using wget to get information from a web page, then using grep and sed to remove formatting

     
    process:  download files (script: base-get-book.sh )
        #####################################
              wget -O INFO-1 "http://www...page 1
              wget -O INFO-2 "http://www...page 2
              wget -O INFO-3 "http://www...page 3
        #####################################
      process:  strip out formatting to get specific content
         (script: extract-text-book.sh)
        #####################################
        #!/bin/bash
        #####################################
        mkdir INFO
        for x in `ls INFO-*`
            do 
        #####################################
        cat $x | grep "text $x" | sed 's/New American/ New American/g' | \
                sed 's/[^<]*<\/span>/ /g'  | \
                sed 's/<[^>]*>//g' | sed 's/\ [[^]*\]]//g' | sed 's/ \([1-9]\)/\n\1/g'  > $x.txt
        #####################################
        mv $x INFO
            done
        #####################################
      process:  make sure all files have proper names for sorting
        ls > fix
        #####################################
        vi fix 
            remove lines that have digits that will sort: e.g.   filename-10.txt
            leave lines that need to have a zero added, then:
        :%s/.*/mv & &/g
            results:     mv filename-1.txt filename-1.txt
            then cursor to the number on the right, and insert a 0 before 1, to get:  filename-01.txt
            repeat (usually 1-9) until done.
        :wq
        #####################################
        sh ./fix ; rm -f fix
        #####################################
        ls -al (should show all files sorted nicely)
        #####################################
      process:  remove remaining internal brackets
        use script to remove brackets:      
                    sh ./remove-brackets.sh *
        #####################################
       script: remove-brackets.sh 
        #!/bin/bash
        echo "remove bracketed footnotes"
        perl -pi -e 's/\[.\]//g' *.txt
        perl -pi -e 's/\[..\]//g' *.txt
        #####################################
     process:  edit all the files to remove errors, line up on one line 
            vi INFO-*
        after making sure there is a number on the left and all the text on one line, then
        :%s/.*/INFO 9:&/g
        :wn
        :%s/.*/INFO 10:&/g
        :wn
        :%s/.*/INFO 11:&/g
        :wn
            #  where INFO is the book or file name, and 9 is the chapter.  
            # The ampersand places the text string with line number to the right
            repeat until done editing all files
        #####################################
     process:  copy files to proper directories and other servers:
        cp INFO-*.txt ../SORTED-INFO
        scp -r ../SORTED-INFO/ 192.168.11.11:/home/luser/FILES/SORTED-INFO
        or 
         rsync -r ../SORTED-INFO/ 192.168.11.11:/home/luser/FILES/SORTED-INFO
        #####################################
     repeat until all 1,189 chapters are cleaned up and sorted... 
        then test to see that there will be 31,102 lines, and based on the version, 781,621 words
        cat INFO-* > total-info.txt ; cat total-info.txt | wc 
    (note: as of 3/7/2017 I'm on 14/66)
        



Simply Linux: Basics Linux Tackles Microsoft Using BASH on Windows 10
Practical Suggestions for Microsoft Windows
 Full Size Jeep Buyer's Guide
12 hour Video Course by john:
The Art of Linux System Administration
published by O'Reilly Media
Study Guide for the LPIC-2 Certification Exams
search for:
on the internet, or:
JohnMeister.com-fotos
LinuxMeister-Linux
BibleTech- Bible overview

overview of mankind's history
Biblical history:
"Promises and Prophets"

Wagoneers

FULL SIZE JEEPS

JeepMeister
"Jeep is America's
only real sports car."
-Enzo Ferrari


MeisterTech
Diesels +

One Page Overview of Linux Commands

click for an image of the 5 essential Linux commands

An Intro to Linux
AMSOIL product guide,
or, AMSOIL web, or 1-800-956-5695,
use customer #283461

Amsoil dealer since 1983
purchase AMSOIL
at Midway Auto on SR9 in Snohomish,
or at Northland Diesel in Bellingham, WA


SJ - 1962-1991

XJ - 1984-2001

WJ - 1999-2004

KJ - 2002-2007

WK - 2005-2010

Find the recommended
AMSOIL synthetics
for your Jeep

CJ-10A - 1984-1986

Jeepsters

MJ - 1984-1992

Willys - 1946-1965

Other Jeeps (FC)