Session #112 - Nov 10, 2017 - FRIDAY - Linux at Lunch

Session #112 - Nov 10, 2017 - FRIDAY - Linux at Lunch - KEY WORD SCRIPT walk through! (skipped a session on 3 Nov 2017) The Key Word Script uses regular expressions: grep, sed, awk, cut, uniq, sort, perl, tr and vi to extract KEY WORDS from a text document! Key Words are those hopefully of proper names and places; words beginning with capital letters. We'll walk through this script, line by line, which has already spun off a few modified versions to create different content. The output of this script produces two Key-Word variants, one that shows only the line where the word appears, and another that shows the context, grep -C 1, listing the line before and the line after. It's possible to insert HTML coding into this script to create a web page as I have demonstrated previously, with: http://johnmeister.com/linux/Scripts/mk-webpage-2015-05-08.html I know that there are GUI based tools like WordPress that might create nicer pages... but these tools have security issues, require a database, and are not trivial to install, learn, use or manage. SIMPLE ALWAYS WORKS. It's just interesting to see the names and places listed in just ONE of the 66 books: http://johnmeister.com/bible/BibleSummary/KEY-WORDs/01-GENESIS/ZLIST-names-01k_Gen.txt http://johnmeister.com/bible/BibleSummary/KEY-WORDs/GET-KEY.sh.txt (will create a copy of this on johnmeister.com/linux that will work with generic text files.) ------------------------------------------------------------ #!/bin/bash ### john meister 29 october 2017 - copyright 2017 ### script to find Key words - upper case - Who and Where USAGE="GET-KEY.sh.txt , e.g. GET-KEY.sh.txt 02n_Exo.txt" ######################################################################### cat $1 | tr -d '[{}()\;\":,\!\.\?]' > T2 ### remove most special characters cat T1, filter, save as T2 perl -pi -e 's/ /\n/g' T2 ### put all words on one line grep -v "^'" T2 > T3 ### remove leading ticks from list save as T3 cat T3 | sort | uniq | grep -v ^[a-z] | grep -v ^[0-9] > T4 ### sort, remove lc and numbers, save as T4 rm -f T1 T2 T3 ### deleting temporary files ########################################################################## echo "edit T4 in vi to remove non-keywords; :wq will resume; hit any key to start" read vi T4 # T4 will be moved to ZLIST-names when done ######################### OUT="KEY-names-with-VS-$1" # Key names with counts and verses OUT2="TIMES-names-$1" # Key names with counts only cat /dev/null > $OUT for Z in `cat T4` do echo "----------------------------------------------------------------------" | tee -a $OUT grep $Z $1 | wc -l > T1 #################### echo "The word \"$Z\" in \"$1\" is found `cat T1| cut -c 5-9` times." | tee -a $OUT2 echo "======================================================================" | tee -a $OUT2 #################### echo "The word \"$Z\" in \"$1\" occurs in these verses `cat T1| cut -c 6-9` times:" | tee -a $OUT echo "----------------------------------------------------------------------" | tee -a $OUT grep -C 1 $Z $1 | tee -a $OUT echo "======================================================================" | tee -a $OUT done ########################################################### mv T4 ZLIST-names-$1 ; rm -f T1 ########################################################## So, the extract files look like this for the following command(s): GET-KEY.sh.txt 01k_Gen.txt and GET-KEY.sh.txt 01n_Gen.txt KEY-names-with-VS-01k_Gen.txt 2017-10-30 00:40 1.2M KEY-names-with-VS-01n_Gen.txt 2017-10-30 00:40 1.2M TIMES-names-01k_Gen.txt 2017-10-30 00:40 69K TIMES-names-01n_Gen.txt 2017-10-30 00:40 69K ZLIST-names-01k_Gen.txt 2017-10-30 00:40 3.8K ZLIST-names-01n_Gen.txt 2017-10-30 00:40 3.9K In addition, I want to modify the output to only show the references: http://johnmeister.com/bible/BibleSummary/KEY-WORDs/01-GENESIS/KEY-names-with-VS-01k_Gen.txt e.g. ---------------------------------------------------------------------- The word "Achbor" in "01k_Gen.txt" occurs 2 times: Gen 36:38; Gen 36:39 ====================================================================== or: ---------------------------------------------------------------------- The word "Achbor" in "01k_Gen.txt" occurs 2 times: Gen 36:38; Gen 36:39 ====================================================================== SIMPLE is portable, secure and easily modified; but it may take more effort and understanding, and may include more "steps" and sometimes requires manual intervention; NOTICE the USE of "read" in the script above. I had to manually edit the "key word" list to remove unimportant words like "The", "And", "Because", etc. And a few other items that didn't filter out properly. It is possible that I could create a "heredoc" to filter those words, but after converting the strings to single words with new lines any grep -v filters would take out the rest of any words starting out with the same beginning (e.g And would remove Andover, etc.). That would push the filtering into the text prior to the new line insertion, and since one has no idea which words are "important" or of interest, better to remove them manually.

Simply Linux: Basics  Full Size Jeep Buyer's Guide Using BASH on Windows 10
Practical Suggestions for Microsoft Windows
Linux Tackles Microsoft
12 hour Video Course by john:
The Art of Linux System Administration
published by O'Reilly Media
Study Guide for the LPIC-2 Certification Exams
search for:
on the internet, or:
JohnMeister.com-fotos
LinuxMeister-Linux
BibleTech- Bible overview

overview of mankind's history
Biblical history:
"Promises and Prophets"

Wagoneers

FULL SIZE JEEPS

JeepMeister
"Jeep is America's
only real sports car."
-Enzo Ferrari


MeisterTech
Diesels +

One Page Overview of Linux Commands

click for an image of the 5 essential Linux commands

An Intro to Linux
AMSOIL product guide,
or, AMSOIL web, or 1-800-956-5695,
use customer #283461

Amsoil dealer since 1983
purchase AMSOIL
at Midway Auto on SR9 in Snohomish,
or at Northland Diesel in Bellingham, WA


SJ - 1962-1991

XJ - 1984-2001

WJ - 1999-2004

KJ - 2002-2007

WK - 2005-2010

Find the recommended
AMSOIL synthetics
for your Jeep

CJ-10A - 1984-1986

Jeepsters

MJ - 1984-1992

Willys - 1946-1965

Other Jeeps (FC)