using sed in vi to replace pipe symbol(|) with a colon(:)
28 Apr 2018
OBJECTIVE: replace pipe (|) with colon (:):
------------------------------------------------
The copy of the LXX and Greek New Testament that I found on line had pipe symbols, making it harder to read.
--> head -n 2 01g_Gen.txt
Gen|1|1|ἐν ἀρχῇ ἐποίησεν ὁ θεὸς τὸν οὐρανὸν καὶ τὴν γῆν
Gen|1|2|ἡ δὲ γῆ ἦν ἀόρατος καὶ ἀκατασκεύαστος καὶ σκότος ἐπάνω τῆς ἀβύσσου καὶ πνεῦμα θεοῦ ἐπεφέρετο ἐπάνω τοῦ ὕδατος
this worked to manually replace the pipe with colons in vi:
----------------------------------------------------------------------
<esc>:%s/|/:/g
<esc>:%s/[a-z]:/&Y/g
<esc>:%s/:Y/ /g
<esc>:%s/[0-9]:[0-9].*:/&X/g
<esc>:%s/:X/ /g
I had to copy the original over a few times, had cp aliased, so...
---------------------------------------------------------------------------
--> /bin/cp 01g_Gen.txt test
this worked with perl at the command line:
------------------------------------------------
--> perl -pi -e 's/\|/:/g' test ; perl -pi -e 's/(.*):(.*):(.*):/$1 $2:$3 /' test ; head -n 2 test
Gen 1:1 ἐν ἀρχῇ ἐποίησεν ὁ θεὸς τὸν οὐρανὸν καὶ τὴν γῆν
Gen 1:2 ἡ δὲ γῆ ἦν ἀόρατος καὶ ἀκατασκεύαστος καὶ σκότος ἐπάνω τῆς ἀβύσσου καὶ πνεῦμα θεοῦ ἐπεφέρετο ἐπάνω τοῦ ὕδατος
testing results
-------------------------
-->diff 01g_Gen.txt test | wc -l
3068
-->cat 01g_Gen.txt | wc -l
1533
-->cat test | wc -l
1533
-->grep "|" test
(nothing)
-->grep "|" 01g_Gen.txt | wc -l
1533
------------------------------------------------
RABBIT TRAIL
------------------------------------------------
--> cat test | wc
1533 37871 388211
------------------------------------------------
--> cat 01g_Gen.txt | wc
1533 34807 388211
------------------------------------------------
--> bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
37871-34807
3064
quit (to exit)
------------------------------------------------
--- why are we off by 4??? ---
------------------------------------------------
--> cmp 01g_Gen.txt test
01g_Gen.txt test differ: char 4, line 1
------------------------------------------------
--> head -n 1 01g_Gen.txt ; head -n 1 test
Gen|1|1|ἐν ἀρχῇ ἐποίησεν ὁ θεὸς τὸν οὐρανὸν καὶ τὴν γῆν
Gen 1:1 ἐν ἀρχῇ ἐποίησεν ὁ θεὸς τὸν οὐρανὸν καὶ τὴν γῆν
------------------------------------------------
--> head -n 1 01g_Gen.txt | wc ; head -n 1 test | wc
1 10 108
1 12 108
------------------------------------------------
wait... the original Greek "plain" text may have come from a Microsoft site!
-->cat 01g_Gen.txt | col -b > clean-01g_Gen.txt
--> diff clean-01g_Gen.txt 01g_Gen.txt | wc -l
3068
------------------------------------------------
ok, let's rerun this process without the "^M" or other MS characters...
--> head -n 1 01g_Gen.txt | wc ; head -n 1 test | wc
1 10 107
1 12 107
one less character... guess that makes sense, but didn't explain
the difference between 3064 and the expected 3068.
At this point the text appears to be ok after the perl commands,
will proceed to use this string to remove the pipes and replace for readability.
the working string for the Greek "plain" text to replace pipe with space and colons
--> perl -pi -e 's/\|/:/g' test ; perl -pi -e 's/(.*):(.*):(.*):/$1 $2:$3 /' test ; head -n 2 test
(replacing test with the actual file names... will need to get on the html copies... may attempt
to target lines within the body of the html... haven't tried that before... would think
that replacing the "s" with "x,y", where x=first line, y=last line, should work... let's see:
oh my... will save this adventure for a later date... :) 4/28/2018
will figure out how to do in perl what can be done with sed inside vi... there are differences...
and... one can do a LOT of damage with perl on the command line... so, ALWAYS test with a COPY of your real file.
(then remember to remove the extra copies before you forget what they were for...)
At this point, I will test the working perl command on an html file and see if I break anything.
Theoretically there should not be any pipe symbols in my three translation Bible pages, other than those from the Greek.
...special thanx to Kevin P. for the inspiration and ideas, always good to talk with you.
------------------------------------------------
JohnMeister.com
Today's Date:
|