using perl to fix text formatting issues
These commands could be executed within a text document using the vi editor, in fact, that's what I was doing repeatedly,
and finally modified the sed strings to work with command line perl and then incorporated them into a clean up script.
This script can be run on the same text multiple times without causing any harm.
The first line: #!/bin/bash - is the shell directive, all commands in this file should be execute as in a bash shell.
The 2nd line is an echo statement that displays on the screen to let you know it ran.
The first two perl lines remove any bracketed information in the text, such as for foot notes.
The other lines add and adjust spaces after specific punctuation.
###############################################################################################
#!/bin/bash
echo "remove bracketed footnotes, fixed commas and semicolon spacing"
perl -pi -e 's/\[.\]//g' *.txt # remove bracketed text such as a footnote
perl -pi -e 's/\[..\]//g' *.txt # remove bracketed text such as a footnote
perl -pi -e 's/,/, /g' *.txt # adds a space after a comma
perl -pi -e 's/, /, /g' *.txt # removes any extra spaces after a comma after adding a space
perl -pi -e 's/;/; /g' *.txt # adds a space after a semicolon
perl -pi -e 's/:/: /g' *.txt # adds a space after a colon
perl -pi -e 's$\.$. $g' *.txt # adds a space after a period
perl -pi -e 's/?/? /g' *.txt # adds a space after a question mark
###############################################################################################
# the following sed and perl string replace all the lines below, used brackets for upper-case
###############################################################################################
sed -i 's/[A-Z]/ /g # add space before uppercase letter
perl -pi -e 's/ / /g # change two spaces to one
###############################################################################################
# perl -pi -e 's/A/A /g' *.txt # adds a space after an uppercase A
# perl -pi -e 's/A /A /g' *.txt # removes any extra spaces before an uppercase A
# perl -pi -e 's/T/T /g' *.txt # adds a space after an uppercase T
# perl -pi -e 's/T /T /g' *.txt # removes any extra spaces before an uppercase T
# perl -pi -e 's/W/W /g' *.txt # adds a space after an uppercase W
# perl -pi -e 's/W /W /g' *.txt # removes any extra spaces before an uppercase W
# perl -pi -e 's/Y/Y /g' *.txt # adds a space after an uppercase Y
# perl -pi -e 's/Y /Y /g' *.txt # removes any extra spaces before an uppercase Y
###############################################################################################
# need to figure out bracket in perl: perl -pi -e 's/(.*)/<img src=\"$1\"><br>$1<hr>/'
###############################################################################################
|