how to fix 3,359 pages with an incomplete URL, using grep, find and perl...
fixing embedded HTML tags in a large number of files
# fixing BibleTech
# BAD STRING, needs .net a href="http://BibleTech" target
Updated html files some time ago, changing a URL or a path and accidently removed the domain from a large number of links...
a href="http://BibleTech" target
Have used perl to fix other broken or outdated URLs on the server:
# fix-domains-dec-2017.sh:# find /home/luser/httpd/ -type f -name "*.html" -exec perl -pi -e 's$http://linuxmeister.net$http://johnmeister.com/linux$g' {} \;
# fix-domains-dec-2017.sh:find /home/luser/httpd/ -type f -name "*.html" -exec perl -pi -e 's$linuxmeister.net$johnmeister.com/linux$g' {} \;
# x-perl-find-fix-linuxmeister.net.txt:--> find . -type f -name '*.html' -exec perl -pi -e 's$linuxmeister.net$johnmeister.com/linux$g' {} \;
#
TESTING the command on a single file FIRST
- first find a file in the current directory
- then execute the perl string at the command line
- then check to see if it fixed it with a grep
- then open the test file with vi or more and make sure nothing else broke
------------------------------------------------
look for and pick a file:
--> grep "a href=\"http://BibleTech\" target" *.html
Books-by-John.html:<a href="http://BibleTech" target="_blank">BibleTech</a>- Bible overview
use find with perl to edit that ONE file:
------------------------------------------------
--> find . -type f -name The_Art_of_Linux_System_Administration.html \
-exec perl -pi -e 's$a href=\"http://BibleTech\" target$a href=\"http://BibleTech.net\" target$g' {} \;
use grep the file should NOT come up:
------------------------------------------------
--> grep "a href=\"http://BibleTech\" target" *.html
(SILENCE IS GOLDEN - remember the Philosophy of LINUX (UNIX)!)
Then modify the find and perl string and run...
find . -type f -name *.html -exec perl -pi -e 's$a href=\"http://BibleTech\" target$a href=\"http://BibleTech.net\" target$g' {} \;
Decided to first see how many files, and time the actions:
# ------------------------------------------------
# --> time find . -type f -name "*.html" -exec grep "a href=\"http://BibleTech\" target" {} \; | wc -l
# 3359
#
# real 4m25.992s
# user 0m56.230s
# sys 0m23.291s
#
# ------------------------------------------------
# --> time find . -type f -name "*.html" -exec perl -pi -e 's$a href=\"http://BibleTech\" target$a href=\"http://BibleTech.net\" target$g' {} \;
#
# real 8m40.815s
# user 1m32.800s
# sys 0m46.286s
#
# ------------------------------------------------
# --> time find . -type f -name "*.html" -exec grep "a href=\"http://BibleTech\" target" {} \;
#
# real 4m7.665s
# user 0m54.618s
# sys 0m23.934s
#
# ------------------------------------------------
JohnMeister.com
Today's Date:
|