how to fix 3,359 pages with an incomplete URL, using grep, find and perl...

fixing embedded HTML tags in a large number of files

#  fixing BibleTech
#   BAD STRING, needs .net   a href="http://BibleTech" target

Updated html files some time ago, changing a URL or a path and accidently removed the domain from a large number of links...

      a href="http://BibleTech" target  

Have used perl to fix other broken or outdated URLs on the server:

# find /home/luser/httpd/ -type f -name "*.html" -exec perl -pi -e 's$$$g' {} \;
# /home/luser/httpd/ -type f -name "*.html" -exec perl -pi -e 's$$$g' {} \;
#> find . -type f -name '*.html' -exec perl -pi -e 's$$$g' {} \;

TESTING the command on a single file FIRST

  1. first find a file in the current directory
  2. then execute the perl string at the command line
  3. then check to see if it fixed it with a grep
  4. then open the test file with vi or more and make sure nothing else broke

------------------------------------------------ look for and pick a file: --> grep "a href=\"http://BibleTech\" target" *.html Books-by-John.html:<a href="http://BibleTech" target="_blank">BibleTech</a>- Bible overview
use find with perl to edit that ONE file: ------------------------------------------------ --> find . -type f -name The_Art_of_Linux_System_Administration.html \ -exec perl -pi -e 's$a href=\"http://BibleTech\" target$a href=\"\" target$g' {} \; use grep the file should NOT come up: ------------------------------------------------ --> grep "a href=\"http://BibleTech\" target" *.html (SILENCE IS GOLDEN - remember the Philosophy of LINUX (UNIX)!)

Then modify the find and perl string and run...

find . -type f -name *.html -exec perl -pi -e 's$a href=\"http://BibleTech\" target$a href=\"\" target$g' {} \;

Decided to first see how many files, and time the actions:

# ------------------------------------------------ # --> time find . -type f -name "*.html" -exec grep "a href=\"http://BibleTech\" target" {} \; | wc -l # 3359 # # real 4m25.992s # user 0m56.230s # sys 0m23.291s # # ------------------------------------------------ # --> time find . -type f -name "*.html" -exec perl -pi -e 's$a href=\"http://BibleTech\" target$a href=\"\" target$g' {} \; # # real 8m40.815s # user 1m32.800s # sys 0m46.286s # # ------------------------------------------------ # --> time find . -type f -name "*.html" -exec grep "a href=\"http://BibleTech\" target" {} \; # # real 4m7.665s # user 0m54.618s # sys 0m23.934s # # ------------------------------------------------ Today's Date:

