using find, grep and perl, with awk to find and replace a URL

 30 January 2020

problem: - website contained links to the sold out DVD on Amazon, needed to update link to Oreilly
solution: - find html files with the amazon URL and replace Oreilly Media

 the simple test: 
grep -i Amazon *.html
(that only works within one directory, but does show all the instances, and other content)

 a little more involved "test" (to make sure the special characters are escaped):

grep "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd" HEADER.html

with 4TB worth of files... need to descend through the directories using find, however a few problems resulted...


the command produced a lot of standard errors about a file being a directory, but also the information at the same time... 
    didn't take the time to introduce the command to send stderr to the bit bucket... or use the -type f, and/or even better, -name "*.html"
    main reason was trying to find the files at first...  simple always works...

this worked, well enough:
find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \;

this worked better (used later):
find jeep -type f -name "*.html"  -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; 

AT THIS POINT, the string that needed to be replaced was found...  
    Could see that it was primarily HEADER.html files


 the command to FIND the pages works, next, test the PERL command on a copy of a HEADER.html file renamed test.html:

perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' test.html

that was successful and tested with grep:  grep -i amazon test.html


 next step was to use find with the perl string by first using individual directories:

find /web/info/ -type f -name 'HEADER.html' -exec perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' {} \;

find /web/linux/ -type f -name 'HEADER.html' -exec perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' {} \;


 once a few directories were tested successfully, the entire web tree was searched and updated:

find /web/ -type f -name 'HEADER.html' -exec \
perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' {} \;



find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; | awk '{print $1}'
added the field separator... that worked... mostly... 

find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; | awk -F : '{print $1}'

now that this find was working, it was sent to a file that could be reviewed and used to create a script to update or delete, used tee with an append.

find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; | awk -F : '{print $1}' | tee -a LIST.txt

while the find was running and producing the list a few tweaks to awk were made and then that output send to a file that was edited and turned into a script.

tail -f LIST.txt

   cat LIST.txt | awk '{print $1}'
   cat LIST.txt | awk '{print $1$2}'
   cat LIST.txt | awk -F : '{print $1}'
   cat LIST.txt | awk -F : '{print $1}' > edit-files
   vi edit-files


find jeep -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; 
find info -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; 
find linux -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; 

found 3 non-Header files, fixed with commands in a quick script below... 

--> cat edit-files 
#!/bin/bash
perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' linux/Intro-to-Linux/One-Hour-Linux-Sessions-2019.html

perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' linux/Intro-to-Linux/One-Hour-Linux-Sessions-2018.html

perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' linux/LinuxMeister.net-Books-Videos-n-links.html

-------------------
testing:
grep -i amazon linux/LinuxMeister.net-Books-Videos-n-links.html
find linux -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \;

JohnMeister.com Today's Date:

fotomeister: john's fotos
Study the Bible on line

Study Guide for LPIC-2

Video Course:
The Art of Linux System Administration
and LPIC-2 Study Guide

O'Reilly Media author info

Linux overview

5 commands

bashrc and basics

Overview of Linux

The vi editor

Linux notes

Linux commands

BASH Scripting

Filesystems

microsoft info

Sys Admin

fotomeister: photography

STUDY IT.

e-books by john
pollen count
News Links
BBC: Middle East
Israel National News
Der Spiegel
Jihadwatch.org
Seattle traffic
just one year.

Study it daily!

Biblical history:
"Promises and Prophets"

Bible tech

Lessons on MP3

Details of the Passover week

Read it in ONE year!

persecution.com
NASB/KJV/ES/D
South East Asian Missions
FULL SIZE JEEP

Buyer's Guide

SJ Jeeps
"Jeep is America's
only real sports car."
-Enzo Ferrari

Mercedes, VW, and other Diesels
Nikon cameras
general tech info

AMSOIL product guide,
or, AMSOIL web, or 1-800-956-5695,
use customer #283461
Amsoil dealer since 1983

purchase AMSOIL and have it
installed locally in WA at:
- Northland Diesel 360.676.1970 - Bellingham
- Midway Auto 360.668.7111 - Clearview/Snohomish
- Fleet Services 425.355.4440 - Everett