using find, grep and perl, with awk to find and replace a URL

 30 January 2020

problem: - website contained links to the sold out DVD on Amazon, needed to update link to Oreilly
solution: - find html files with the amazon URL and replace Oreilly Media

 the simple test: 
grep -i Amazon *.html
(that only works within one directory, but does show all the instances, and other content)

 a little more involved "test" (to make sure the special characters are escaped):

grep "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd" HEADER.html

with 4TB worth of files... need to descend through the directories using find, however a few problems resulted...


the command produced a lot of standard errors about a file being a directory, but also the information at the same time... didn't take the time to introduce the command to send stderr to the bit bucket... or use the -type f, and/or even better, -name "*.html" main reason was trying to find the files at first... simple always works... this worked, well enough: find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; this worked better (used later): find jeep -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; AT THIS POINT, the string that needed to be replaced was found... Could see that it was primarily HEADER.html files
the command to FIND the pages works, next, test the PERL command on a copy of a HEADER.html file renamed test.html: perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' test.html that was successful and tested with grep: grep -i amazon test.html
next step was to use find with the perl string by first using individual directories: find /web/info/ -type f -name 'HEADER.html' -exec perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' {} \; find /web/linux/ -type f -name 'HEADER.html' -exec perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' {} \;
once a few directories were tested successfully, the entire web tree was searched and updated: find /web/ -type f -name 'HEADER.html' -exec \ perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' {} \;
find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; | awk '{print $1}' added the field separator... that worked... mostly... find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; | awk -F : '{print $1}' now that this find was working, it was sent to a file that could be reviewed and used to create a script to update or delete, used tee with an append. find . -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; | awk -F : '{print $1}' | tee -a LIST.txt while the find was running and producing the list a few tweaks to awk were made and then that output send to a file that was edited and turned into a script. tail -f LIST.txt cat LIST.txt | awk '{print $1}' cat LIST.txt | awk '{print $1$2}' cat LIST.txt | awk -F : '{print $1}' cat LIST.txt | awk -F : '{print $1}' > edit-files vi edit-files find jeep -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; find info -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; find linux -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \; found 3 non-Header files, fixed with commands in a quick script below... --> cat edit-files #!/bin/bash perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' linux/Intro-to-Linux/One-Hour-Linux-Sessions-2019.html perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' linux/Intro-to-Linux/One-Hour-Linux-Sessions-2018.html perl -pi -e 's$https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref\=sr_1_1\?ie=UTF8\&qid\=1484728403\&sr\=8-1\&keywords\=lpic-2\+dvd$http://shop.oreilly.com/product/0636920050209.do$g' linux/LinuxMeister.net-Books-Videos-n-links.html ------------------- testing: grep -i amazon linux/LinuxMeister.net-Books-Videos-n-links.html find linux -type f -name "*.html" -exec grep -H -n "https://www.amazon.com/Study-Guide-LPIC-2-Exams-201/dp/B01I25VO9A/ref=sr_1_1?ie=UTF8&qid=1484728403&sr=8-1&keywords=lpic-2+dvd" {} \;

JohnMeister.com Today's Date:


fotomeister: john's fotos

Study the Bible on line
Simply Linux: Basics Using BASH on Windows 10
Practical Suggestions for Microsoft Windows
Linux Tackles Microsoft  Full Size Jeep Buyer's Guide
FULL SIZE JEEP

Buyer's Guide

SJ Jeeps

"Jeep is America's
only real sports car."
-Enzo Ferrari


Mercedes, VW, and other Diesels
Nikon cameras
general tech info
AMSOIL product guide,
or, AMSOIL web, or 1-800-956-5695,
use customer #283461

Amsoil dealer since 1983

purchase AMSOIL and have it
installed locally in WA at:

- Northland Diesel 360.676.1970 - Bellingham
- Midway Auto 360.668.7111 - Clearview/Snohomish
- Fleet Services 425.355.4440 - Everett