using find and perl to manage URLs for domains in HTML# # 10 Dec 2017 # process to consolidate domains that are subdomains of a master domain, # using find and perl # # --> tree -d -L 1 /srv/www/htdocs/ # /srv/www/htdocs/ # ├── 2015 # ├── 2016 # ├── 2017 # ├── Amsoil # ├── bible # http://johnmeister.com/bible or http://bibletech.net # ├── BIBLE # contains a redirect: <meta HTTP-EQUIV="REFRESH" content="3; url=http://BibleTech.net/"> # ├── cars-of-john # ├── fotomeister # http://johnmeister.com/fotomeister or http://johnmeister.com (expired) # ├── fotos # ├── HD-DECK # ├── jeep # http://johnmeister.com/jeep or http://jeepmeister.com # ├── john # ├── linux # http://johnmeister.com/linux or http://johnmeister.com/linux # ├── tech # http://johnmeister.com/tech or http://meistertech.net # └── Time_Lapse # # 15 directories # ------------------------------------------------ # # 1) make backup copy of original README.html # # 2) edit README.html to eliminate references to expired or expiring domains # # 3) identify how many files will be affected: # find /srv/www/htdocs/ -type f -name README.html -print | wc -l # ------------------------------------------------ # --> find /srv/www/htdocs/ -type f -name README.html -print | wc -l # 5055 # ------------------------------------------------ # # 4) add listing of files to a file to save: # find /srv/www/htdocs/ -type f -name README.html -print > listing-of-README-locations.txt # # 5) use find to find and copy updated README.html # find /srv/www/htdocs/ -type f -name README.html -exec cp /srv/www/htdocs/README.html {} \; # # USE TIME TO RECORD TOTAL PROCESSING TIME: # --> time find . -type f -name README.html -exec cp /srv/www/htdocs/README.html {} \; # cp: '/srv/www/htdocs/README.html' and './README.html' are the same file # # real 0m51.229s # user 0m7.111s # sys 0m4.639s # # # --> man time | col -b | grep -v ^$ # TIME(1) Linux User's Manual TIME(1) # NAME # time - time a simple command or give resource usage # SYNOPSIS # time [options] command [arguments...] # ------------------------------------------------ # # # 6) edit /etc/apache2/errors.conf to point 404 to "missing.html" # ------------------------------------------------ # --> sudo vi /etc/apache2/errors.conf # comment default, insert "ErrorDocument 404 /missing.html" # ------------------------------------------------ # --> cat /etc/apache2/errors.conf | grep -C 3 missing # # Some examples: # ErrorDocument 500 "The server made a boo boo." # ErrorDocument 404 /missing.html # ErrorDocument 404 "/cgi-bin/missing_handler.pl" # ErrorDocument 402 http://www.example.com/subscription_info.html # -- # ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var # ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var # # ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var # ErrorDocument 404 /missing.html # ErrorDocument 405 /error/HTTP_METHOD_NOT_ALLOWED.html.var # ErrorDocument 408 /error/HTTP_REQUEST_TIME_OUT.html.var # ErrorDocument 410 /error/HTTP_GONE.html.var # ------------------------------------------------ # 7) place missing.html in each of the "subdomain" directories - bible, jeep, linux, tech, fotomeister # this allows a "3 way" means of accessing content, the normal URL, the main domain, and an "oops" page. # ------------------------------------------------ # 8) after placing the "missing.html" pages I realized I forgot a domain - updated it as follows: # --> find /srv/www/htdocs/ -type f -name missing.html -exec cp /srv/www/htdocs/missing.html {} \; # ------------------------------------------------ # 9) to replace all the embedded links with the master domain link find was combined with perl: # # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://johnmeister.com/linux$http://johnmeister.com/linux$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://jeepmeister.com$http://johnmeister.com/jeep$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://meistertech.net$http://johnmeister.com/tech$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://bibletech.net$http://johnmeister.com/bible$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://seasianmissions.org$http://johnmeister.com/bible/SEAM$g' {} \; # # 10) there were also references in the text such as titles, headings and so on, these commands replaced those: # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$johnmeister.com/linux$johnmeister.com/linux$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$jeepmeister.com$johnmeister.com/jeep$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$meistertech.net$johnmeister.com/tech$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$bibletech.net$johnmeister.com/bible$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$LinuxMeister.net$LinuxMeister$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$JeepMeister.com$JeepMeister$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$MeisterTech.net$MeisterTech$g' {} \; # find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$BibleTech.net$BibleTech$g' {} \; # # 11) the command can also be executed from the current directory: # find . -type f -name "*.html" -exec perl -pi -e 's$http://johnmeister.com$http://johnmeister.com/fotomeister$g' {} \; # # 12) find can also be used to count the total files # ------------------------------------------------ # --> find /srv/www/htdocs/ -type f -name "*.html" -print | wc -l # 24292 # ------------------------------------------------ # --> find /srv/www/htdocs/ -type f -name "*.jpg" -print | wc -l # 1967047 # ------------------------------------------------ # NOTE: need to encapsulate wildcards in quotes or it fails: # --> find /srv/www/htdocs/ -type f -name *.html -print | wc -l # 0 # ------------------------------------------------ # # 13) using ls -AlR and grep can lead to different results - or none: # ------------------------------------------------ # --> ls -AlR /srv/www/htdocs/ | grep *.html | wc -l # 0 # ------------------------------------------------ # --> ls -AlR /srv/www/htdocs/ | grep "*.html" | wc -l # 0 # ------------------------------------------------ # --> ls -AlR /srv/www/htdocs/ | grep '*.html' | wc -l # 0 # ------------------------------------------------ # --> ls -AlR /srv/www/htdocs/ | grep html | wc -l # 26671 # ------------------------------------------------ # --> ls -AlR /srv/www/htdocs/ | grep \.html | wc -l # 26671 # ------------------------------------------------ |
|
Wagoneers FULL SIZE JEEPS JeepMeister "Jeep is America's -Enzo Ferrari MeisterTech Diesels + |
One Page Overview of Linux Commands click for an image of the 5 essential Linux commands An Intro to Linux |
- Midway Auto - SR9 Clearview/Snohomish - Northland Diesel - Bellingham - Grumpy's Gun Repair - Granite Falls |