using find and perl to manage URLs for domains in HTML

# #  10 Dec 2017
#  process to consolidate domains that are subdomains of a master domain,
#    using find and perl
# 
# --> tree -d -L 1 /srv/www/htdocs/
# /srv/www/htdocs/
# ├── 2015
# ├── 2016
# ├── 2017
# ├── Amsoil
# ├── bible           # http://johnmeister.com/bible  or http://bibletech.net
# ├── BIBLE           # contains a redirect:  <meta HTTP-EQUIV="REFRESH" content="3; url=http://BibleTech.net/">
# ├── cars-of-john
# ├── fotomeister     # http://johnmeister.com/fotomeister  or http://johnmeister.com  (expired)
# ├── fotos
# ├── HD-DECK
# ├── jeep            # http://johnmeister.com/jeep  or http://jeepmeister.com
# ├── john
# ├── linux           # http://johnmeister.com/linux  or http://johnmeister.com/linux
# ├── tech            # http://johnmeister.com/tech  or http://meistertech.net
# └── Time_Lapse
# 
# 15 directories
#  ------------------------------------------------
#    
#    1) make backup copy of original README.html
#
#    2) edit README.html to eliminate references to expired or expiring domains
#
#    3) identify how many files will be affected: 
#       find /srv/www/htdocs/ -type f -name README.html -print | wc -l
#  ------------------------------------------------
#  --> find /srv/www/htdocs/ -type f -name README.html -print | wc -l
#  5055
#  ------------------------------------------------
#
#    4) add listing of files to a file to save:
#       find /srv/www/htdocs/ -type f -name README.html -print > listing-of-README-locations.txt
#
#    5) use find to find and copy updated README.html
#       find /srv/www/htdocs/ -type f -name README.html -exec cp /srv/www/htdocs/README.html {} \;
# 
#    USE TIME TO RECORD TOTAL PROCESSING TIME:
# --> time find . -type f -name README.html -exec cp /srv/www/htdocs/README.html {} \;
# cp: '/srv/www/htdocs/README.html' and './README.html' are the same file
# 
# real    0m51.229s
# user    0m7.111s
# sys     0m4.639s
#
# 
# --> man time | col -b | grep -v ^$
# TIME(1)                                   Linux User's Manual                                  TIME(1)
# NAME
#        time - time a simple command or give resource usage
# SYNOPSIS
#        time [options] command [arguments...]
#  ------------------------------------------------
# 
#
#    6) edit /etc/apache2/errors.conf to point 404 to "missing.html"
#  ------------------------------------------------
#       --> sudo vi /etc/apache2/errors.conf   #  comment default, insert "ErrorDocument 404 /missing.html"
#  ------------------------------------------------
#       --> cat /etc/apache2/errors.conf | grep -C 3 missing
#
#   Some examples:
#   ErrorDocument 500 "The server made a boo boo."
#   ErrorDocument 404 /missing.html
#   ErrorDocument 404 "/cgi-bin/missing_handler.pl"
#   ErrorDocument 402 http://www.example.com/subscription_info.html
# --
#     ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var
#     ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var
#  #    ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
#     ErrorDocument 404 /missing.html
#     ErrorDocument 405 /error/HTTP_METHOD_NOT_ALLOWED.html.var
#     ErrorDocument 408 /error/HTTP_REQUEST_TIME_OUT.html.var
#     ErrorDocument 410 /error/HTTP_GONE.html.var
#  ------------------------------------------------
#   7) place missing.html in each of the "subdomain" directories - bible, jeep, linux, tech, fotomeister
#       this allows a "3 way" means of accessing content, the normal URL, the main domain, and an "oops" page.
#  ------------------------------------------------
#   8) after placing the "missing.html" pages I realized I forgot a domain - updated it as follows:
#    --> find /srv/www/htdocs/ -type f -name missing.html -exec cp /srv/www/htdocs/missing.html {} \;
#  ------------------------------------------------
#   9) to replace all the embedded links with the master domain link find was combined with perl:
# 
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://johnmeister.com/linux$http://johnmeister.com/linux$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://jeepmeister.com$http://johnmeister.com/jeep$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://meistertech.net$http://johnmeister.com/tech$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://bibletech.net$http://johnmeister.com/bible$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://seasianmissions.org$http://johnmeister.com/bible/SEAM$g' {} \;
#
#   10) there were also references in the text such as titles, headings and so on, these commands replaced those:
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$johnmeister.com/linux$johnmeister.com/linux$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$jeepmeister.com$johnmeister.com/jeep$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$meistertech.net$johnmeister.com/tech$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$bibletech.net$johnmeister.com/bible$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$LinuxMeister.net$LinuxMeister$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$JeepMeister.com$JeepMeister$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$MeisterTech.net$MeisterTech$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$BibleTech.net$BibleTech$g' {} \;
#
#   11) the command can also be executed from the current directory:
# find . -type f -name "*.html" -exec perl -pi -e 's$http://johnmeister.com$http://johnmeister.com/fotomeister$g' {} \;
#   
#   12) find can also be used to count the total files
# ------------------------------------------------
# --> find /srv/www/htdocs/ -type f -name "*.html" -print | wc -l
# 24292
# ------------------------------------------------
# --> find /srv/www/htdocs/ -type f -name "*.jpg" -print | wc -l
# 1967047
# ------------------------------------------------
#    NOTE:  need to encapsulate wildcards in quotes or it fails:
# --> find /srv/www/htdocs/ -type f -name *.html -print | wc -l
# 0
# ------------------------------------------------
#
#   13) using ls -AlR and grep can lead to different results -  or none:
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep *.html | wc -l
# 0
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep "*.html" | wc -l
# 0
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep '*.html' | wc -l
# 0
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep html | wc -l
# 26671
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep \.html | wc -l
# 26671
# ------------------------------------------------

 
JohnMeister.com - Today's Date: