using find and perl to manage URLs for domains in HTML
# # 10 Dec 2017
# process to consolidate domains that are subdomains of a master domain,
# using find and perl
#
# --> tree -d -L 1 /srv/www/htdocs/
# /srv/www/htdocs/
# ├── 2015
# ├── 2016
# ├── 2017
# ├── Amsoil
# ├── bible # http://johnmeister.com/bible or http://bibletech.net
# ├── BIBLE # contains a redirect: <meta HTTP-EQUIV="REFRESH" content="3; url=http://BibleTech.net/">
# ├── cars-of-john
# ├── fotomeister # http://johnmeister.com/fotomeister or http://johnmeister.com (expired)
# ├── fotos
# ├── HD-DECK
# ├── jeep # http://johnmeister.com/jeep or http://jeepmeister.com
# ├── john
# ├── linux # http://johnmeister.com/linux or http://johnmeister.com/linux
# ├── tech # http://johnmeister.com/tech or http://meistertech.net
# └── Time_Lapse
#
# 15 directories
# ------------------------------------------------
#
# 1) make backup copy of original README.html
#
# 2) edit README.html to eliminate references to expired or expiring domains
#
# 3) identify how many files will be affected:
# find /srv/www/htdocs/ -type f -name README.html -print | wc -l
# ------------------------------------------------
# --> find /srv/www/htdocs/ -type f -name README.html -print | wc -l
# 5055
# ------------------------------------------------
#
# 4) add listing of files to a file to save:
# find /srv/www/htdocs/ -type f -name README.html -print > listing-of-README-locations.txt
#
# 5) use find to find and copy updated README.html
# find /srv/www/htdocs/ -type f -name README.html -exec cp /srv/www/htdocs/README.html {} \;
#
# USE TIME TO RECORD TOTAL PROCESSING TIME:
# --> time find . -type f -name README.html -exec cp /srv/www/htdocs/README.html {} \;
# cp: '/srv/www/htdocs/README.html' and './README.html' are the same file
#
# real 0m51.229s
# user 0m7.111s
# sys 0m4.639s
#
#
# --> man time | col -b | grep -v ^$
# TIME(1) Linux User's Manual TIME(1)
# NAME
# time - time a simple command or give resource usage
# SYNOPSIS
# time [options] command [arguments...]
# ------------------------------------------------
#
#
# 6) edit /etc/apache2/errors.conf to point 404 to "missing.html"
# ------------------------------------------------
# --> sudo vi /etc/apache2/errors.conf # comment default, insert "ErrorDocument 404 /missing.html"
# ------------------------------------------------
# --> cat /etc/apache2/errors.conf | grep -C 3 missing
#
# Some examples:
# ErrorDocument 500 "The server made a boo boo."
# ErrorDocument 404 /missing.html
# ErrorDocument 404 "/cgi-bin/missing_handler.pl"
# ErrorDocument 402 http://www.example.com/subscription_info.html
# --
# ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var
# ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var
# # ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
# ErrorDocument 404 /missing.html
# ErrorDocument 405 /error/HTTP_METHOD_NOT_ALLOWED.html.var
# ErrorDocument 408 /error/HTTP_REQUEST_TIME_OUT.html.var
# ErrorDocument 410 /error/HTTP_GONE.html.var
# ------------------------------------------------
# 7) place missing.html in each of the "subdomain" directories - bible, jeep, linux, tech, fotomeister
# this allows a "3 way" means of accessing content, the normal URL, the main domain, and an "oops" page.
# ------------------------------------------------
# 8) after placing the "missing.html" pages I realized I forgot a domain - updated it as follows:
# --> find /srv/www/htdocs/ -type f -name missing.html -exec cp /srv/www/htdocs/missing.html {} \;
# ------------------------------------------------
# 9) to replace all the embedded links with the master domain link find was combined with perl:
#
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://johnmeister.com/linux$http://johnmeister.com/linux$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://jeepmeister.com$http://johnmeister.com/jeep$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://meistertech.net$http://johnmeister.com/tech$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://bibletech.net$http://johnmeister.com/bible$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$http://seasianmissions.org$http://johnmeister.com/bible/SEAM$g' {} \;
#
# 10) there were also references in the text such as titles, headings and so on, these commands replaced those:
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$johnmeister.com/linux$johnmeister.com/linux$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$jeepmeister.com$johnmeister.com/jeep$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$meistertech.net$johnmeister.com/tech$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$bibletech.net$johnmeister.com/bible$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$LinuxMeister.net$LinuxMeister$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$JeepMeister.com$JeepMeister$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$MeisterTech.net$MeisterTech$g' {} \;
# find /srv/www/htdocs/ -type f -name "*.html" -exec perl -pi -e 's$BibleTech.net$BibleTech$g' {} \;
#
# 11) the command can also be executed from the current directory:
# find . -type f -name "*.html" -exec perl -pi -e 's$http://johnmeister.com$http://johnmeister.com/fotomeister$g' {} \;
#
# 12) find can also be used to count the total files
# ------------------------------------------------
# --> find /srv/www/htdocs/ -type f -name "*.html" -print | wc -l
# 24292
# ------------------------------------------------
# --> find /srv/www/htdocs/ -type f -name "*.jpg" -print | wc -l
# 1967047
# ------------------------------------------------
# NOTE: need to encapsulate wildcards in quotes or it fails:
# --> find /srv/www/htdocs/ -type f -name *.html -print | wc -l
# 0
# ------------------------------------------------
#
# 13) using ls -AlR and grep can lead to different results - or none:
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep *.html | wc -l
# 0
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep "*.html" | wc -l
# 0
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep '*.html' | wc -l
# 0
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep html | wc -l
# 26671
# ------------------------------------------------
# --> ls -AlR /srv/www/htdocs/ | grep \.html | wc -l
# 26671
# ------------------------------------------------
|
|
![]() Wagoneers FULL SIZE JEEPS JeepMeister "Jeep is America's -Enzo Ferrari MeisterTech Diesels + |
One Page Overview of Linux Commands click for an image of the 5 essential Linux commands An Intro to Linux |
- Midway Auto - SR9 Clearview/Snohomish - Northland Diesel - Bellingham - Grumpy's Gun Repair - Granite Falls |