########################################################################
# Hadoop Linux cluster NOTES - in my lab - 8 June 2015 - john meister ##
###  NOTE, URLs were internal sources AND Cloudera was NOT used     ####
# Cloudera offered a proprietary setup version of Apache Hadoop     ####
# with a ludicrous license agreement... set things up manually: RTM ####
#  got this cluster working, but notes are not complete...          ###
########################################################################
should have used ext3 (when I set these up...)
#####
Ext4:
######
The Ext4 Linux filesystem has delayed allocation of data which makes it 
handle unplanned server shutdowns/power outages less well than classic 
ext3. Consider turning off the delalloc option in /etc/fstab unless you trust your UPS.
from:   http://wiki.apache.org/hadoop/DiskSetup

       delalloc
              Deferring block allocation until write-out time.

       nodelalloc
              Disable delayed allocation. Blocks are allocated when data is copied 
		from  user  to page cache.

set /etc/fstab:  (8 jun 2015)

------------------------------------------------
root@Linux-Lab-1 [/root]
------------------------------------------------
--> cat /etc/fstab
/dev/system/swap     swap                 swap       defaults              0 0
/dev/system/root     /                    ext4       acl,user_xattr,nodelalloc        1 1
UUID=4AE5-CA05       /boot/efi            vfat       umask=0002,utf8=true  0 0
############################################################################################
### NOTE:  added nodelalloc for ext4 to prevent corruption of data
###  added nodelalloc to prevent delayed block allocation until write-out time - helps prevent loss
### default ext4 delays block-allocation until write-out, risking data if power fails.
############################################################################################

NOTE:  (need to update /etc/fstab on all nodes... to show nodelalloc for ext4)
NOTE:  FRESH INSTALLS SHOULD USE ext3!
-------------------------------------------------

sudo addgroup hadoop 
$ sudo adduser --ingroup hadoop hduser

------------------------------------------------
disable ip6
------------------------------------------------
john@Linux-Lab-1 [/home/john]
------------------------------------------------
--> cat /etc/sysctl.conf
####
#
# /etc/sysctl.conf is meant for local sysctl settings
#
# sysctl reads settings from the following locations:
#   /boot/sysctl.conf-
#   /lib/sysctl.d/*.conf
#   /usr/lib/sysctl.d/*.conf
#   /usr/local/lib/sysctl.d/*.conf
#   /etc/sysctl.d/*.conf
#   /run/sysctl.d/*.conf
#   /etc/sysctl.conf
#
# To disable or override a distribution provided file just place a
# file with the same name in /etc/sysctl.d/
#
# See sysctl.conf(5), sysctl.d(5) and sysctl(8) for more information
#
####
net.ipv4.ip_forward = 0
## net.ipv6.conf.all.forwarding = 0
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
------------------------------------------------

S/W install list (revised:)
	NEED TO DOWNLOAD/INSTALL from:
// was an internal source, need to locate on line   /repositories/apache/
-------------------------------------------------
	
------------------
software required:  (need version info for the following)
------------------
Flume  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/flume/

HDFS Balancer  - VERSION ??? - nothing on java source.some-company-dot-com

HDFS Name Node  - VERSION ??? - nothing on java source.some-company-dot-com

HBase Master  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/hbase/

HBase Region Server - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/hbase/

HBase Rest  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/hbase/

HDFS Datanode  - VERSION ??? - nothing on java source.some-company-dot-com

Hive - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/hive/

Sqoop  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/sqoop/

YARN  - VERSION ??? - nothing on java source.some-company-dot-com
	NO INFO INTERNALLY

Zookeeper  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/zookeeper/

Jaspersoft Server 
	last status 5/18/2015 - Ron waiting
	
PostgresSQL DB  - VERSION ???
	installed on nodes... not sure of version

Hive Metastore Server  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/hive/

Hue Server  - VERSION ??? - nothing on java source.some-company-dot-com
	hey Hue, where are you?  not on java...

OOZIE  - VERSION ???
	http://java source.some-company-dot-com/repositories/apache/oozie/

Camel - VERSION ???
	App Server (JBoss + Camel + Tomcat)  - 
	http://java source.some-company-dot-com/repositories/apache/camel/apache-camel/

Tomcat - VERSION ???
	App Server (JBoss + Camel + Tomcat)  - 
	http://java source.some-company-dot-com/repositories/apache/tomcat/

------------------
software required:  have version info and s/w
------------------
Hadoop -// was an internal source, need to locate on line   /repositories/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
App Server (JBoss + Camel + Tomcat)  - 
JBoss - JBOSS-EAP-6.3-Download   (copied to master node) 


############################################################################################
DETAILED ORIGINAL LIST:
############################################################################################
Master Node (Linux-Lab-1) software install list 
	1.	Cloudera Manager Event Server 
	2.	Cloudera Manager Host Monitor 
	3.	Cloudera Manager Service Alert Publisher 
	4.	Cloudera Manager Service Monitor 
5.Flume 
6.HDFS Balancer 
7.HDFS Name Node 
8.OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
________________________________________
SECOND Node (Linux-Lab-2) software install list 
1.	HBase Master 
2.	HBase Region Server1 
3.	HBase Rest 
4.	HDFS Datanode1 
5.	Hive 
6.	Sqoop 
7.	YARN 
8.	Zookeeper 
9.	OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
________________________________________
THIRD Node (Linux-Lab-3) software install list 
1.	Jaspersoft Server 
2.	PostgresSQL DB 
3.	HBase Region Server2 
4.	HDFS Datanode2 
5.	HDFS Secondary NameNode 
6.	YARN 
7.	Zookeeper 
8.	OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
________________________________________
FOURTH Node (Linux-Lab-4) software install list 
1.	App Server (JBoss + Camel + Tomcat) 
2.	HBase Region Server3 
3.	HDFS Datanode3 
4.	Hive Metastore Server 
5.	Hue Server 
6.	OOZIE 
7.	YARN 
8.	Zookeeper 
9.	OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
##########################################################

#################################################################
ORIGINAL NOTES when planning for Cloudera (did NOT use Cloudera)
#################################################################
Software details: 
•	SuSE Linux 13.2 (Cloudera Supported) - Operating system - installed and configured - 21 Apr 2015 
•	Cloudera Express (Manager + Hadoop + HBase, etc.(??) - 5.2.4 (recent stable release) - TBD 
•	JasperReports - 6.0 - Reporting Server - TBD 
•	JBoss - Communtity Edition 7.0 - Application Server - TBD 
•	PostgresSQL Server - 9.1 - RDBMS for Cloudera, Jaspersoft & Principal RDBMS Store for Analytics Platform (ck version - installed) 
•	Apache Tomcat - 7.0 - Web Server add-on - installed, but needs to be configured 
•	Apache Camel - 2.15.1 - Routing and Mediation Engine - TBD 
•	Java 1.7.0_67 (/opt/jdk1.7.0_67) - ck installed version - TBD 
##########################################################
TCP/IP ports: no firewall installed - Apparmor NOT installed - ssh enabled
•	5432 - PostgresSQL 
•	8080/8443 - Tomcat Server 
•	8180/8553 JBoss Server (Standard Ports increased by 100) 
•	180/543 Apache Camel (Standard Ports increased by 100) 
##########################################################
system details: - complete 21 Apr 2015
1.	setup 4 Dell T5600 systems with 500GB drives, 16GB memory, 2 procs, SuSE 13.2, KDE desktop - done 
2.	setup sys admin ssh/sudo, fixed IP's - done 
3.	created localized script to capture network performance, user access, and disk usage - done 
4.	expand monitoring scripts and move to lab system, run via ssh - TBD 
5.	configure script to run from lab manager's linux system to remotely monitor nodes via cron - TBD 
##########################################################
system and user details: -   hduser or hadoop?
user hduser to install app w/ sudo - local only (use case to force su -) 

1.setup user accounts for Ron, Raja, Matt, Gaja AND hduser - 30 Apr
hduser: hduser:x:1005:100:
hduser - local account only - MUST su from existing account :
/home/hduser:/bin/bash (note - add case)

John: john:x:1000:100:john :/home/john:/bin/bash
# maybe: Ron: ron:x:1002:100:Ron :/home/fondenr:/bin/bash
# maybe: Matt: matt:x:1003:100:Matt :/home/matt:/bin/bash
# maybe: Gaja: gaja:x:1004:100:
2.	establish rights per Gaja's email -TBD 
3.	add users via script or manually via scp - 30 Apr
	ssh x "sudo cp /etc/passwd /etc/passwd-30apr2015"
	scp passwd Linux-Lab-1:/home/john/ ssh x "sudo cp /etc/passwd /etc/passwd-30apr2015"
4.	setup shadow, group, .bashrc, .History and .ssh directories for each user - 30 Apr
5.	setup ssh id_rsa.pub for each user on each node - TBD
6.	establish ssh between nodes for each user - TBD 
##########################################################
TO ARCHIVE (use rsync or scp from opsdevlab)
------------------------------------------------
john@opsdevlab [/home/john/FILES/ARCHIVE/Linux-Lab-1-john-SOURCE]
------------------------------------------------
--> scp -rp Linux-Lab-1:/home/john .
------------------------------------------------
##########################################################

Master Node (Linux-Lab-1) software install list 
	1.	Cloudera Manager Event Server 
	2.	Cloudera Manager Host Monitor 
	3.	Cloudera Manager Service Alert Publisher 
	4.	Cloudera Manager Service Monitor 
5.	Flume 
6.	HDFS Balancer 
7.	HDFS Name Node 
8.	OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
________________________________________
SECOND Node (Linux-Lab-2) software install list 
1.	HBase Master 
2.	HBase Region Server1 
3.	HBase Rest 
4.	HDFS Datanode1 
5.	Hive 
6.	Sqoop 
7.	YARN 
8.	Zookeeper 
9.	OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
________________________________________
THIRD Node (Linux-Lab-3) software install list 
1.	Jaspersoft Server 
2.	PostgresSQL DB 
3.	HBase Region Server2 
4.	HDFS Datanode2 
5.	HDFS Secondary NameNode 
6.	YARN 
7.	Zookeeper 
8.	OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
________________________________________
FOURTH Node (Linux-Lab-4) software install list 
1.	App Server (JBoss + Camel + Tomcat) 
2.	HBase Region Server3 
3.	HDFS Datanode3 
4.	Hive Metastore Server 
5.	Hue Server 
6.	OOZIE 
7.	YARN 
8.	Zookeeper 
9.	OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers 
------------------------------------------------

------------------------------------------------
        EXPORTED VARIABLES:
------------------------------------------------
export JAVA_HOME=${JAVA_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} \
    -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} \
    -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=$USER

------------------------------------------------
external sites providing tech info and details
------------------------------------------------
http://wiki.apache.org/hadoop/
http://wiki.apache.org/hadoop/HowToConfigure
http://wiki.apache.org/hadoop/DiskSetup
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
http://wiki.apache.org/hadoop/HadoopMapReduce
http://www.philippeadjiman.com/blog/the-hadoop-tutorial-series/
http://wiki.apache.org/nutch/NutchHadoopTutorial
------------------------------------------------


JohnMeister.com Today's Date:


fotomeister: john's fotos



Study the Bible on line

Bible Basics: an 8 section guide to understanding the Bible
on line here: HTML version Or view in PDF format.
Available at e-book retailers around the world in multiple formats.
LPIC-2 Study Guide:
The Art of Linux Sys Admin
O'Reilly author info
e-books by john
published works by john:
Bible Basics  Full Size Jeep Buyer's Guide Simply Linux: Basics Linux Tackles Microsoft
Mac and Jack - the tail of two kitties Using BASH on Windows 10 Using BASH on Windows 10 - LITE VERSION
Practical Suggestions for Microsoft Windows

Linux Overview
-- bashrc and basics
The vi editor -- Linux notes
Scripting -- Filesystems
OS performance info -- Sys Admin

STUDY the Bible    ...in ONE year
Israel National News -- BBC: MidEast
Der Spiegel -- Voice of the Martyrs
News Links -- Jihadwatch.org
South East Asian Missions
Bible Basics - an e-book
these links updated: 2021-10-14
overview of mankind's history
"Promises and Prophets"

Study the Bible.
verse by verse teachings on MP3
Details of the Passover week

THURSDAY on the cross...

BibleTech: multiple languages in parallel
FULL SIZE JEEP

Buyer's Guide


SJ Jeeps

"Jeep is America's
only real sports car."
-Enzo Ferrari

Mercedes, VW, and other Diesels
Nikon cameras -- and other general tech info

Bibletech
NW pollen count
Seattle traffic
Bridge Christian Fellowship mp3 teachings
AMSOIL product guide,
or, AMSOIL web, or 1-800-956-5695,
use customer #283461

Amsoil dealer since 1983

purchase AMSOIL and have it installed locally in Western Washington at the following locations, best to call 1st as it may take a day or 2 day to get specific AMSOIL products in!

- Northland Diesel 360.676.1970 - Bellingham
- Dalton's 360.668.7111 - Snohomish & Marysville
- Fleet Services 425.355.4440 - Everett