########################################################################
# Hadoop Linux cluster NOTES - in my lab - 8 June 2015 - john meister ##
### NOTE, URLs were internal sources AND Cloudera was NOT used ####
# Cloudera offered a proprietary setup version of Apache Hadoop ####
# with a ludicrous license agreement... set things up manually: RTM ####
# got this cluster working, but notes are not complete... ###
########################################################################
should have used ext3 (when I set these up...)
#####
Ext4:
######
The Ext4 Linux filesystem has delayed allocation of data which makes it
handle unplanned server shutdowns/power outages less well than classic
ext3. Consider turning off the delalloc option in /etc/fstab unless you trust your UPS.
from: http://wiki.apache.org/hadoop/DiskSetup
delalloc
Deferring block allocation until write-out time.
nodelalloc
Disable delayed allocation. Blocks are allocated when data is copied
from user to page cache.
set /etc/fstab: (8 jun 2015)
------------------------------------------------
root@Linux-Lab-1 [/root]
------------------------------------------------
--> cat /etc/fstab
/dev/system/swap swap swap defaults 0 0
/dev/system/root / ext4 acl,user_xattr,nodelalloc 1 1
UUID=4AE5-CA05 /boot/efi vfat umask=0002,utf8=true 0 0
############################################################################################
### NOTE: added nodelalloc for ext4 to prevent corruption of data
### added nodelalloc to prevent delayed block allocation until write-out time - helps prevent loss
### default ext4 delays block-allocation until write-out, risking data if power fails.
############################################################################################
NOTE: (need to update /etc/fstab on all nodes... to show nodelalloc for ext4)
NOTE: FRESH INSTALLS SHOULD USE ext3!
-------------------------------------------------
sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
------------------------------------------------
disable ip6
------------------------------------------------
john@Linux-Lab-1 [/home/john]
------------------------------------------------
--> cat /etc/sysctl.conf
####
#
# /etc/sysctl.conf is meant for local sysctl settings
#
# sysctl reads settings from the following locations:
# /boot/sysctl.conf-
# /lib/sysctl.d/*.conf
# /usr/lib/sysctl.d/*.conf
# /usr/local/lib/sysctl.d/*.conf
# /etc/sysctl.d/*.conf
# /run/sysctl.d/*.conf
# /etc/sysctl.conf
#
# To disable or override a distribution provided file just place a
# file with the same name in /etc/sysctl.d/
#
# See sysctl.conf(5), sysctl.d(5) and sysctl(8) for more information
#
####
net.ipv4.ip_forward = 0
## net.ipv6.conf.all.forwarding = 0
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
------------------------------------------------
S/W install list (revised:)
NEED TO DOWNLOAD/INSTALL from:
// was an internal source, need to locate on line /repositories/apache/
-------------------------------------------------
------------------
software required: (need version info for the following)
------------------
Flume - VERSION ???
http://java source.some-company-dot-com/repositories/apache/flume/
HDFS Balancer - VERSION ??? - nothing on java source.some-company-dot-com
HDFS Name Node - VERSION ??? - nothing on java source.some-company-dot-com
HBase Master - VERSION ???
http://java source.some-company-dot-com/repositories/apache/hbase/
HBase Region Server - VERSION ???
http://java source.some-company-dot-com/repositories/apache/hbase/
HBase Rest - VERSION ???
http://java source.some-company-dot-com/repositories/apache/hbase/
HDFS Datanode - VERSION ??? - nothing on java source.some-company-dot-com
Hive - VERSION ???
http://java source.some-company-dot-com/repositories/apache/hive/
Sqoop - VERSION ???
http://java source.some-company-dot-com/repositories/apache/sqoop/
YARN - VERSION ??? - nothing on java source.some-company-dot-com
NO INFO INTERNALLY
Zookeeper - VERSION ???
http://java source.some-company-dot-com/repositories/apache/zookeeper/
Jaspersoft Server
last status 5/18/2015 - Ron waiting
PostgresSQL DB - VERSION ???
installed on nodes... not sure of version
Hive Metastore Server - VERSION ???
http://java source.some-company-dot-com/repositories/apache/hive/
Hue Server - VERSION ??? - nothing on java source.some-company-dot-com
hey Hue, where are you? not on java...
OOZIE - VERSION ???
http://java source.some-company-dot-com/repositories/apache/oozie/
Camel - VERSION ???
App Server (JBoss + Camel + Tomcat) -
http://java source.some-company-dot-com/repositories/apache/camel/apache-camel/
Tomcat - VERSION ???
App Server (JBoss + Camel + Tomcat) -
http://java source.some-company-dot-com/repositories/apache/tomcat/
------------------
software required: have version info and s/w
------------------
Hadoop -// was an internal source, need to locate on line /repositories/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
App Server (JBoss + Camel + Tomcat) -
JBoss - JBOSS-EAP-6.3-Download (copied to master node)
############################################################################################
DETAILED ORIGINAL LIST:
############################################################################################
Master Node (Linux-Lab-1) software install list
1. Cloudera Manager Event Server
2. Cloudera Manager Host Monitor
3. Cloudera Manager Service Alert Publisher
4. Cloudera Manager Service Monitor
5.Flume
6.HDFS Balancer
7.HDFS Name Node
8.OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
________________________________________
SECOND Node (Linux-Lab-2) software install list
1. HBase Master
2. HBase Region Server1
3. HBase Rest
4. HDFS Datanode1
5. Hive
6. Sqoop
7. YARN
8. Zookeeper
9. OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
________________________________________
THIRD Node (Linux-Lab-3) software install list
1. Jaspersoft Server
2. PostgresSQL DB
3. HBase Region Server2
4. HDFS Datanode2
5. HDFS Secondary NameNode
6. YARN
7. Zookeeper
8. OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
________________________________________
FOURTH Node (Linux-Lab-4) software install list
1. App Server (JBoss + Camel + Tomcat)
2. HBase Region Server3
3. HDFS Datanode3
4. Hive Metastore Server
5. Hue Server
6. OOZIE
7. YARN
8. Zookeeper
9. OTHER S/W: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
##########################################################
#################################################################
ORIGINAL NOTES when planning for Cloudera (did NOT use Cloudera)
#################################################################
Software details:
• SuSE Linux 13.2 (Cloudera Supported) - Operating system - installed and configured - 21 Apr 2015
• Cloudera Express (Manager + Hadoop + HBase, etc.(??) - 5.2.4 (recent stable release) - TBD
• JasperReports - 6.0 - Reporting Server - TBD
• JBoss - Communtity Edition 7.0 - Application Server - TBD
• PostgresSQL Server - 9.1 - RDBMS for Cloudera, Jaspersoft & Principal RDBMS Store for Analytics Platform (ck version - installed)
• Apache Tomcat - 7.0 - Web Server add-on - installed, but needs to be configured
• Apache Camel - 2.15.1 - Routing and Mediation Engine - TBD
• Java 1.7.0_67 (/opt/jdk1.7.0_67) - ck installed version - TBD
##########################################################
TCP/IP ports: no firewall installed - Apparmor NOT installed - ssh enabled
• 5432 - PostgresSQL
• 8080/8443 - Tomcat Server
• 8180/8553 JBoss Server (Standard Ports increased by 100)
• 180/543 Apache Camel (Standard Ports increased by 100)
##########################################################
system details: - complete 21 Apr 2015
1. setup 4 Dell T5600 systems with 500GB drives, 16GB memory, 2 procs, SuSE 13.2, KDE desktop - done
2. setup sys admin ssh/sudo, fixed IP's - done
3. created localized script to capture network performance, user access, and disk usage - done
4. expand monitoring scripts and move to lab system, run via ssh - TBD
5. configure script to run from lab manager's linux system to remotely monitor nodes via cron - TBD
##########################################################
system and user details: - hduser or hadoop?
user hduser to install app w/ sudo - local only (use case to force su -)
1.setup user accounts for Ron, Raja, Matt, Gaja AND hduser - 30 Apr
hduser: hduser:x:1005:100:
hduser - local account only - MUST su from existing account :
/home/hduser:/bin/bash (note - add case)
John: john:x:1000:100:john :/home/john:/bin/bash
# maybe: Ron: ron:x:1002:100:Ron :/home/fondenr:/bin/bash
# maybe: Matt: matt:x:1003:100:Matt :/home/matt:/bin/bash
# maybe: Gaja: gaja:x:1004:100:
2. establish rights per Gaja's email -TBD
3. add users via script or manually via scp - 30 Apr
ssh x "sudo cp /etc/passwd /etc/passwd-30apr2015"
scp passwd Linux-Lab-1:/home/john/ ssh x "sudo cp /etc/passwd /etc/passwd-30apr2015"
4. setup shadow, group, .bashrc, .History and .ssh directories for each user - 30 Apr
5. setup ssh id_rsa.pub for each user on each node - TBD
6. establish ssh between nodes for each user - TBD
##########################################################
TO ARCHIVE (use rsync or scp from opsdevlab)
------------------------------------------------
john@opsdevlab [/home/john/FILES/ARCHIVE/Linux-Lab-1-john-SOURCE]
------------------------------------------------
--> scp -rp Linux-Lab-1:/home/john .
------------------------------------------------
##########################################################
Master Node (Linux-Lab-1) software install list
1. Cloudera Manager Event Server
2. Cloudera Manager Host Monitor
3. Cloudera Manager Service Alert Publisher
4. Cloudera Manager Service Monitor
5. Flume
6. HDFS Balancer
7. HDFS Name Node
8. OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
________________________________________
SECOND Node (Linux-Lab-2) software install list
1. HBase Master
2. HBase Region Server1
3. HBase Rest
4. HDFS Datanode1
5. Hive
6. Sqoop
7. YARN
8. Zookeeper
9. OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
________________________________________
THIRD Node (Linux-Lab-3) software install list
1. Jaspersoft Server
2. PostgresSQL DB
3. HBase Region Server2
4. HDFS Datanode2
5. HDFS Secondary NameNode
6. YARN
7. Zookeeper
8. OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
________________________________________
FOURTH Node (Linux-Lab-4) software install list
1. App Server (JBoss + Camel + Tomcat)
2. HBase Region Server3
3. HDFS Datanode3
4. Hive Metastore Server
5. Hue Server
6. OOZIE
7. YARN
8. Zookeeper
9. OTHER S/W and configuration details: Java 1.7.0_67 (/opt/jdk1.7.0_67), ntpd, sshd, crond, sudoers
------------------------------------------------
------------------------------------------------
EXPORTED VARIABLES:
------------------------------------------------
export JAVA_HOME=${JAVA_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} \
-Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} \
-Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=$USER
------------------------------------------------
external sites providing tech info and details
------------------------------------------------
http://wiki.apache.org/hadoop/
http://wiki.apache.org/hadoop/HowToConfigure
http://wiki.apache.org/hadoop/DiskSetup
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
http://wiki.apache.org/hadoop/HadoopMapReduce
http://www.philippeadjiman.com/blog/the-hadoop-tutorial-series/
http://wiki.apache.org/nutch/NutchHadoopTutorial
------------------------------------------------
JohnMeister.com Today's Date:
|