前言:
现时我们对“apachehadoop的common”大致比较关怀,同学们都需要知道一些“apachehadoop的common”的相关知识。那么小编同时在网上收集了一些关于“apachehadoop的common””的相关内容,希望咱们能喜欢,兄弟们快快来学习一下吧!这篇文章接上篇文章,Hadoop 2.5.2分布式集群搭建(一)基本环境搭建(虚拟机),接下来该正式将Hadoop部署配置。
5.安装hadoop
自己去hadoop官网下载你所需要的Hadoop 版本,我这边用的是Hadoop 2.5.2
先要下载一个Hadoop 2.5.2复制到master服务器的/usr/local路径下,
# cd /usr/local
# wget
下载完成后将之解压出来
# tar -zxvf hadoop-2.5.2.tar.gz
# cd hadoop-2.5.2
# mkdir data //用于制定hadoop的hadoop.tmp.dir目录
vim命令修改hadoop core-site.xml文件
# vim /usr/local/hadoop-2.5.2/etc/hadoop/core-site.xml
编辑修改etc/hadoop/core-site.xml 配置如下,
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Hmaster:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.5.2/data</value>
</property>
</configuration>
编辑修改etc/conf/mapred-site.xml 配置如下,
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>Hmaster:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Hmaster:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Hmaster:19888</value>
</property>
</configuration>
conf/hdfs-site.xml 配置如下,注意文件路径中不要包含一些点、逗号等特殊字符,文件路径需要写成完全路径,以file:开头
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-2.5.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop-2.5.2/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>Hmaster:9000</value>
</property>
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>
</configuration
编辑修改etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Hmaster:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Hmaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Hmaster:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Hmaster:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Hmaster:8088</value>
</property>
</configuration>
然后修改 etc/hadoop/slaves,这是为了可以通过全局命名令识别控制各台服务器
Hslave1
Hslave2
Hslave3
etc/hadoop/hadoop-env.sh和yarn-env.sh中配置Java环境变量
export JAVA_HOME=/usr/java/default
可以使用scp 直接把以上配置文件copy到另外的服务器集群上
# scp -r hadoop-2.5.2 root@Hslave1:/usr/local
# scp -r hadoop-2.5.2 root@Hslave2:/usr/local
# scp -r hadoop-2.5.2 root@Hslave3:/usr/local
编辑修改/etc/profile文件 ,用来配置hadoop环境变量
#HADOOP VARIABLES START
export HADOOP_HOME=/usr/local/hadoop-2.5.2
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
#HADOOP VARIABLES END
至此,基本的分布式hadoop环境配置已经搭建完毕,接下来需要进行启动验证。
6.启动验证hadoop
(1)格式化文件系统
# cd /usr/local/hadoop-2.5.2
# ./bin/hdfs namenode -format
17/11/04 22:19:13 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = Hmaster/192.168.0.200
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.5.2
STARTUP_MSG: classpath = /usr/local/hadoop-2.5.2/etc/hadoop:/usr/local/hadoop-2.5.2/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop-2.5.2/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/usr/local/hadoop-2.5.2/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-
...
...
STARTUP_MSG: build = -r cc72e9b000545b86b75a61f4835eb86d57bfafc0; compiled by 'jenkins' on 2014-11-14T23:45Z
STARTUP_MSG: java = 1.8.0_91
************************************************************/
17/11/04 22:19:13 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
17/11/04 22:19:13 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-c7182aaa-a763-4276-b7eb-f11b4bd86a63
17/11/04 22:19:14 INFO namenode.FSNamesystem: fsLock is fair:true
17/11/04 22:19:14 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
17/11/04 22:19:14 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
17/11/04 22:19:14 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
17/11/04 22:19:14 INFO blockmanagement.BlockManager: The block deletion will start around 2017 十一月 04 22:19:14
17/11/04 22:19:14 INFO util.GSet: Computing capacity for map BlocksMap
17/11/04 22:19:14 INFO util.GSet: VM type = 64-bit
17/11/04 22:19:14 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
17/11/04 22:19:14 INFO util.GSet: capacity = 2^21 = 2097152 entries
17/11/04 22:19:15 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
17/11/04 22:19:15 INFO blockmanagement.BlockManager: defaultReplication = 3
17/11/04 22:19:15 INFO blockmanagement.BlockManager: maxReplication = 512
17/11/04 22:19:15 INFO blockmanagement.BlockManager: minReplication = 1
17/11/04 22:19:15 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
17/11/04 22:19:15 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
17/11/04 22:19:15 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
17/11/04 22:19:15 INFO blockmanagement.BlockManager: encryptDataTransfer = false
17/11/04 22:19:15 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
17/11/04 22:19:15 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
17/11/04 22:19:15 INFO namenode.FSNamesystem: supergroup = supergroup
17/11/04 22:19:15 INFO namenode.FSNamesystem: isPermissionEnabled = false
17/11/04 22:19:15 INFO namenode.FSNamesystem: HA Enabled: false
17/11/04 22:19:15 INFO namenode.FSNamesystem: Append Enabled: true
17/11/04 22:19:15 INFO util.GSet: Computing capacity for map INodeMap
17/11/04 22:19:15 INFO util.GSet: VM type = 64-bit
17/11/04 22:19:15 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
17/11/04 22:19:15 INFO util.GSet: capacity = 2^20 = 1048576 entries
17/11/04 22:19:15 INFO namenode.NameNode: Caching file names occuring more than 10 times
17/11/04 22:19:15 INFO util.GSet: Computing capacity for map cachedBlocks
17/11/04 22:19:15 INFO util.GSet: VM type = 64-bit
17/11/04 22:19:15 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
17/11/04 22:19:15 INFO util.GSet: capacity = 2^18 = 262144 entries
17/11/04 22:19:15 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/11/04 22:19:15 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/11/04 22:19:15 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
17/11/04 22:19:15 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/11/04 22:19:15 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/11/04 22:19:15 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/11/04 22:19:15 INFO util.GSet: VM type = 64-bit
17/11/04 22:19:15 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
17/11/04 22:19:15 INFO util.GSet: capacity = 2^15 = 32768 entries
17/11/04 22:19:15 INFO namenode.NNConf: ACLs enabled? false
17/11/04 22:19:15 INFO namenode.NNConf: XAttrs enabled? true
17/11/04 22:19:15 INFO namenode.NNConf: Maximum size of an xattr: 16384
17/11/04 22:19:15 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1165534849-192.168.0.200-1509805155604
17/11/04 22:19:15 INFO common.Storage: Storage directory /usr/local/hadoop-2.5.2/dfs/name has been successfully formatted.
17/11/04 22:19:15 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/11/04 22:19:15 INFO util.ExitUtil: Exiting with status 0
17/11/04 22:19:15 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Hmaster/192.168.0.200
************************************************************/
如果遇到启动失败的情况, 则需要我们手动创建目录
# mkdir /usr/local/hadoop-2.5.2/dfs
如果成功会显示
INFO common.Storage: Storage directory /usr/local/hadoop-2.5.2/dfs/name has been successfully formatted.
(2) 启动hadoop
# cd /usr/local/hadoop-2.5.2
# sbin/start-all.sh
启动日志如下
在master和三台slave上执行命令jps查看java进程,成功启动情况如下,
# Hmaster 情况
# jps
5346 ResourceManager
5619 Jps
5206 SecondaryNameNode
5032 NameNode
# Hslave1 Hslave2 Hslave3 情况
4291 NodeManager
4133 DataNode
4460 Jps
如果出现以下输出使其卡着不动,则要在/etc/ssh/ssh_config 文件中添加
StrictHostKeyChecking no 然后重启ssh服务/etc/init.d/ssh restart
...
The authenticity of host 'localhost (127.0.0.1)' can't be established.ECDSA key fingerprint is 08:1d:db:e4:d2:e0:87:89:ed:ca:69:82:17:6a:83:57
...
7.可能遇到的问题
(1)start-all.sh集群启动过程中出现一些服务进程启动失败情况时,首先检查排除防火墙的问题,
然后再去查看相应服务的启动日志的报错信息。
(2)Initialization failed for Block pool(Datanode Uuid unassigned)
问题的定位:所有namenode目录、所有datanode目录、从节点临时目录
问题的原因:
1) 首先是主节点的namenode clusterID与从节点的datanode clusterID不一致 导致
2) 因为多次格式化了namenode跟datanode之后的结果,格式化之后从节点生成了新的ID,造成了记录情况不一致
解决的办法:
在格式化之前,先把所有的服务杀掉(stop-dfs.sh、stop-yarn.sh或者stop-all.sh),确保服务都停掉了之后,分别到所有节点的namenode目录、datanode目录、临时目录,然后把以上目录里面的所有相关内容都删掉,然后再重新启动进行测试。
标签: #apachehadoop的common