前言:
目前看官们对“hadoop怎么集群部署”大概比较注重,看官们都需要学习一些“hadoop怎么集群部署”的相关内容。那么小编也在网上汇集了一些关于“hadoop怎么集群部署””的相关资讯,希望看官们能喜欢,兄弟们一起来学习一下吧!前置要求需要3台虚拟机,系统为Centos7,分别host命名为bigdata01,bigdata02,bigdata03密码均为root确保三台虚拟机已经完成了JDK、SSH免密、关闭防火墙、配置主机名映射等前置操作JDK安装参考:SSH免密配置方法: 配置/etc/hosts文件
在3台虚拟机的/etc/hosts文件中,填入如下内容:(同时这也是三台虚拟机的ip地址)
192.168.31.115 bigdata01192.168.31.131 bigdata02192.168.31.133 bigdata03虚拟机设置bigdata01设置4GB或以上内存bigdata02和bigdata03设置2GB或以上内存角色分配:bigdata01: Namenode、Datanode、ResourceManager、NodeManager、HistoryServer、WebProxyServer、QuorumPeerMainbigdata02: Datanode、NodeManager、QuorumPeerMainbigdata03: Datanode、NodeManager、QuorumPeerMainHadoop集群部署下载Hadoop安装包
下载Hadoop安装包、解压、配置软链接。
# 1. 下载# 网页: 在 bigdata01 节点执行$. cd /root$. wget 2. 解压# 请确保目录/export/server存在# tar -zxvf hadoop-3.4.0.tar.gz -C /export/server/$. tar -xf hadoop-3.4.0.tar.gz$. mv hadoop-3.4.0 /usr/local/# 3. 构建软链接# ln -s /usr/local/hadoop-3.4.0-3.3.0 /usr/local/hadoop-3.4.0修改配置文件: hadoop-env.sh
配置文件位于/usr/local/hadoop-3.4.0/etc/hadoop目录,修改hadoop-env.sh文件。
此文件是配置Hadoop临时环境变量,在Hadoop运行时生效,
永久生效,需写到/etc/profile中。
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp hadoop-env.sh hadoop-env.sh-bak$. vi hadoop-env.sh# 在文件最后添加如下内容# 在文件开头加入:export JAVA_HOME=/usr/java/jdk-11/# 配置Hadoop安装路径export HADOOP_HOME=/usr/local/hadoop-3.4.0# Hadoop hdfs配置文件路径# YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR.export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop# export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop# Hadoop YARN 日志文件夹# YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR# export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn# Hadoop hdfs 日志文件夹export HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs# Hadoop的使用启动用户配置export HDFS_NAMENODE_USER=rootexport HDFS_DATANODE_USER=rootexport HDFS_SECONDARYNAMENODE_USER=rootexport YARN_RESOURCEMANAGER_USER=rootexport YARN_NODEMANAGER_USER=rootexport YARN_PROXYSERVER_USER=root修改配置文件: core-site.xml
清空文件,填入如下内容:
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp core-site.xml core-site.xml-bak$. vi core-site.xml# 写入如下内容<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata01:8020</value> <description></description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> <description></description> </property></configuration>配置:hdfs-site.xml文件
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp hdfs-site.xml hdfs-site.xml-bak$. vi hdfs-site.xml# 写入如下内容<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>dfs.datanode.data.dir.perm</name> <value>700</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/nn</value> <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description> </property> <property> <name>dfs.namenode.hosts</name> <value>bigdata01,bigdata02,bigdata03</value> <description>List of permitted DataNodes.</description> </property> <property> <name>dfs.blocksize</name> <value>268435456</value> <description></description> </property> <property> <name>dfs.namenode.handler.count</name> <value>100</value> <description></description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/dn</value> </property></configuration>配置: mapred-env.sh文件
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp mapred-env.sh mapred-env.sh-bak$. vi mapred-env.sh# 在文件的末尾加入如下环境变量设置export JAVA_HOME=/usr/java/jdk-11/export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA配置: mapred-site.xml文件
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp mapred-site.xml mapred-site.xml-bak$. vi mapred-site.xml# 替换为如下内容<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description></description> </property> <property> <name>mapreduce.jobhistory.address</name> <value>bigdata01:10020</value> <description></description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>bigdata01:19888</value> <description></description> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/data/mr-history/tmp</value> <description></description> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/data/mr-history/done</value> <description></description> </property><property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value></property></configuration>配置: yarn-env.sh文件
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp yarn-env.sh yarn-env.sh-bak$. vi yarn-env.sh# 在文件的末尾加入如下环境变量设置# WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.# WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.export JAVA_HOME=/usr/java/jdk-11/export HADOOP_HOME=/usr/local/hadoop-3.4.0export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HADOOP_LOG_DIR=$HADOOP_HOME/logs/hdfs# export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop# export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn配置: yarn-site.xml文件
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp yarn-site.xml yarn-site.xml-bak$. vi yarn-site.xml# 替换为如下内容<?xml version="1.0"?><!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><configuration><!-- Site specific YARN configuration properties --><property> <name>yarn.log.server.url</name> <value>;/value> <description></description></property> <property> <name>yarn.web-proxy.address</name> <value>bigdata01:8089</value> <description>proxy server hostname and port</description> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> <description>Configuration to enable or disable log aggregation</description> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> <description>Configuration to enable or disable log aggregation</description> </property><!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>bigdata01</value> <description></description> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> <description></description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/data/nm-local</value> <description>Comma-separated list of paths on the local filesystem where intermediate data is written.</description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/nm-log</value> <description>Comma-separated list of paths on the local filesystem where logs are written.</description> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> <description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>Shuffle service that needs to be set for Map Reduce applications.</description> </property></configuration>修改workers文件
$. cd /usr/local/hadoop-3.4.0/etc/hadoop$. cp workers workers-bak$. vi workers# 替换为如下内容bigdata01bigdata02bigdata03分发hadoop到其它机器
# 在 bigdata01 节点执行cd /usr/local/scp -r hadoop-3.4.0 bigdata02:/usr/localscp -r hadoop-3.4.0 bigdata03:/usr/local修改 /etc/profile文件
所有节点均执行:
# 1. 配置/etc/profile$. vi /etc/profile export HADOOP_HOME=/usr/local/hadoop-3.4.0export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin# 2. 刷新环境变量$. source /etc/profile$. which hadoop创建所需目录在bigdata01执行:
mkdir -p /data/nnmkdir -p /data/dnmkdir -p /data/nm-logmkdir -p /data/nm-local在bigdata02执行:
mkdir -p /data/dnmkdir -p /data/nm-logmkdir -p /data/nm-local在bigdata03执行:
mkdir -p /data/dnmkdir -p /data/nm-logmkdir -p /data/nm-local格式化NameNode,
在bigdata01执行:
hadoop namenode -format
hadoop这个命令来自于:$HADOOP_HOME/bin中的程序
由于配置了环境变量PATH,所以可以在任意位置执行hadoop命令哦
启动hdfs集群
# 在bigdata01执行$. start-dfs.sh# 如需停止可以执行$. stop-dfs.sh# ------------------ 查看状态 ---------------- ## 方式1# 如果HDFS已成功启动,应该能看到以下进程:# ◦ NameNode(主节点)# ◦ DataNode(在每个存储数据的节点上)# ◦ 如果启用了HA(High Availability)模式,还会有一个SecondaryNameNode或NameNode standby实例。$. jps4408 NameNode4589 DataNode# 方式2# 访问HDFS的NameNode Web UI,它通常监听在9870端口: 方式4# 查看HDFS相关的日志文件(一般位于/var/log/hadoop/hdfs/目录下,# 具体位置取决于你的Hadoop配置),查找启动成功的确认信息或错误信息。# 方式5# 通过HDFS客户端命令测试$. hdfs dfs -ls /
start-dfs.sh这个命令来自于:$HADOOP_HOME/sbin中的程序
由于配置了环境变量PATH,所以可以在任意位置执行start-dfs.sh命令
启动yarn集群
# 在bigdata01执行start-yarn.sh# 如需停止可以执行stop-yarn.sh# ------------------ 查看状态 ---------------- ## 方式1$. yarn resourcemanager status$. yarn nodemanager status# 方式2# 主节点(ResourceManager)上应该有ResourceManager进程。# 从节点(NodeManager)上应该有NodeManager进程。$. jps4680 ResourceManager5241 NodeManager# 方式3# 打开浏览器访问ResourceManager Web UI,通常端口号是8088: 方式4# 查看YARN相关的日志文件(如/var/log/hadoop-yarn/目录下),# 寻找启动成功的确认信息或错误信息。启动历史服务器
$. mapred --daemon start historyserver# 如需停止将start更换为stop$. mapred --daemon stop historyserver启动web代理服务器
yarn-daemon.sh start proxyserver# 如需停止将start更换为stopyarn-daemon.sh stop proxyserver验证Hadoop集群运行情况验证进程
在bigdata01、bigdata02、bigdata03上通过jps验证进程是否都启动成功
# bigdata01$. jps8401 NameNode8513 DataNode9201 WebAppProxyServer9106 JobHistoryServer8712 SecondaryNameNode# bigdata02$. jps22768 DataNode# bigdata03$. jps26675 DataNode验证HDFS
浏览器打开: 创建文件test.txt,随意填入内容,并执行:
# hdfs dfs 命令等价于 hadoop fs$. hadoop fs -put test.txt /test.txt# or$. hdfs dfs -put test.txt /test.txt$. hadoop fs -cat /test.txt# or$. hdfs dfs -cat /test.txthadoop fs 命令参考
# -------------------------- 其它命令 -------------------------- # hadoop fs -ls / # 显示目录信息hadoop fs -ls -R / # 递归显示目录信息hadoop fs -mkdir /user/tguigu 在hdfs上创建目录hadoop fs -moveFromlocal test.txt /user/tguigu/data 从本地剪切粘贴到hdfshadoop fs -appendTofile test.txt /user/tguigudata/test.txt 追加一个文件到已经存在的文件末尾hadoop fs -cat 显示文件内容hadoop fs -tail 显示一个文件的末尾hadoop fs -cp /user/tguigu/../x.txt /user/tguigu/test../ 从hdfs的一个路径拷贝到hdfs的另一个路径hadoop fs -mv /user/tguigu/../x.txt /.../ 在hdfs目录中移动文件hadoop fs -get /user/tguigu/../x.txt ./ 等同于copyToLocal,就是从hdfs下载文件到本地hadoop fs -getmerge /user/tguigu//test/* ./zaiyiqi.txt 合并下载多个文件hadoop fs -put 等同于 copyFromLocal 上传hadoop fs -rm 删除文件或文件夹hadoop fs -rmdir 删除空目录hadoop fs -df 统计文件系统的可用空间hadoop fs -du 统计文件的大小信息hadoop fs -setrep 设置hdfs中文件的副本量数 # 例如: hadoop fs -setrep -R 3 /user/hadoop/data hadoop fs -setrep [-R] [-w] <numReplicas> <path> -setrep : 命令关键字,用于设置副本数。 -R:递归选项,如果指定,将对指定目录及其子目录下的所有文件进行副本数设置。 -w:等待选项,如果指定,命令将在所有副本都复制完成之后才返回。 <numReplicas>:你想要设置的副本数量。 <path>:你要更改副本数的HDFS文件或目录的路径。# -------------------------- hadoop fs ---------------------------- # # [root@lpf-vm-115 hadoop-3.4.0]# hdfs dfs -h# [root@lpf-vm-115 hadoop-3.4.0]# hadoop fs -h-h: Unknown commandUsage: hadoop fs [generic options] [-appendToFile [-n] <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum [-v] <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-concat <target path> <src path> <src path> ...] [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] [-q <thread pool queue size>] <localsrc> ... <dst>] [-copyToLocal [-f] [-p] [-crc] [-ignoreCrc] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>] [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] [-s] <path> ...] [-cp [-f] [-p | -p[topax]] [-d] [-t <thread count>] [-q <thread pool queue size>] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-v] [-x] <path> ...] [-expunge [-immediate] [-fs <path>]] [-find <path> ... <expression> ...] [-get [-f] [-p] [-crc] [-ignoreCrc] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] [-skip-empty-file] <src> <localdst>] [-head <file>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] [-d] [-t <thread count>] [-q <thread pool queue size>] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] [-s <sleep interval>] <file>] [-test -[defswrz] <path>] [-text [-ignoreCrc] <src> ...] [-touch [-a] [-m] [-t TIMESTAMP (yyyyMMdd:HHmmss) ] [-c] <path> ...] [-touchz <path> ...] [-truncate [-w] <length> <path> ...] [-usage [cmd ...]]Generic options supported are:-conf <configuration file> specify an application configuration file-D <property=value> define a value for a given property-fs <; specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.-jt <local|resourcemanager:port> specify a ResourceManager-files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster-libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath-archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machinesThe general command line syntax is:command [genericOptions] [commandOptions]验证YARN
浏览器打开: 执行:
# 创建文件words.txt,填入如下内容$. vi words.txtexample osc hadooposc hadoop hadooposc hadoop# 将文件上传到HDFS中$. hadoop fs -put words.txt /words.txt# 执行如下命令验证YARN是否正常# 在web界面能看到任务并且没有报错,则集群部署成功!hadoop jar \/usr/local/hadoop-3.4.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar \wordcount \-Dmapred.job.queue.name=root.root \/words.txt \/output$. hadoop fs -cat /words.txtyarn 参考命令
yarn常用命令参考: yarn常用命令
# ------------------------ yarn 参考命令 ------------------------ # # yarn 示例# 查看yarn运行状态(Hadoop 2.x版本)yarn resourcemanager statusyarn nodemanager status# 列出YARN集群中正在运行或最近运行过的所有应用程序的状态信息。$. yarn application -list [-appStates <state1, state2, ...>] [-all] $. yarn application -list -all# 查看指定应用的详细信息$. yarn application -status <Application ID># 查看应用的日志:yarn logs -applicationId <Application ID># 终止指定的应用yarn application -kill <Application ID>$. yarn top$. yarn node# 显示集群的节点信息:$. yarn node -list # 可以查看NodeManager information url和 nodeId$. yarn node -list -all$. yarn -showDetails$. yarn -list -showDetails# 查看单个节点详细状态:$. yarn node -states$. yarn node -states -all -list# -status <NodeId> Prints the status report of the node.$. yarn node -status bigdata02:30823# yarn -h[root@lpf-vm-115 bin]# yarn -hWARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.Usage: yarn [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS] or yarn [OPTIONS] CLASSNAME [CLASSNAME OPTIONS] where CLASSNAME is a user-provided Java class OPTIONS is none or any of:--buildpaths attempt to add class files from build tree--config dir Hadoop config directory--daemon (start|status|stop) operate on a daemon--debug turn on shell script debug mode--help usage information--hostnames list[,of,host,names] hosts to use in worker mode--hosts filename list of hosts to use in worker mode--loglevel level set the log4j level for this command--workers turn on worker mode SUBCOMMAND is one of: Admin Commands:daemonlog get/set the log level for each daemonnode prints node report(s)rmadmin admin toolsrouteradmin router admin toolsscmadmin SharedCacheManager admin tools Client Commands:app|application prints application(s) report/kill application/manage long running applicationapplicationattempt prints applicationattempt(s) reportclasspath prints the class path needed to get the hadoop jar and the required librariescluster prints cluster informationcontainer prints container(s) reportenvvars display computed Hadoop environment variablesfs2cs converts Fair Scheduler configuration to Capacity Scheduler (EXPERIMENTAL)jar <jar> run a jar filelogs dump container logsnodeattributes node attributes cli clientqueue prints queue informationschedulerconf Updates scheduler configurationtimelinereader run the timeline reader servertop view cluster informationversion print the version Daemon Commands:globalpolicygenerator run the Global Policy Generatornodemanager run a nodemanager on each workerproxyserver run the web app proxy serverregistrydns run the registry DNS serverresourcemanager run the ResourceManagerrouter run the Router daemonsharedcachemanager run the SharedCacheManager daemontimelineserver run the timeline serverSUBCOMMAND may print help when invoked w/o parameters or with -h.其它Web页面url
# hadoop首页 yarn resourcemanager web ui# Datanode Information yarn log url: hadoop jobhistory yarn.web-proxy.address NodeManager information
标签: #hadoop怎么集群部署 #hadoop集群部署方式分别是 #hadoop集群部署方式分别是哪几种 #hadoop集群如何运行