集群启动HDFS start-dfs.sh,
22038 NameNode
22134 DataNode
22314 JournalNode
缺少zkfs,问题解决。 查看日志:cat /opt/sxt/Hadoop-2.6.5/logs/hadoop-root-zkfc-Node2.log
2021-08-09 14:09:14,036 FATAL org.apache.hadoop.ha.ZKFailoverController:
Unable to start failover controller. Parent znode does not exist.
Run with -formatZK flag to initialize ZooKeeper.
问题解决: 停止集群,在有问题的结点重新初始化 zkfc, 再重启zookeeper.
hdfs zkfc -formatZK
start-zk.sh
start-hdfs.sh
22476 DFSZKFailoverController
start-yarn.sh
./bin/yarn rmadmin -getServiceState rm1
./bin/yarn rmadmin -getServiceState rm2
发现 ResourceManager 结点都是active,无standby; 说明HA配置有问题。
结合 yarn Web可以访问说明是zookeeper 配置有问题,检查zoo.cfg文件。
`#
dataDir=/opt/sxt/zookeeper/zkData
server.1=Node1:2888:3388
server.2=Node2:2888:3388
server.3=Node3:2888:3388
`#
注意这两个配置项,还需在Node1、Node2、Node3 指定的dataDir目录下,还要有文件myid分别写入1、2、3 参数.
随后重启zookeeper, 重启Yarn. 再来观察:
./bin/yarn rmadmin -getServiceState rm1
./bin/yarn rmadmin -getServiceState rm2
发现分别有 active,standby;再访问 Yarn Web:8088, HDFS Web:50070;皆正常!