目录
一、前言
本次的遇到的问题为启动Hadoop集群后,jps查看进程中,ResourceManage节点没有启动,导致无法访问http://localhost:8088,通过查看ResourceManage的启动日志,日志中给出的错误为“已启用嵌入式自动故障转移,但未设置yarn.resourcemanager.zk-address”,笔者初步判断为yarn-site.xml文件未填写Zookeeper的端口,以及设置开启自动恢复和故障自动转移。
ResourceManage的日志,报出的问题
2023-04-14 03:56:09,668 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Embedded automatic failover is enabled, but yarn.resourcemanager.zk-address is not set
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Embedded automatic failover is enabled, but yarn.resourcemanager.zk-address is not set
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:70)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:142)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:267)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1185)
二、解决
修改yarn-site.xml文件,添加以下内容
<!-- 指定 Zookeeper 集群服务器的 Host:Port 列表 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>spark01:2181,spark02:2181,spark03:2181</value>
</property>
<!-- 开启自动恢复功能 -->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!-- 开启故障自动转移 -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
重新启动Hadoop集群后,jps查看进程中,ResourceManage节点启动成功
三、结尾
对于本次的问题,笔者认为非常离谱,笔者在之前使用hadoop-2.10.1的时候尚为配置,集群仍然可以正常启动,而此次发生问题的hadoop版本为2.7.4,在未详细指定的情况下ResourceManage节点不能正常启动,笔者只能暂且归咎为版本差异问题,本篇文章仅供读者参考,具体问题应先查看日志,然后具体分析,切忽盲目照搬。