kafka入门篇

概述
注意
二 Zookeeper集群搭建
三 Kafka入门
- 3.1 部署

概述

1.1 定义

Kafka是一个分布式的基于发布/订阅模式的消息队列(MQ),主要用于大数据实时处理领域。

1.2 消息队列

1.2.1 MQ传统应用场景之异步处理

在这里插入图片描述

1.2.2 消息队列的好处

解耦：允许你独立的扩展或者修改两边的处理过程，只要确保同样的接口约束
可恢复性：系统一部分组件失效，不会影响整个系统，降低进程间的耦合度
缓冲：解决生产和消费消息处理速度不一样的情况
灵活性&峰值处理能力：使用消息队列能够使关键组件顶住突发的访问压力，而不会因为超负荷的请求而崩溃
异步通信：很多时候，有些消息并不需要立即处理，放入一个队列，当资源足够，再去处理就行

1.2.3 消息队列的两种模式

点对点模式：一对一，消费者主动拉取数据，消息收到后清除
发布/订阅模式：一对多，消费者消费消息后不会清除

1.3 Kafka基础架构

在这里插入图片描述

注意

1.每个topic下的一个分区，只能被同一个消费者组内的消费者消费，eg:consumerA和consumerB是同一个消费者组，A消费topicA -partition0的数据的时候，B是没法消费 -partition0的数据的。B只能消费 -partition1的数据

2.如果一个消费者组内只有一个消费者，比如A，那么A就要消费 -partition0和 -partition1的数据集

3.消费者组内的消费者的个数=partition分区的个数，消费效率最快。
消费者组内的消费者个数>partition分区数，会有闲置资源浪费

4.kafka依赖于zk，zk中会保存一些元数据。消费者也会存一部分数据到zk（例如消费消息到哪了，offset值），先放在缓存，在更新到zk

特别注意：0.9版本之前，offset存在zk,0.9版本之后，存储在本地（kafka）

解耦，不想太过于依赖zk

1.4 相关阅读

consumer和partition对应的关系：

https://blog.csdn.net/mxw2552261/article/details/101441652

关于下述Kafka概念的解释以及副本机制的详解

https://mp.weixin.qq.com/s?__biz=Mzg5NjMxMTYxNQ==&mid=2247486146&idx=1&sn=71d0c1e58df2243f2bc582b5fded7ce6&source=41#wechat_redirect

1.5 kafka概念：

Producer: 消息生产者，就是向Kafka broker 发消息的客户端。
Consumer: 消息消费者，向Kafka broker 取消息的客户端。
Consumer Group(CG): 消费者组，由多个consumer组成，消费者组内每个消费者负责消费不同分区的数据，一个分区只能由一个组内消费者消费；消费者组内互不影响。所有的消费者都属于某个消费者组，即消费者逻辑上的一个订阅者。
Broker: 一台Kafka服务器就是一个broker，一个集群由多个broker组成。一个broker可以容纳多个Topic。
Topic:可以理解为一个队列，生产者和消费者面向的都是同一个Topic。
Partition: 为了实现扩展性，一个特别大的Topic可以分布在多个Broker上(即服务器上)，一个topic可以分为多个partition，每个partition都是一个有序的队列。
Replica: 副本，为保证集群中某个节点发生故障时，该节点partition数据不丢失，且Kafka仍能继续工作，Kafka提供了副本机制，一个Topic的每个分区都有若干副本，一个leader和若干个follower。
Leader: 每个分区多个副本的“主”，生产者发送数据的对象，以及消费者消费数据的对象都是Leader。
Follower: 每个分区多个副本中的“从”，实时从Leader中同步数据，保持和Leader数据的同步。Leader发生故障时，某个Follower会成为新的Leader。

二 Zookeeper集群搭建

2.1 zookeeper下载

https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/

2.2 linux安装JDK

https://www.oracle.com/java/technologies/javase/javase8u211-later-archive-downloads.html

2.2.1 下载.tar.gz包，上传到服务器 /usr/local/software下


#解压文件
tar -zxvf jdk-8u311-linux-x64.tar.gz

#配置环境变量
vi /etc/profile

#在文件末尾追加
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_311 （你上述jdk文件地址）
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}

#配置文件生效
source /etc/profile

#测试javac
javac 
java -version
2.3 压缩zookeeper文件，修改配置文件
#将 zoo_sample.cfg 文件复制并重命名为 zoo.cfg 文件。
cp zoo_sample.cfg zoo.cfg






server.A=B:C:D
　　　　A：其中 A 是一个数字，表示这个是服务器的编号；
　　　　B：是这个服务器的 ip 地址；
　　　　C：Zookeeper服务器之间的通信端口；
　　　　D：Leader选举的端口。
2.4 创建myid文件
在上述data文件夹下创建myid文件


我们在上述配置集群的时候，当前server对应几，myid中就写几
server.0=192.168.146.200:2888:3888
server.1=192.168.146.201:2888:3888
server.2=192.168.146.202:2888:3888

2.5 搭建遇到的问题

2.5.1 集群IP地址错误

上述配置中配成0.0.0.0是遇到的报错

在这里插入图片描述

解决方案：https://stackoverflow.com/questions/30940981/zookeeper-error-cannot-open-channel-to-x-at-election-address

2.5.2 防火墙的问题

去对应服务器开启对应端口的安全组，开放2181，2888，3888端口

查看firewall服务状态
systemctl status firewalld
开启
systemctl startfirewalld
重启
systemctl enable firewalld
关闭
systemctl stop firewalld
查看防火墙规则
firewall-cmd --list-all
查询端口是否开放
firewall-cmd --query-port=8080/tcp
开放80端口
firewall-cmd --permanent --add-port=80/tcp
移除端口
firewall-cmd --permanent --remove-port=8080/tcp
重启防火墙(修改配置后要重启防火墙)
firewall-cmd --reload
2.6 启动zookeeper
[root@shanghai2-zone3-01 bin]# ./zkServer.sh start

#有时候会报错，说找不到java_home
可以vi zkEnv.sh

在这里插入图片描述

#停止
[root@shanghai2-zone3-01 bin]# ./zkServer.sh stop

#重启
[root@shanghai2-zone3-01 bin]# ./zkServer.sh restart(没试过)

#查看zk集群状态
[root@shanghai2-zone3-01 bin]# ./zkServer.sh status

在这里插入图片描述

三 Kafka入门

3.1 部署

3.1.1 载kafka安装包

   kafka.apache.org

3.1.2 集群部署

解压安装包
[root@ebs-60287 software]# tar -zxvf kafka_2.11-0.11.0.0.tgz -C /opt/module/
修改解压后的文件名
[root@ebs-60287 module]# mv kafka_2.11-0.11.0.0/ kafka
在/opt/module/kafka目录下创建logs文件夹
[root@ebs-60287 kafka]# mkdir logs
修改配置文件（只需要修改server.properties）
– broker设置唯一标识

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

-- 设置是否可以删除主题，默认false
# Switch to enable topic deletion or not, default value is false
#delete.topic.enable=true

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
- Log的配置
############################# Log Basics #############################

-- 日志文件地址（数据和日志都在）
# A comma seperated list of directories under which to store log files
log.dirs=/opt/module/kafka/logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1


############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

-- 日志文件（数据）保留多长时间   日志和数据都是以.log存放的
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

- ZK的配置
############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
- 配置环境变量

3.1.3 相关命令

3.1.3.1 启动

#启动阻塞
bin/kafka-server-start.sh config/server.properties
#启动后台
bin/kafka-server-start.sh -daemon config/server.properties

3.1.3.2 停止

#停止
bin/kafka-server-stop.sh config/server.properties

3.1.3.3 创建topic

#创建topic
bin/kafka-topics.sh --create --zookeeper 192.168.1.200:2181 --topic test --partitions 1 --replication-factor 1
参数：
–topic 定义topic名
–replication-factor 定义副本数
–partitions 定义分区数

3.1.3.4 删除topic

#删除topic
bin/kafka-topics.sh --zookeeper 192.168.1.200:2181 --delete --topic first

3.1.3.5 查看topic和详情

#查看已创建的topic信息：（ip需要和config/server.properties的保持一致）

bin/kafka-topics.sh --list --zookeeper 192.168.1.200:2181

#查看某个topic详情

bin/kafka-topics.sh --zookeeper 192.168.1.200:2181 --describe --topic first

3.1.3.6 开启生产者模式

#开启生产者发送消息：（ip需要和config/server.properties的保持一致）

bin/kafka-console-producer.sh --broker-list 192.168.1.200:9092 --topic test

3.1.3.7 开启消费者模式

#开启消费者消费消息：（ip需要和config/server.properties的保持一致）

bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.200:9092 --topic test --from-beginning

3.1.3.8 修改分区数

#修改分区数
bin/kafka-topic.sh --zookeeper 192.168.1.200:2181 --alter --topic first --partitions 6

3.1.3.9 群起/停脚本：

在这里插入图片描述

3.1.3.10 kafka相关报错问题

在这里插入图片描述

该情况下的虚拟机对外ip[暴露的ip]和真实ip[ifconfig显示的ip]可能只是映射关系，用户访问对外ip时，OpenStack会转发到对应的真实ip实现访问。
但此时如果 Kafka server.properties配置中的listeners=PLAINTEXT://xx.xx.xx.xx:9092中的ip配置为[对外ip]的时候无法启动，因为socket无法绑定监听，报如下错误
解决方法也很简单，listeners=PLAINTEXT://10.20.30.153:9092中的ip改为真实ip[ifconfig中显示的ip]即可，其他使用时正常使用对外ip即可，跟真实ip就没有关系了。
例:假如服务器对外访问的ip为123.44.55.66 内网地址为10.20.30.153

listeners=PLAINTEXT://10.20.30.153:9092
advertised.listeners=PLAINTEXT://123.44.55.66:9092

可通过ip a
查看eth0网卡的ip ->listeners
服务器Ip->advertised.listeners

3.1.4 本地测试

3.1.4.1 创建topic

在这里插入图片描述

topic：yangdezhi 创建成功。同时集群模式下，另外两个节点也能看到这个topic
在这里插入图片描述

在这里插入图片描述

kafka入门篇

kafka入门篇

概述

1.1 定义

1.2 消息队列

1.2.1 MQ传统应用场景之异步处理

1.2.2 消息队列的好处

1.2.3 消息队列的两种模式

1.3 Kafka基础架构

注意

特别注意：0.9版本之前，offset存在zk,0.9版本之后，存储在本地（kafka）

1.4 相关阅读

1.5 kafka概念：

二 Zookeeper集群搭建

2.1 zookeeper下载

2.2 linux安装JDK

2.2.1 下载.tar.gz包，上传到服务器 /usr/local/software下

2.5 搭建遇到的问题

2.5.1 集群IP地址错误

2.5.2 防火墙的问题

三 Kafka入门

3.1 部署

3.1.1 载kafka安装包

3.1.2 集群部署

3.1.3 相关命令

3.1.3.1 启动

3.1.3.2 停止

3.1.3.3 创建topic

3.1.3.4 删除topic

3.1.3.5 查看topic和详情

3.1.3.6 开启生产者模式

3.1.3.7 开启消费者模式

3.1.3.8 修改分区数

3.1.3.9 群起/停脚本：

3.1.3.10 kafka相关报错问题

3.1.4 本地测试

3.1.4.1 创建topic

悦读