卡夫卡详解
This article will teach you the basics of a fast-growing and reliable streaming platform that makes data processing and storage a breeze!
本文将教您快速增长且可靠的流平台的基础,该平台使数据处理和存储变得轻而易举!
什么是卡夫卡? (What is Kafka?)
Kafka is a publish/subscribe (pub/sub) messaging system that provides data streaming capabilities while also taking advantage of distributed computing.
Kafka是一个发布/订阅(pub / sub)消息传递系统,它在提供数据流功能的同时还利用了分布式计算的优势。
What is a pub/sub messaging system? A pub/sub messaging system contains two components that relay some form of data or information between each other. One component publishes data while the other component subscribes to the publisher to receive the published data.
什么是发布/订阅消息传递系统? 发布/订阅消息传递系统包含两个组件,它们在彼此之间中继某种形式的数据或信息。 一个组件发布数据,而另一个组件订阅发布者以接收发布的数据。
Kafka follows this pattern with its own set of components and features.
Kafka遵循此模式,具有自己的一组组件和功能。
生产者 (Producers)
The first component in a pub/sub messaging system is the publisher which is referred to as a Producer in Kafka. The producer is a data source that publishes or produces a message into Kafka. One of the great features of Kafka is that it is data type independent. This means that Kafka does not care about what type of data is being produced, whether it’s the GPS signal of a car, application metrics from front-end servers, or even images!
发布/订阅消息传递系统中的第一个组件是发布者,在Kafka中称为生产者 。 生产者是将数据发布或生成消息到Kafka中的数据源。 Kafka的一大特色是它与数据类型无关。 这意味着Kafka不在乎生成何种类型的数据,无论是汽车的GPS信号,前端服务器的应用程序度量标准,甚至是图像!
消费者 (Consumers)
The second component in a pub/sub messaging system is the subscriber, which is referred to as a Consumer in Kafka. The consumer can subscribe or listen to a data stream and consume messages from that stream while having no relationship or knowledge about the producers.
发布/订阅消息系统中的第二个组件是订户,在Kafka中称为“ 消费者” 。 消费者可以订阅或收听数据流,并消费该数据流中的消息,而无需与生产者有任何关系或了解。
Consumers can subscribe to multiple streams of data regardless of the type of data being consumed. In other words, you can have a single application that takes in data from as many different sources as you’d like. Kafka makes it easy to access the data you need while leaving the processing steps entirely in your control.
消费者可以订阅多个数据流,而不管消费的数据类型如何。 换句话说,您可以拥有一个应用程序,该应用程序可以根据需要从许多不同的源中获取数据。 通过Kafka,可以轻松访问所需的数据,同时将处理步骤完全留在您的控件中。
高层架构 (High-level Architecture)
Now that you know where messages come from (producers) and how messages can be retrieved (consumers), let’s discuss what happens in between.
现在您已经知道了消息来自何处(生产者)以及如何检索消息(消费者),下面让我们讨论一下两者之间发生的情况。
The above image illustrates a simple Kafka flow with three producers and two consumers. Each producer must specify a destination for the message and each consumer must specify from where it needs to consume. This middle ground between the producer and consumer, where the Kafka message is stored, is called a Topic.
上图显示了一个简单的Kafka流,其中包含三个生产者和两个消费者。 每个生产者必须指定消息的目的地,每个使用者必须指定消息的使用位置。 存储Kafka消息的生产者和消费者之间的中间地带称为Topic 。
主题,分区和偏移 (Topics, Partitions, and Offsets)
Topics can be thought of as a table in a database, where producers can write to and where consumers can read from. Each topic contains Partitions which are essentially logs that commit and append Kafka messages as they arrive. To identify messages, partitions use an auto-incrementing integer called an Offset, which is unique within partitions.
可以将主题视为数据库中的表,生产者可以在其中写数据,消费者可以从中读取数据。 每个主题都包含分区 ,这些分区本质上是在提交和提交 Kafka消息时将其添加的日志。 为了标识消息,分区使用称为Offset的自动递增整数,该整数在分区中是唯一的。
Offsets provide consumers the flexibility of reading messages when and from where they want, this is done by committing the offset. A commit from a consumer is like checking items on a list, once a message has been consumed, the commit tells Kafka to mark that offset as processed for that consumer.
偏移量使消费者可以在需要的时间和地点自由地读取消息,这是通过提交偏移量来完成的。 使用者的提交就像检查列表中的项目一样,一旦消耗了一条消息,该提交就会告诉Kafka将偏移量标记为针对该使用者处理的偏移量。
As a consumer, you have the ability to read partitions from a specified offset or from the last committed message. How can this be useful? Consider an application that receives data every 2 hours. In this case, having the application continuously running and waiting for messages can be very expensive. By reading from the last committed message, you could have the application go live every 8 hours to simply consume all new messages and commit the offset of the latest message. This can reduce costs and usage of resources significantly.
作为使用者,您可以从指定的偏移量或最后提交的消息中读取分区。 这有什么用? 考虑一个每2小时接收一次数据的应用程序。 在这种情况下,让应用程序连续运行并等待消息可能非常昂贵。 通过读取最后提交的消息,您可以使应用程序每8个小时上线,以消耗所有新消息并提交最新消息的偏移量。 这样可以大大降低成本和资源使用。
经纪人和集群 (Brokers and Clusters)
Now, you know about producers, consumers, and how Kafka messages flow within Kafka, but one of the most important components remain, the Kafka Broker. The broker is what ties the whole system together; it is the Kafka server that is responsible for dealing with all communications involving producers, consumers, and even other brokers. Producers rely on the broker to correctly accept and store the incoming Kafka message to its appropriate topic. Consumers rely on the broker to handle their fetch and commit requests while consuming from topics.
现在,您了解了生产者,消费者以及Kafka消息在Kafka中的流动方式,但是最重要的组件之一仍然是Kafka Broker 。 经纪人将整个系统联系在一起。 Kafka服务器负责处理涉及生产者,消费者甚至其他经纪人的所有通信。 生产者依靠代理正确地接受传入的Kafka消息并将其存储到其适当的主题。 消费者在从主题中消费时依赖于代理来处理其获取和提交请求。
A group of brokers is called a Kafka Cluster. One of the biggest perks of using Kafka is its use of distributed computing. A distributed system shares its workload among many other computers called nodes. These nodes all work together and communicate to complete the work rather than having all the work assigned to a single node. When we have multiple Kafka brokers and clusters dealing with large amounts of data, distributed computing saves resources and increases overall performance; making Kafka a desirable choice for big data applications.
一组经纪人称为Kafka 集群 。 使用Kafka的最大好处之一就是对分布式计算的使用。 分布式系统在称为节点的许多其他计算机之间共享工作量。 这些节点一起工作并进行通信以完成工作,而不是将所有工作分配给单个节点。 当我们有多个处理大量数据的Kafka代理和集群时,分布式计算可以节省资源并提高整体性能。 使Kafka成为大数据应用程序的理想选择。
摘要 (Summary)
Kafka is a great tool when it comes to handling and processing data especially with big data applications. It's a reliable platform that provides low-latency and high throughput with its data-streaming capabilities and gives an ample amount of helpful features and services to make your application better.
当涉及到处理和处理数据(尤其是大数据应用程序)时,Kafka是一个很棒的工具。 它是一个可靠的平台,其数据流功能可提供低延迟和高吞吐量,并提供大量有用的功能和服务,以使您的应用程序更好。
This has been a high-level overview of what Kafka is and how it works, but there is still much more to Kafka that makes it the great tool that it is. I recommend reading Kafka; The Definitive Guide, it provides in great detail the structure of Kafka and easy to follow steps on its use.
这是什么是Kafka以及它如何工作的高级概述,但是Kafka还有很多东西使它成为它的出色工具。 我建议读卡夫卡。 权威指南 ,其中详细介绍了Kafka的结构,并易于遵循使用步骤。
翻译自: https://medium.com/@chintan.mistry76/a-quick-introduction-to-kafka-101eedf28485
卡夫卡详解