flume + kafka + SparkStreaming
1.首先演示案例 linux学过的知识点监控文件tail -F 文件名 另一个窗口中往文件中添加数据
tail -F qqq.txt
echo "abcdfs" >> qqq.txt
模拟WEB服务器产生日志的过程:
流的机制是先写到缓存中,一定大小之后再写到磁盘上,
所以flume采集并不会看到一条一条的效果,
让流写一条刷新一次,模拟web服务器产生日志效果
1) SocketTest.java 创建socket类用来读取文件写入到另一个文件中
import java.io.*;
public class SocketTest {
public static void main(String[] args) throws IOException, InterruptedException {
File ctoFile = new File(args[0]);
File dest=new File(args[1]);
InputStreamReader rdCto = new InputStreamReader(new FileInputStream(ctoFile));
OutputStreamWriter writer=new OutputStreamWriter(new FileOutputStream(dest));
BufferedReader bfReader = new BufferedReader(rdCto);
BufferedWriter bwriter=new BufferedWriter(writer);
PrintWriter pw=new PrintWriter(bwriter);
String txtline = null;
while ((txtline = bfReader.readLine()) != null) {
Thread.sleep(2000);
pw.println(txtline);
pw.flush();
}
bfReader.close();
pw.close();
}
}
2)在linux上创建文件夹和文件 SocketTest/data.log
mkdir SocketTest
touch data.log
3)编译成.class上传到linux系统,需要两个参数 第一个参数 数据源 第二参数 目标文件
java SocketTest access.20120104.log SocketTest/data.log
4)监测 data.log文件
$> tail -F data.log
【flume + kafka + SparkStreaming】
第一步:【采用flume监控日志文件 (替换掉tail -F data.log)】
flume-exec-logger.conf
agent.sources = r1
agent.sinks = k1
agent.channels = c1
#指定类型为exec
agent.sources.r1.type = exec
agent.sources.r1.command = tail -F /home/hyxy/Desktop/SocketTest/data.log
agent.sources.r1.channels = c1
#设置成控制台输出用于测试有没有监听到输入文件的信息
agent.sinks.k1.type = logger
agent.sinks.k1.channel = c1
agent.channels.c1.type = memory
agent.channels.c1.capacity = 100
agent.channels.c1.transactionCapacity = 100
启动flume
$>flume-ng agent -n agent -c /home/hyxy/apps/flume/conf/ -f /home/hyxy/apps/flume/conf/flume-exec-logger.conf -Dflume.root.logger=INFO,console
第二步:【flume + kafka】
flumekafa.conf
agent.sources = r1
agent.sinks = k1
agent.channels = c1
#指定类型为exec
agent.sources.r1.type = exec
agent.sources.r1.command = tail -F /home/hyxy/Desktop/SocketTest/data.log
agent.sources.r1.channels = c1
#kafka对应的版本1.6 http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#kafka-sink
agent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.k1.topic = test222
agent.sinks.k1.brokerList = localhost:9092
agent.sinks.k1.batchSize = 20
agent.sinks.k1.requiredAcks = 1
agent.sinks.k1.ch