Bootstrap

流计算项目实战【KAFKA+FLUME+SPARKSTRAMING+HBASE】

1、 使用python 造数据

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import random
import time


# In[2]:


url_paths=[
    "/class/112.html",
    "/class/132.html",
    "/class/146.html",
    "/class/177.html",
    "/class/212.html",
    "/class/342.html",
    "/class/202.html",
    "/class/562.html",
    "/class/862.html",
    "/course/111.html",
    "/course/332.html",
    "/learn/103.html",
    "/learn/992.html",
    "/error/172.html"
]

ip_slices=[12,152,153,198,214,123,26,45,99,45,36,72,99,203,204,129,238]

##搜索引擎跳转

http_referers=[
    "http://www.baidu.com/s?wd={query}",
    "http://www.sogou.com/wb?wd={query}",
    "http://www.bing.com/search?wd={query}",
    "http://www.yahoo.com/q?wd={query}",
    "http://www.meituan.com/look?se={query}",
    "http://www.tencent.com/find?p={query}",
]

search_word=[
    "sparksql实战",
    "HIVE数据仓库",
    "python爬虫实战",
    "spark-streaming流计算",
    "KAFKA数据传输",
    "hadoop大数据基础",
    "scala语言"
]

status_codes=["200","-","423","-","-","123","200","-","105","-","-","184","-","-"]


# In[3]:


### 模拟生成 课程地址

def course_gennerate():
    return random.sample(url_paths,1)[0]

### 模拟生成 IP 地址    
def ip_gennerate():
    ip01=random.sample(ip_slices,4)   
    return '.'.join(str(i) for i in  ip01)


### 模拟生成 状态 地址    
def state_code():
    state=random.sample(status_codes,1)[0]   
    return state


### 模拟生成 搜索跳转链接  
def link_gennerate():
    link=random.sample(http_referers,1)[0] 
    refer=random.sample(search_word,1)[0]   
    return link.format(query=refer)


# In[4]:


state_code()


# In[5]:


##  生成包含IP地址与 课程地址的  LOG 日志

def generate_log(count=20):
    time_str=time.strftime('%Y-%m-%d %H:%M:%S',time.localtime())
    ##创建一个输出的文件
    f=open("/home/hadoop/data/project_test/access.log","w+")
    while count>=1:
        query_log="{localtime}\t{url}\t{ip}\t{link}\t{state_code}".format(url=course_gennerate(),
                                                             ip=ip_gennerate(),
                                                             link=link_gennerate(),
                                                            state_code=state_code(),
                                                            localtime=time_str)
        print(query_log)
        f.write(query_log + "\n")
        count=count-1



# In[6]:


generate_log()


# In[ ]:

2. 设置定时脚本,并设置定时

悦读

道可道,非常道;名可名,非常名。 无名,天地之始,有名,万物之母。 故常无欲,以观其妙,常有欲,以观其徼。 此两者,同出而异名,同谓之玄,玄之又玄,众妙之门。

;