最近业务中涉及到远程服务器的日志收集需求, 出于限制技术栈扩大的想法,使用PHP进行了实现.
实现过程中有些小小需要注意的点,记录如下:
1. 主动获取. 由于服务器较多, 如果使用Flume之类的架构, 需要在每台服务器上安装软件, 这就产生了运维成本 . 所以我们使用 收集端主动获取的方式. 不需要在生产者(服务端)安装软件.
2.SSH连接. 每台服务器都配置了SSH连接权限,使用PHP的 ssh2扩展即可远程连接并访问服务器内容.
3.服务器日志结构统一. 每台服务器上的日志文件都按同一目录 规则放置,以简化程序逻辑.
4.CLI运行. 收集是持续运行的程序,使用CLI模式,要注意,此时所使用的INI文件问题.
5.SSH连接异常. 有时,由于网络问题,导致SSH连接或验证失败, 延时重试即可.
6.日志截断与压缩. 通常,我们的运维会在每天的固定时间对日志进行截断和压缩, 这就有了两种类型的文件需要读取:压缩与未压缩的日志, 需要分别处理.
7.日志中的时间戳. 以秒为单位 的时间戳不足以区分请求, 我们增加$msec以毫秒计量, 同一毫秒内,同一IP来源,同一UA的可以认为是一个请求.
8.读取目录. 使用readdir即可读取SSH格式的远程目录, readdir("ssh2.sft://......"); 过滤掉不需要的文件后, 按文件创建时间排序,逐个处理.
9.读取压缩文件. 如果用file_get_contents会导致界面长期无响应, 我使用了fopen, fread 分步读取. 一次读取8K(再大也没有用了). 每读取一定次数后,输出一个进度显示.
10.压缩文件缓存. 读取成功后, 保存到缓存目录 , 以便备份以及下次使用. 如果程序出错或重新运行时, 先检查缓存目录, 如果有缓存文件,就不用从网络上读取了.
11.解压缩. 使用gzdecode即可. 这会导致PHP内存需要暴增, 调整PHP.INI吧, 把内存限制扩大.
12.压缩日志处理完成记录. 处理完成一个压缩文件后, 在数据库中记录下来, 以后PHP程序运行后,就不用重复处理了.
13.未压缩日志处理. 未压缩的日志表明,此日志仍在增长中. 不需要缓存. 使用数据库记录,当前文件指针(使用ftell,fseek). 记录文件创建日期.
14.未压缩日志判断. 当文件日期与记录的日期不同时, 或文件小于记录中的文件大小, 说明 此文件被更新了, 需要重置文件指针.
否则可以直接定位(fseek),以继续从上次处理的位置进行.
15.日志行分解. 使用正则即可,根据空格及定界符进行区分. 也可使用logParser第三方类库来处理. 为节省内存开销.可使用Iterator 协程模式, 逐行返回.
16.日志判重. 事先读取每个服务器的最后 日志时间戳(毫秒)以及IP,UA.
17.日志保存. 我是使用了MYSQL来保存日志. 每一行日志执行一次MYSQL会极大浪费运行时间, 可以累积4000行再一次性插入.
18.错误处理. 除了SSH连接失败外, 还会读取半行日志,导致分解失败, 此时也抛出异常. 由主程序捕获,并重新运行即可.
源程序如下:
<?php
/**
* Created by IcePHP Framework.
* User: 蓝冰大侠
* Date: 2018/4/11
* Time: 15:09
*/
class MLogImport
{
/**
* 当前正在处理的站点名称
* @var string
*/
private $feed;
/**
* 当前正在处理的站点主机地址
* @var string
*/
private $host;
/**
* 当前站点的登录名/站点名称
* @var string
*/
private $user;
/**
* 当前站点的登录密码
* @var string
*/
private $pass;
/**
* 当前站点的日志所在目录
* @var string
*/
private $logPath;
//用于本地保存服务器日志的目录,仅备份压缩后的日志
const CACHE_PATH = DIR_ROOT . 'run/serverLog/';
/**
* 当前站点的SSH连接
* @var resource
*/
private $sftp;
/**
* 当前正在处理的文件名
* @var string
*/
private $file;
/**
* 本站最后一条日志
* @var array
*/
private $lastRow;
private $agentPatterns;
private $agentReplaces;
public function __construct()
{
$maps = [
'/AppleWebKit\/[\d\.]*/i' => 'AppleWebKit/...',
'/Mobile\/[\d\w]*/i' => 'Mobile/...',
'/Safari\/[\d\.]*/i' => 'Safari/...',
'/CriOS\/[\d\.]*/i' => 'CriOS/...',
'/GSA\/[\d\.]*/i' => 'GSA/...',
'/Version\/[\d\.]*/i' => 'Version/...',
'/Chrome\/[\d\.]*/i' => 'Chrome/...',
'/Edge\/[\d\.]*/i' => 'Edge/...',
'/Firefox\/[\d\.]*/i' => 'Firefox/...',
'/SamsungBrowser\/[\d\.]*/i' => 'SamsungBrowser/...',
'/build\/[\d\w\-\.]*/i' => 'build/...',
'/Silk\/[\d\.]*/i' => 'Silk/...',
'/Crosswalk\/[\d\.]*/i' => 'Crosswalk/...',
'/Gecko\/[\d\.]*/i' => 'Gecko/...',
'/NTENTBrowser\/[\.\d]*/i' => 'NTENTBrowser/...',
'/Snapchat\/[\d\w\-\.]*/i' => 'Snapchat/...',
'/Java\/[\d\.]*/i' => 'Java/...',
'/UCBrowser\/[\d\.]*/i' => 'UCBrowser',
'/\(Linux[^\)]*SAMSUNG[^\)]*\)/i' => 'SAMSUNG...',
'/\([^\)]*IPAD[^\)]*\)/i' => 'IPAD...',
'/\([^\)]*SM-[^\)]*\)/i' => 'SM...',
'/LG\-[\w\d]*/i' => 'LG...',
'/LGL\d[\w\d]*/i' => 'LGL...',
'/itel it\d*/i' => 'itel...',
'/XT\d*/i' => 'XT...',
'/TECNO\-[\w\d]*/i' => 'TECNO...',
'/RCT[\d\w]*/i' => 'RCT...',
'/Micromax\s[\w\d]*/i' => 'Micromax...',
'/LGMS[\d]*/i' => 'LGMS...',
'/GT\-[\w\d]*/i' => 'GT...',
'/HUAWEI\s[\-\w\d]*/i' => 'HUAWEI...',
'/Lenovo\s[\-\w\d]*/i' => 'Lenovo...',
'/SCH\-[\w\d]*/i' => 'SCH...',
'/rv\:[\d\.]*/i' => 'rv:...',
'/Lumia\s\d+/i' => 'Lumia...',
'/Instagram\s[\d\.]*/i' => 'Instagram...',
'/iPhone OS 5[_\d]*/i' => 'iOS 5...',
'/iPhone OS 6[_\d]*/i' => 'iOS 6...',
'/iPhone OS 7[_\d]*/i' => 'iOS 7...',
'/iPhone OS 8[_\d]*/i' => 'iOS 8...',
'/iPhone OS 9[_\d]*/i' => 'iOS 9...',
'/iPhone OS 10[_\d]*/i' => 'iOS 10...',
'/iPhone OS 11[_\d]*/i' => 'iOS 11...',
'/iOS 5[_\d]*/i' => 'iOS 5...',
'/iOS 6[_\d]*/i' => 'iOS 6...',
'/iOS 7[_\d]*/i' => 'iOS 7...',
'/iOS 8[_\d]*/i' => 'iOS 8...',
'/iOS 9[_\d]*/i' => 'iOS 9...',
'/iOS 10[_\d]*/i' => 'iOS 10...',
'/iOS 11[_\d]*/i' => 'iOS 11...',
'/Android 2[\.\d]*/i' => 'Android 2...',
'/Android 3[\.\d]*/i' => 'Android 3...',
'/Android 4[\.\d]*/i' => 'Android 4...',
'/Android 5[\.\d]*/i' => 'Android 5...',
'/Android 6[\.\d]*/i' => 'Android 6...',
'/Android 7[\.\d]*/i' => 'Android 7...',
'/Android 8[\.\d]*/i' => 'Android 8...',
'/QuantcastSDK[^\s]*(\s\(\d+\))?/i' => 'QuantcastSDK...',
];
$this->agentPatterns = array_keys($maps);
$this->agentReplaces = array_values($maps);
}
/**
* 记录一个站点的账号,密码,日志路径
* @param string $host 主机/账号
* @param string $user 站点名称/登录名
* @param string $pass 登录密码
* @param string $logPath 日志文件路径
*/
public function site(string $host, string $user, string $pass, string $logPath): void
{
$this->feed = $user;
$this->host = $host;
$this->user = $user;
$this->pass = $pass;
$this->logPath = $logPath;
//重新连接的间隔时间
$interval = 1;
$connect = null;
while (true) {
//连接主机
$connect = ssh2_connect($this->host, '22');
//账号密码验证成功
if (false !== ssh2_auth_password($connect, $this->user, $this->pass)) {
break;
}
//间隔时间2秒,4,8,...
$interval *= 2;
echo "auth wrong at $this->host, retry after $interval seconds\r\n";
//间隔指定 时间后,重新连接
sleep($interval);
}
//登录成功
echo "\r\nlogin $this->feed\r\n";
//读取文件列表
$this->sftp = ssh2_sftp($connect);
if(!$this->sftp){
throw new Exception('ssh2_sftp fail.');
}
$handle = opendir("ssh2.sftp://{$this->sftp}{$this->logPath}"); //ssh2.sftp://Resource #33/home/.....
if (!$handle) {
throw new Exception('open dir ssh2.sftp fail.');
}
$zippedFiles = [];
$unzippedFile = '';
while (false !== ($file = readdir($handle))) {
$filePath = "ssh2.sftp://{$this->sftp}{$this->logPath}/$file";
//必须是文件,目录的不要
if (!is_file($filePath)) continue;
//必须是访问日志
if (left($file, 10) !== 'access.log') continue;
//如果是压缩文件
if (substr($file, -3) === '.gz') {
//4.5之前的不处理(这天改格式了)
if (substr($file, 11, 8) < '20180405') continue;
$zippedFiles[] = $file;
} else {
$unzippedFile = $file;
}
}
closedir($handle);
//本站最后请求时间
$this->lastRow = table('log')->row('*', ['feedName' => $this->feed], 'id desc')->toArray();
//按创建时间正序排序
asort($zippedFiles);
//逐个文件处理压缩文件
foreach ($zippedFiles as $file) {
$this->file = $file;
$this->zipped();
}
//如果有非压缩日志,处理
if ($unzippedFile) {
$this->file = $unzippedFile;
$this->unzipped();
}
}
/**
* 读取远程 文件内容
* @param $indicator string 远程 文件指示器
* @param $size int 文件大小
* @return Iterator 遍历器
*/
private function readUnzipped(string $indicator, int $size): Iterator
{
echo "Begin read File:$this->file:" . STool::kmgt($size) . "\r\n";
//打开文件,指向上次读取的位置
$f = fopen($indicator, 'r');
if (!$f) {
return;
}
if ($this->offset) {
fseek($f, $this->offset);
echo "Seek to $this->offset\r\n";
}
//总行数
$lines = 0;
//逐行读取
while (!feof($f)) {
$lines++;
$line = fgets($f);
//更新偏移量
$this->offset = ftell($f);
//返回行数
yield $line;
//每200行输出一个显示
if ($lines % 500 == 0) {
echo "read $this->feed $this->file Lines:$lines\r\n";
}
}
fclose($f);
echo "read $this->feed $this->file Lines:$lines\r\n";
echo "End.\r\n";
}
/**
* 读取远程 文件内容
* @return string 缓存文件路径
*/
private function readZipped(): string
{
//构造远程文件地址
$indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";
//文件大小
$fileSize = filesize($indicator);
$size = STool::kmgt($fileSize);
//如果有缓存文件且缓存文件大小一致,则使用缓存文件
$cacheFile = self::CACHE_PATH . $this->feed . '/' . $this->file;
if (is_file($cacheFile) and filesize($cacheFile) == $fileSize) {
echo "Read Zipped File From Cache:" . $this->file . ' ' . $size . "\r\n";
return $cacheFile;
}
//从服务器读文件
echo "Begin read File:{$this->file}:" . $size . "\r\n";
$fileHandle = fopen($indicator, 'rb');
if (!$fileHandle) {
dump($indicator, 'OPEN FAIL');
exit;
}
//读取远程文件内容
$content = '';
$i = 0;
while (!feof($fileHandle)) {
//每次能读回8K字节
$content .= fread($fileHandle, 65536);
//每128K显示一次读取进度
$i++;
if ($i % 16 == 0) {
echo "$this->feed $this->file Reading :" . STool::kmgt(strlen($content)) . "/$size\r\n";
}
}
fclose($fileHandle);
//保存到缓存文件中
echo "Save to cache:" . $cacheFile . " \r\n";
makeDir(dirname($cacheFile));
file_put_contents($cacheFile, $content);
//返回压缩文件内容
return $cacheFile;
}
/**
* 字符串分行
* @param string $content
* @return Iterator
*/
public function explode(string $content): Iterator
{
$size = strlen($content);
$pointer = 0;
while ($pointer < $size) {
$next = strpos($content, "\n", $pointer);
if ($next === false) {
$line = substr($content, $pointer);
$next = $size;
} else {
$line = substr($content, $pointer, $next - $pointer);
}
yield $line;
$pointer = $next + 1;
}
}
private function valid(string $url): bool
{
return false !== strpos($url, '/?s=') or false !== strpos($url, '/?ss=') or preg_match('/^\/.*\/.*\/$/i', $url);
}
/**
* 处理一个压缩日志文件
*/
private function zipped(): void
{
//检查文件已经处理过
$fileTable = table('zipped');
if ($fileTable->exist(['feedName' => $this->feed, 'fileName' => $this->file])) return;
//读取文件内容
$gz=gzopen($this->readZipped(),'r');
echo "\r\nBegin Process File\r\n";
//$memTable = $this->createTemporaryTable(uniqid('tmp_'));
//要插入的日志表
$logTable = table('log');
//要插入的行缓冲区
$rows = [];
$insertRowsCount = 0;
$content = null;
$key=0;
while(!gzeof($gz)) {
$line=gzgets($gz);
if ((++$key) % 30000 == 0) {
echo "Analysis LINES:$key\r\n";
}
//空行不处理
$line = trim($line);
if (!$line) continue;
//行分解
$parts = $this->explodeLine($line);
if (!$parts) continue;
//判断 是否是 搜索 行
if (!$this->valid($parts['url'])) continue;
//检查是否已经处理过
if ($this->lastRow) {
if ($parts['timestamp'] < $this->lastRow['timestamp']) continue;
if ($parts['timestamp'] == $this->lastRow['timestamp'] and $parts['url'] == $this->lastRow['url'] and $parts['ip'] == $this->lastRow['ip']) {
continue;
}
}
//加入缓冲 区
$parts['feedName'] = $this->feed;
$rows[] = $parts;
//每4000行执行一次插入,再多就会出现placeholder太多
if (count($rows) >= 4000) {
$logTable->inserts($rows);
$insertRowsCount += count($rows);
SDebug::clearMsgs();
$rows = [];
}
}
//处理最后剩余的行
if (count($rows)) {
$logTable->inserts($rows);
$insertRowsCount += count($rows);
SDebug::clearMsgs();
}
echo "insert LINES:$insertRowsCount\r\n";
//标记此文件已经处理过
//$fileTable->begin();
//$this->move($memTable);
$fileTable->insert(['feedName' => $this->feed, 'fileName' => $this->file]);
//$fileTable->commit();
}
/**
* 将临时表中的日志转移到正式表中
* @param STable $memTable 临时表对象
*/
private function move(STable $memTable)
{
$fields = ['feedName', 'accessTime', 'timestamp', 'ip', 'requestTime', 'responseTime', 'method', 'url', 'code', 'length', 'referrer', 'agentId', 'created', 'updated', 'forward'];;
$fieldsStr = implode(',', $fields);
$memTable->execute("Insert" . " Into log($fieldsStr) select $fieldsStr from " . $memTable->name());
$memTable->deleteAll();
}
/**
* 当前文件的偏移
* @var int
*/
private $offset;
/**
* 处理一个未压缩的日志文件
*/
private function unzipped(): void
{
//检查上次处理情况
$fileTable = table('unzipped');
//如果没有记录,则生成一条初始记录
if ($fileTable->notExist(['feedName' => $this->feed])) {
$fileTable->insert(['feedName' => $this->feed, 'offset' => 0, 'size' => 0, 'timestamp' => 0]);
}
//取出处理信息,其中包含 offset(上次文件指针位置),size(上次文件大小), lasttime(上次最后时间)
$info = $fileTable->row('*', ['feedName' => $this->feed]);
//构造远程文件地址
$indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";
//文件大小
$fileSize = filesize($indicator);
//文件变小了, 说明是新文件
if ($fileSize < $info['size']) {
$this->offset = 0;
} else {
// 取首行
$f = fopen($indicator, 'r');
$firstLine = fgets($f);
fclose($f);
$first = $this->explodeLine($firstLine);
$timestamp = $first['timestamp'];
if ($timestamp > $info['timestamp']) {
$this->offset = 0;
} else {
$this->offset = $info['offset'];
}
}
echo "\r\nBegin Process File\r\n";
//要插入的日志表
$logTable = table('log');
//要插入的行缓冲区
$rows = [];
$insertedRowsCount = 0;
$iterator = $this->readUnzipped($indicator, $fileSize);
$lastTime = 0;
foreach ($iterator as $key => $line) {
//空行不处理
$line = trim($line);
if (!$line) continue;
//分解 日志行
$parts = $this->explodeLine($line);
if (!$parts) continue;
//判断 是否是 搜索 行
if (!$this->valid($parts['url'])) continue;
//判断是否已经导入
if ($this->lastRow and (floatval($parts['timestamp']) < floatval($this->lastRow['timestamp']))) continue;
$rows[] = array_merge($parts, [
'feedName' => $this->feed
]);
//最大的时间戳
$lastTime = $parts['timestamp'];
//批量插入
if (count($rows) >= 100) {
$insertedRowsCount += count($rows);
$logTable->inserts($rows);
$fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
echo "Insert LINES:$insertedRowsCount\r\n";
SDebug::clearMsgs();
$rows = [];
}
}
//处理最后剩余的行
if (count($rows)) {
$insertedRowsCount += count($rows);
$logTable->inserts($rows);
$fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
echo "Insert LINES:$insertedRowsCount\r\n";
SDebug::clearMsgs();
}
}
/**
* 分解一行日志
* @param $line string
* @return array
* @throws Exception 匹配失败
*/
private function explodeLine(string $line): array
{
//[08/Apr/2018:03:30:17 +0800] 1523129417.075 72.178.128.43 - 0.114 - "GET /index.php/blog/search/?s=lowering%20ldl%20cholesterol&subid=tgr_zhen_BFX0J1ILON6N__rmlwuf_73004751 HTTP/1.1" 499 0 "http://168634854.keywordblocks.com/Cholesterol_Hdl_Ldl_Ratio.cfm?&vsid=1661264105777118&vi=1523124812717930856&dytm=1523124813100&kbbq=%26sde%3D1%26adepth%3D1%26ddepth%3D3&tdAdd[]=%7C%40%7Csde%3D1%7C%40%7Cadepth%3D1%7C%40%7Cddepth%3D3&sbdrId=135&vgd_matchstr=CommercialUrlOn%7Chlid%3D2002&matchstring=CommercialUrlOn%7Chlid%3D2002&vgd_bdata=ss%3D320x568%7C%7CMM%3D1.0%7C%7Cbb%3D145%7C%7CMP%3D.*%2Fcholesterol-management%2F.*%7C%7Cfbb%3D0%7C%7CRB%3D34.18110604079318%7C%7Cbtd%3D2341877441767294977%7C%7Ccbid%3D34.18110604079318%7C%7CMB%3D15.0%7C%7CMC%3DAUTO%7C%7Curl_l%3D50%7C%7Chour_group_l%3D20%7C%7CRImp%3D9.0%7C%7Cbid%3D15.1%7C%7Cdevice_l%3D20%7C%7CisRef%3D0&verid=111299&acid=427573913652889251523124810846&hvsid=00001523124812870012196577713996&upk=1523124813.1380&sttm=1523124812870&=&kp=1&kbc=143697&bdrid=4&subBdr=135&kt=266&ki=5912010&ktd=274911461948&kbc2=rpc%3D0.14&fdkt=266&lkpgd=UUID%3Duuid_s8_3_1523124813_778621763%7C%7CSI%3D863%7C%7CMPTD%3D232%7C%7CPTD%3D6922032661652308480%7C%7CSID%3D14%7C%7CCI%3D863%7C%7CMN%3D8%7C%7Cerpm%3D-1.0%7C%7CMI%3D863%7C%7CKTGD%3D3866%7C%7CKSE%3D1523124813242%7C%7CAN%3D5%7C%7CHID%3D3%7C%7CPTD2%3D16896&&lktgd=3866&&fp=biwFab2EOSptF9Dp9P5pLIuIHpVTe2ha94T5u6HCtebISTPUlc1la6_ujtvHa-nb8nGHPkJ_EnIwZF7mo3KnR2p3XYd1wmF70O9szYDQ9ufyP0OyS-gxVg%3D%3D&c=O5LJq2Lix-2w0IdspaXDCw&cme=rs5xevxSmJb0u22ZZHKqUTjYupvdJAHcw4kmb0sBhK6UBgyb-EKIO8Yg8DI2Uv0ZcpIG4AQvPb75jBLoeAG5VMn2cBgcO0Er9uHnU2G2b5527aplb-EHrVG_De8s_c_9-9bkhpH6jUmk3eK5uGthWBagtuatdg2SBe72cEUSh9aPY9sVJrkoOPaGQsQOH5rqAz1TMLK3_fisF-ozH6JyNg%3D%3D%7C%7CNDHRnZ9Gz3KXlI-i9OnZqQ%3D%3D%7C5gDUJdTGiJzedmq9hanWYg%3D%3D%7CtrJ5NInYpv_AyRdJRHyQbAoA6iGqXTxu%7CRrUTbnOe6Nf9cTuAtIJVy9no3H-wuOVy%7CN7fu2vKt8_s%3D%7Cl44MelaykDW0jQJG6bjukdQlinX0DB9oV4Sm9gijr_bD43Zl1UaHw39JatxHgP46euFaB3PMdSqZJqb8JKnexHrlF_K3RJ5R%7CJf0d-WoAdPuDA6UD6Gc_F1zJX7Ucny7osFvXic8Z4MU%3D%7Cue9AR4Lxeuwq7AuXzY3UTfqQIZ7T1ETAepQ5ZjhMUrn8F4iL72pDJxv9w1vxSK2jeiEactQl6VTIdrnkiwcfmH0laLhDYgMhmFyUaT5z0ZmFu4kbMwh587f73k-Z2prl2NRyNqvZoZL_mL9UwcCaoUiGM916VV0SyiuEizF5kMH-PgMGZNtaVAulY1i6cP1h%7C&ib=0&cid=8CU12LGKP&crid=285618735&size=300x250&lpid=&tsid=1&ksu=233&chid=&https=0&kwdsMaxTm=400&ugd=3&maxProviderPixel1=%2F%2Fc.ad-srv.co%2Fpixel&maxProviderPixel2=%2F%2Fc.adyield.co%2Fpixel&rms=1523124813&&sc=TX&asn=11427&kals=base&kalog=SI%3D863%7C%7CTPTD%3D516%7C%7CCI%3D863%7C%7CUUID%3Duuid_s12_nc1b_4_1523124812_210054693%7C%7CSID%3D11%7C%7CHID%3D4%7C%7CMI%3D863%7C%7CMPTD%3D176&kasts=tstype%3DBASE_BAG%7C%7C&kata=8ce5&clsKb=2&ecref=w77E%3ASS7mE8NQ.BJGYO.NmYS7mE8NSuSTOj%2BTJeJjQ%2BImLY1j%2BD1zyJS%3Fx7YMN1YE18yzvJY4RuuT%26x7YM7JLYvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26x7YMQmxLNJv%26x7YMYJO8xYvG%26x7YMNmz7Jz7vfHFWiXW9FiuH%26yM7yv%26yM78vUBOofiiAXAf9uWuH%26yMOJvY%26yMOYv%26yM1Evu7f%26yMzBvy%26yMN8vu9HHHfuWWX%26yM18vX9hiuHFXXFA%26yMjEvi9fhX9A%26yMj8v%26UvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26yNj8Ov%3Dd9C%3DgdBfCqpRD%3DfKDVQK6rMLAkw9YDzi%20YAVJOGawyN077a0Ezk7z8903%20AuogkUFBTjI%20%3DK8jODdV1K8Qg4KTBMBNR&kct=20512&abpl=2" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604.1" "-"
$ns = '([^\s]*)';
$str = '"([^"]*)"';
$datetime = '\[([^\]]*)\]';
//正则匹配
$matched = preg_match("/$datetime $ns $ns \- $ns $ns $str $ns $ns $str $str $str/i", $line, $matches);
if (!$matched) {
throw new Exception('NOT MATCH');
}
//空格区别 MODE URL HTTP协议
list($mode, $url, $protocol) = explode(' ', $matches[6]);
return [
'accessTime' => datetime(strtotime($matches[1])), //访问时间(秒)
'timestamp' => floatval($matches[2]),//访问时间戳(带毫秒)
'ip' => $matches[3], //请求者IP
'requestTime' => floatval($matches[4]), //Nginx处理请求的时间
'responseTime' => floatval($matches[5]), //Nginx完成整个响应的时间
'method' => $mode, //GET/POST/...
'url' => $url, //请求地址
'code' => $matches[7], //响应代码
'length' => intval($matches[8]), //响应正文长度
'referrer' => left($matches[9], 250), //引用
'agentId' => $this->getAgentId($matches[10]), //用户代理
'forward' => $matches[11] //真实IP
];
}
//获取Agent与ID的对应关系
private function getAgentMap()
{
$rows = table('agent')->select('id,agent', null, 'agent')->toArray();
return array_column($rows, 'id', 'agent');
}
//根据一个Agent,获取对应ID,如果没有则创建一个对应关系
private function getAgentId($agent)
{
//如果UA为空
if (!$agent) {
return 0;
}
//静态内存缓存
static $maps;
if (!$maps) {
$maps = $this->getAgentMap();
}
//缩减[FBAN/FBIOS;...]
$sub = mid($agent, '[', ']');
if ($sub) {
$agent = str_replace('[' . $sub . ']', '[...]', $agent);
}
$agent = str_replace(' (KHTML, like Gecko)', '', $agent);
//变种归并
$agent = preg_replace($this->agentPatterns, $this->agentReplaces, $agent);
//Agent缩减到250个字符
$agent = left($agent, 191);
if (!isset($maps[$agent])) {
$id = table('agent')->insertIgnore(['agent' => $agent]);
$maps[$agent] = $id;
}
return $maps[$agent];
}
}
————————————————
版权声明:本文为CSDN博主「蓝冰大侠」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/bluehire/article/details/79985203