中文字幕av专区_日韩电影在线播放_精品国产精品久久一区免费式_av在线免费观看网站

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

如何使用Spark分析網站日志

發布時間:2021-11-10 18:54:33 來源:億速云 閱讀:134 作者:柒染 欄目:云計算

如何使用Spark分析網站日志,相信很多沒有經驗的人對此束手無策,為此本文總結了問題出現的原因和解決方法,通過這篇文章希望你能解決這個問題。

郁悶從昨天開始個人網站不斷的發出告警504錯誤,登錄機器看了一下是php-fpm報錯,這個錯誤重啟php-fpm后,幾個小時就告警,快一年了都沒什么問題,奇怪

[28-Sep-2016 11:53:19] NOTICE: ready to handle connections
[28-Sep-2016 11:53:19] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 11:53:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:46:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[28-Sep-2016 13:49:32] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

以為是這個值設置的太小了,所以修改了配置修改大了值

[28-Sep-2016 15:51:43] NOTICE: fpm is running, pid 28179
[28-Sep-2016 15:51:43] NOTICE: ready to handle connections
[28-Sep-2016 15:51:43] NOTICE: systemd monitor interval set to 10000ms
[28-Sep-2016 15:52:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children
[28-Sep-2016 16:15:58] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:52:32] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:53:05] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[28-Sep-2016 16:55:17] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it

結果后來還是一樣,幾個小時之后再次504告警,再看nginx的日志,發現一些奇怪的ip訪問量非常大。。。有懷疑是有惡意ip的訪問,看來有必要查查訪問日志中的ip訪問量

root@iZ28bhfjhgkZ:/var/log/nginx# vim access.log
121.42.53.180 - - [25/Sep/2016:06:26:29 +0800] "POST /wp-cron.php?doing_wp_cron=1474755989.0131719112396240234375 HTTP/1.0" 499 0 "-" "WordPress/4.3.1; http://zhwen.org"
182.92.148.207 - - [25/Sep/2016:06:26:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
203.208.60.226 - - [25/Sep/2016:06:28:55 +0800] "GET /?p=675 HTTP/1.1" 200 8204 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/themes/sparkling/inc/css/font-awesome.min.css?ver=4.3.1 HTTP/1.1" 200 26711 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 HTTP/1.1" 200 374 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
203.208.60.226 - - [25/Sep/2016:06:28:58 +0800] "GET /wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.3.1 HTTP/1.1" 200 771 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
121.43.107.174 - - [25/Sep/2016:06:29:18 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
115.28.189.208 - - [25/Sep/2016:06:29:33 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
42.156.139.59 - - [25/Sep/2016:06:30:58 +0800] "GET /?paged=14 HTTP/1.1" 200 11164 "-" "YisouSpider"
182.92.148.207 - - [25/Sep/2016:06:31:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)"
61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /?p=articles/cscope-tags HTTP/1.1" 200 10681 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/602.1.50 (KHTML, like Gecko)"
61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /apple-touch-icon-precomposed.png HTTP/1.1" 404 151 "-" "Safari/12602.1.50.0.10 CFNetwork/807.0.4 Darwin/16.0.0 (x86_64)"

所以對訪問日志的ip做了一個簡單統計:
1)先把ip取出來(為了減少數據量,其實也可以直接壓縮后下載到本地),再下載到本地
root@iZ28bhfjhgkZ:/var/log/nginx# cat access.log|awk ‘{print $1}’ > tt

在sparkshell中執行下面的代碼:

val line = sc.textFile("/data1/data/t1")

line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)

.map(e => (e._2, e._1)).reduceByKey(_+","+_)

.sortByKey(true,1).saveAsTextFile("/data1/data/t3")

2)最后的結果t3的內容如下,發現這幾個ip的訪問量非常大,尤其

191.96.249.53
。。。。。
(855,182.92.148.207)
(3100,121.8.136.75)
(3889,61.135.169.81)
(53513,191.96.249.53)

3)再搞一個iptables限制,搞定。spark做這種統計分析還是非常簡單的,就是一行代碼搞定分析。

root@iZ28bhfjhgkZ:/var/log# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination        

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination        

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination        
root@iZ28bhfjhgkZ:/var/log# iptables -A INPUT -s 191.96.249.53 -j DROP
root@iZ28bhfjhgkZ:/var/log# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination        
DROP       all  --  DEDICATED.SERVER     anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination        

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination        
root@iZ28bhfjhgkZ:/var/log#

看完上述內容,你們掌握如何使用Spark分析網站日志的方法了嗎?如果還想學到更多技能或想了解更多相關內容,歡迎關注億速云行業資訊頻道,感謝各位的閱讀!

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

阳新县| 墨竹工卡县| 澄迈县| 盐津县| 平南县| 岳阳县| 长岭县| 嘉义县| 高唐县| 托克托县| 合江县| 鄂尔多斯市| 宕昌县| 乌兰察布市| 汉源县| 芷江| 伊春市| 虞城县| 襄城县| 辰溪县| 中江县| 四川省| 江门市| 普宁市| 湖口县| 南陵县| 民和| 将乐县| 神木县| 美姑县| 渝北区| 化德县| 二连浩特市| 晋州市| 株洲市| 陈巴尔虎旗| 甘谷县| 浦东新区| 航空| 安吉县| 宁明县|