您好,登錄后才能下訂單哦!
一 問題描述
將一臺zabbix proxy由2.4.5升級到3.0.2后啟動了就直接崩潰了。
錯誤信息如下:
2367:20160508:153246.830 One child process died (PID:42385,exitcode/signal:11). Exiting ...
42367:20160508:153248.904 Zabbix Proxy stopped. Zabbix 3.0.2 (revision 59540).
將日志級別調高點,設置DebugLevel=4,查看報錯
42629:20160508:153529.004 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42628:20160508:153529.004 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42634:20160508:153529.004 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42651:20160508:153529.004 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42659:20160508:153529.004 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42661:20160508:153529.005 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42655:20160508:153529.005 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42663:20160508:153529.005 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42664:20160508:153529.005 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42666:20160508:153529.005 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42665:20160508:153529.006 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42667:20160508:153529.006 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42630:20160508:153529.006 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42668:20160508:153529.006 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42671:20160508:153529.006 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42670:20160508:153529.007 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42647:20160508:153529.007 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42674:20160508:153529.007 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42680:20160508:153529.007 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ... 42679:20160508:153529.008 Got signal [signal:15(SIGTERM),sender_pid:42623,sender_uid:498,reason:0]. Exiting ...
9102:20160508:170455.283 End of zbx_strpool_destroy() 9102:20160508:170455.283 End of free_configuration_cache() 9102:20160508:170455.283 In free_ipmi_handler() 9102:20160508:170455.283 End of free_ipmi_handler() 9102:20160508:170455.284 In free_selfmon_collector() collector:0x7fa793e23000 9102:20160508:170455.284 End of free_selfmon_collector() 9102:20160508:170455.284 In unload_modules() 9102:20160508:170455.284 End of unload_modules() 9102:20160508:170455.284 Zabbix Proxy stopped. Zabbix 3.0.2 (revision 59540). zabbix_proxy [9152]: [file:'selfmon.c',line:375] lock failed: [22] Invalid argument
總共有4臺proxy,每臺proxy的操作系統版本都是CentOS6。就單獨這一臺zabbix proxy啟動有問題,剛啟動就有進程死掉。
二 問題解決
查看這臺proxy服務器的內核版本是3.9.8-1.el6.elrepo.x86_64,而其他幾臺proxy的內核版本是2.6.32-358.el6.x86_64 剛開始也沒有朝著內核版本的問題方向思考。以為還是proxy的版本或者系統參數設置有問題。
1.排查Linux下的信號量
查看Zabbix的官方BUG追蹤平臺Invalid argument可能是信號量達到了限制
可以嘗試增大信號量的值
echo 256 40000 32 32000 > /proc/sys/kernel/sem
這個四個參數分別是
SEMMSL 每個信號量集的含有的信號量的最大值
SEMMNS 系統層面的信號量數量最大限制
SEMOPM semop(2)調用可以指定的最大操作數量
SEMMNI 系統層面的信號數量最小限制
通過echo方式更改重啟后會失效,可以更改/etc/sysctl.conf
kernel.sem = 250 32000 100 10000
然后執行sysctl -p
更改參數后proxy啟動仍然會崩潰
2.排查內核版本原因
Updating to latest kernel (3.10.0-327.10.1.el7.x86_64) + reboot solved it for me
The error reported originally may be something else than what affected me.
查看帖子懷疑是內核版本造成的,這臺proxy以前有人升級過內核,更改linux啟動項設置默認啟動內核為2.6.32.重啟后問題得到解決,proxy不再崩潰。但是之前proxy 2.4.5在3.9.8的內核上可以正常運行。
遇到這個BUG的人挺多的,有好幾個帖子都是反饋這個問題。
三 補充知識
這個BUG涉及到Linux下的信號量相關的知識。信號量是為了解決在多進程編程下資源共享的問題。著名的用信號量來解決的就是哲學家就餐問題。
Linux下使用ipcs命令來查看共享內存,信號量和消息隊列
ipcs 默認是-a參數,顯示所有信息
# ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x6c6c6536 0 root 600 4096 0 0x68031618 1835009 zabbix 600 8388583 0 0x6c030ad7 11632648 zabbixagen 600 365056 13 0x68031672 2392073 zabbix 600 8388583 0 0x78031672 2424842 zabbix 600 16777216 0 0x74031672 2457611 zabbix 600 4194304 0 0x67031672 2490380 zabbix 600 1336934400 0 0x73031672 2523149 zabbix 600 235929600 0 ------ Semaphore Arrays -------- key semid owner perms nsems 0x00000000 0 root 600 1 0x00000000 32769 root 600 1 0x00000000 4325378 apache 600 1 0x7a031618 622595 zabbix 600 12 0x00000000 4358148 apache 600 1 0x7a030ad7 4292613 zabbixagen 600 13 0x7a031672 851974 zabbix 600 12 0x7a031638 1015815 zabbix 600 12 0x7a031620 1441800 zabbix 600 12 ------ Message Queues -------- key msqid owner perms used-bytes messages
ipcs -m 單獨顯示共享內存片段信息
ipcs -s 單獨顯示信號量
ipcs -q 單獨顯示消息隊列
# ipcs -u ------ Shared Memory Status -------- segments allocated 71 pages allocated 1795473 pages resident 68130 pages swapped 57636 Swap performance: 0 attempts 0 successes ------ Semaphore Status -------- used arrays = 16 allocated semaphores = 152 ------ Messages: Status -------- allocated queues = 0 used headers = 0 used space = 0 bytes
參考資料:
https://support.zabbix.com/browse/ZBX-10657
http://www.cnblogs.com/forilen/p/4316358.html
https://support.zabbix.com/browse/ZBX-3974
http://blog.zabbix.com/mysterious-zabbix-problems-how-we-debug-them/1023/
https://en.wikipedia.org/wiki/Semaphore_(programming)
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。