Python script to force recheck all critical services in Nagios
If you've ever run a Nagios server, you may have noticed that it's sometimes a bit of a pain to manually recheck a bunch of service monitors after you've made some changes to your network or hosts, or services.
The normal way of re-checking a Nagios service monitor is through the GUI. However, Nagios allows for the possibility for external programs to send commands to Nagios through the external command file.
This file is usually located in: /usr/local/nagios/var/rw/nagios.cmd, however, your file may be located elsewhere, depending on your distro and installation procedure. You can always find it by installing mlocate and running the locate command:
$ locate nagios.cmd
Here is the Nagios help page for External Commands:
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/extcommands.html
The key to using External Commands is to realise that the current state of the Nagios system (that is, the current state of all its monitored hosts and services) is also stored in a file. This file is called 'status.dat' and is typically located in: /var/log/nagios/status.dat
Again, your mileage may vary, depending on your distro and installation.
This is the file that Nagios uses to store the current status, comment, and downtime information. This file is used by the CGIs so that current monitoring status can be reported via a web interface. The CGIs must have read access to this file in order to function properly. This file is deleted every time Nagios stops and recreated when it starts. [1]
We can use these two files in combination by scanning the status.dat file for hosts or services which are in a particular state, and then sending commands to nagios.cmd file to process them.
Below is a Python script which I wrote which opens the status.dat file (the location of which you may have to update), finds the entry for service (each service has its own paragraph), and determines its current status.
If the status is critical, it sends a command to the nagios.cmd file to force re-check the service.
import re
import os
print "Force rechecking the following services:"
print
with open('/var/log/nagios/status.dat') as file:
for line in file:
if 'servicestatus {' in line:
for i in range(15):
myline=file.next().strip()
if re.match("host_name",myline):
host_name=myline.strip()
if re.match("service_description",myline):
service_description=myline.strip()
if re.match("current_state",myline):
current_state=myline.strip()
else:
current_state=""
if ('current_state=2' in current_state) or ('current_state=1' in current_state):
host_name = host_name.split("=",1)[1]
service_description = service_description.split("=",1)[1]
print host_name
print service_description
print current_state
print
cmd = "echo '[1509653167] SCHEDULE_FORCED_SVC_CHECK;"+host_name+";"+service_description+";1509653167' > /var/spool/nagios/cmd/nagios.cmd"
os.system(cmd)
[1] https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/configmain.html
You will need to change the line:
for i in range(15):
to:
for i in range(16):
For this script to work with the latest version of Nagios (Dec, 2017)
Congratulations @thomas-tiramisu! You have received a personal award!
1 Year on Steemit
Click on the badge to view your Board of Honor.
Congratulations @thomas-tiramisu! You received a personal award!
You can view your badges on your Steem Board and compare to others on the Steem Ranking
Vote for @Steemitboard as a witness to get one more award and increased upvotes!