Sunday, May 13, 2012

Monitoring tool - install monit

Icinga, nagios and other monitoring tools can monitor a specified daemon or process running. Though they can monitor the icinga or nagios daemon and check that they are running, what would happen if icinga or nagios daemon themselves stop.
Monit is capable of monitoring a daemon by checking a specified process or port running and restarting the daemon or even stopping it.
"Monit is a free open source utility for managing and monitoring, processes, programs, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations." MONIT Official
I'd like to introduce about installing monit first, and how to monitor icinga with monit then.
The configurations are released on my github, here.

Reference

Install monit

  •  setup rpmforge repository
# rpm -ivh http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm
# sed -i 's/enabled = 1/enabled = 0/' /etc/yum.repos.d/rpmforge.repo
  • install monit
# yum -y --enablerepo=rpmforge install monit
  • verify installation
# monit -V
This is Monit version 5.3.2
Copyright (C) 2000-2011 Tildeslash Ltd. All Rights Reserved.

Configuration

  •  /etc/monitrc (monit control file)
    Please see the official documentation if you need further information about monit control file.
    The set alert directive below means that monit sends alert if it matches the actions except for from checksum to timestamp.
# cat > /etc/monitrc << EOF
set daemon 120 with start delay 30
set logfile /var/log/monit/monit.log
## Sending E-mail, put off the comment below
set mailserver localhost
set alert username@domain not {
checksum
content
data
exec
gid
icmp
invalid
fsflags
permission
pid
ppid
size
timestamp
#action
#nonexist
#timeout
}
mail-format {
from: monit@$HOST
subject: Monit Alert -- $SERVICE $EVENT --
message:
Hostname:       $HOST
Service:        $SERVICE
Action:         $ACTION
Date/Time:      $DATE
Info:           $DESCRIPTION
}
set idfile /var/monit/id
set statefile /var/monit/state
set eventqueue
    basedir /var/monit  
    slots 100           
set httpd port 2812 and
    allow localhost 
    allow 192.168.0.0/24
    allow admin:monit      
include /etc/monit.d/*.conf
EOF
  • setup logging 
# mkdir /var/log/monit
# cat > /etc/logrotate.d/monit <<EOF
/var/log/monit/*.log {
  missingok
  notifempty
  rotate 12
  weekly
  compress
  postrotate
    /usr/bin/monit quit  
  endscript
}
EOF 
  • setup include file (service entry statement)
    The following is example of monitoring ntpd.
# cat > /etc/monit.d/ntpd.conf
check process ntpd
        with pidfile "/var/run/ntpd.pid"
        start program = "/etc/init.d/ntpd start"
        stop program = "/etc/init.d/ntpd stop"
        if 3 restarts within 3 cycles then alert

EOF
  •  verify syntax
# monit -t
Control file syntax OK

Start up

  • run monit from init
    It is enable to run monit from init script, but I want to make it certain of always having a running Monit daemon on the system.
# cat >> /etc/inittab <<EOF
mo:2345:respawn:/usr/bin/monit -Ic /etc/monitrc
EOF
  • re-examine /etc/inittab 
# telinit q
# tail -f /var/log/messages
May 13 12:34:35 ha-mgr02 init: Re-reading inittab
  • check monit running
# ps awuxc | grep 'monit'
root      1431  0.0  0.0  57432  1876 ?        Ssl  11:38   0:00 monit 
  • stop monit process and check that init begins monit
# kill `pgrep monit` ; ps cawux | grep 'monit'
root     13661  0.0  0.0  57432  1780 ?        Ssl  13:31   0:00 monit

  • show status and summary
# show status
Process 'ntpd'
  status                            Running
  monitoring status                 Monitored
  pid                               32307
  parent pid                        1
  uptime                            12d 17h 44m 
  children                          0
  memory kilobytes                  5040
  memory kilobytes total            5040
  memory percent                    0.2%
  memory percent total              0.2%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sun, 13 May 2012 12:34:35

System 'system_ha-mgr02.forschooner.net'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.09] [0.20] [0.14]
  cpu                               1.6%us 3.2%sy 0.3%wa
  memory usage                      672540 kB [32.6%]
  swap usage                        120 kB [0.0%]
  data collected                    Sun, 13 May 2012 12:32:35
  • show summary 
# monit summary
The Monit daemon 5.3.2 uptime: 58m 

Process 'sshd'                      Running
Process 'ntpd'                      Running
System 'system_ha-mgr02.forschooner.net' Running

Start up from upstart

As RHEL-6.x and CentOS-6.x adopts upstart, it is necessary to use upstart but for init with those OS.
  • setup /etc/init/monit.conf
# monit_bin=$(which monit)
# cat > /etc/init/monit.conf << EOF
# monit respawn
description     "Monit"

start on runlevel [2345]
stop on runlevel [!2345]
 
respawn
exec $monit_bin -Ic /etc/monit.conf
EOF 
  • show a list of the known jobs and instances
# initctl list
 rc stop/waiting
 tty (/dev/tty3) start/running, process 1249
 ...
 monit stop/waiting
 serial (hvc0) start/running, process 1239
 rcS-sulogin stop/waiting
  • begin monit
# initctl start monit
 monit start/running, process 6873
  • see the status of the job(monit)
 # initctl status monit
 monit start/running, process 6873
  • stop monit process
# kill `pgrep monit`
  • check that upstart begins monit
# ps cawux | grep monit
 root      7140  0.0  0.1   7004  1840 ?        Ss   21:42   0:00 monit
  • see the log file that monit is respawning
# tail -1 /var/log/messages
 Oct 20 12:42:41 ip-10-171-47-212 init: monit main process ended, respawning

Verification

  • access to the monit service manager (http://IP Address:2812)

  • check ntp daemon starts if it stops 
# /etc/init.d/ntpd status
ntpd (pid  32307) is running...
# /etc/init.d/ntpd stop  
Shutting down ntpd:                                        [  OK  ]
  • see the log file that monit starts ntpd 
# cat /var/log/monit/monit.log
[JST May 13 12:52:24] error    : 'ntpd' process is not running
[JST May 13 12:52:24] info     : 'ntpd' trying to restart
[JST May 13 12:52:24] info     : 'ntpd' start: /etc/init.d/ntpd
  • check ntpd is running
# /etc/init.d/ntpd status
ntpd (pid  9475) is running...

Mail sample format

The following is examples of alert mail when monit works.
  • notifying that the daemon is stopped
<Subject>
Monit Alert -- ntpd Does not exist --
<Body>
Hostname:       ha-mgr02.forschooner.net
Service:        ntpd
Action:         restart
Date/Time:      Sun, 13 May 2012 12:52:24
Info:           process is not running 
  • notifying that the daemon starts
<Subject>
Monit Alert -- ntpd Action done --
<Body>
Hostname:       ha-mgr02.forschooner.net
Service:        ntpd
Action:         alert
Date/Time:      Sun, 13 May 2012 12:54:15
Info:           start action done 
  • notifying that the daemon is stopped
<Subject>
Monit Alert -- ntpd Exists --
<Body>
Hostname:       ha-mgr02.forschooner.net
Service:        ntpd
Action:         alert
Date/Time:      Sun, 13 May 2012 12:54:15
Info:           process is running with pid 9475









No comments:

Post a Comment