Trying to setup SSH for Nagios on Ubuntu is quite maddening. So, for those thinking that since they have familiarity with ssh, ssh-keygen, that using check_by_ssh would be a breeze, don't! Use NRPE instead.
These plugins are needed to monitor things like load, procs, on remote machines such as might exist in a small home network.
Assuming a localhost that is running the Nagios monitoring server, and a remotehost that we need to monitor load, procs, here are the simple steps to get NRPE up and running on Ubuntu or Fedora:
This assumes that a base Nagios server is up and running on localhost, and remotehost does not have Nagios installed.
On remotehost, run:
sudo apt-get install nagios-nrpe-server    (Ubuntu)
su -c "yum install nrpe.i386 nagios-plugins-procs.i386" (Fedora FC7)
To find appropriate names of packages to install, use:
On remotehost, run:
apt-cache search nagios    (Ubuntu)
yum search nagios   (Fedora)
Make sure all required plugins are installed on the remotehost also - things like check_load, check_procs, etc in /usr/lib/nagios/plugins/ or appropriate folder on remotehost.
Next, edit the config file /etc/nagios/nrpe.cfg - replace .. with the IP address of localhost that is running the Nagios monitoring server:
/etc/nagios/nrpe.cfg:
allowed_hosts=127.0.0.1, ...
In the same file, check the hardcode commands, such as check_load and check_total_procs, those might be sufficient for many uses.
Finally, restart nrpe on remote host:
sudo /etc/init.d/nagios-nrpe-server restart   (Ubuntu)
su -c "service nrpe restart"   (Fedora)
Now, back to localhost. Install the nrpe-plugin here:
sudo apt-get install nagios-nrpe-plugin    (Ubuntu)
Test it - from locahost, run:
/usr/lib/nagios/plugins/check_nrpe -H remotehost   (this should output remote NRPE version)
/usr/lib/nagios/plugins/check_nrpe -H remotehost -c check_total_procs   (outputs #procs)
/usr/lib/nagios/plugins/check_nrpe -H remotehost -c check_load   (outputs load)
If this all works, then add the service watch to your local nagios.cfg. for example:
define service{
        use                generic-service         ; Name of service template to use
#       hostgroup_name    remoteservers
        host                remotehost
        service_description  Current Load
        check_command       check_nrpe_1arg!check_load
        }
All done!
Now - what is necessary to setup  check_by_ssh on a Ubuntu server? A mess!
The nagios user is installed with no shell, and no fixed home directory - the /var/run/nagios3 "home dir" gets wiped off on restart.
So, here are all the things that need to be done:
- Install required nagios plugins and nagios common items on remotehost.
- Edit /etc/passwd to allow remote shell access, change /bin/false to /bin/bash. Do this on all machines, since it will be necessary to test this before turning it on.  It is not absolutely necessary on localhost which can use a command like "sudo su -l -s /bin/bash nagios" to run ssh commands as the nagios user, but it is necessary on remotehost to execute the remote ssh commands.
- On localhost, find a directory to hold the nagios ssh  keys. Run keygen here, create the RSA key pair.
- Copy the public key to authorized_keys on a folder on the remotehost.
- Figure out a way to create known_hosts file for SSH. May have to use SSH options to handle this.
- Become nagios user on remote host, test that ssh to remotehost works without password. Note down all the arguments needed to make this work - will need to provide location of key file, location of knowhosts, and maybe more.
- If that works, then the check_by_ssh command can be used to monitor a service.
- Home Folder for Nagios is wiped on Reboot has some of the issues discussed.
All in all, very manual and tedious process.
So, the best thing for Ubuntu users is to stick to NRPE with Nagios, much easier than setting up check_by_ssh.
 
  
Comments
Thanks a bunch for the post!
Thanks a bunch for the post! I was about to embark on using check_by_ssh after failing to compile NRPE on my ubuntu box. Your post saved me hours of frustration.
Could not complete SSL handshake
I followed all your steps, but whenever I try to run any of the test commands you have listed, it gives me the following error:
CHECK_NRPE: Error - Could not complete SSL handshake.
Any idea what would cause that?
Otherwise, thanks for the informative post!
Nevermind...
Answered my own question. I had put a space in between the IPs in /etc/nagios/nrpe.cfg which was breaking it. I will now proceed to facepalm:
/facepalm
Great post, thanks! =)
I´ve had the same problem.
I´ve had the same problem. Where you see:
allowed_hosts=127.0.0.1
I wrote
allowed_hosts= 127.0.0.1
until I realised of the unnecessary blank before the IP address. I took it off and now it works fine.
Disagree
Sorry I would have to disagree, mainly because of my mixed environment. check_by_ssh is much easier to use, NRPE failed to compile on a lot of my servers and I do not have the choice of being able to installed extra packages.