Lack of monitoring tools for Linux
I've been looking for a way to monitor a few dozen Linux servers lately, and there just doesn't seem to be a nice integrated tool to do it. In particular, I am looking for something that:
- Pulls various SNMP data from a list of Linux server
- Stores said data for a user-specifiable amount of time
- Generates useful graphs of said data
- Sends emails out when said data exceeds certain thresholds
- Provides a decent web interface for controlling everything
- Runs under Linux
Undoubtedly, someone will eventually come out with the complete package that satisfies my every desire. Once that happens, I'll just be one step away from having everything I ever wanted from Linux, with better cluster administration tools being my last hurdle.
20 Comments:
Nagios on a central server will take care of most of this. Webmin on each linux server will allow easy admin for most items. Nagios with Nagiostat/rrdtool will also do all the monitoring/alerting/metrics for Routers, firewalls, switches, AS400, Windows, Unix, etc. in one spot.
I think i have the answer to your prayers. :) I just ran into this neat tool at slashdot.org called Splunk. It does real-time data collection on many different sources and compiles them into a single "Google-like" interface for users to search through. It's really rather nice:
http://www.splunk.com/
Have a look at BigBrother http://www.bb4.org/
Live site at our university: https://bb.phys.ethz.ch/bb/
I'd say RRDtool is exactly what you describe as "Stores data for a user-specifiable amount of time", since you can configure it just like that (eg store full data for 3 Months, then store the 1h-average for 2 more years) and was invented at our university =) http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/
Have you checked out Argus? It's open source, and has handled everything I've thrown at it thus far. Excellent software. Extremely easy to set up (unlike BB, Nagios, and the like)...
http://argus.tcp4me.com
We use OpenNMS, a lot of configuring by hand, but works good so far.
We use a combination of Nagios for monitoring and alerts, and Cacti for pretty graphs. Not an ideal solution, but it seems to be the best I've found so far.
Just For Fun Network Monitoring System... www.jffnms.org is an awesome, open source SNMP program that lets you set up satellite locations across the internet, and they all pool in to your one mysql database, which is graphed with more information than you can shake a stick at. You can even perform some functions (restart, etc.) on machines you're monitoring.
There is also orca (http://www.orcaware.com) which is good for collecting more detailed information on systems. Its relatively simple, very extensible and multiplatform. The Linux data collector is called procallator.
Hobbit is a (compatible) GPL rewrite of Bigbrother, with the trending features of LARRD built-in (ie disk, memory, load graphing works out-the-box). Since all configuration is on the server-side, set is also a lot less effort (especially on distros that ship hobbit ...). Additng additional graphs is much easier (edit hobbitgraph.cfg rather than hack up LARRD), and additional data can be collected to rrd files quite easily using the ncv collector.
It does use RRD for most data storage, but that's what most people need for trend analysis.
Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.
http://www.cacti.net/
Take a look at netmrg www.netmrg.net. I use this with some scripts run out of snmp. Seems to work well.
Maybe late then never: zabbix.com is a nice tool, meeting your request.
How about Munin? It is targeted towards a single machine, but it can send alerts to Nagios
hyperic http://www.hyperic.com/ is my weapon of choice
bash, perl, python, php and ruby to name a few they "do" monitor and will do what ever you ask them to do. Alert, Store 'said' data, poll SNMP data. Give you nice UI.
What it won't or can't do is make coffee for you!
What about Groundwork Open Source? It doesn't add much to Nagios, but it's very easy to setup: http://www.groundworkopensource.com
Look for the free version.
Monitoring Alerts:
Nagios is horrible to manage, but it works so we use it. I'm not happy about it. I'd replace it in a second.
Graphs and trends:
Cacti is crap. It makes things way more complicated than it needs to be. We got rid of it. I switched to Munin. Munin is not perfect, but reasonable. I like the plugins. It's SUPER easy to add plugins and graphs to Munin. Configuration and installation was so-so. All these Perl beasts suck when it comes to installation and configuration.
Don't forget the incredibly lightweight "mon".
http://www.kernel.org/pub/software/admin/mon/
or
http://mon.wiki.kernel.org/
aptitude install munin
For those using Nagios,
Here is a solution which gives the ability to extend Nagios notifications with email, sms, and voice messages.
If interested, check the website:
http://www.alarmtilt.com/?action=solutions&browse=nagios
Post a Comment
<< Home