Saturday, February 04, 2006

Lack of monitoring tools for Linux

I've been looking for a way to monitor a few dozen Linux servers lately, and there just doesn't seem to be a nice integrated tool to do it. In particular, I am looking for something that:
  • Pulls various SNMP data from a list of Linux server
  • Stores said data for a user-specifiable amount of time
  • Generates useful graphs of said data
  • Sends emails out when said data exceeds certain thresholds
  • Provides a decent web interface for controlling everything
  • Runs under Linux
Maybe I'm just blind, but there doesn't seem to be anything that can do all of the above. I can accomplish some of it using mon, for example, but then I don't have a decent web interface, data retrieval/storage, or graphing. I can use Cacti, but then I don't have good alerting or data storage (RRD files are "lossy"). I would write my own, but then I lose the nice user interface.

Undoubtedly, someone will eventually come out with the complete package that satisfies my every desire. Once that happens, I'll just be one step away from having everything I ever wanted from Linux, with better cluster administration tools being my last hurdle.

20 comments:

  1. Anonymous6:12 AM

    Nagios on a central server will take care of most of this. Webmin on each linux server will allow easy admin for most items. Nagios with Nagiostat/rrdtool will also do all the monitoring/alerting/metrics for Routers, firewalls, switches, AS400, Windows, Unix, etc. in one spot.

    ReplyDelete
  2. Anonymous6:50 AM

    I think i have the answer to your prayers. :) I just ran into this neat tool at slashdot.org called Splunk. It does real-time data collection on many different sources and compiles them into a single "Google-like" interface for users to search through. It's really rather nice:

    http://www.splunk.com/

    ReplyDelete
  3. Anonymous6:56 AM

    Have a look at BigBrother http://www.bb4.org/
    Live site at our university: https://bb.phys.ethz.ch/bb/

    I'd say RRDtool is exactly what you describe as "Stores data for a user-specifiable amount of time", since you can configure it just like that (eg store full data for 3 Months, then store the 1h-average for 2 more years) and was invented at our university =) http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/

    ReplyDelete
  4. Anonymous7:09 AM

    Have you checked out Argus? It's open source, and has handled everything I've thrown at it thus far. Excellent software. Extremely easy to set up (unlike BB, Nagios, and the like)...

    http://argus.tcp4me.com

    ReplyDelete
  5. Anonymous7:58 AM

    We use OpenNMS, a lot of configuring by hand, but works good so far.

    ReplyDelete
  6. Anonymous8:38 AM

    We use a combination of Nagios for monitoring and alerts, and Cacti for pretty graphs. Not an ideal solution, but it seems to be the best I've found so far.

    ReplyDelete
  7. Anonymous8:44 AM

    Just For Fun Network Monitoring System... www.jffnms.org is an awesome, open source SNMP program that lets you set up satellite locations across the internet, and they all pool in to your one mysql database, which is graphed with more information than you can shake a stick at. You can even perform some functions (restart, etc.) on machines you're monitoring.

    ReplyDelete
  8. There is also orca (http://www.orcaware.com) which is good for collecting more detailed information on systems. Its relatively simple, very extensible and multiplatform. The Linux data collector is called procallator.

    ReplyDelete
  9. Anonymous9:26 AM

    Hobbit is a (compatible) GPL rewrite of Bigbrother, with the trending features of LARRD built-in (ie disk, memory, load graphing works out-the-box). Since all configuration is on the server-side, set is also a lot less effort (especially on distros that ship hobbit ...). Additng additional graphs is much easier (edit hobbitgraph.cfg rather than hack up LARRD), and additional data can be collected to rrd files quite easily using the ncv collector.

    It does use RRD for most data storage, but that's what most people need for trend analysis.

    ReplyDelete
  10. Anonymous1:45 PM

    Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.
    http://www.cacti.net/

    ReplyDelete
  11. Anonymous2:41 PM

    Take a look at netmrg www.netmrg.net. I use this with some scripts run out of snmp. Seems to work well.

    ReplyDelete
  12. Anonymous10:44 PM

    Maybe late then never: zabbix.com is a nice tool, meeting your request.

    ReplyDelete
  13. How about Munin? It is targeted towards a single machine, but it can send alerts to Nagios

    ReplyDelete
  14. Anonymous11:13 PM

    hyperic http://www.hyperic.com/ is my weapon of choice

    ReplyDelete
  15. bash, perl, python, php and ruby to name a few they "do" monitor and will do what ever you ask them to do. Alert, Store 'said' data, poll SNMP data. Give you nice UI.

    What it won't or can't do is make coffee for you!

    ReplyDelete
  16. Anonymous10:17 AM

    What about Groundwork Open Source? It doesn't add much to Nagios, but it's very easy to setup: http://www.groundworkopensource.com
    Look for the free version.

    ReplyDelete
  17. Anonymous4:16 PM

    Monitoring Alerts:
    Nagios is horrible to manage, but it works so we use it. I'm not happy about it. I'd replace it in a second.

    Graphs and trends:
    Cacti is crap. It makes things way more complicated than it needs to be. We got rid of it. I switched to Munin. Munin is not perfect, but reasonable. I like the plugins. It's SUPER easy to add plugins and graphs to Munin. Configuration and installation was so-so. All these Perl beasts suck when it comes to installation and configuration.

    ReplyDelete
  18. Anonymous2:19 PM

    Don't forget the incredibly lightweight "mon".

    http://www.kernel.org/pub/software/admin/mon/
    or
    http://mon.wiki.kernel.org/

    ReplyDelete
  19. Anonymous9:25 PM

    aptitude install munin

    ReplyDelete
  20. For those using Nagios,
    Here is a solution which gives the ability to extend Nagios notifications with email, sms, and voice messages.

    If interested, check the website:
    http://www.alarmtilt.com/?action=solutions&browse=nagios

    ReplyDelete