Columbia University Cricket Package




Introduction
------------

The Columbia University Cricket Package includes enhancements to the
cricket scripts, new data collection scripts for Unix hosts, and the
configuration files used here.  You can examine Columbia's cricket
graphs interactively to see what is produced by these scripts and
config files:

   http://cricket.cc.columbia.edu/

At Columbia we use cricket version 1.0.5 so the modified cricket
scripts included in this package are based on that version.  Also
included in this package are other scripts and programs (unixhost,
cricketbat, cricketeer, mailsent, cricketweb, realmem) that were
developed at Columbia for use with Cricket.

At Columbia the data collection scripts required by each host are
installed in /opt/local/sbin.  You can install them someplace else but
then you'll need to change the references in the scripts that call
them.  These are the data collection scripts:

cricketbat		daemon that collects system info
cricketweb		read web server logs
mailsent		count email messages sent past 60 min
realmem			shows physical memory installed, memory in use
unixhost		called by snmpd to get system info

Overview
--------

The collect-subtrees script has been renamed cricketd and has been
modified to behave as a daemon instead of a cron job.  It collects the
data from each host using snmp requests at 5-minute intervals.  The
main advantage to the daemon approach is that if a subtree takes
longer than 5 minutes to process there won't be multiple copies of the
script running.  Another advantage is that you can shut down cricketd
while you are making changes and it won't start up again until you
explicitly start it.  We also added some features that are controlled
by parameters in the subtree file.  The subtree polling interval can
be changed from the default (5 minutes) to something else.

The subtree-sets file has been renamed cricketd.conf, and it is
generated daily by cricket_maker.pl.  It reads cluster information
from Columbia's cluster database (hosts.ph) which is not included in
this package, and it calls listInterfaces to get information about
router interfaces.

cricketd calls collector which makes snmp requests to individual
hosts.  The snmpd daemon on each host calls unixhost to satisfy each
request.  Some requests can be satisfied quickly (load average,
memory, swap) and unixhost calls the appropriate program to get the
answer (uptime, realmem, swap).  For all other requests unixhosts gets
the answer from the /tmp/cricketbat.host file which is periodically
updated by cricketbat on each host.  This means unixhost can respond
quickly since it doesn't have to perform time-consuming tasks such as
counting web hits or email messages sent.  A cron job called
cricketbat_restart will automatically start up cricketbat on each
monitored host if it's not currently running.

This section shows the communication paths between programs.  cricketd
calls collector every 5 minutes which makes SNMP requests of the
various hosts that are monitored.  The snmpd daemon calls unixhost to
get the requested information.

cricketd and collector run on the cricket host:

  cricketd -> collector -> (snmp requests)

the other modules run on the unix hosts being monitored

  (snmp requests) -> snmpd -> unixhost -> cricketbat

unixhost calls these programs:
  /bin/hostname
  /usr/bin/uptime
  /usr/bin/swap
  /opt/local/sbin/realmem
  /usr/bin/ps
  and it will restart cricketbat if it's not running

cricketbat calls these programs to collect system info:
  /opt/local/bin/zcat
  /opt/local/bin/tail
  /usr/bin/uptime
  /opt/local/bin/mailsent
  /opt/local/bin/cricketweb
  /opt/local/bin/wget


Detailed Descriptions
---------------------

collector

The collector script is part of the cricket distribution and it
was modified locally as follows.

cricketbat

This script should be installed on all Unix hosts that you're
interested in polling.  We use this script on both Solaris and Linux
hosts.  It can be installed in /opt/local/sbin/ or some other
directory.  This daemon runs on each Unix host to collect system
information at 5-minute intervals and write it to a tmp file.  This
allows snmpd to respond to requests quickly since snmpd can simply
grab the data from the tmp file and so the requests don't time out.
cricketbat calls other programs and scripts to collect specific data:
cricketweb, mailsent, realmem.  It is started automatically by
unixhost if it's not already running so there is no need to start it
up manually, unless you want to.  Web server info is only collected on
web server hosts.  You can ignore the subroutines that count cubmail
sessions and dialup sessions since they don't apply to your site.  If
you're using the Apache web server and you want cricketweb to count
active web server processes you should enable /server-status requests
in your web server config file.  In order to count the number of hits
and bytes delivered the cricketbat daemon needs read access to the web
server logs.  At Columbia cricketbat calls the ourhosts script (not
included here) to get a list of hostnames in a particular cluster.  To
make it work on your system you can simply hardcode the list of
hostnames in cricketbat, for example: $wwwhosts = 'alpha beta gamma';

cricketd.conf

This file is installed in ~cricket/cricket/subtree-sets and is read by
cricketd.  It contains a list of sets and the cricket subtrees that
are in each set.  Our modified cricketd allows you to specify special
arguments after the set name: "noblacklist" and "interval=".  These
are local enhancements.  You can include the word "noblacklist" after
a set name to disable blacklisting for that set.  Blacklisting is
described in the collector section, above.  You can include
"interval=150sec" after a set name to specify the polling interval to
use instead of the default polling interval which is 300 secs.  Each
set will get a separate cricketd process as defined in cricketd-start.
In this example we have one one subtree for each set but you can have
multiple subtrees in a set if you prefer.  When you change the list of
sets you should also change the cricket-start file so that the number
of cricketd daemons matches the number of sets in the subtree-sets
file.

cricketd_init

This script is used to start/stop the cricketd daemon.  If necessary
it also kills any (collector) subprocesses that are active when
cricketd is shut down.  This script should be installed on the cricket
polling host as /etc/init.d/cricketd with a symlink (or hard link)
from /etc/rc3.d/ so that cricketd starts automatically when the system
starts up, for example:
$ ln -s /etc/init.d/cricketd /etc/rc3.d/S84cricketd

cricketd_restart

This script can be installed in the cricket home directory or in
/opt/local/sbin/.  After changing the config files the cricket
administrator can run this script as root to shut down the cricketd
daemons, recompile the config, and restart the daemons.

cricketeer

This CGI script can be used instead of grapher.cgi to display cricket
graphs.  It lets the user see all possible views and all possible
targets at the same time.  You are not limited to the views available
in a particular node of the config tree.  For an example of how it is
used, see
http://cricket.cc.columbia.edu/cgi-bin/cricket/cricketeer

cricketweb
cricketweb.c

This program can be installed in /opt/local/sbin/.  It will be called
by cricketbat to obtain information about the web server(s) on the
current host.  It reads the web server log to get the number of hits
in the past 10 minutes.  Cricket then divides by 10 to get requests
per minute.  cricketweb also counts bytes delivered in the past 10
minutes and converts to bits per second.  This is much more accurate
than parsing the results from a /server-status request which gives the
hit count and byte count as smoothed results (averaged over time).
cricketweb also counts the number of errors (failed requests) and the
number of requests from browsers in the columbia.edu domain so these
can be graphed as well.  cricketweb doesn't count requests for
/server-status in the totals since those requests come from
cricketbat.

grapher.cgi

This CGI script is part of the standard cricket distribution.  Our
modified version of this script displays a set of current values for a
set of hosts simultaneously.  You don't need to display graphs for
all targets to compare them.  For an example, see
http://cricket.cc.columbia.edu/cgi-bin/cricket/grapher.cgi

mailsent
mailsent.c

This program can be installed in /opt/local/sbin/.  It will be called
by cricketbat to count the number of mail messages sent by this host
in the past 60 minutes.  It does this by parsing the /var/log/syslog
file and counting the lines containing "stat=Sent" with the date and
time in the past 60 minutes.  If the current time is between 4am and
5am is also reads the previous version of syslog in
/var/log/syslog.1.gz since the log has recently been rotated.

netapp-oids.html

This file was downloaded from www.netapp.com and contains some useful
information about OIDs used by Network Appliance filers.

netapp.mib

This file was copied from /root/etc/mib/netapp.mib on one of our
Network Appliance filers.  It contains a detailed list of values that
can be obtained from the filer using snmp requests.

realmem
realmem.c

This program can be installed in /opt/local/sbin/.  It will be called
by unixhost to determine the amount of physical memory (real memory)
installed on the current host and the amount of memory in use.

snmpd.conf

Add this line to the snmpd.conf on your Solaris/Linux hosts so that
unixhost is called when the hosts are polled by collector.

unixhost

The unixhost script is called by snmpd to satisfy requests in our
branch of the MIB tree.  Your site's snmpd.conf file must contain a
line that directs those snmp requests to the unixhost script.  Some
requests are handled quickly by unixhost, and the more time consuming
requests are read from the cricketbat file.  The MIB includes an OID
for restarting the cricketbat daemon.  unixhost is always called with
two command line options: the mode and the OID.  For example:

/opt/local/sbin/unixhost -n .1.3.6.1.4.1.2021.255.3.1
/opt/local/sbin/unixhost -p .1.3.6.1.4.1.2021.255.3.1

The "-n" option is used to return the next OID in the tree, which is
used by snmpwalk to traverse the tree.  The "-p" option (or any other
letter besides "n") will return the description and current value
associate with the given OID.  Use snmpwalk to determine if all values
in your mib tree are being returned properly.


Sample Output From snmpwalk
---------------------------

$ snmpwalk localhost yourcommunitystring .1.3.6.1.4.1.2021.255    
enterprises.ucdavis.255.1 = "UNIX Host Metrics"
enterprises.ucdavis.255.2.1 = "number of users"
enterprises.ucdavis.255.2.2 = Gauge32: 46
enterprises.ucdavis.255.3.1 = "swap space in use, total swap kb"
enterprises.ucdavis.255.3.2 = Gauge32: 246456
enterprises.ucdavis.255.3.3 = Gauge32: 930008
enterprises.ucdavis.255.4.1 = "real memory in use, total memory kb"
enterprises.ucdavis.255.4.2 = Gauge32: 452448
enterprises.ucdavis.255.4.3 = Gauge32: 524288
enterprises.ucdavis.255.5.1 = "mail messages queued"
enterprises.ucdavis.255.5.2 = Gauge32: 1
enterprises.ucdavis.255.6.1 = "mail messages sent per hour"
enterprises.ucdavis.255.6.2 = Gauge32: 71
enterprises.ucdavis.255.7.1 = "process count: total imap pine sendmail procmail postgres"
enterprises.ucdavis.255.7.2 = Gauge32: 198
enterprises.ucdavis.255.7.3 = Gauge32: 0
enterprises.ucdavis.255.7.4 = Gauge32: 9
enterprises.ucdavis.255.7.5 = Gauge32: 2
enterprises.ucdavis.255.7.6 = Gauge32: 0
enterprises.ucdavis.255.7.7 = Gauge32: 0
enterprises.ucdavis.255.8.1 = "web server tothits, cuhits, errors, totbits, cubits, procs"
enterprises.ucdavis.255.8.2 = Gauge32: 0
enterprises.ucdavis.255.8.3 = Gauge32: 0
enterprises.ucdavis.255.8.4 = Gauge32: 0
enterprises.ucdavis.255.8.5 = Gauge32: 0
enterprises.ucdavis.255.8.6 = Gauge32: 0
enterprises.ucdavis.255.8.7 = Gauge32: 0
enterprises.ucdavis.255.9.1 = "sec web server tothits, cuhits, errors, totbits, cubits, procs"
enterprises.ucdavis.255.9.2 = Gauge32: 0
enterprises.ucdavis.255.9.3 = Gauge32: 0
enterprises.ucdavis.255.9.4 = Gauge32: 0
enterprises.ucdavis.255.9.5 = Gauge32: 0
enterprises.ucdavis.255.9.6 = Gauge32: 0
enterprises.ucdavis.255.9.7 = Gauge32: 0
enterprises.ucdavis.255.10.1 = "cubmail sessions"
enterprises.ucdavis.255.10.2 = Gauge32: 0
enterprises.ucdavis.255.11.1 = "express, staff, and main dialups in use"
enterprises.ucdavis.255.11.2 = Gauge32: 0
enterprises.ucdavis.255.11.3 = Gauge32: 0
enterprises.ucdavis.255.11.4 = Gauge32: 0
enterprises.ucdavis.255.12.1 = "nfs retransmissions and timeouts"
enterprises.ucdavis.255.12.2 = Gauge32: 0
enterprises.ucdavis.255.12.3 = Gauge32: 0



Columbia's Local OIDs
---------------------

Columbia allocated a portion of the MIB tree to ourselves.  This is
the UC Davis subtree with ".255" at the end.  We added the following
line to the end of our /etc/snmpd.conf

pass .1.3.6.1.4.1.2021.255 /bin/sh /opt/local/sbin/unixhost

When snmpd gets a request for one of these local OIDs it calls
unixhost which either computes the value itself or reads the value
from /tmp/cricketbat.hostname.  The cricketbat script runs on each
host at 5 minute intervals and stores the results in this tmp file.

# these are computed by unixhost

OID	numusers		1.3.6.1.4.1.2021.255.2.2
OID	swapinuse		1.3.6.1.4.1.2021.255.3.2
OID	totalswap		1.3.6.1.4.1.2021.255.3.3
OID	meminuse		1.3.6.1.4.1.2021.255.4.2
OID	totalmem		1.3.6.1.4.1.2021.255.4.3
OID	mqueued			1.3.6.1.4.1.2021.255.5.2

# the more time consuming valus are computed by cricketbat

OID	msent			1.3.6.1.4.1.2021.255.6.2
OID	totalprocs		1.3.6.1.4.1.2021.255.7.2
OID	imapprocs		1.3.6.1.4.1.2021.255.7.3
OID	pineprocs		1.3.6.1.4.1.2021.255.7.4
OID	sendprocs		1.3.6.1.4.1.2021.255.7.5
OID	procprocs		1.3.6.1.4.1.2021.255.7.6
OID	pgresprocs		1.3.6.1.4.1.2021.255.7.7

OID	httptotrpm		1.3.6.1.4.1.2021.255.8.2
OID	httpcurpm		1.3.6.1.4.1.2021.255.8.3
OID	httpfailrpm		1.3.6.1.4.1.2021.255.8.4
OID	httptotbps		1.3.6.1.4.1.2021.255.8.5
OID	httpcubps		1.3.6.1.4.1.2021.255.8.6
OID	httpprocs		1.3.6.1.4.1.2021.255.8.7

OID	httpstotrpm		1.3.6.1.4.1.2021.255.9.2
OID	httpscurpm		1.3.6.1.4.1.2021.255.9.3
OID	httpsfailrpm		1.3.6.1.4.1.2021.255.9.4
OID	httpstotbps		1.3.6.1.4.1.2021.255.9.5
OID	httpscubps		1.3.6.1.4.1.2021.255.9.6
OID	httpsprocs		1.3.6.1.4.1.2021.255.9.7

OID	cubmailsess		1.3.6.1.4.1.2021.255.10.2
OID	expressdial		1.3.6.1.4.1.2021.255.11.2
OID	staffdial		1.3.6.1.4.1.2021.255.11.3
OID	nfsretrans		1.3.6.1.4.1.2021.255.12.2
OID	nfstimeout		1.3.6.1.4.1.2021.255.12.3

# this OID will start cricketbat unless it's already running

OID	cricketbat		1.3.6.1.4.1.2021.255.255.2



Copyright Notice
----------------

Copyright by the Trustees of Columbia University in the City of New
York.  The locally written scripts and documents in the Columbia
University Cricket Package are protected by copyright owned in whole
or in principal part by The Trustees of Columbia University in the
City of New York ("Columbia"). You may download this package for
reference and research purposes only.

COLUMBIA MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED,
WITH RESPECT TO THIS PACKAGE, OR ANY PART THEREOF, INCLUDING ANY
WARRANTIES OF TITLE, NONINFRINGEMENT OF COPYRIGHT OR PATENT RIGHTS OF
OTHERS, MERCHANTABILITY, OR FITNESS OR SUITABILITY FOR ANY PURPOSE.

Distribution and/or alteration by not-for-profit research or
educational institutions for their local use is permitted as long as
this notice is kept intact and attached to the document.  Any other
distribution of copies of these documents or any altered version
thereof is expressly prohibited without prior written consent of
Columbia.


Contact Info
------------

Comments and questions regarding the Columbia University Cricket
Package should be addressed to cricket@columbia.edu

Ben Beecher				July 15, 2005
Unix Systems Group			http://www.columbia.edu/acis/sy/
Academic Information Systems
Columbia University