Spong is the primary tool SST uses internally to monitor base level of service for SST managed and co-managed servers.  Spong alerts SST admins by email and pages on-call staff about issues it detects during pre-defined monitoring windows.


Monitoring and alerting above and beyond the default monitoring baseline is typically implemented by the customer/group requesting the additional monitoring.

 

By default, Spong monitors the following services at the different levels of criticality (yellow/warning or red/failure).

Linux/AIX:  ping, CPU load, swap usage, disk usage, errors and warnings in the "messages" log, mail queue size, server management processes, SSH

Windows:  ping, CPU usage, memory (RAM) and pagefile usage, disk usage, errors and warnings in system event logs, select services and/or processes

*Optional network services (besides ping):  simple HTTP(S) (URL), FTP, SMTP, SNMP, DNS, LDAP, NFS, SSH, SQL ping

*=May not be available for co-managed servers.  See Co-managed MOU for details.

Alerting options for Spong are as follows (SST may call the customer contact if a monitoring service alerts):

  1. Standard monitoring 6am - 6pm Monday - Friday
  2. Production monitoring 6am - 12am [daily OR Monday - Friday]
  3. 24x7 monitoring (for critical systems only)

(Alerting is disabled during scheduled maintenance such as OS patching.)

Note: For monitoring coverage outside of 6-6 M-F, the customer must provide phone numbers for two (2) contacts that can be reached promptly after hours (the SST admin on-call is expected to respond within 30 minutes, and SST expects a similar response time from the customer).  Any issue reported that requires customer involvement must be dealt with promptly (i.e. responses such as "can this wait until the morning or Monday?" are unacceptable and will result in reverting coverage to 6-6 M-F -- if it's not important enough for SST to wake the customer, it's not important enough for Spong to wake SST).

To request a system to be monitored by Spong or request changes to how or when a server is monitored, please contact the primary admin for your server or the ITS Help Desk.

Article number: 
115986
Last updated: 
September 12, 2019