Prometheus is the primary tool that Enterprise Infrastructure uses internally to monitor base level of service for ITS managed and co-managed systems. Prometheus alerts EI sysadmins via email and pages on-call staff about issues it detects during pre-defined monitoring windows.
By default, Prometheus monitors the services at different levels of criticality, including:
- Ping
- CPU
- Disk usage
- Swap usage
- Memory usage
- OS log errors & warnings
- Select services and/or processes
Alert windows, thresholds, and communication target groups vary by operating system and systems hosting service level.
Additional information about Enterprise Infrastructure's Prometheus environment can be found on their SharePoint documentation site for Prometheus.