Alert aggregation and the correlation platform are extremely useful, streamlining our incident management
What is our primary use case?
We use BigPanda to aggregate alerts from multiple sources (Nagios, Sensu, Wavefront, Splunk, etc.) and correlate related alerts into incidents.
Pros and Cons
"Alert aggregation was the primary requirement. BigPanda pulls all this together into a single UI for us, allowing us to see related alerts grouped together into an incident, and enables us to easily create a JIRA ticket and Slack channel to manage an issue."
"We have also made extensive use of the outbound integrations to ticketing systems (JIRA) and collaboration tools (Slack). The main driver for us has been getting all alerting into a single UI and enabling us to streamline our incident management process."
"Our infrastructure is quite large - tens of thousands of servers, often with 30-plus checks running on each host with one minute intervals. This generates a lot of data often in bursts (when we have a large scale failure). This has caused some delay in the ingestion pipeline."
What other advice do I have?
I think BigPanda is a great company with a quality product. As with any largescale tooling change there will be challenges, but the team was very responsive in resolving issues.