What is our primary use case?
How has it helped my organization?
It has empowered all our platform engineers with a very powerful and easy to use monitoring system. Most of our platform organization is now involved in monitoring. Previously, only a handful of platform engineers were involved, because Graphite and Sensu were so cumbersome to use.
What is most valuable?
It is incredibly easy to do common monitoring actions:
- Excellent autocomplete for everything in the UI.
- Using tags is very intuitive (in contrast to the cumbersome regex-like based querying in Graphite).
- Going from viewing a metric to creating a monitor alerting on a metric is very easy. This is very important as the easier it is to create monitors, the more monitors will be created by people. With Graphite and Sensu, the effort required to create and test a monitor was so great that we had only a handful of monitors. We now have over 300 monitors.
What needs improvement?
- It would be nice to be able to graph metrics by excluding certain tags (like you can do in monitors).
- It would also be nice if we had more insight into our own usage of Datadog (agents and custom metrics). They provide a usage page which does help, but it is not in real-time.
- It would be great if usage metrics were automatically created and we could create custom metrics, instead we ended up building some of our own stuff to track and alert on our own usage.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
Very rarely. Maybe only once or twice that we noticed. It is very reliable.
What do I think about the scalability of the solution?
How are customer service and technical support?
It is excellent. The web app has a real-time support chat window in which a support engineer is chatting with you within a minute. That is the "right" way to do support.
Which solution did I use previously and why did I switch?
We previously ran Graphite and Sensu ourselves. By moving to Datadog, we did not need to manage our own monitoring infrastructure anymore. Graphite was somewhat complex to run.
How was the initial setup?
Initial setup is easy. Install the agent and send it metrics. There are StatsD/Datadog libraries available for most languages.
What's my experience with pricing, setup cost, and licensing?
Pricing seems reasonable. It depends on the size of your organization, the size of your infrastructure, and what portion of your overall business costs go toward infrastructure. It is hard to say without looking at all of this.
Which other solutions did I evaluate?
We looked at several competitors at the time (Summer 2016). There did not seem to be any compelling alternatives. Once we did the PoC with Datadog, we loved it and decided to move forward.
What other advice do I have?
Try it out and see if you like it.