Datadog Room for Improvement
Continued improvement around cost and pricing model is needed. It is pretty complex and takes a fair amount of intimate knowledge to know exactly how turning on a single function is going to impact your bill, especially when you don't see the metrics for a day or two.
We have recently had a number of issues with stability and delays on logging, monitoring, metric evaluation, and alerts. More often than not in the past month, it seems that we get the banner across the to of our dashboards that some service is impacted. They don't always show up on the incident page, either.View full review »
Their logging solution is expensive for our use case. They do have the capability to rehydrate old or incomplete logs, and it works, but I would rather not have to think about that operation.
Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion. Positive note is that they do have lots of documentation, it just needs better curation.
Their APM solution still needs some work, but they are actively developing it. I would also like to see more database-specific application monitoring.
More pre-configured "Monitor Alerts" would be helpful. Datadog's knowledge of its customers and what they are looking for in terms of monitoring and alerting could be taken advantage of with pre-canned alerts. They have started this with "Recommended Monitors". That feature was very helpful when configuring our Kubernetes alerts. More would be even better.
Datadog tech support is very good. One area that could be more helpful is actually talking to someone or sharing your screen to help troubleshoot issues that arise. For new cloud engineers just coming into the cloud monitoring field, there is a learning curve. There is a lot to learn and figure out. For example, we still ran into some issues configuring the private link and more videos of how to do things could be of use.View full review »
Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support.
The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.
We need the ability to create a service dependency map like Splunk ITSI. We have to build this in PagerDuty and it's not the best user experience. The ability to create custom inventory objects based on logs ingested would be a value add. It would be better if Datadog makes this a simple click and enable.
It would be helpful to have the ability to upgrade agents via the Datadog portal. Once agents are connected to the Datadog portal, we should be able to upgrade them quickly.
Security monitoring for Azure and Operating System (Windows and Linux) are features that need to be addressed.
Dashboards for Azure Active Directory metrics and events should be improved.View full review »
Head of Digital & Cognitive Services at a tech company with 11-50 employees
It can have an artificial intelligence component. Even though I can seamlessly look at end-to-end security, it would be better to have alerts and notifications powered by an AI engine. I am not sure if they have an AI component. We have not reached out to them or looked at it, but this is something that I keep on talking about within our company in terms of features. Such a feature would be good to have, and it would further optimize my Security Ops team's abilities.View full review »
Director of Cloud Operations at a tech services company with 11-50 employees
It can have a more modernized pricing mechanism. We're actually working with them to figure out how to become more modular and have a better and more modernized pricing mechanism. The issue with Datadog is that you have to buy the whole suite of different products, and you kind of get stuck in the old utilization of 40% of their suite. Most organizations today break down between application development, networking, and security. Therefore, there should be a way to break down different modules into just app dev, infosec, networking, etc. Customers have various needs across their business lines, and sometimes, they're just not willing to have tools that they're not using 100%. AppDynamics is probably a little bit better in terms of being modular.View full review »
Datadog lacks a deeper application-level insight. Their competitors had eclipsed them in offering ET functionality that was important to us. That's why we stopped using it and switched to New Relic.
Datadog's price is also high.View full review »
The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts.
SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like:
- The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful.
- The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.
Senior Cloud Security Engineer at a financial services firm with 201-500 employees
I believe there is room for improvement with this solution. It wasn't easy for me to get a quick understanding of what this tool offers us as opposed to the added tools of AWS. By that, I mean in regards to finding a better way to apply some filters or to create some alarms. I don't get more advanced features in comparison to AWS but at least I get a centralized way of doing things, which can be done on the AWS side as well. It's more complicated because you have to configure some other services to stream their logs from multi accounts to one account. It could be more user friendly and include advanced examples in the documentation showing some use cases or customer case studies, so you can get a clear idea that this functionality provides something extra.
We haven't used the solution too much yet to assess what features it is missing or would improve. The support in Latin American is a point that would mark as a point to improveView full review »
Project Director at a tech services company with 501-1,000 employees
Its pricing model can be improved. Its settings should be improved for a better understanding of billing. They should also provide some alerts when there is an increase in usage. For example, if there is a 20% more increase from one week to another, the customer should get an alert.View full review »
The error traceability is an area that can be improved. This is something that helps us to pinpoint the area where a problem is occurring. It is a function stack, and it should be showing us how each function is defined.View full review »
I'm still exploring the trial version, and it is fine. One thing that I haven't been able to figure out is how to retrieve a report. This is something that could be improved. I probably need to navigate to a place to access the reports.View full review »
Cloud Architect at a tech services company
Additional metrics should be included.
Better integration with other solutions is needed.View full review »
In the past two years, there have been a couple of outages.View full review »
Senior Software Engineer at a financial services firm with 201-500 employees
The Log Explorer could be better. I don't think it has log manipulation as Splunk does.View full review »