Delivery Manager, DBA Services at a manufacturing company with 10,001+ employees
Real User
2023-01-25T15:49:08Z
Jan 25, 2023
Datadog isn't as mature as some of the established players like Dynatrace or Splunk. It's a new product, so they are constantly releasing new features, and I don't have much to complain about.
Software Engineering Manager at a healthcare company with 501-1,000 employees
Real User
Top 20
2022-12-06T21:07:00Z
Dec 6, 2022
Overall, we really like the quality and relevance of all of the Datadog products that are currently being used. The documentation is very well organized and is the go-to place for us to find answers to our questions. We would really like to see more from the Service Catalog. It is something that we are interested in. However, some might think it lacks some key features at this time. We will definitely keep our eye out for this and adopt it when all the features are implemented. We're really looking forward to all the great things DD will do.
Integration should have been easier. It is very tough to go to all the services and enable Datadog integration for each AWS service. We can add the AWS services and the services on one page and show only the services that are enabled. A similar approach should be for any other integration. Lately, chat support has a longer waiting time. We would love to get faster chat support. We also need additional support for sending the flare files
Software Developer at a pharma/biotech company with 51-200 employees
Real User
Top 10
2022-12-06T20:54:00Z
Dec 6, 2022
Sometimes it’s difficult to customize certain queries to find specific things, specifically with the logging solution. I’ve used other logging platforms in the past that have extensive and mature query languages. This might not be super friendly to start out with, yet can be very powerful. I wish there was more of an emphasis on query languages instead of the UI-based tooling that Datadog provides. Even though it is powerful on its own, the UI-based design lacks the elegance, efficiency, and complexity.
Software Engineer at a comms service provider with 5,001-10,000 employees
Real User
2022-12-06T20:44:00Z
Dec 6, 2022
Delta traces on the Golang profiler are extremely expensive concerning memory utilization. In a Kubernetes environment where we would like to set per-pod memory allocations as low as possible, the overhead of that profiler feature is prohibitive. In one case, our pods (which were provisioned to target 250 MB and max at 500 MB memory) got stuck in a crash loop due to out-of-memory, which was caused entirely by the delta profiles feature of the profiler. Multistep Datadog synthetics lack the feature of basic arithmetic. For our use case, performing basic arithmetic on the output of previous steps to produce input for subsequent steps would be extremely useful.
There is not much that needs to be improved. The UI is super user-friendly. The deployment process is easy. We enjoy using the integrations with Slack and PagerDuty. Customer support is awesome from our experience. There is a lot of documentation for us to be able to use if we need to. I'm not sure if Datadog can monitor K8s deployments in real-time. For instance, being able to see a deployment step by step visually. This would be helpful if there were any incidents during the deployment. In general, Datadog is a great solution.
Software Engineering Manager at a hospitality company with 1,001-5,000 employees
Real User
2022-12-06T20:16:00Z
Dec 6, 2022
Datadog is so feature-rich that it is often hard to onboard new folks and tough to decide where to invest time. The APM is a perfect example of this. This feature alone has so much (profiling, tracing, span summary, flame graphs). I would love to see more of the insight and automation-focused features, such as the log patterns, where I can spend time more efficiently. The cost of Datadog at scale can get very expensive very quickly. I would like to see a better usage/cost dashboard with breakdowns like the AWS cost explorer.
Senior Software Engineer at a transportation company with 51-200 employees
Real User
2022-12-06T19:56:00Z
Dec 6, 2022
I found the documentation can sometimes be confusing. I tried configuring APM for some of our Python containers, and I had to cross-reference multiple blog posts and the official documentation to figure out which Datadog-agent to use. If I needed a ddtrace trace, what environment variables I should set, etc. Furthermore, to generate my own traces, I wasn't aware that ddtrace adds its own "monkey patching," which led to headaches with respect to configuring the service for RabbitMQ. A more unified and up-to-date documentation suite would be greatly appreciated.
Atlassian Expert at a tech consulting company with 51-200 employees
Real User
2022-12-06T19:50:00Z
Dec 6, 2022
The current way accounts are billed could be vastly improved - especially when involving multiple organizations across multiple accounts in combination with reserved commitments. Being able to have an automatic materialized report on certain dashboards that could be exported as PDF to be shared with non-Datadog users could help a lot. Other than that, we are more than happy with the features we use regularly.
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees
Real User
2022-12-06T19:42:00Z
Dec 6, 2022
Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us. There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.
Senior Engineering Manager,Mobile Wireless Engineering at a comms service provider with 10,001+ employees
Real User
2022-12-06T19:26:00Z
Dec 6, 2022
We need more integration functionality, including certain metrics integration. We should be able to monitor devs and need it to build more monitoring tools and offer leadership metrics.
The product is quite complex, and there are so many features that I either didn't know about or wasn't sure how to use. One thing that could be improved is somehow surfacing interesting or relevant products that might be applicable given our infrastructure. Additionally, the billing can sometimes be confusing and opaque, especially around not making it obvious what the implications can be if you add different AWS integrations. This has caused some unexpected costs in the past due to engineers not understanding how Datadog pricing works.
We primarily use the log management functionality, and the only feedback I have there is better fuzzy text searching in logs (the kind that Kibana has). I've learned about a ton of other offerings, like APM, NPM, etc., over the course of workshops. Once I try those out, I'm sure I will have additional feedback.
Site Reliability Engineer at a financial services firm with 1-10 employees
Real User
Top 20
2022-10-05T09:22:08Z
Oct 5, 2022
Graph filters for logs need to be set manually which works well for JSON but not for unstructured logs. Making structured logs for high-performance applications is over our heads so we had to dump some technical streams for our logs.
Datadog could be improved if it could detect other software in a container or server. Datadog is better than other APM or observability tools, but it focuses mostly on telling the customer what they need to know about the software, database or applications that land on the server. We also need to know the version before setting up an agent with the APM modeling tool. In some instances, the owner of a particular software changes to another person and this person did not originally transfer the knowledge or data to manage the server. The new person needs to monitor this server and they need to know what software or version of software was installed on this server before they used the APM agent for monitoring. If datadog could provide this insight, it would improve how we use the solution. In a future release, we would like to be able to complete a network traffic or network flow analysis to detect the errors or problems on the network.
Senior Engineer at a educational organization with 5,001-10,000 employees
Real User
Top 10
2022-08-15T10:42:13Z
Aug 15, 2022
Datadog needs more local Asia-Pacific support, and if they don't have a SaaS solution in Asia-Pacific, they should offer an on-prem version. I'm told that's not possible.
I haven't really noticed anything that they could improve upon. Maybe they could add in some features to go both ways, to maybe make some configuration changes, etc. That's a little bit outside of what Datadog does, though. It's really very full-featured, so I don't really have any complaints. I haven't really fully looked at the documentation as I know where I need to go and look at things. It could probably be a little bit of a better user experience. There are so many functions there that sometimes navigating your way around is a little bit hard. They have a really nice menu system. However, there's so much there. It's possible that I skipped a guided tour when I started. It’s not intuitive to everyone. There are a lot of technical features.
IT Test Manager at a transportation company with 10,001+ employees
Real User
2022-03-29T15:58:56Z
Mar 29, 2022
I'd like to see more flexibility in the customization and they have a few settings which need to be changed but we are unable to make those changes as users or as the administrator. The tagging to get the different parts of the monitoring interconnected is a bit tricky and takes time to work out.
Chief Strategy Officer (CSO) at a computer software company with 11-50 employees
Real User
2022-02-04T12:22:43Z
Feb 4, 2022
Datadog has a lot of features kind of cramped into one dashboard. It's quite hard to get around what feature does exactly what. There was a steep learning curve, trying to navigate through menus. The menu navigation could improve. If there was a more straightforward way of adding new functions or features to where each menu is placed that would be an improvement.
Sr.Tech.Analyst Monitoreo at a financial services firm with 1,001-5,000 employees
Real User
2021-11-07T10:11:00Z
Nov 7, 2021
It could use some additional features when working with metrics like Grafana or like New Relic has. Datadog does not use library technologies like Dynatrace does. Datadog has machine learning too, but it does not have this option in all layers of monitoring like infrastructure service process in applications.
Senior Cyber Security Expert at a security firm with 11-50 employees
Real User
2021-09-09T19:57:09Z
Sep 9, 2021
While I like the ease of use, when compared with Tenable Nessus they could still improve their usability. They are okay, but there is room to be better. They could have more integration. They could be more intuitive as well. For example, the intuitivity of the user interfaces, and how long it takes for users to learn how to use Datadog. It is not impossible to use, or impossible to do the administration with it but when you put these two next to each other, meaning Nessus and Datadog, Nessus comes out as the winner.
Project Director at a tech services company with 501-1,000 employees
Real User
2021-05-18T17:10:08Z
May 18, 2021
Its pricing model can be improved. Its settings should be improved for a better understanding of billing. They should also provide some alerts when there is an increase in usage. For example, if there is a 20% more increase from one week to another, the customer should get an alert.
It can have a more modernized pricing mechanism. We're actually working with them to figure out how to become more modular and have a better and more modernized pricing mechanism. The issue with Datadog is that you have to buy the whole suite of different products, and you kind of get stuck in the old utilization of 40% of their suite. Most organizations today break down between application development, networking, and security. Therefore, there should be a way to break down different modules into just app dev, infosec, networking, etc. Customers have various needs across their business lines, and sometimes, they're just not willing to have tools that they're not using 100%. AppDynamics is probably a little bit better in terms of being modular.
Senior Manager, Site Reliability Engineering at Extra Space Storage
Real User
2021-01-25T19:36:00Z
Jan 25, 2021
Continued improvement around cost and pricing model is needed. It is pretty complex and takes a fair amount of intimate knowledge to know exactly how turning on a single function is going to impact your bill, especially when you don't see the metrics for a day or two. We have recently had a number of issues with stability and delays on logging, monitoring, metric evaluation, and alerts. More often than not in the past month, it seems that we get the banner across the to of our dashboards that some service is impacted. They don't always show up on the incident page, either.
We need the ability to create a service dependency map like Splunk ITSI. We have to build this in PagerDuty and it's not the best user experience. The ability to create custom inventory objects based on logs ingested would be a value add. It would be better if Datadog makes this a simple click and enable. It would be helpful to have the ability to upgrade agents via the Datadog portal. Once agents are connected to the Datadog portal, we should be able to upgrade them quickly. Security monitoring for Azure and Operating System (Windows and Linux) are features that need to be addressed. Dashboards for Azure Active Directory metrics and events should be improved.
The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts. SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like: * The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful. * The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.
Their logging solution is expensive for our use case. They do have the capability to rehydrate old or incomplete logs, and it works, but I would rather not have to think about that operation. Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion. Positive note is that they do have lots of documentation, it just needs better curation. Their APM solution still needs some work, but they are actively developing it. I would also like to see more database-specific application monitoring.
Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support. The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.
More pre-configured "Monitor Alerts" would be helpful. Datadog's knowledge of its customers and what they are looking for in terms of monitoring and alerting could be taken advantage of with pre-canned alerts. They have started this with "Recommended Monitors". That feature was very helpful when configuring our Kubernetes alerts. More would be even better. Datadog tech support is very good. One area that could be more helpful is actually talking to someone or sharing your screen to help troubleshoot issues that arise. For new cloud engineers just coming into the cloud monitoring field, there is a learning curve. There is a lot to learn and figure out. For example, we still ran into some issues configuring the private link and more videos of how to do things could be of use.
Technology Competency and Solution Head at LearningMate
Real User
Top 20
2020-11-25T16:41:00Z
Nov 25, 2020
The error traceability is an area that can be improved. This is something that helps us to pinpoint the area where a problem is occurring. It is a function stack, and it should be showing us how each function is defined.
Senior Cloud Security Engineer at a financial services firm with 201-500 employees
Real User
2020-10-21T04:33:58Z
Oct 21, 2020
I believe there is room for improvement with this solution. It wasn't easy for me to get a quick understanding of what this tool offers us as opposed to the added tools of AWS. By that, I mean in regards to finding a better way to apply some filters or to create some alarms. I don't get more advanced features in comparison to AWS but at least I get a centralized way of doing things, which can be done on the AWS side as well. It's more complicated because you have to configure some other services to stream their logs from multi accounts to one account. It could be more user friendly and include advanced examples in the documentation showing some use cases or customer case studies, so you can get a clear idea that this functionality provides something extra.
Datadog lacks a deeper application-level insight. Their competitors had eclipsed them in offering ET functionality that was important to us. That's why we stopped using it and switched to New Relic. Datadog's price is also high.
There are things about it that we would like to be fixed, such as it is taking averages of average. This results in data that we don't expect, but overall we are happy with it.
The product could do better with its notifications. I want more technical support than conferences because technical support helps with setting up the product much easier.
Some of their newer solutions are interesting, like their logging, but they are not fleshed out. They could use more metrics or synthetics, which would be really helpful. I would love to see support for front-end and mobile applications. Right now, it is mostly all back-end stuff. Being able to do some integration with our front-end products would be awesome.
The only thing that they were missing that has throw us from the beginning (they are still missing it) is consistency in the APIs. There are a couple of guys on the automation side who complain rightfully over how hard it is because every new feature which comes out has a new way of interfacing with the API. This was our big, red flag in the beginning, but given the price and other features, it wasn't enough for us to discount. We said "That we would live with this one red flag", but it is still a red flag. Stability of the product has been a concern for us outside of the primary monitoring agents. It does not have the best interface.
System Ninja at a philanthropy with 51-200 employees
Real User
2018-12-11T08:30:00Z
Dec 11, 2018
We want to reduce having to go to different screens to obtain all the information. However, they are moving in the right direction from what we have noticed.
Site Reliability Engineer at a computer software company with 201-500 employees
Real User
2018-12-04T07:57:00Z
Dec 4, 2018
The way data is represented can be limiting. They have added their own little query language that you can use to manipulate things, so you can graph and relate two different metrics together. This is relatively new this year. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two. However, it looks like this is the direction that they're going, and that's a good direction. I think they should continue adding things that way. I like being able to put the formulas in myself. I don't want the average. I want a rolling average over three minutes, not five minutes. They're getting better at letting the user customize this.
Datadog is a cloud monitoring solution that is designed to assist administrators, IT teams, and other members of an organization who are charged with keeping a close eye on their networks. Administrators can use Datadog to set real-time alerts and schedule automated report generation. They can deal with issues as they arise and keep up to date with the overall health of their network while still being able to focus on other tasks. Users can also track the historical performance of their...
Datadog is expensive.
The solution needs to integrate AI tools.
The product needs to have more enterprise approach to configuration.
Datadog isn't as mature as some of the established players like Dynatrace or Splunk. It's a new product, so they are constantly releasing new features, and I don't have much to complain about.
Overall, we really like the quality and relevance of all of the Datadog products that are currently being used. The documentation is very well organized and is the go-to place for us to find answers to our questions. We would really like to see more from the Service Catalog. It is something that we are interested in. However, some might think it lacks some key features at this time. We will definitely keep our eye out for this and adopt it when all the features are implemented. We're really looking forward to all the great things DD will do.
Integration should have been easier. It is very tough to go to all the services and enable Datadog integration for each AWS service. We can add the AWS services and the services on one page and show only the services that are enabled. A similar approach should be for any other integration. Lately, chat support has a longer waiting time. We would love to get faster chat support. We also need additional support for sending the flare files
We need more integration with security tools like Drata.
Sometimes it’s difficult to customize certain queries to find specific things, specifically with the logging solution. I’ve used other logging platforms in the past that have extensive and mature query languages. This might not be super friendly to start out with, yet can be very powerful. I wish there was more of an emphasis on query languages instead of the UI-based tooling that Datadog provides. Even though it is powerful on its own, the UI-based design lacks the elegance, efficiency, and complexity.
Custom-level metrics could be improved. Billing should be more transparent.
Delta traces on the Golang profiler are extremely expensive concerning memory utilization. In a Kubernetes environment where we would like to set per-pod memory allocations as low as possible, the overhead of that profiler feature is prohibitive. In one case, our pods (which were provisioned to target 250 MB and max at 500 MB memory) got stuck in a crash loop due to out-of-memory, which was caused entirely by the delta profiles feature of the profiler. Multistep Datadog synthetics lack the feature of basic arithmetic. For our use case, performing basic arithmetic on the output of previous steps to produce input for subsequent steps would be extremely useful.
The product needs a better Datadog agent installation.
There is not much that needs to be improved. The UI is super user-friendly. The deployment process is easy. We enjoy using the integrations with Slack and PagerDuty. Customer support is awesome from our experience. There is a lot of documentation for us to be able to use if we need to. I'm not sure if Datadog can monitor K8s deployments in real-time. For instance, being able to see a deployment step by step visually. This would be helpful if there were any incidents during the deployment. In general, Datadog is a great solution.
The logging could be improved in the future.
Datadog is so feature-rich that it is often hard to onboard new folks and tough to decide where to invest time. The APM is a perfect example of this. This feature alone has so much (profiling, tracing, span summary, flame graphs). I would love to see more of the insight and automation-focused features, such as the log patterns, where I can spend time more efficiently. The cost of Datadog at scale can get very expensive very quickly. I would like to see a better usage/cost dashboard with breakdowns like the AWS cost explorer.
I found the documentation can sometimes be confusing. I tried configuring APM for some of our Python containers, and I had to cross-reference multiple blog posts and the official documentation to figure out which Datadog-agent to use. If I needed a ddtrace trace, what environment variables I should set, etc. Furthermore, to generate my own traces, I wasn't aware that ddtrace adds its own "monkey patching," which led to headaches with respect to configuring the service for RabbitMQ. A more unified and up-to-date documentation suite would be greatly appreciated.
The current way accounts are billed could be vastly improved - especially when involving multiple organizations across multiple accounts in combination with reserved commitments. Being able to have an automatic materialized report on certain dashboards that could be exported as PDF to be shared with non-Datadog users could help a lot. Other than that, we are more than happy with the features we use regularly.
Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us. There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.
We need more integration functionality, including certain metrics integration. We should be able to monitor devs and need it to build more monitoring tools and offer leadership metrics.
The product is quite complex, and there are so many features that I either didn't know about or wasn't sure how to use. One thing that could be improved is somehow surfacing interesting or relevant products that might be applicable given our infrastructure. Additionally, the billing can sometimes be confusing and opaque, especially around not making it obvious what the implications can be if you add different AWS integrations. This has caused some unexpected costs in the past due to engineers not understanding how Datadog pricing works.
We primarily use the log management functionality, and the only feedback I have there is better fuzzy text searching in logs (the kind that Kibana has). I've learned about a ton of other offerings, like APM, NPM, etc., over the course of workshops. Once I try those out, I'm sure I will have additional feedback.
Sometimes, it takes a long time to load the dashboard if we have many charts.
Graph filters for logs need to be set manually which works well for JSON but not for unstructured logs. Making structured logs for high-performance applications is over our heads so we had to dump some technical streams for our logs.
Datadog could be improved if it could detect other software in a container or server. Datadog is better than other APM or observability tools, but it focuses mostly on telling the customer what they need to know about the software, database or applications that land on the server. We also need to know the version before setting up an agent with the APM modeling tool. In some instances, the owner of a particular software changes to another person and this person did not originally transfer the knowledge or data to manage the server. The new person needs to monitor this server and they need to know what software or version of software was installed on this server before they used the APM agent for monitoring. If datadog could provide this insight, it would improve how we use the solution. In a future release, we would like to be able to complete a network traffic or network flow analysis to detect the errors or problems on the network.
Datadog needs more local Asia-Pacific support, and if they don't have a SaaS solution in Asia-Pacific, they should offer an on-prem version. I'm told that's not possible.
I haven't really noticed anything that they could improve upon. Maybe they could add in some features to go both ways, to maybe make some configuration changes, etc. That's a little bit outside of what Datadog does, though. It's really very full-featured, so I don't really have any complaints. I haven't really fully looked at the documentation as I know where I need to go and look at things. It could probably be a little bit of a better user experience. There are so many functions there that sometimes navigating your way around is a little bit hard. They have a really nice menu system. However, there's so much there. It's possible that I skipped a guided tour when I started. It’s not intuitive to everyone. There are a lot of technical features.
Datadog could improve the flexibility with AI and ML concepts. This will allow customers to be more leveraged towards publishing.
I'd like to see more flexibility in the customization and they have a few settings which need to be changed but we are unable to make those changes as users or as the administrator. The tagging to get the different parts of the monitoring interconnected is a bit tricky and takes time to work out.
Datadog has a lot of features kind of cramped into one dashboard. It's quite hard to get around what feature does exactly what. There was a steep learning curve, trying to navigate through menus. The menu navigation could improve. If there was a more straightforward way of adding new functions or features to where each menu is placed that would be an improvement.
The setup was a bit complex. As Datadog is a bit on the expensive side, I would recommend it for simple, uncomplicated, solutions.
It could use some additional features when working with metrics like Grafana or like New Relic has. Datadog does not use library technologies like Dynatrace does. Datadog has machine learning too, but it does not have this option in all layers of monitoring like infrastructure service process in applications.
They could look into improving the integration. I'd like to see better pricing and more integration in the next release.
While I like the ease of use, when compared with Tenable Nessus they could still improve their usability. They are okay, but there is room to be better. They could have more integration. They could be more intuitive as well. For example, the intuitivity of the user interfaces, and how long it takes for users to learn how to use Datadog. It is not impossible to use, or impossible to do the administration with it but when you put these two next to each other, meaning Nessus and Datadog, Nessus comes out as the winner.
Its pricing model can be improved. Its settings should be improved for a better understanding of billing. They should also provide some alerts when there is an increase in usage. For example, if there is a 20% more increase from one week to another, the customer should get an alert.
It can have a more modernized pricing mechanism. We're actually working with them to figure out how to become more modular and have a better and more modernized pricing mechanism. The issue with Datadog is that you have to buy the whole suite of different products, and you kind of get stuck in the old utilization of 40% of their suite. Most organizations today break down between application development, networking, and security. Therefore, there should be a way to break down different modules into just app dev, infosec, networking, etc. Customers have various needs across their business lines, and sometimes, they're just not willing to have tools that they're not using 100%. AppDynamics is probably a little bit better in terms of being modular.
Continued improvement around cost and pricing model is needed. It is pretty complex and takes a fair amount of intimate knowledge to know exactly how turning on a single function is going to impact your bill, especially when you don't see the metrics for a day or two. We have recently had a number of issues with stability and delays on logging, monitoring, metric evaluation, and alerts. More often than not in the past month, it seems that we get the banner across the to of our dashboards that some service is impacted. They don't always show up on the incident page, either.
We need the ability to create a service dependency map like Splunk ITSI. We have to build this in PagerDuty and it's not the best user experience. The ability to create custom inventory objects based on logs ingested would be a value add. It would be better if Datadog makes this a simple click and enable. It would be helpful to have the ability to upgrade agents via the Datadog portal. Once agents are connected to the Datadog portal, we should be able to upgrade them quickly. Security monitoring for Azure and Operating System (Windows and Linux) are features that need to be addressed. Dashboards for Azure Active Directory metrics and events should be improved.
The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts. SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like: * The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful. * The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.
In the past two years, there have been a couple of outages.
Their logging solution is expensive for our use case. They do have the capability to rehydrate old or incomplete logs, and it works, but I would rather not have to think about that operation. Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion. Positive note is that they do have lots of documentation, it just needs better curation. Their APM solution still needs some work, but they are actively developing it. I would also like to see more database-specific application monitoring.
Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support. The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.
More pre-configured "Monitor Alerts" would be helpful. Datadog's knowledge of its customers and what they are looking for in terms of monitoring and alerting could be taken advantage of with pre-canned alerts. They have started this with "Recommended Monitors". That feature was very helpful when configuring our Kubernetes alerts. More would be even better. Datadog tech support is very good. One area that could be more helpful is actually talking to someone or sharing your screen to help troubleshoot issues that arise. For new cloud engineers just coming into the cloud monitoring field, there is a learning curve. There is a lot to learn and figure out. For example, we still ran into some issues configuring the private link and more videos of how to do things could be of use.
The error traceability is an area that can be improved. This is something that helps us to pinpoint the area where a problem is occurring. It is a function stack, and it should be showing us how each function is defined.
I believe there is room for improvement with this solution. It wasn't easy for me to get a quick understanding of what this tool offers us as opposed to the added tools of AWS. By that, I mean in regards to finding a better way to apply some filters or to create some alarms. I don't get more advanced features in comparison to AWS but at least I get a centralized way of doing things, which can be done on the AWS side as well. It's more complicated because you have to configure some other services to stream their logs from multi accounts to one account. It could be more user friendly and include advanced examples in the documentation showing some use cases or customer case studies, so you can get a clear idea that this functionality provides something extra.
Datadog lacks a deeper application-level insight. Their competitors had eclipsed them in offering ET functionality that was important to us. That's why we stopped using it and switched to New Relic. Datadog's price is also high.
Additional metrics should be included. Better integration with other solutions is needed.
There are things about it that we would like to be fixed, such as it is taking averages of average. This results in data that we don't expect, but overall we are happy with it.
The product could do better with its notifications. I want more technical support than conferences because technical support helps with setting up the product much easier.
Some of their newer solutions are interesting, like their logging, but they are not fleshed out. They could use more metrics or synthetics, which would be really helpful. I would love to see support for front-end and mobile applications. Right now, it is mostly all back-end stuff. Being able to do some integration with our front-end products would be awesome.
The only thing that they were missing that has throw us from the beginning (they are still missing it) is consistency in the APIs. There are a couple of guys on the automation side who complain rightfully over how hard it is because every new feature which comes out has a new way of interfacing with the API. This was our big, red flag in the beginning, but given the price and other features, it wasn't enough for us to discount. We said "That we would live with this one red flag", but it is still a red flag. Stability of the product has been a concern for us outside of the primary monitoring agents. It does not have the best interface.
I would like testing for data in the future. That would be really nice. Also, I would like some additional enhancement in the visuals.
We want to reduce having to go to different screens to obtain all the information. However, they are moving in the right direction from what we have noticed.
The on-premise version is very difficult to upgrade.
The way data is represented can be limiting. They have added their own little query language that you can use to manipulate things, so you can graph and relate two different metrics together. This is relatively new this year. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two. However, it looks like this is the direction that they're going, and that's a good direction. I think they should continue adding things that way. I like being able to put the formulas in myself. I don't want the average. I want a rolling average over three minutes, not five minutes. They're getting better at letting the user customize this.