Software Engineering Manager at a healthcare company with 501-1,000 employees
Real User
Top 20
Great CI visibility, logging, and monitoring
Pros and Cons
  • "Datadog helps us detect issues early on and helps in troubleshooting."
  • "We would really like to see more from the Service Catalog."

What is our primary use case?

We mainly use the product to monitor our infrastructure and apps. It is the go-to tool when we want to check that things are running properly. We use Datadog synthetic monitors to ensure our app works across different locations in the United States. 

We also have set up Datadog monitors to send alerts if things stop working as expected. 

We use Continuous Integration Pipeline visibility to make sure our developers are not being blocked by infrastructure and other things that might be out of their control.

How has it helped my organization?

Datadog helps us detect issues early on and helps in troubleshooting. Creating Service Level Objectives and defining monitors is helping us to stay on top of potential issues that might affect our users. 

We take advantage of Application Performance Monitoring to ensure our applications are working as expected, and our users can get the healthcare they need at a price they can afford. 

Synthetic monitoring also helps us in testing our application in different browsers.

What is most valuable?

The most valuable aspects of the solution include: 

CI visibility, which helps us in making sure our CI systems are running efficiently and are not blocking our developers from releasing new software and fixing bugs.

Logs, which help us in debugging issues where we can search for logs and can make sure they are relevant to the issues we are looking at.

APM, which can help us to stay on top of our applications by giving us the confidence that our apps are running.

Monitoring. We use monitoring a lot to ensure we know about potential issues and fix them before they affect our customers.

What needs improvement?

Overall, we really like the quality and relevance of all of the Datadog products that are currently being used. 

The documentation is very well organized and is the go-to place for us to find answers to our questions. 

We would really like to see more from the Service Catalog. It is something that we are interested in. However, some might think it lacks some key features at this time. We will definitely keep our eye out for this and adopt it when all the features are implemented. 

We're really looking forward to all the great things DD will do.

Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,479 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

The stability is great.

What do I think about the scalability of the solution?

The scalability is great.

How are customer service and support?

Technical support is great.

What about the implementation team?

We handled the initial setup in-house.

What's my experience with pricing, setup cost, and licensing?

I don't have any insights into pricing.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Software Engineer at a transportation company with 51-200 employees
Real User
Good dashboard, excellent monitoring, and easy to expand
Pros and Cons
  • "Datadog has helped us a ton by allowing us to set up a multitude of easily configurable alarms across our tech stack and infrastructure."
  • "I found the documentation can sometimes be confusing."

What is our primary use case?

We primarily use Datadog for alerts. If we're running out of database connections or CPU credits we want to find out in Slack. Datadog provides nice features for that.

Secondarily, we use Datadog for analyzing historical trends and forecasting potential issues.

I'm trying to learn how to add in Continuous Profiler in our primary backend servers and set up Synthetic Tests for monitoring our front end.

Everything is mostly on AWS, and the Datadog integrations help a ton.

How has it helped my organization?

Datadog has helped us a ton by allowing us to set up a multitude of easily configurable alarms across our tech stack and infrastructure. It doesn't matter if it's in AWS Lambda or a Docker container in AWS EC2, Datadog's intuitive interface makes alarms incredibly easy to configure, reducing our resolution time for incidents.

A lot of the value comes from how frictionless the integrations are. Adding in a Datadog agent or flipping a switch on the Datadog UI to start streaming Lambda data makes the product so incredibly appealing for my company.

What is most valuable?

The monitoring feature has been the most valuable.

I really like the dashboard. Monitoring has a straightforward tie-in to business value at my company (i.e. declaring incidents, etc). Things like having a dashboard and APM make my job easier. That said DevX is a little bit of a harder sell to executives in my company.

The dashboard feature makes it so easy to inspect multiple metrics at once across services. It's truly been a lifesaver when I'm personally trying to understand why performance degradation is happening.

What needs improvement?

I found the documentation can sometimes be confusing. I tried configuring APM for some of our Python containers, and I had to cross-reference multiple blog posts and the official documentation to figure out which Datadog-agent to use. If I needed a ddtrace trace, what environment variables I should set, etc. 

Furthermore, to generate my own traces, I wasn't aware that ddtrace adds its own "monkey patching," which led to headaches with respect to configuring the service for RabbitMQ.

A more unified and up-to-date documentation suite would be greatly appreciated.

For how long have I used the solution?

I've used the solution for about two years.

What do I think about the stability of the solution?

I don't recall seeing an incident from Datadog in the past couple of years and that's been wonderful.

What do I think about the scalability of the solution?

The solution is incredibly scalable! To be fair, our data throughput to Datadog isn't super huge, however, we have never seen issues as it scaled to handle more of our data.

Which solution did I use previously and why did I switch?

We used to use AWS Cloudwatch for a lot of our monitoring needs. That said, the interface felt clunky, confusing, and limited.

What was our ROI?

We don't have hard numbers on ROI. That said, overall, it has been a wonderful addition to our tooling suite.

Which other solutions did I evaluate?

We also looked at Honeycomb and are currently using both in production.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
April 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,479 professionals have used our research since 2012.
Senior Cloud Engineer at a comms service provider with 10,001+ employees
Real User
Good platform monitoring and great cost and performance optimization
Pros and Cons
  • "The observability pipelines are the most valuable aspect of the solution."
  • "Geo-data is also something very critical that we hope to see in the future."

What is our primary use case?

We use the solution primarily for platform monitoring for the services that are deployed in AWS. It gives a better way to monitor the services, including pods, cost, high availability, etc. This way, observability is ensured and also customer services are uninterrupted. 

Also, we host the data pipelines between the cloud and the on-prem for which Datadog is used to ensure better services. We report issues based on the metrics reported over it. 

How has it helped my organization?

Cost and performance optimization were the major enhancements for our organization. It gives us platform monitoring for the services that are deployed in AWS for a better way to monitor the services (pods, cost, high availability, etc.). With this product, we ensure that observability and also keep customer services uninterrupted. We host the data pipelines between the cloud and the on-prem. Datadog helps to ensure better services. We find we can report issues based on the metrics reported over it.

What is most valuable?

The observability pipelines are the most valuable aspect of the solution. 

Platform monitoring for the services that are deployed in AWS is helpful. It gives a better way to monitor the services. With Datadog, we ensure observability and maintain uninterrupted customer service. 

We can host the data pipelines between the cloud and the on-prem. Issues are easily reported.

The data streams are good. Data lineage is something that really helped in ensuring tracking of the data and metrics and also the volumes processed.

What needs improvement?

We'd like to see better transformers.

Live chat would be the best way to support us. 

Also, the features that we saw getting launched recently were something we expected and we're glad to see them coming.  

Geo-data is also something very critical that we hope to see in the future.

For how long have I used the solution?

I've used the solution for two or more years. 

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
LuWang - PeerSpot reviewer
DevOps Engineer at Screencastify
Real User
Customizable and helpful for isolating and filtering environments
Pros and Cons
  • "We have way more observability than what we had before - on the application and the overall system."
  • "Auto instrumentation on tracing has not been very easy to find in the documentation."

What is our primary use case?

We use Datadog for observability and system/application health, mainly for product support, triaging, debugging, and incident responses.

We use a lot of the logging and the Datadog agent to collect logs, metrics, and traces from our GKE workloads. We use APM and continuous profiling for latency and performance measurement. We use RUM to observe frontend user events, such as tracing on request and what actions they take before errors occur. We also use error tracking and source maps to debug production failures.

We are still relatively new to the product, and we are planning to use more of the notebook functionality and power packs to record run books and break knowledge silos. We also need to utilize dashboards and continuous profiling more for performance measurement and integrate Datadog alerts for incident response.

How has it helped my organization?

We have way more observability than what we had before - on the application and the overall system. That includes the GKE cluster, nodes, and pods. It's helped with our cloud-run instances, databases, and data storage.

We also started observability in the CI pipeline to measure our CI performance, as it was a pain point for us. We are aiming to do incremental deployments and releases, and the bottleneck so far has been our CI performance. The visibility on which actions or functions take the most time allows us to pinpoint and focus on improving configurations on these.

What is most valuable?

We use structure logging a lot to triage production issues. The querying, attributes and tags manipulation, and customization have been very helpful in isolating and filtering environments. The integration with Winston logger has also been a breeze.

First and foremost, was that structured logging, tags, and attributes have not only allowed us to narrow down to a problem quickly in production, they have also let us create dashboards from these logs to understand more user behaviors, such as how many users stop and leave our application before an upload has completed. That helps us understand how important processing time is to a user.

We also intend to use distributed tracing more to understand where the error has occurred in a particular request.

What needs improvement?

Definitely, documentation could use improvement. As I navigated and try to find instrumentation and implementation details, I discovered inconsistency among SDKs based on languages. 

There are also places where highlighting can be improved. I once created an issue on GitHub, and it was resolved right away by an engineer. He pointed out that it was actually in the documentation. I looked again and found it was not very obvious. We were stuck on the problem for days.

Auto instrumentation on tracing has not been very easy to find in the documentation. We ended up using OpenTelemetry, yet the conversion between tracing contexts has been difficult.

For how long have I used the solution?

We've used the solution between six months and a year. 

How are customer service and support?

Customer service and support are generally very fast. I did experience one ticket, which involved changing the log index retention period, not being responded to. Any support tickets related to technical issues were resolved pretty fast.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used to use GCP Stackdriver for logging and monitoring since our infrastructure is all GCP based. It was lacking a lot, particularly on tracing and structured logging. We often had a lot of trouble triaging and diagnosing a production problem. Datadog's specialty is observability. Since we started using the product, we were able to create dashboards, and utilize APM, continuous profiling, RUM, and distributed tracing for production support and user trends.

Datadog also offers labs and workshops for its products, which is very helpful.

What about the implementation team?

We implemented the product ourselves.

What was our ROI?

I'm not sure what our ROI would be.

What's my experience with pricing, setup cost, and licensing?

We started with on-demand pricing as we were re-writing our product, and we weren't sure about the total usage. After we went into production and released the product, we experienced a price surge. Fortunately, our Datadog account manager reached out to us and suggested a monthly subscription, which is what we'll be switching to.

I'd advise keeping an eye on the usage and possibly setting up some monitoring on price. We didn't have much of a setup cost; we started with a free trial and continued with on-demand after the trial ended.

Which other solutions did I evaluate?

We didn't evaluate many of the other options. However, we do also use OpenTelemetry, which is vendor agnostic and integrates with Datadog.

What other advice do I have?

We always keep the Datadog agent to the latest version.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Software Engineer at Spring Health
User
Great dashboards and custom metrics with the ability to parse logs
Pros and Cons
  • "The dashboards are great."
  • "We need more advanced querying against logs."

What is our primary use case?

We share dashboards, set up alerts, and monitor everything that happens in our system. We use it in staging, features, production, and our load test environment. It is exceptionally helpful for making our engineering more data-driven. 

I came from a company that believes we should focus on being telemetry driven. Instilling this in a smaller, less mature engineering organization has been challenging. However, it is much easier while using Datadog.

What is most valuable?

The dashboards are great. They are an easy way to give visibility into what we need to watch with others who are not SMEs.

I enjoy the custom metrics. With this, we can take things that were once logs and then retain them longer.

We are able to parse logs. To be honest, this was only useful due to the fact that we had not yet set up the Datadog agent properly in PHP. Once we did this, the Datadog log parsing was no longer needed.

The ability to pin to a date and time is very helpful. This allows us to pinpoint exactly what was happening.

What needs improvement?

We need more advanced querying against logs. While most issues I have had here can be alleviated by way of sending better-formatted logs, it would be cool to do SQL-type queries against our data.

We need a way to see dashboard metadata. We launched a huge customer, and we saw more people using Datadog than ever across the entire organization, yet had no way to tell.

It would be ideal if we had some way to compare arbitrary date times more easily. We would love to use the Diff Graph command against some hard-coded value, for instance, against some known event.

For how long have I used the solution?

I've used the solution for eight months.

What do I think about the scalability of the solution?

The scalability is great!

Which solution did I use previously and why did I switch?

We previously used New Relic. I was not part of the decision-making team that made the switch.

What was our ROI?

The ROI is the speed at which we can debug live sites. It has been excellent. It's amazing how many incidents we can capture before customers notice.

Which other solutions did I evaluate?

We looked into New Relic and a home-brewed solution as potential other options.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Infrastructure engineer at a insurance company with 10,001+ employees
Real User
Good infrastructure, helpful logs, and useful alerts
Pros and Cons
  • "It has a high-level insight into the infrastructure model of the application and provides important detailed data on the host and metrics, which is the main concern of our customers."
  • "I sometimes log in and see items changed, either in the UI or a feature enabled. To see it for the first time without proper communication can sometimes come as a shock."

What is our primary use case?

Our use case is to provide cloud organization application monitoring. I use it for insight into what host in what region has activity or what market is using Datadog to its fullest potential and utilizing that for cost. This may also help determine who is using monitoring and setting alerts or just setting up monitoring and not doing anything about it. The use case can also be to check when the host or applications are down, or if the usage of CPU, memory, etc, is too high.

How has it helped my organization?

The solution has improved our organization from a market perspective. We have multiple departments and need some time to gather that data from a grouping point of view. Grouping that data via tag or seeing the separation is easy. In addition, it provides metrics and insights for senior leadership to have a high level of usage and cost. Application teams have better insight into their application, outages, when to plan for patches, updates, etc. Also, they have a better understanding of where the data gaps may be.

What is most valuable?

The infrastructure is the most valuable. It has a high-level insight into the infrastructure model of the application and provides important detailed data on the host and metrics, which is the main concern of our customers. It provides confirmation that the layer where the application is running is monitored and will be alerted when it is down and not functional. The customers can have ease of mind knowing their metrics are accurately being measured. The value of data provided, including service name, logs, and all other pertinent details tied to the host, makes it a valuable source of data

What needs improvement?

The solution can be improved via open communication to the broader audience on what has changed and what has not changed. I sometimes log in and see items changed, either in the UI or a feature enabled. To see it for the first time without proper communication can sometimes come as a shock.

For how long have I used the solution?

I have been using the solution for three years.

What do I think about the stability of the solution?

The stability is great.

How are customer service and support?

Technical support is great. Datadog has the resources and knowledge to tackle questions.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I did not previously use a different solution.

How was the initial setup?

The initial setup is straightforward.

What about the implementation team?

The initial setup was handled in-house.

Which other solutions did I evaluate?

I did not evaluate any other solutions.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Architect at a comms service provider with 10,001+ employees
Real User
Good for monitoring and following metrics with a helpful flame graph
Pros and Cons
  • "Flame graphs are pretty useful for understanding how GraphQL resolves our federated queries when it comes to identifying slow points in our requests. In our microservice environment with 170 services."
  • "I often have issues with the UI in my browser."

What is our primary use case?

We use the solution primarily for distributed tracing, service insight and observability, metrics, and monitoring. We create custom metrics from outbound service calls to trace the availability of back-office systems. 

We use the flame graph to get insights into our GraphQL implementation. It helps highlight how resolvers work. 

However, it's lacking in tracing which GraphQL queries are run, and we use custom spans for that.

How has it helped my organization?

Prior, the team only had Instana, and few people used it. The main barriers to entry were the access (since it was not integrated into our SSO) and the user experience, which made it hard to follow. We had an on-prem version, and it wasn't the snappiest. The APM has made observability and tracing more accessible to developers.

What is most valuable?

Flame graphs are pretty useful for understanding how GraphQL resolves our federated queries when it comes to identifying slow points in our requests. In our microservice environment with 170 services. There are complex transactions over the course of a single user request since we essentially operate as a middle layer with 90 back office systems we integrate to.

What needs improvement?

I often have issues with the UI in my browser. I tend to have a lot of tabs open, yet have issues with it not responding or not showing data. A couple of times, pasting the URL into an incognito window shows the data that's there.

For how long have I used the solution?

I've used the solution for two years. 

How was the initial setup?

The initial setup was complex and required a bit of tweaking to get everything configured correctly and into our pipelines.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Software Engineer at a tech vendor with 1,001-5,000 employees
Real User
Great profiling and tracing but storage is expensive
Pros and Cons
  • "Anything I've wanted to do, I found a way to get it done through Datadog."
  • "When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself."

What is our primary use case?

We use the solution for application hosting and a little bit of everything when it comes to supporting a worldwide logistics tracking service. It's used as a central service for collecting telemetrics and logs. We find it does the same work as all of our old tools combined, including Prometheus, Kibana, Google Logs, and more; putting all of this information in a single platform makes it easy to corroborate information and associate a request with the data, which might be lost when it is saved as logs.

How has it helped my organization?

At my organization, we have plenty of microservices written in different languages. Different teams prefer one or the other framework or library within those languages.

With Datadog, we can get in a single line and march in the same direction; our logs and metrics are collected in the same fashion, making it easy to find bugs or integration problems across services and understand how they interact with other systems.

What is most valuable?

I primarily prefer to utilize the profiling and tracing feature. It can potentially be used as a more-informed alternative to logs.

Beyond that, anything I've wanted to do, I found a way to get it done through Datadog. It allows for testing, logging, hardware monitoring, system performance, memory consumption, advanced observability, AI assistance, cross-team collaboration, and business analytics. Datadog helps some of the world’s biggest brands transform faster with the help of true AIOps, AI-assisted answers, UX and business analytics, cloud observability, and smart AI assistance.

It's all supporting my desire to build a great application, and in a centralized SaaS application, it's hard to say anything can beat it.

What needs improvement?

The storage of logs is a little bit unexpected; most services generate gigabytes of logs, and their size is not excessive. When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself.

For how long have I used the solution?

I've used the solution for one year.

What do I think about the stability of the solution?

We have no concerns with stability.

What do I think about the scalability of the solution?

It appears to be that there are no issues with scaling.

How are customer service and support?

Technical support is slow. It takes forever to get responses from the support team.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I've previously used Kibana and Prometheus. We are still using these.

How was the initial setup?

Setting up through the environment variables made it unbelievably easy to get started.

What about the implementation team?

We've implemented the solution in-house.

What was our ROI?

I do not have this number off-hand, as I am not the finance guy. I just like the product.

What's my experience with pricing, setup cost, and licensing?

I'd advise new users not to start off by sending logs.

Which other solutions did I evaluate?

We did not really look at other options.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.