What is our primary use case?
It's used in two major use cases:
- Monitoring and our own internal IT operations.
- We provide our customers access to Dynatrace tenants so customers can also leverage developing their code running on our platform.
It does full stack monitoring for internal operations, problem diagnostics, APM use cases, and performance management for our customers.
We have multiple instances of Dynatrace running, where about half of them are running in our data centers and the other half are running in the public cloud. Therefore, it's a hybrid deployment. We use a mixture of cloud providers, including AWS, Microsoft Azure (running Kubernetes), and Google Cloud Platform.
We have traditional deployments on VMware virtual machines as well as running stuff in the cloud. We have a couple hundred Kubernetes clusters monitoring using Dynatrace. Dynatrace's functionality in this area is unmatched combined with its full stack visibility, ease of deployment, and completely dynamic changes. The container environments are also dynamic since you have microservices spinning up and down as you go. I have never seen another tool doing this with the same reliability.
How has it helped my organization?
Dynatrace has improved our organization through operational support. We also have a large services organization which directly works with customers, and sometimes you run into situations where customers ask how they can improve their applications. Traditionally, these service teams would go for assessments. Eventually, they would even go onsite and through performance workshops with them to find some low hanging fruits that could address, and this was very tedious work. By introducing Dynatrace, you suddenly have real-time data. Then, the process of doing performance reviews switches from workshops or a defined time frame analysis (and then taking actions) to a more continuous approach where you constantly have Dynatrace performance data of the landscape.
Service engineers save a lot of time because they can just go in look at the data and share it with the customer, who has the same view, and say, "Here's an improvement which can be immediately implemented." It's not like a collection of big, multiple findings that are consolidated into one results presentation, then the customer needs to do something. It's more like a continuous performance analysis and improvement process, which is more efficient than those workshops approaches. That's one of the biggest of the advantages that our services team sees because it helps DevOps to focus on continuous delivery and shift quality issues to pre-production.
Dynatrace is tightly integrated with ITSM. It's integrated with ServiceNow, which our support team is using.
We provide a platform, then the customer ships the code and deploys it. Therefore, we rely on testing by the customer, and sometimes, they miss something and it breaks. Then, it doesn't work as expected so we have to step in, and say, "Yes, your site is down," or "It's not functioning properly." We do the analysis because typically the customer says, "Okay, it's not us. It must be you as the service provider." This is where we gain a lot of efficiency. The support team is the first line of defense there. They get the information to determine if they are able to quickly pinpoint the problem. E.g., the customer deployed, then two hours later, issues were occurring. This is when you don't want to waste time. Our support engineers need the visibility so they can immediately be able to communicate to the customer, saying, "Yes, it's on our side," or "It's on your side." If it's on the customer's side, they can let them know exactly where they need to go. This is where we gain most of the time.
It helps our operations that the solution uses a single agent for automated deployment and discovery. If you think about all the work in the past where we had different agents, tools, or scripts deployed to monitor specific aspects of an environment and different tools, then having one agent definitely helps. For example, for our rollout, when we migrated all the different tools to Dynatrace, we did this over the weekend. We installed the agent, then just watched the data and findings coming in, which was a huge benefit. We installed one thing an it discovers everything.
I suppose the solution has decreased time to market for our individual customers with new innovations/capabilities. Dynatrace helps them gain better insights, allowing them to do another deployment faster.
What is most valuable?
It has auto detection of almost everything. The full stack capabilities to get one agent deployed allows you not to worry about anything else because the agent detects everything. This is in combination with the AI so you don't need to worry about any baselines or setting up any thresholds. This is all done automatically, which brings us the biggest benefit.
Configuration as code integrating through APIs is really important when automating at scale. If you think about the tens of thousands of hosts that you deploy to, then APIs are key when automating deployments, the management of those instances, and configuration as well as integrating with other systems without sophisticated or far reaching APIs.
Dynatrace easily integrates with our infrastructure or applications, then reliably triggers self-healing actions or remediation actions. This is something that we really love to use because it definitely removes a lot of human interaction. You just let the machine to do the job and can trust it, and that's the most important. I have seen systems where the users were very reluctant to trust the system to take actions where typically a human would do the job manually. Dynatrace considers all the information that it gathers, then triggers self-healing actions which are quite reliable. It doesn't need a lot of human adjustment to make it work.
We use real-user monitoring a lot to get insight into end users and our customers, e.g., customer behavior.
What needs improvement?
While the integrations are great, sometimes our customers are not as far as long in Dynatrace concepts from a technical perspective as they need to be, whether it's a cultural thing and educational thing. Thus, some of our customers are not as advanced as Dynatrace would like them to be. From a technical perspective, all the capabilities are there but the concepts are not yet spread out within the ecosystem to their fullest extent. Therefore, Dynatrace is ahead of its time.
Documentation could be improved. E.g., you don't know how to properly use Dynatrace because documentation is almost lacking behind the features being deployed.
On very large deployment scenarios, the APIs for configuration and configuration management came in slowly. This is something that is good already but could be better.
In the product, I am missing some configuration automation APIs.
For how long have I used the solution?
The company has been using Dynatrace on different occasions for the past eight years. The current product of Dynatrace has only been out for four years.
What do I think about the stability of the solution?
We operate services for our customers with pretty high SLAs. We guarantee the systems we run are reliable. We also guarantee uptime. In the past three years, we have run up to 50 updates with Dynatrace and had only one or two issues where the system had to be brought down. There are almost no issues at all with stability. It is rock-solid.
They are improving constantly with every release and adding new stuff. We have updates about every two weeks.
What do I think about the scalability of the solution?
We have about 2,500 people using it.
We currently manage seven Dynatrace clusters with several thousand Dynatrace tenants, then in total almost 30,000 hosts are monitored with Dynatrace. We're not reaching the limits of Dynatrace's scalability. This is probably one of the largest deployments, but we have not seen any limitations so far.
We want to leverage even more services:
- Real-user monitoring
- Possibly look into session replay.
- Expand the footprint of synthetic monitoring.
- Build more integrations by leveraging all the data Dynatrace captures for custom metrics into our BI reporting, billing systems, internal cross charging functionality, and scaling/optimizing our environments in terms of resource usage.
There is a lot of data in Dynatrace at the moment that we do not fully utilize.
How are customer service and technical support?
The technical support is great. We have a pretty good contract with Dynatrace for contacting support. They are pretty responsive and very knowledgeable. You get a DevOps engineer from Dynatrace jumping on immediately with very high expertise. You don't get the typical Level 1 automated standard reply: "Yes, we will take care of it," but then you have to ping back.
Which solution did I use previously and why did I switch?
We came from a former product of Dynatrace, which was called AppMon, and not really sold anymore. Though, there are customers who still use it out there. We used it for the traditional APM scenario, then migrated to Dynatrace to extend the visibility for hybrid cloud deployment.
We had been using a mixture of Opsview, Splunk, SolarWinds, and other tools. We switched because of the complexity of managing all these tools. It became unmaintainable. E.g., historically, people would write scripts for Nagios Opsview, then maintain them. If we lost the people who had been maintaining those scripts, then nobody knew how the checks worked for those custom scripts. Also, the maintenance overhead was pretty high.
From the perspective of the end users using different monitoring solutions, you had different teams who had to go to different tools and contend with data in one tool not being exactly the same data as another tool. While the overlap between tools was there, the complexity in accessing those tools and knowing how to use those tools became a big organizational and maintenance overhead that we decided to pull them all into one tool to harmonize it. We wanted one tool where the interface and data are the same regardless of whatever you're monitoring.
How was the initial setup?
The initial setup was straightforward. We looked into Dynatrace and were able to roll it out to 12,000 hosts within four weeks.
From the Managed version, you can have it installed and up and running in less than an hour. This is on the condition that you have the hardware to install it on and access to the systems/services that you want to monitor.
Initially, some people were skeptical about the one agent really working, so we did test it. Now, we have had so many good experiences that when we deploy, build new services, or spin up new instances, Dynatrace is one of the first things that is always there. We don't even even test the agents anymore. We completely rely on this mature product that is solid and stable when we deploy staging, development, QA environments, or playgrounds. There is no deployment without Dynatrace agents.
What about the implementation team?
We deployed Dynatrace ourselves as we have a lot experience working with it. Deploying Dynatrace depends on the environments that you run it on. Since that was all orchestrated with things like Puppet, Chef and Ansible for us, it just was a matter writing a bit of automation code that it wasn't already in place. One person was needed to do this properly, and it is not that hard of work because it applies to almost every environment that we deploy. For new services that we provide, it's done within the development teams writing those services. Therefore, there is no dedicated Dynatrace team responsible for integrating Dynatrace with services.
There is almost an API for everything. If you run it Managed, this means you have to administer Dynatrace's installation yourself. You run it and take care of some prerequisites, like sizing. Any system updates, back fixes, or upgrades to the whole cluster have almost zero maintenance. All you need to do is confirm it or let Dynatrace update itself. In the past three years, we had almost 50 updates or installations where we didn't even need to touch anything. We just had one or two occasions where an update broke functionality, and those were fixed with next update and within hours. It's almost self-maintaining.
We do have a dedicated staff for maintenance, but this team is not spending a lot of time on actually managing Dynatrace. They do the integrations of Dynatrace and other tools as well as development of custom integrations and configurations. This team is also responsible for the infrastructure and ensuring the machines Dynatrace runs on are scaled or adjusted properly. However, this is minor effort for them. We have a dedicated team of 20 to 30 SRE engineers and their responsibility is not only to Dynatrace. They are responsible for the whole infrastructure and surrounding tools.
What was our ROI?
As we use it internally, our internal operations have gained a lot more efficiency. The time to resolution and triage problems in different environments has been reduced by 50 percent, if not more. When Dynatrace raises a problem, the team does not need to bring together experts from other teams to look at the problem, log files, etc. You almost have Dynatrace training our support engineers because it's so easy to pinpoint the root cause of problems.
The solution has decreased our mean time to identification by approximately 50 percent.
There has been a positive impact on the instances run for our customers. Overall, uptime got better because we became faster at fixing the problems causing downtime.
The solution has saved us money through the consolidation of tools. With a hybrid landscape, we had multiple tools. When we consolidated, we removed four or five other monitoring tools with one. For the last ROI calculation that I did, Dynatrace was saving us up to $500,000 per year.
In addition, our speed is up 40 to 50 percent. Therefore, our human cost and licensing savings together are one to two million.
What's my experience with pricing, setup cost, and licensing?
We are a very big customer. We obviously have a special price point.
If there are no corporate requirements to run Dynatrace Managed (operating it yourself), I would definitely go for the size option. For small and medium-sized companies, the size option is probably the cheapest one. You don't need to look into operating it. You don't need to run hardware. It is pay as you go.
We looked into what can Dynatrace could actually replace. If the price point is high, think about the impact it would have to the entire organization to constantly replace monitoring tools. If implemented correctly, then it has a lot of saving potentials for the organization. That is something that should go into any ROI calculation.
Which other solutions did I evaluate?
We looked at the other big player in this space: New Relic and AppDynamics. Looking at the cloud, full stack capabilities, ease of deployment, and scalability that Dynatrace has, they definitely stood out in comparison. The full stack story was pretty compelling, where you have one agent deployed and it provides everything.
What other advice do I have?
Trust what it's doing. Don't question what it's doing. If you don't understand it yet, take the time to try to understand it. Do not implement or force the old ways of monitoring onto a completely different approach, like Dynatrace. That's definitely that the biggest lesson a lot of people in our organization had to go through.
Be curious and embrace the different approach. It is definitely worth it. The different approach that it does is a good one. It's different but it's something that actually works. Those guys know what they have built and what they are doing.
It is partly integrated with CI/CD. We are operating a platform with our applications, but our customers are responsible for testing and CI/CD deployed into our environments. Internally, some of our teams use it. The majority of our CI/CD deployment is our customers' responsibility, and while we do provide them Dynatrace for CI/CD, we do not control how they integrate it.
We are in the process of rolling out synthetic monitoring at scale to replace other tools.
We are not yet using session replay, which is mostly due to data compliance restrictions. We have very hard data privacy protections. We do have customers who are highly interested in using the feature, but we are not using it at the moment.
Overall, I would give the solution a clear 10 (out of 10).
Which deployment model are you using for this solution?