What is our primary use case?
We use it to follow up user experience data. It's all banking applications. For example, when you're viewing your account, you open up your mobile app and the click you do to view your account is measured in Dynatrace. It's stored and we are checking the timing at each moment.
We are also following up the timing differences between our different releases. When we have a new version release, we are already checking within our test environment to see what the impact of each change is before it goes to production. And we follow that up in production as well.
In addition, we are following up the availability of all our different systems.
And root cause analysis is also one of the main business cases.
So we have three main use cases:
- To follow up what's going on in production
- Proactively reacting to possible problems which could happen
- Getting insights into all our systems and seeing the correlation between these different systems and improving, in that way, our services to our end users.
We use the on-prem solution, but it's the same as the SaaS solution that they are offering. They have Dynatrace SaaS and Dynatrace Managed, and our is the Managed. Currently we're on version 181, but that changes every month.
How has it helped my organization?
The dynamic microservices for Kubernetes is really value-added because there is a lot of monitoring functionality already built into Kubernetes Docker. There are also free things like Prometheus which can display that. That's very good for technical people. For the owner of the pod itself, that's enough. But those things don't provide any business value. If you want business value from it, you need to extract it to a higher level, and that's where you need the correlations. You need to correlate what is between all these different services. What is the flow like between the services? How are they interconnected? And that's where Dynatrace gives added value. And the fact is that you can combine these data, which are coming from Kubernetes, and include them in Dynatrace, meaning you have a single pane of glass where you can see everything. You can see the technical things, but you have the bigger business value on top of it, as well.
Before Dynatrace, we were testing just by trying out the application ourselves and getting a feeling for the performance. That's how it very often would go. You would start up an application and it was judged by the feeling of the person who was using it at that moment in time. That, of course, is not representative of what the actual end-user feeling would be. We were totally blind. We actually need this to be able to be closer to the customer. To really care about the customer, you need to know what he is doing.
Also, incidents are resolved much faster by using Dynatrace. And that's for front-end, because we actually know what is going on. But it's also for server-side incidents where we can see the correlation. Using this solution our MTTR has been lowered by 25 percent. It's pinpointing the actual errors or the actual database calls, so it goes faster. But, of course, you still have to do it. It still needs to be implemented. It doesn't do the implementation work for you.
Root cause detection, how the infrastructure components interact with each other, helps. We know what is going wrong and where to pinpoint it. Before, we needed to fill a room with all the experts. The back-end expert would say, "I'm not seeing anything on the back-end." And the network expert would say, "I'm not seeing anything on the network." When you see the interaction between the different aspects, it's immediately clear you have to search in your Java development, or you have to search in your database, because all the other ones don't have any impact on the performance. You see it in Dynatrace because all the numbers are there. It really helps with that. It also helps to pinpoint which teams should work on the solution. In addition to the fact that it's speeding up the process of finding your root cause, it's also lowering the number of people who need to pay attention to the problem. It's just a single team that we need to work on it. All the rest can go home.
It has decreased our mean time to identification by 90 percent, meaning it only takes us one-tenth of the time it used to, because it immediately pinpoints where the problem is.
Dynatrace also helps DevOps to focus on continuous delivery and to shift quality issues to pre-production because we are already seeing things in pre-production. We have Dynatrace in our test environment, so we have a lot of extra information there, and DevOps teams can actually work on that information.
Finally, in terms of uptime, it's signaling whenever something is down and you can react to the fact that it is down a lot faster. That improves the uptime. But the tool itself, of course, doesn't do anything for your uptime. It just signals the fact that it's down faster so you can react to it.
What is most valuable?
The most valuable aspect is the fact that Dynatrace is a correlation tool for all those different layers. It's the correlation from the front-end through to the database. You can see your individual tracks.
One of the aspects that follows from that is the root cause analysis. Because we have these correlations, we can say, "Hey it's going slow on the server side because a database is having connection issues," for example. So the root cause is important, but it's actually based on the correlation between the different layers in your system.
Dynatrace is a single platform. It has all these different tools but they are actually all baked into the OneAgent technology. Within that OneAgent — which is growing quite large, but that's something else — you have the different tool sets. You have threat analysis, memory dumps, Java analysis, the database statements, and so on. It's all included in this OneAgent. So the management is actually quite easy. You have this one tool, and you have server-side and agent-side which are ways of semi-automatically updating it. We don't have to do that much management on it. Even for the quite large environment that we have, the management, itself, is quite limited. It doesn't take a lot of time. It's quite easy.
The solution's ability to assess the severity of anomalies based on the actual impact to users and business KPIs is great. It's exactly what we need. The severity impact is based on the users, the availability, and the impact it has on your business.
We also use the real-user monitoring and we are using the synthetic monitoring in a limited way, for the moment. We are not using session replay. I would like that, but it's still being considered by councils within the company as to whether we are able to use it.
We are using synthetic monitoring to measure the availability of one of our services. It's a very important service and, if it is down, we want business to be notified about this immediately. So we have set up a synthetic monitor, which is measuring the availability of that single service each minute. Whenever there is a problem, an incident will be immediately created and forwarded to the correct person. This synthetic monitoring is just an availability check in HTTP. It's actually a browser which is calling up a page and we are doing some page checks on this page to be sure that it is available. Next to the availability, which the synthetic monitoring gives us, we also measure the performance of this single page, because it's very important for us that this page is fast enough. If the performance of this single page degrades, an incident is also created for the same person, and he can respond to it immediately.
Real-user monitoring is a big part of what we are doing because we are focusing on the actual user experience. I just came from a meeting, 15 minutes ago, where we discussed this issue: a slowdown reported by the users. We didn't see anything on the server side but users are still complaining. We need to see what the users are actually doing. You can do that in debug tools, like Chrome Debugger, to see what your network traffic is and what your page is doing. But you cannot do that in production with your end-users. You cannot request that your end-users open their debug tools and tell you what's going on. That's what Dynatrace offers: insight like the debug tools for your end-user. That's also exactly what we need.
Most of the problems that we can respond to immediately are server problems, but most of the problems that occur, are front-end problems, currently. More and more, performance issues are located on the machine of the end-user, and so you need to have insight into that. A company of our size is obliged to have insight into how its actual users are doing. Otherwise, we're just blind to our user experience.
Dynatrace also provides a really nice representation of your infrastructure. You have all your servers, you have all your services, and you know how they communicate with each other.
What needs improvement?
While it gives you a good view of all the services that are instrumented by Dynatrace — which is good, of course, and that's what it can do — in our case, our infrastructure is a lot bigger than the part that is instrumented by Dynatrace only. So we only see a small part of the infrastructure. There are a number of components which are not instrumentable, like the F5 firewalls, switches, etc. So it gives a good overview of your server infrastructure. That's great, we need that. But it's lacking a bit of network segmentation and switches. So it's not a representation of your entire infrastructure. Not every component is there.
The solution's ability to assess the severity of anomalies based on the actual impact to users and business KPIs is great. In my opinion, it could be extended even more. I would like it to be more configurable for the end-user. It would be nice to have more business rules applicable to the severity. It's already very good as it is now. It is based on the impact on your front-end users. But it would be nice if we could configure it a bit more.
Another area for improvement is that I would like the alerting to be set up a little bit more easily. Currently, it takes a lot of work to add alerting, especially if you have a large environment, and I consider our environment to be quite large. The alerting takes a lot of administration. It could be a lot easier. It would not be that complicated to build in, but it would take some time.
I would also like the visual representation of the graphs to be improved. We have control of the actual measures which are in the graphs, but we are not able to control how the axes are represented or the thresholds are represented. I do know that they are working on that.
For how long have I used the solution?
I have been using the Dynatrace AppMon tool for six years and we changed to the new Dynatrace tool almost three years ago.
What do I think about the stability of the solution?
We haven't had any issues with the stability of Dynatrace, and it's been running for a long time. We use the Managed environment, so it's an on-prem service, but it's quite stable. We are doing the updates pretty regularly. They come in every month but we are doing them every two or three months. First we do them in the test phase and then in the production phase. But we have not experienced any downtime ever.
What do I think about the scalability of the solution?
For us, Dynatrace is scalable and we haven't seen any issues with that. We did need to install a larger server, but that's because we have a managed environment. You don't have that problem if you go with the SaaS environment. We don't see any negative impact on the scale of our products, and we are already quite large. It's quite scalable.
In terms of the cloud-native environments we have scaled Dynatrace to, we are using Dynatrace on an OpenShift platform, which is a Docker Kubernetes implementation from Red Hat. We have Azure for our CRM system, which Dynatrace monitors, but we are not measuring the individual pods in there as it is not a PaaS; it's a SaaS solution of course.
As for the users of the solution, we make a distinction between the users who are deploying stuff and those who are managing the Dynatrace stuff. The latter would be my team, the APM team, and we are four people. The four people are installing the Dynatrace agents, making sure the servers are alright, and making sure the management of the Dynatrace system itself is okay.
The users of the tool are the users of the different business cases. That includes development and business. There are about 500 individual users making use of the different dashboards and abilities within Dynatrace. But we see that number of users, 500, as a bit small. We want to extend that to over 1,000 in near future. But that will take some advertising inside the company.
How are customer service and technical support?
I use Dynatrace technical support on a daily basis. They have a live chat within the tool and that comes for free with the tool itself. All 500 of our users are able to use this chat functionality. I'm using it very frequently, especially when I need to find out where features or functionalities are located within the tool. They can immediately help you with first-line support for the easy questions and that saves you a lot of time. You just chat and say, "Hey, I want to see where this setting can be activated," and they say, "Just click this button and you will be there."
For the more complex questions, you start with tickets and they will solve them. That takes a little bit longer, depending on how complex your question is.
But that first-line support is really a very easy way to interact with these people, and you get more out of the tool, faster.
Which solution did I use previously and why did I switch?
We purchased the Dynatrace product because we had some issues with our direct channels, our customer-facing applications. There were complaints from the customer side and we couldn't find the solution.
There were also a number of our most important applications that needed more monitoring. We had a lot of monitoring capabilities on the server side and on the database side, but the correlation between all these monitoring tools was not that easy. When they came up with a problem they would say, "Hey, it's not the mainframe, it's not the database, it's not the network." But what was it? That was still hard to find out. And we were missing some monitoring on the front-end. The user experience monitoring was lacking. We investigated a number of products and Dynatrace came out as the best.
How was the initial setup?
We kind of grew into Dynatrace. Our initial scope was quite small, so it was not that complex. Currently, our scope is a lot broader, but it is not complex for us because we have been working with the tool for such a long time. Overall, it's quite straightforward. If you're starting with this product from scratch and you have to find out everything, it can take some time to learn the product. But it's quite straightforward.
We started with the AppMon tool, which was the predecessor to the current tool. Implementing that went quite fast because it was a very small scope. When we changed to the Dynatrace Managed it took us half a year. And that's not including the contract negotiations. That was for the actual implementation: Finding out all business cases and all the use cases that we had, transforming them into the new tool, and launching it live for a big part of our company. That took half a year.
What about the implementation team?
We hired some external experts from a company in Belgium, which is called Realdolmen. They really helped us in the implementation. They had experience in implementing Dynatrace for other companies already, so that really helped. And I would advise that approach. If you're doing it all by yourself, you are focusing on what your problems are, while if you are adding an external person to it, who is also an expert in the product itself, he will give you insights into how the product can benefit you in ways you couldn't have imagined.
What was our ROI?
The issue of whether Dynatracec has saved us money through consolidation of tools is something we are working on. There are a number of things that we are replacing now by things that are already present in Dynatrace. If you currently have a lot of different tools, it will save you money. But Dynatrace is not the cheapest tool. Money-saving should not be your first concern if you buy Dynatrace.
It depends on your business case, but as soon as you are at a reasonable size and you have different channels to connect within your company — mobile and web and so on — you need to have a view into your infrastructure and that's where Dynatrace provides real benefits. It's not for a simple company. It's not for the bakery store around the corner. But as soon as you hit a reasonable size, it gives enough added value and it's hard to imagine not having it or something comparable.
"Reasonable size" depends a bit on your industry. But it is connected with the number of customers you have. We have about 25,000 concurrent customers, at a given moment in time. As soon as you have more than 1,000 concurrent customers, you need this tool to have enough analysis power. It gives you power for tracking the individual user and it gives you the power to aggregate all the data, to see an overview of how your users are doing. This combination really gives you a lot of benefits.
What's my experience with pricing, setup cost, and licensing?
It is quite costly. Dynatrace was the most expensive, compared to the other products we looked at. But it was also a lot better. If you want value for your money, Dynatrace is the way to go.
Which other solutions did I evaluate?
In my opinion, the product is extremely good and comparable. We did compare it to AppDynamics and New Relic and we saw that Dynatrace is actually the best product there is. If you are looking for the best, Dynatrace will be your product.
What other advice do I have?
The biggest lesson that I have learned from Dynatrace is that application performance monitoring is very complex, but the easiest part of it is the technical aspect. The more complex thing is all the internal company politics around it. We see a lot of data and if you are targeting some people and say, "Hey, your data bridge is going slowly," they will respond to it very defensively. If they have their own monitoring tools, they can say, "Oh no, my database is going very fast. See my screen is green." But we have the insights. It's all data, and gathering the data is the technical aspect. That's easy. But then convincing people and getting people to agree on what is obvious data is far more complex than the technical aspects.
The way to overcome that is talking. Communication is key.
I'm a little bit skeptical about the self-healing. I have heard a lot about it. I have gone through some Dynatrace instances where they have this self-healing prophecy. I think it's difficult to do self-healing. We are not using it in our company. There is a limited range of problems that you can address with it. It's only if you definitely know that this solution will work for this problem. But problems are always different, every time. And if you have specific knowledge that something will work if a particular problem arises, most of the time you can just avoid having the problem. So I'm a little bit skeptical. We are also not using it because we have a lot of governance on our production environment. We cannot immediately change something in production.
We are using dynamic microservices within a Kubernetes environment, but the self-healing is a little bit baked into these microservices. It's a Docker Kubernetes thing, where you have control over how many containers or pods you want to spin up. So you don't need an extra self-healing tool on top of that.
In terms of integrating Dynatrace with our CI/CD and ITSM tools, we are working on both of those directions, but we are not there yet. We have an integration with our ITSM tool in the sense that we are registering incidents from Dynatrace in our ServiceNow. But we are not monitoring it as a component management system.
We are not doing as much as I would want to for these Quality Gates. That can be improved in our company. Dynatrace could help with that, but I would focus on something else like Keptn, or something else that integrates with Dynatrace, to provide that additional functionality. Keptn would be more suitable for that, than the Dynatrace tool itself, but they are closely linked together. For us, that aspect is a work-in-progress.
I would rate Dynatrace a nine out of 10, because it has really added value to my daily business and what I have to do in performance analysis. It can be improved, and I hope it will be improved and updates will be coming. But it's still a very good tool and it's better than other tools that I have seen.
Which deployment model are you using for this solution?