What is our primary use case?
We are using it to monitor our e-commerce applications and the full stack that our e-commerce applications run on. That includes both our Rack Room Shoes domain and our Off Broadway Shoes domain. We use it to monitor the overall health of the entire stack, from the hardware all the way to the user interface. And more specifically, we use it to monitor the real user's experience on the front-end.
How has it helped my organization?
What Dynatrace has really allowed our team to do is focus more on innovation, rather than on monitoring and bug-squashing. Now that we have a tool like Dynatrace, we can continue to do forward-thinking projects while Dynatrace is doing the monitoring and rooting out the root causes. We're spending a lot less time trying to find out what the problem is, versus letting Dynatrace pinpoint where the problem is. We can then validate and remediate much quicker. That's the impact it's had on our business.
The automated discovery and analysis helps us to proactively troubleshoot production and pinpoint underlying root cause. We recently had some issues with database connections. Our database team was scratching their heads, not really knowing where to look. What we were able to do with Dynatrace, because we had some of the Oracle Insights tools built into the database, was to provide, down to the SQL statement, what queries were taking up the most resources on that machine. We provided that to the database team and that gave them a head-start in being able to refactor the data so it was quicker to query. That really helped us speed up the user experience for that specific issue.
Dynatrace helps DevOps to focus on continuous delivery and to shift quality issues to pre-production. We are just now starting to use it in that way. When we first launched Dynatrace, we only had monitoring in our production environment. At that point we were using it as an up-front, first-alert tool for any issues that were happening. Now what we're doing is instrumenting our lower environments with Dynatrace so that it will allow us to monitor our load-testing in those environments, to find out where our breaking points are. So it does allow us to push out products that are much more stable and much less buggy because we're able to find out where our breaking points are in the lower environments. What this is going to do is allow us to do is push out, at a faster rate, more solid, less buggy releases and customer features, and allow us to continue to innovate on the next idea. We're just starting that journey. We just got fully instrumented in our lower environments in the last couple of weeks.
In terms of 360-degree visibility into the user experience across channels, we're only monitoring our digital channels right now, specifically our e-commerce channels. But we do have ways, even within the channel, to dissect by the source they came from. Did a given customer come from a digital ad? Did they come from an email? Did they come to us direct? It does allow us to segment our customers and see how each segment of customer performs as well. This is important for us because we want to make sure that we're not driving specific segments of customers into a bad-performing experience or to a slow response time. It also allows us to adequately determine where to spend our marketing dollars.
Another benefit is that it has definitely decreased our mean time to identification, with the solution and the Davis AI engine bringing the most probable root cause to the top. And within that, it gives us the ability to drill down into the specific issue or query or line of code that is the issue. So it has saved us a lot of time — I would estimate it has saved us 10 hours a week — in remediating issues and trying to find the root cause.
It has also improved uptime, indirectly. Because it gives us alerts early, we're able to mitigate issues before they're actually bigger issues.
What is most valuable?
The alerting systems are definitely the most valuable feature. The AI engine, "Davis," has proved to be a game-changer for us, as it helps to alert us when there are anomalies found in our applications or in their performance. We find that very helpful. There's still a human element to the self-healing capabilities. I wish I could say, "Oh, it's magic. You just plug it in and it fixes all your problems." I wouldn't say that, but what I would say is that the Davis engine gives us that immediate insight and allows us to cater to our solution so that the next time that problem arises it can mitigate it without a lot of human involvement.
Dynatrace's ability to assess the severity of anomalies, based on the actual impact to users and business KPIs, is really good, out-of-the-box. But it does an even better job when, again, we as humans give more instruction and provide more custom metrics that we're trying to monitor that are key to our business. And then, letting the Davis engine find those anomalies and push them to the top, especially as they relate to business impact, is very valuable to us.
We find the solution's ability to provide the root cause of our major issues, down to the line of code that might be problematic, to be valuable.
And we get a lot of value out of the Session Replay feature that allows us to capture up to 100 percent of our customers' real user experiences. That's helped us a lot in being able to find obscure bugs or make fixes to our applications.
We also use real-user monitoring and Synthetic Monitoring functionalities. We use real-user monitoring for load times, speed index, and overall application index. And we use Synthetic Monitors to make sure that even certain outside, third-party services are available to us at all times. In certain cases, we have been reliant on a third-party service, and our Dynatrace tool has let us know that that service isn't available. We were able to remove that service from our website and reach out to the service provider to find out why it wasn't available.
We also find it to be very easy to use, even for some of our business users. Most of the folks who use the Dynatrace tool do tend to be in the technical field, but use is spread across both the business side, what we call our omni-channel group, as well as our IT group. They all use it for different purposes. I'm beginning to use it on the business side to show the impact that performance has on revenue risk. I can then go back and show that when we have bad performance it affects revenue. And I can put a dollar amount on that. So the user interface is very easy to use, even for the business user.
What needs improvement?
Dynatrace continues to innovate, and that's especially true in the last couple of years. We have continued to provide our feedback, but the one area that we get value out of now, where we would love to see additional features, is the Session Replay. The ability to see how one individual uses a particular feature is great. But what we'd really like to be able to see is how a large group of people uses a particular feature. I believe Dynatrace has some things on its roadmap to add to Session Replay that would allow us those kinds of insights as well.
For how long have I used the solution?
We started using Dynatrace in September of 2017. At that time it was an older product called AppMon. But we quickly upgraded to the current Dynatrace platform the following year. We've been using the SaaS platform ever since.
What do I think about the stability of the solution?
It's been very stable. We've had very little downtime. In the last four years there may have been one outage. Overall, it's been extremely stable. Many times, Dynatrace is our first alert that we have issues with other platforms.
What do I think about the scalability of the solution?
It's extremely scalable. We're one of the small players. We're running with about 70 agents right now. We've been at Dynatrace's conferences and have heard of customers who can deploy 5,000 agents over a weekend and have no issues at all. For our small spec-of-sand space, it's extremely scalable.
We are hosted on Google cloud. That's where all of our VMs are currently set up. Our database is there, our tax server is there. All of our application and web servers are there, and Dynatrace is monitoring all of that for us. We haven't encountered any limitations at all in scaling to our cloud-native environment. We can spin up new auxiliary servers in a matter of minutes and have Dynatrace agents running on them within 15 minutes. We're starting to play a little bit with migrating a version of our application into a Kubernetes deployment and using Dynatrace to monitor the Kubernetes containers as well.
We have plans to increase our usage of Dynatrace. We just recently updated our hosts. We needed to increase the number of host units so that we could put Dynatrace on more servers, and we've already just about used up all of those. So next year, we'll likely have to increase those host units again. And we're going to start using more pieces of Dynatrace that we haven't used before, like management zones and custom metrics.
How are customer service and technical support?
Technical support has been great. The first line of defense is their chat through the UI, which is really simple. They're super-responsive and usually get back to us within minutes. We have a solutions engineer that we can reach out to as well, and they have been very helpful, even with things like setting up training sessions and screen-sharing sessions to help enable our internal teams to be more productive using the tool.
Which solution did I use previously and why did I switch?
We were using a tool called New Relic and we were really just using it as a synthetic monitor to make sure the application was up and running, but we really weren't getting a lot of insights. When we decided that we wanted a tool that could give us more insights and that we needed a tool that could give us the ability to monitor more of our customers' behaviors, there just wasn't another tool like Dynatrace that we felt could do things as well as Dynatrace, through a "single pane of glass." We chose Dynatrace over New Relic at the time because New Relic just didn't have any solutions like it.
We haven't found another tool that can help us visualize and understand our infrastructure, and do triage, like Dynatrace. We haven't found one that can give us that full visibility into the entire stack from VM all the way to the UI. That was really the reason we picked Dynatrace. There just wasn't another tool that we felt could do it like Dynatrace.
The fact that the solution uses a single agent for automated deployment and discovery was the second reason that we chose Dynatrace. The ease of deployment, the fact that we could use the one agent and deploy it on the host and suddenly light up all of these metrics, and suddenly light up all of these dashboards with insights that we didn't have before, made it extremely attractive. It required a lot less on our part to try to do instrumentation. Now, as we add more Dynatrace agents to more of our back-end servers, we think we'll gain even more value out of it.
How was the initial setup?
We started with AppMon, which was more of an on-premise version, where we were installing it, although it still was a one-agent. Then we moved to the SaaS solution, and it was very easy for us to migrate from AppMon to the SaaS solution, and it's been extremely easy to instrument new hosts with the agent.
We were up and running within 30 days when we were first engaged with AppMon. When we migrated to the SaaS solution, it maybe took another 30 days and might have even been less. I wasn't involved with that migration, but I worked closely with the guy who was. I don't remember it taking much longer than 30 days to migrate.
We had an implementation strategy. We knew specifically which application we wanted to monitor, and all of the hardware and services and APIs that that application was dependent on. We went in with a strategy to make sure that all of those things were monitored. And now we've progressed that strategy to start monitoring more of our internal back-end systems as well — the systems that support our stores, not just our e-commerce channel — to see if we can't get more value and maybe even realize more cost savings on our brick and mortar side using Dynatrace.
What was our ROI?
We have definitely seen return on our investment. It has come in the form of being able to produce more stable, less buggy applications and features, and in allowing our team to focus more on innovating new ideas that drive revenue and business, versus maintaining and troubleshooting the existing application.
It hasn't yet saved us money through consolidation of tools, but as we continue to find more value in Dynatrace, it does make us look at other tools and see if we are able to use Dynatrace to consolidate them. We have replaced other application monitoring tools with Dynatrace, but we've not yet consolidated tools.
What's my experience with pricing, setup cost, and licensing?
Whatever your budget is, you can manage Dynatrace and get value out of it, but you need to manage it to what your needs are. That's the one thing we found. We did not budget the right amount to begin with. It has cost us more in the long run than if we would have been able to negotiate it upfront. But we didn't really know what we didn't know until we'd been using Dynatrace for awhile.
Your ability to catch your Session Replay is based on the number of what they call DEM units, digital experience monitoring units. That's where we were short to begin with. There is an additional expense to determining not just the platform subscription but also the number of hosts units that you want to run and the number of DEM units that you need to be able to capture all of the user experiences that you want. In our case, we wanted the ability to capture 100 percent. Maybe in another business someone would only be worried about capturing a sampling of the traffic.
Which other solutions did I evaluate?
We evaluated New Relic, AppDynamics, AppMon, which was the Dynatrace solution at the time, and we also looked at Rigor.
Dynatrace could do pretty much everything. It wasn't just the real-user monitoring piece of it. It was also the full stack health aspect. The Davis AI engine was probably the biggest differentiator among all of the tools. The Davis AI engine and its ability to surface the root cause was a game-changer.
What other advice do I have?
My advice would be to jump all-in. There doesn't seem to be another tool that can do it like Dynatrace, and from what we've seen the last two times we've gone to their Dynatrace Perform conferences, they are dedicated to innovating and adding features to the platform.
We are not yet using Dynatrace for dynamic microservices within a Kubernetes environment. We are beginning to play in that arena. We're looking at tools that will help us migrate from our current VM architecture to a Kubernetes deployment architecture, to enable us to get more into a no-DevOps type of environment. But today, we're still on a virtual machine deployment architecture.
Similarly, we have not integrated the solution with our CI/CD and/or ITSM tools. That is on our roadmap. As we migrate and transition into a no-DevOps and continuous improvement/continuous deployment operation, we'll begin to use Dynatrace as part of our deployment processes.
The solution hasn't yet decreased our time to market for new innovations or capabilities, but we believe that we will realize that benefit going forward, since we'll be leveraging Dynatrace in our lower environments to find out where breaking points are of new features that we release.
We have half-a-dozen regular users who range from our e-commerce architect to DevOps engineers to front-end software developers. My role as a user is more of a senior-level executive or sponsor role. We also have some IT folks, some database administrators and some CI people, but most of our users are in the IT/technical realm.
We don't have a team dedicated to maintaining the solution. We do have a team responsible for it, though. That is the team that just helped instrument our lower environment with Dynatrace. We've got some shared responsibilities and some deployment instructions that are shared across three different groups. They're from IT, our omnichannel group, which is really our business side, and we leverage a third-party for staff augmentation and they use Dynatrace to help us monitor during our off-hours.