What is our primary use case?
We have several uses for Dynatrace. Most of the time, we use Dynatrace for looking into potential site problems, investigating reported issues, and trying to replicate those problems in a test environment using the information provided by Dynatrace.
We use Dynatrace for performance monitoring. Quarterly, we will specifically see if there's anything that we can optimize on the front-end of our website, so that's what you see and interact with on the web page.
We also use it to get ahead of any potential problems in our stack. E.g., if Dynatrace is indicating a problem, we will look into it and determine if it's affecting users. Depending on its impact, and usually if it's impacting customers, we can use that information to decide on what we need to work on next to benefit the customer experience.
I use the tool as more of an analyst. I will use Dynatrace to show where systems need to be fixed, etc.
This solution is SaaS. We use Google Cloud Platform, where we just use their compute engines as far as our hosts. We also have a few services that are on-prem. Dynatrace works fine with both of them.
How has it helped my organization?
The solution helps our DevOps to focus on continuous delivery and shift quality issues to pre-production. We recently got a staging environment implemented with Dynatrace. We are mainly using it for load testing at the moment. Dynatrace has been detecting failures, letting us know immediately what types of failures are occurring so we can catch them before releases. Our developers have been able to identify bottlenecks and other types of problems that they would not have been able to before by just using standard logging and analytics tools.
The solution give us 360-degree visibility of the user experience across channels, which is a great benefit. We're in eCommerce as a retailer. We are selling across multiple channels and platforms. We have a mobile app and a website. We even have other services which we may instrument with Dynatrace in the future. As far as our website and mobile app that we have instrumented with Dynatrace, it has all been very positive.
The solution has decreased our time to market with new innovations/capabilities because we have been able to quickly identify areas that we can improve for new features and gather that data from Dynatrace. Then, we have been able to verify that our new features and releases are working as expected.
What is most valuable?
The User Sessions Query language has definitely been the most helpful with its key user actions and user session properties. Using those together, that has completely transformed how we're able to identify customers and their problems on our site. It has made a very big impact over the year.
Using synthetic monitors, we monitor our websites. We have two main domains. There are several plain HTTP monitors, then there are actual browser based monitors that emulate browser behavior. We use both of those types. We have several mobile browsers emulated under synthetic monitors that we use. Those ping our website every 15 minutes. On some of these synthetic monitors, we use multiple data centers to get an idea of geographic availability. We also monitor some of our third-party providers using our synthetic monitors. We monitor our customer support live chat server, which is hosted by a third-party, where we are given alerts if that system were to go down. We are also monitoring an email capture API that's a part of our website.
With user session queries, the main thing that we use that for (and the most valuable), is when we get a problem. If we get some type of a report, obscure problem, or Dynatrace reports a problem, we go straight to using the User Sessions Query Language to find sessions with Session Replay, then we replay those sessions to figure out exactly what the customer did and what conditions may have caused the problem to gauge the impact of the problem itself.
We also save user sessions queries into dashboards, then create different dashboards based on different projects to try and gather data. E.g., last year, we redid a part of our website and used Dynatrace sessions queries and Session Replays to verify that our customers were not having any problems or being confused by their experience. We wanted to verify that, which is one way that we've used the User Sessions Query Language along with the dashboards. We've also created some other dashboards that return custom metrics for us, which goes along, in some cases, with user session properties and user action properties. In that way, we're able to get a very granular look at certain statistics where it would be more difficult to get those numbers from our traditional analytics suite.
What needs improvement?
The solution’s ability to assess the severity of anomalies based on the actual impact to users and business KPIs is a bit off. I have found that even though Dynatrace detects a problem and gives you a count and estimate of impacted users, this number is usually much higher than is actually the case and not fully accurate. E.g., I recently noticed an error. Every time someone would experience this error, Dynatrace would create a new problem and it would say, "Several hundred people were impacted." However, using Dynatrace's own tools (user Session Replay), then going back and actually tracing through these requests, we found much fewer people were actually impacted. In some sessions that Dynatrace said were impacted, when you view the Session Replay videos, you could see that the customer was not impacted in any meaningful way.
The solution’s ability to visualize, understand our infrastructure, and to do triage is helpful. I wish that you could do user session queries with those host level metrics and be able to create custom graphs the same way you could with user session data. They're both part of Dynatrace, but they don't feel like they're integrated together well. E.g., we're having an issue that has to do with just HTTP codes and we would like to marry that up with a user session query turning that into a dashboard. We can't currently do that because the User Sessions Query Language does not have access to the HTTP errors or HTTP status code data that is part of the hosts and infrastructure package. Otherwise, if you're just focusing on the infrastructure part it, I think it does a good job.
For how long have I used the solution?
I have been using Dynatrace since February 2019.
What do I think about the stability of the solution?
I have noticed a few times where data collection did get interrupted. It was two or three times within the past year. Obviously, it's our monitoring system and we don't want that to go down at all. However, three times for no more than 30 minutes each time is pretty good.
What do I think about the scalability of the solution?
The scalability has been able to meet all of our needs. We have not encountered any limitations when scaling Dynatrace with the Google Cloud Platform.
In the past 365 days, we have two websites that we monitor with Dynatrace, including mobile apps. We've recorded over 23 million sessions for Rack Room Shoes and 8.1 million sessions for Off Broadway Shoes.
There are three users who are active users of Dynatrace:
- The user experience architect, who is designing new interactive features and studying customer behavior
- The product owner, whose focus when using Dynatrace is on the metrics, dashboards, and the user experience as far as using user sessions, queries and Session Replay. They may troubleshoot or look into problems as well.
- The back-end architect, who looks into certain problems and figures out with Dynatrace where they're coming from. They use information from Dynatrace for writing more detailed support tickets.
How are customer service and technical support?
I have noticed a few problems with the service before. I reached out to support and the system did appear to resolve itself on its own (after there was a problem). Then, the support staff couldn't see any further issues. The solution’s self-healing functionality works.
The technical support is below average. They've solved some of the problems that we had, but it took several weeks to resolve almost each problem we had when they probably should have been fixed within a day or two.
Which solution did I use previously and why did I switch?
There was an initial implementation of AppMon (another Dynatrace offering) before the current Dynatrace SaaS offering.
Dynatrace has definitely made an impact. We were never able to get granular data with any of our other solutions. They were all very disconnected and separate, whereas Dynatrace seems to have good integrations with our entire stack. There haven't been any problems getting additional data now that we have Dynatrace,
How was the initial setup?
It is very easy to use and set up. It did take some customization to get it working for our sites, but after that, it's been pretty easy and straightforward.
The initial setup is complicated, but it's much less complicated than similar systems that I have used in the past. For Dynatrace's setup, maybe there were problems with how our web application was initially developed before I joined Rack Room, because there were a lot of features related to error reporting. It would report errors for things that weren't actual problems, etc. You have to configure it to get around those types of problems, but it's usually fine afterwards.
Over the past year, we've been tweaking Dynatrace. It's been a slow phase-in rollout as far as how much we rely on the data it's giving us back.
What about the implementation team?
I was involved in the initial implementation.
What was our ROI?
The solution has decreased our mean time to identification by about three days.
The solution decreased our mean time to repair by around a week.
There has been a huge increase in uptime. It's hard to say by how much for certain because we've made other development practice changes.
What other advice do I have?
It is a great platform. We found a lot of value in setting up user session properties and user action properties, then being able to use them to identify individual problems/customers. We use that to sort of streamline the whole process of finding and fixing problems.
Biggest lesson learnt: Customers do not always behave as expected.
I would rate Dynatrace as an eight (out of 10).
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?