What is our primary use case?
We wanted the performance assurance because we have seasonal spikes in our volume. One of the use cases was making sure that we could adjust for seasonal spikes in volume.
Another use case was taking a look at how we increase our density and make a more effective utilization of the assets that we have on the floor.
The third use case was the planning, being able to adjust for mergers, acquisitions, divestitures, and quickly being able to separate out the infrastructure required to support that workload.
We just upgraded and are using the latest on-prem version.
We use Turbonomic for our on-prem hosting: servers, storage, and containers. We also use it in Azure. We are trying to use it across multiple hosting environments. The networking team is not really using it. Instead, I am there from a hosting standpoint, where the main focus is on servers and storage, then the linkage to applications with the resources that they are using.
How has it helped my organization?
It integrates into our other tools that we have been able to stitch together. When I take a look at an infrastructure cluster, I can see what applications are running on it. I can see down to the transaction level who is actually causing a performance constraint. We can then go back to our application teams to get that issue resolved.
When I start to take a look at a cluster level, I can look to see which application is running in that cluster. Then, we can get down into specific transactions. We can then watch to see how workload is trending and identify where we may need to add more hosts into the environment. With our transactions, we use Turbonomic linked into AppDynamics. When it links in and pulls the application data, it also helps us dig down. So, if I see my utilization trending up, then is it something on the infrastructure side or the application side? Is it something the application team needs to address? Or, is it something my infrastructure team can address? This allows us to make fact-based decisions.
In our organization, optimizing application performance is a continuous process that is beyond human scale. We would not be able to do the number of actions that Turbonomic takes on a daily, weekly, and monthly basis. It is humanly impossible with the little micro adjustments that it can make. That is a huge differentiator. If you just figure each action could take anywhere very conservatively from five to 10 minutes to act upon, then you multiply that out by thousands of actions every month, it is easily something where you could say, "I am saving a couple of FTEs."
On Windows 2008, whenever we did a large scale OS upgrade, it was kind of taking a look at what resources were allocated to each of the applications and server instances. Then, you basically would replicate that. Being able to use Turbonomic, we have been quickly able to go through and take a look, and say, "Okay, wow. This may have been what was previously allocated to you. We now realize that your utilization doesn't require that level." We are able to actually downsize as we go through and rebuild. This part, the planning aspect, is really good.
One of the things that we completed this year was starting to tag applications so we can pull up more critical applications and take a look at their resources needs. We can have a specific dashboard per critical application.
What is most valuable?
For performance assurance, I love the dynamic resource allocations. We don't have any nuisance performance issues.
When you take a look at the utilization of our resources, it is great that this solution works both on-prem and in the cloud. We have been able to identify some quick saves in the cloud, and then on-prem, with their algorithm. So, we have been able to go ahead and increase our density by about 35 percent, which has delayed purchases of hardware.
Turbonomic provides specific actions that prevent resource starvation. One of the best features about using their algorithm is it can go through and tell me that I have a specific server instance or virtual image that needs either more CPU or memory added, tell us "These are the ones that aren't using the resources." Then, we can decrease the allocations to those server instances. The nice thing about this is we can schedule which of these activities you want Turbonomic to do automatically for us.
Monitoring and thresholds are very reactive, so somebody would have to be sitting there with eyes on glass, taking action. Whereas, with Turbonomic, we now have our thresholds set, and it automatically takes those actions.
The reporting is good.
What needs improvement?
It would be nice for them to have a way to do something with physical machines, but I know that is not their strength Thankfully, the majority of our environment is virtual, but it would be nice to see this type of technology across some other platforms. It would be nice to have capacity planning across physical machines.
For how long have I used the solution?
Between my two companies, I have been using it now for about four or five years.
What do I think about the stability of the solution?
The stability has been wonderful. We have never had any issues.
What do I think about the scalability of the solution?
The scalability is great. There is no problem with scaling.
There are about a dozen people from engineering, operations, and capacity who login and use the data to make decisions. It is a hands-off type of product. You only need a couple of key people from the different use case areas to use it.
How are customer service and technical support?
What is really impressive with the Turbonomic team is that after you sign the deal, they don't disappear. In the two and a half years in my current position, Turbonomic has been right there, whether we have an issue, which is very rare, or we are trying to still complete the objectives of the purchase, such as integrating our use cases. The Turbonomic team is very supportive and hands-on with you. I can't say enough about their customer support because it helps drive the value faster. They are always right there working with my team as part of the team.
Turbonomic is a real partner, which is a really good thing. I have been in IT my whole life, decades, and there are way too many vendors that once you make the sale, that's it. You are now at the bottom of their pile because they are chasing the next sale.
Which solution did I use previously and why did I switch?
Before I came to this company, my previous company was using this tool extensively. At my previous job, I had seen the benefits of the tool. When I came over to this company, it was one of the first things that I started to champion.
I have been with the company for three years, and we have used a tool called VMware DRS. We are a heavy VMware shop, and vROps wasn't anywhere near the level of automation needed. DRS, even though it can do some things automatically, it is all based on data pulled from the night before. We didn't have anything in the environment that could do the real-time automated resource moves, like Turbonomic does.
I think DRS is gone now. The engineering team still uses VMware for a couple of things, simply because that is their preference. vROps is still in the environment, but I would love to get to the point where we can continue showing success with Turbonomic and eventually eliminate vROps.
How was the initial setup?
The initial setup was very straightforward. This is one of the very few tools which we were able to stand up and get it running within weeks.
It is a very simple product to install, then there are just a couple of configurations to tweak. Then, you are up and running. They literally tell you what you need. It's like, "Here are the requirements: You need X number of virtual images - this level." It has very simple instructions. We probably had it installed in one day, then we had everything reporting within a couple of days. After that, we did the tuning, mapping, and everything else. Within 30 days, we were probably getting useful data out of this tool.
What about the implementation team?
We just worked with Turbonomic. Cisco was our reseller, but they actually provide Turbonomic resources.
We have only two people involved with setup and maintenance. I have one main person with a backup person for him. That is how easy it is to set up and maintain. Our future plans are to migrate to the cloud offering probably later this year. Once we do that, that will free up one person.
The main guy is a Windows Server admin who supports the Turbonomic platform, but this isn't his only job. It is something that just takes up a fraction of his time. Once we go to the cloud offering, then the management of the tool goes back to Turbonomic and we will just be a consumer of the data.
What was our ROI?
When I first put the proposal on the table, we put in the proposal that we would get our payback within three years. We got our payback in 15 months. For example, we went through and increased our density, then we were able to delay the purchase of close to 200 servers.
We are very excited about the fact that it does integrate with ServiceNow, our service management ticketing system. It will go out there, and when it says, "I need to add CPU/memory," then it creates the change ticket for us. So, we can have an automated ticket created and get the approvals in place, then it is automatically executed and the ticket is closed off. This saves my team hundreds of actions every year.
When the application starts to see performance degradation, those tickets will go to their queue, but then they will get escalated to me. I can tell you that I have received almost no calls about, "My application is running slow." Before Turbonomic, during the busy season, it seemed like almost every day that I was receiving calls. So, there is definitely a huge drop in, "My performance is running slow," where you would then kind of scramble to find out, "Okay, why is it running slow?"
We use Turbonomic to help optimize our cloud operations and it has reduced our cloud costs. We have been able to identify unattached premium storage, paying for storage that we weren't using. We have also been able to identify instances that were assigned a larger template than was actually needed. So, we were able to then downsize them. This ended up saving us a significant amount of money by rightsizing those instances.
By increasing our level of density, we have been able to delay hardware purchases. So, we have been able to absorb growth without hardware purchases. Without hardware purchases, we also save money on software licensing.
It has allowed us to deploy where our resources spend their time by focusing on other project or high-value activities with the business. There is less firefighting and more project work.
What's my experience with pricing, setup cost, and licensing?
The pricing and licensing are fair. We purchase based on benchmark pricing, which we have been able to get. There are no surprise charges nor hidden fees.
Which other solutions did I evaluate?
We did have to go through and do a comparison of vROps, DRS, and Turbonomic in order for me to get it on board at the company.
The performance assurance and automatic allocations (the automation that comes with it) really drove our decision to go with Turbonomic. They have a level of automation that the competitors don't.
Turbonomic understands the resource relationships at each of the elements of our environment's layers and the risks to performance for each. That is part of what makes them a key differentiator, especially against something like a vROps. Their algorithm is based on: in the moment, what is being used, and what is needed. It will not make an automated move that may cause another issue. Whereas, VMware DRS would move stuff based on data that it had pulled the night before, which may not be valuable or still valid. At that point, you could move something that needed CPU, but you moved it someplace else where now there is a memory constraint instead of a CPU constraint.
A big deciding factor with Turbonomic was you can set how much trending data that you want to keep, whether it is a 30, 60, 90, 120 days, etc. You can set your trending there, then you can schedule your actions based on utilization over that time frame, e.g., the last 90 days.
What other advice do I have?
We are using it mainly to manage the resource utilization for our virtual environment. We are using it for project planning, like the Windows 2008 upgrade with the infrastructure that needs to be built out for that. We are using it to manage our cloud expenses and the utilization within the cloud, which then drives cost reductions there. In the last few months, we started to do the application tagging so we can start to get down to specific application dashboards. This year, we want to start to drive more of the automation to reclaim unused resources, so I can then go ahead and delay further purchases. Our plan is to continue driving up the density of the environment.
Right now, we have certain tasks that get automatically done today. We are working on the piece which does the scheduling, using the change tickets, because we wanted to ensure there was an audit trail so we had an interface with our ticketing system worked out. So, we are getting ready to do that. Adding resources throughout the business day is no big deal, but we want to make sure we don't remove any resources (during the business day). We want to do this during a maintenance window to ensure that there will be no business impact. It is just being ultraconservative and sensitive to the business's needs. As they get more comfortable, we will continue ratcheting up the level of automation that we use.
Everything is very specific with Turbonomic. We can take manual action throughout the day, if we see that it is necessary. We can have Turbonomic take certain specific actions automatically, then we can decide which ones we want to actually schedule so we can link them to approve change tickets.
It will show application metrics and estimate the impact of taking a suggested action from infrastructure resource utilization. I don't know if it will get down into the transaction level performance. I think the new release does that, but we haven't tested that piece out. However, this is the planning piece, e.g., if I were to remove the CPU, what would the performance and utilization look like? Or, in the case of some stuff that I was recently looking at, if I were to add the CPU, what does that do to the overall utilization metrics? You can then decide: Do I want to take that action?
The biggest lesson learnt is probably that people are afraid of change. Our biggest hurdle was putting their faith in automation versus we have always done it this way. We have always been oversized so the application teams would make sure that we never run out of resources, but they needed to be open to change. My favorite analogy that I like to use with them is, "I understand it is hard because instead of you telling me, 'I want this many CPU or this much memory.' I'm telling you trust me." It's like the gas gauge in your car. Don't look at the gas gauge when you get in your car. Just trust me that I have put enough gas in the car for you to get where you are going. It's a very difficult mindset for application teams who are used to saying, "Okay, I have eight CPUs over here. Don't touch them." But, Turbonomic actually gives us the data to show them, "You have eight CPUs over here. You'll never get above 40 percent utilization, so you are costing us money." So, it is fact-based decision-making.
My advice is, "Go for it." Don't let other teams hold you back because this is how they have always done it. Trust the Turbonomic team because they are great at being able to implement, and they are ready to move fast. Make sure you get all the right stakeholders, because we have had to deal with everything from:
- How do we do an internal chargeback?
- The application team's perception that I can't run with anything less than this.
Get ready to be able to put some facts on the table and lean on the Turbonomic team because they are just phenomenal at helping put together business cases and doing the implementation. However, also get ready to tell your people to go for it. Don't be saddled with, "This is how we've always done it," because technology changes. I have seen nothing in my infrastructure career that was as great as this product when it comes to resource utilization.
I would give them a 10 (out of 10). The tool does what it says, and the Turbonomic people don't sell it to you, then disappear. They are always there and a pleasure to work with.