What is our primary use case?
We use it for application log monitoring.
It is a logging product. Our application generates log files, then we upload them to Splunk. We run their agent on our EC2 instances in AWS, then we view the logs through their product, and it is all stored on their infrastructure.
How has it helped my organization?
We have used the alerts for a lot of things. They gave us the ability to kind of make an alert simply. So, we did one for SQL injection. We also had some services which were problematic that would fail, but we figured out what log line that we could look for, so it was easy to make an alert for that.
What is most valuable?
Its usability is the best part. It is easy for our developers to use if they want to search their logs, etc.
What needs improvement?
A problem that we had recently had was we licensed it based on how much data you upload to them every day. Something changed in one our applications, and it started generating three to four times as many logs and. So now, we are trying to assemble something with parts of the Splunk API to warn ourselves, then turn it off and throttle it back more. However it would be better if they had something systematically built into the product that if you're getting close to your license, then to shut things down. This sort of thing would help out a lot. It would help them out too, because then they wouldn't be hollering at us for going over our license.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
Stability has been great. I don't think we have ever had an outage from it.
We don't do a lot of searching. If there is somewhere with problems, it will probably have to be with a lot of searches, and we don't have that. We don't have many developers searching every day. It is mostly when there is a problem, then we use it for diagnostics. So, we don't put a large search load on it. However, the reliability of it has been great. It hasn't been down for us at any point.
What do I think about the scalability of the solution?
It seems to have worked out great. We haven't had any problems yet.
How are customer service and technical support?
I haven't used the technical support.
Which solution did I use previously and why did I switch?
Before Splunk, we used Kibana and Elasticsearch. Sometimes, with them, logs wouldn't even be there. We have received an infinite time reduction there. We couldn't use what we had before, so Splunk being there and working does a lot.
How was the initial setup?
The integration and configuration with the AWS environment was easy. They had the documentation. All we had to do was get their agent running on our EC2 instance, and their documentation was good for that. It worked, which was great.
The product is also integrated with PagerDuty, Slack, and AWS. Those integrations are good and seamless.
What was our ROI?
It has made life easier for us through use, then by troubleshooting problems. It reduces the cost of the intangibles.
What's my experience with pricing, setup cost, and licensing?
The pricing seems good relative to the other vendors that we have had here. However, they need to find ways to be more flexible with the licensing and be able to deal with situations where we start generating more logs. Maybe having some controls in the Splunk interface to turn it off, so we don't have to change anything in our application.
We have an existing contract with Splunk, so it makes sense to stay with them for now. Our license is for a 100 GB/logs a day.
Which other solutions did I evaluate?
There are a lot of vendors in the space at the conference this year. Therefore, we probably talked to six or seven different ones, and the market seems to be consolidating. The market's metrics and log monitoring all seem to be rolling up into a single provider. It looks like that is what will be happening in the next few years.
Right now, there are a ton of different smaller providers doing little pieces of this and that. All the big players, like Splunk, New Relic, and Datadog, seem to be rolling them all up into one offering.
What other advice do I have?
Implement something and watch how much data you are sending to it, then have some way to shut it off without redeploying your app in case things get hairy.
We use the cloud version of the product.
Which version of this solution are you currently using?