What is our primary use case?
We use it for log aggregation.
If you have a large number of devices, you need to aggregate log data to make more sense of it for parsing, troubleshooting, and metrics. This is all we use it for.
If I need to track logs for certain application, I will push all of those logs to Splunk so I can run reports on those logs. It is more about what you are trying to do with it and what you need from it.
How has it helped my organization?
We use it primarily for troubleshooting. We had an issue with SaltStack recently and were able to look for the same log entry on a thousand servers simultaneously, making the process easy.
What is most valuable?
The ability to create dashboards.
You can run reports against multiple devices at the same time. You are able to troubleshoot a single application on a thousand servers. You can do this with a single query, since it is very easy to do.
What needs improvement?
When you get into large amounts of data, Splunk can get pretty slow. This is the same on-premise or AWS, it doesn't matter. The way that they handle large data sets could be improved.
I would like to see an updated dashboard. The dashboard is a little out-of-date. It could be made prettier.
For how long have I used the solution?
More than five years.
What do I think about the stability of the solution?
It's been very stable for us. Most of our stress in not from Splunk, but from disk I/O, like input and output for the disk that you are writing logs to. We have had more issue with our own hardware than Splunk.
You have to make sure if you're writing an enormous amount of data that you have your I/O sorted out beforehand.
What do I think about the scalability of the solution?
It scales fine. We haven't had any issues scaling it. Our current environment is about 30,000 devices.
How was the initial setup?
The integration of this product in our AWS environment was very simple. We just forwarded our logs to it, and that was about it.
It has agent-base log forwarding, so it is very simple, not complicated at all. This process is the same from on-premise and AWS.
What was our ROI?
If you have a large number of servers, even a few hundred servers, then you need to track specific data and log information from a lot of servers. You can either go to each server individually or set up jobs to ship those logs somewhere with rsync or Syslog. The other option is use Splunk and push them all to Splunk, then from Splunk you can just create alerts and run reports against all that data in one place with a single query rather than having to do all that work repeatedly. It saves us a lot of time, just in man-hours, and being able to look at hundreds or thousands of servers simultaneously.
Which other solutions did I evaluate?
Splunk has no real competition. It is just Splunk, and that is it.
What other advice do I have?
Build your environment a lot bigger than you think you will need it, because you fill it up quickly. We log somewhere in the neighborhood of two to four terabytes a day per data center.
We use both AWS and SaaS versions. With the SaaS version, you don't have as much control, but it functions the same, so there is no real difference. Though, the AWS version is probably easier to scale, because it is AWS.
Disclosure: I am a real user, and this review is based on my own experience and opinions.