Are event correlation and aggregation both needed for effective event monitoring and SIEM?
Both are techniques aimed at reducing the number of active alerts an operator receives from the monitoring tool.
I don't fully agree with the previous descriptions of correlation and aggregation, welcome though they are.
Let's take a typical scenario. Assume a network interface on a large switch fails to result in many systems experiencing a failure. In the 'raw' state, i.e. with no correlation or aggregation, the monitoring system would receive potentially thousands of events - possibly multiple SNMP traps from other network devices or servers, event logs records from Windows servers, Syslog entries from Linux, errors from the database management system, errors from web servers relying on that database and probably lots of incidents raised by users on the help desk. Good correlation algorithms will be able to distinguish between "cause" alarms and "symptom" alarms. In this scenario, the "cause" is the failing network switch port and the symptoms are the database failures and log file entries. Simplistically, fixing the cause will also address the symptoms.
Typically, aggregation is used to "combine" events into a single alarm. Again there are multiple methods to do this. A simple one would be - as previously described - duplicate reduction. In a poorly configured monitoring environment every check that breaches threshold results in an alarm. If monitoring is granular, say every 30 seconds the CPU utilization is measured and an alarm raised if it exceeds 80% then very quickly the operator would be overwhelmed by many meaningless alarms - especially if the CPU is doing some work where high CPU usage is expected. In this case, handling 'duplicates' is helpful when helping operators identify real issues. In this case, it may be enough to update the original alarm with the duration of the threshold breach.
There are many techniques for aggregation and correlation beyond identifying cause and symptoms events or ignoring duplicates. For instance, Time based event handling. Consider a scenario where an event is only considered relevant if another event hasn't happened in a given timeframe before or after the focus event. Or a scenario where avent aggregation occurs based on reset thresholds rather than alarm thresholds.
There are also some solutions that purport to intelligently correlate events using AI. Although, speaking personally, this seems more marketing speak than a one-click feature. In reality, these advanced (i.e. $$$$$$) solutions need to maintain a dynamic infrastructure topology in near-real-time and map events to service components in order to assess root cause correlation. In the days of rapidly flexing and shrinking infrastructures, cloud services, and containerization, it is extremely difficult to maintain an accurate, near-real-time view of an entire IT infrastructure from users through to lines of application code. A degree of machine learning has helped, but the cost-benefit simply isn't there yet for these topology-based event correlation features.
Agree on all the answers posted here, and I especially like Dave's explanation on the more advanced solutions available on the market. Excellent call outs on the need for deep & well maintained relationship mapping to enable an AI's algorithm to connect-the-dots between aggregated alerts firing from multiple separate source tools. Having a mature ITSM implementations with CI-discovery, automated dependency-mapping, and full integration between your correlation engine & CMDB will help too.
Other answers are pretty much sum this up but there is one important point to make. In some technology it's important to take into account the number of events that got are aggregated and for your sim device to be able to treat them as individual events for the purpose of correlation.
As previously mentioned, Correlation is the comparing of the same type of events. In my experience, alerts are created to notify when a series of these occurs and reaches as the prescribed threshold.
Aggregation, based on my experience, is the means of clumping/combining objects of similar nature together and providing a record of the "collection"; of deriving group and subgroup data by analysis of a set of individual data entries. Alerts for this are usually created for prognostication and forecasting. Often the "grouping" is not detailed information so there is a requirement for digging into the substantiating data to determine how this data was summarized.
Alerts/Alarms can be set for both, but usually only for the former and not the latter.
You can not process and generate advanced correlated alerts without aggregation: limiting your correlation to one set of source will let your SIEM blind and unaware
of global context.
So yes, to get an 'EFFECTIVE' event monitoring with the goal to correlate them, you need to aggregate many different sources.
"Aggregation is a mechanism that allows two or more events that are raised to be merged into one for more efficient processing" from https://www.ibm.com/support/knowledgecenter/SSRH32_3.2.0/fxhdesigntutorevtaggreg.html
"Event correlation takes data from either application logs or host logs and then analyzes the data to identify relationships. " from https://digitalguardian.com/blog/what-event-correlation-examples-benefits-and-more
So yes you need both for siem. For simple monitoring you dont. Theres a big difference between what a siem does and that what simple event monitoring does.
Simplly : Correlation is the process to track relation between events based on defined conditions. Aggregation is nothing but to aggregate similiar events. Both are required for effective monitoring.
Aggregation is taking several events and turning them into one single event, while Correlation enables you to find relationships between seemingly unrelated events in data from multiple sources and to understand which events are most relevant.
Both Aggregation and Correlation are needed for effective event monitoring and SIEM; In Enterprise Security (ES) correlation searches can search many types of data sources, including events from any security domain (access, identity, endpoint, network), asset lists, identity lists, threat intelligence, and other data in Splunk platform. The searches then aggregate the results of an initial search with functions in SPL and take action in response to events that match the search conditions with an adaptive response action.
Aggregation example - Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network. When you apply aggregation to a Stream, only the aggregated data is sent to indexers. Using aggregation can help you decrease both storage requirements and license usage. Splunk Stream supports a subset of the aggregate functions provided by the SPL (Splunk Processing Language) stats command to calculate statistics based on fields in your network event data. You can apply aggregate functions to your data when you configure a stream in the Configure Streams UI.
Correlation example - Identify a pattern of high numbers of authentication failures on a single host, followed by a successful authentication by correlating a list of identities and attempts to authenticate into a host or device. Then, apply a threshold in the search to count the number of authentication attempts.
Aggregation is a combining of the same events into one. So, we can reduce the number of events.
Correlation is an identification of patterns or interdependencies between various events.
Both processes are needed for effective SIEM. The first of them can be replaced by highly scalable computing architecture. And without the second one, we will not have the main analytical purpose of SIEM.
Correlation is a method of consolidating same type of security events from multiple sources and generating single alarm or alerts to reduce multiple security events from each of the devices of similar types. whereas Aggregation is the process generating an alarm after multiple occurrences of an event take place, usually within a fixed timeframe. Example 10 failures login on device happen in 60 seconds will generate single alarm once not on each individual failure.
Event correlation is an analytical process that looks for trends, patterns, thresholds, or sequences of events in your data. Even when they may not be the same event type (ex: a VPN authentication event followed by a door badge access event in a different location). There are many different ways to do this, including the very common real-time stream of events flowing through an analysis engine or batch queries for specific event sequences.
Aggregation is simply the process of collecting log data together in one place so that you can search and analyze it. Think of aggregation as centralized storage, and correlation as analysis. One note: some SIEM and LM vendors also use the term aggregation to mean consolidation of identical events separated only by time. For example, if you fail login 5 times in a row from the same source IP and same username all within one minute, some products "aggregate" that into a single event that shows the same details but a count and start/end time. That's done mostly for storage
optimization but it actually limits the chain of custody by taking away the legally useful original events.
What features should companies look out for when selecting an event monitoring tool?