Apache Spark vs Azure Stream Analytics comparison

Cancel
You must select at least 2 products to compare!
Comparison Buyer's Guide
Executive Summary
Updated on Sep 5, 2022

We performed a comparison between Apache Spark vs.Azure Stream Analytics based on our users’ reviews in five categories. After reading all of the collected data, you can find our conclusion below.

  • Ease of Deployment: U users note that both products are very straightforward and simple to set up.
  • Features: Users of both products are generally happy with their flexibility, stability, and scalability. Some Azure Stream Analytics users noted issues with stability.

    Apache Spark users note being particularly satisfied with its AI libraries and batch processing, but that there’s a learning curve to using it and that its stream processing needs to be developed more.

    Azure Stream Analytics users say they’re impressed with the solution's UI, real-time analytics, and its deep integration with other Azure products. Some users mention issues when connecting to Microsoft Power BI and would like to see clearer metrics.
  • Pricing: Apache Spark is an open-source product. You have to pay only when you use any bundled product, such as Cloudera. Azure Stream Analytics users say that the solution is fairly priced and is cheaper than its biggest competitors.
  • ROI: Apache Spark users make no mention of ROI. Azure Stream Analytics users mention being pleased with the ROI.
  • Service and Support: Because Apache Spark is open-source, they do not offer support. Azure Stream Analytics users report excellent service and support.

Comparison Results: Apache Spark and Azure Stream Analytics come out about equal in this comparison. Some users are more satisfied with Apache Spark’s stability, and pricing, but Azure Stream Analytics has an edge when it comes to ROI and technical support.

To learn more, read our detailed Hadoop Report (Updated: March 2024).
765,234 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The most valuable feature of this solution is its capacity for processing large amounts of data.""The distribution of tasks, like the seamless map-reduce functionality, is quite impressive.""Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.""I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten.""The main feature that we find valuable is that it is very fast.""It is useful for handling large amounts of data. It is very useful for scientific purposes.""I feel the streaming is its best feature.""It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained."

More Apache Spark Pros →

"I like all the connected ecosystems of Microsoft, it is really good with other BI tools that are easy to connect.""The most valuable features of Azure Stream Analytics are the ease of provisioning and the interface is not terribly complex.""We use Azure Stream Analytics for simulation and internal activities.""I like the way the UI looks, and the real-time analytics service is aligned to this. That can be helpful if I have to use this on a production service.""I appreciate this solution because it leverages open-source technologies. It allows us to utilize the latest streaming solutions and it's easy to develop.""The solution's most valuable feature is its ability to create a query using SQ.""The solution has a lot of functionality that can be pushed out to companies.""We find the query editor feature of this solution extremely valuable for our business."

More Azure Stream Analytics Pros →

Cons
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it.""Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn.""Apache Spark's GUI and scalability could be improved.""I would like to see integration with data science platforms to optimize the processing capability for these tasks.""Apache Spark should add some resource management improvements to the algorithms.""The solution needs to optimize shuffling between workers.""This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed.""The logging for the observability platform could be better."

More Apache Spark Cons →

"The solution’s customer support could be improved.""The only challenge was that the streaming analytics area in Azure Stream Analytics could not meet our company's expectations, making it a component where improvements are required.""The solution doesn't handle large data packets very efficiently, which could be improved upon.""Early in the process, we had some issues with stability.""The solution could be improved by providing better graphics and including support for UI and UX testing.""Sometimes when we connect Power BI, there is a delay or it throws up some errors, so we're not sure.""There may be some issues when connecting with Microsoft Power BI because we are providing the input and output commands, and there's a chance of it being delayed while connecting.""The collection and analysis of historical data could be better."

More Azure Stream Analytics Cons →

Pricing and Cost Advice
  • "Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
  • "Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
  • "We are using the free version of the solution."
  • "Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
  • "Apache Spark is an expensive solution."
  • "Spark is an open-source solution, so there are no licensing costs."
  • "On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
  • "It is an open-source solution, it is free of charge."
  • More Apache Spark Pricing and Cost Advice →

  • "The cost of this solution is less than competitors such as Amazon or Google Cloud."
  • "We pay approximately $500,000 a year. It's approximately $10,000 a year per license."
  • "I rate the price of Azure Stream Analytics a four out of five."
  • "The licensing for this product is payable on a 'pay as you go' basis. This means that the cost is only based on data volume, and the frequency that the solution is used."
  • "There are different tiers based on retention policies. There are four tiers. The pricing varies based on steaming units and tiers. The standard pricing is $10/hour."
  • "The current price is substantial."
  • "Azure Stream Analytics is a little bit expensive."
  • "The product's price is at par with the other solutions provided by the other cloud service providers in the market."
  • More Azure Stream Analytics Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
    765,234 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:The product’s most valuable features are lazy evaluation and workload distribution.
    Top Answer:They provide an open-source license for the on-premise version. However, we have to pay for the cloud version including data centers and virtual machines.
    Top Answer:They could improve the issues related to programming language for the platform.
    Top Answer:Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analytics teams that have to interpret data to further the business goals of their… more »
    Top Answer:Customers need to pay for a license. However, we have a three-year upfront licensing arrangement, which helps to keep the costs relatively low.
    Top Answer:Easier scalability and more detailed job monitoring features would be helpful. Another room for improvement is the ingestion of data.
    Ranking
    2nd
    out of 22 in Hadoop
    Views
    2,468
    Comparisons
    1,915
    Reviews
    20
    Average Words per Review
    387
    Rating
    8.6
    4th
    out of 38 in Streaming Analytics
    Views
    10,297
    Comparisons
    8,660
    Reviews
    12
    Average Words per Review
    379
    Rating
    8.3
    Comparisons
    Also Known As
    ASA
    Learn More
    Overview

    Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

    Azure Stream Analytics is a robust real-time analytics service that has been designed for critical business workloads. Users are able to build an end-to-end serverless streaming pipeline in minutes. Utilizing SQL, users are able to go from zero to production with a few clicks, all easily extensible with unique code and automatic machine learning abilities for the most advanced scenarios.

    Azure Stream Analytics has the ability to analyze and accurately process exorbitant volumes of high-speed streaming data from numerous sources at the same time. Patterns and scenarios are quickly identified and information is gathered from various input sources, such as social media feeds, applications, clickstreams, sensors, and devices. These patterns can then be implemented to trigger actions and launch workflows, such as feeding data to a reporting tool, storing data for later use, or creating alerts. Azure Stream Analytics is also offered on Azure IoT Edge runtime, so the data can be processed on IoT devices.

    Top Benefits

    • User friendly: Azure Stream Analytics is very straightforward and easy to use. Out of the box and with a few clicks, users are able to connect to numerous sources and sinks, and easily develop an end-to-end pipeline. Stream Analytics can easily connect to Azure IoT Hub and Azure Event Hub for streaming ingestion, in addition to connecting with Azure Blob storage for historical data ingestion.

    • Flexible deployment: For low-latency analytics, Azure Stream Analytics can run on Azure Stack or IoT edge. For large-scale analytics, the solution can run in the cloud. Azure Stream Analytics uses the same query language and tools for both the cloud and the edge, facilitating an easier process for developers to design exceptional hybrid architectures for streaming processes.

    • Cost-effective: With Azure Stream Analytics, users only pay for the streaming units they consume; there are no upfront costs. Users can easily scale up or down as needed; there is no commitment or cluster provisioning.

    • Trustworthy: Azure Stream Analytics guarantees event processing to be 99.99% available with a minute level of granularity. Azure Stream Analytics has embedded recovery capabilities and checkpoints to keep things running smoothly at all times. Events are never lost with Azure Stream Analytics at-least once delivery of events and exactly one event processing.

    Reviews from Real Users

    “Azure Stream Analytics is something that you can use to test out streaming scenarios very quickly in the general sense and it is useful for IoT scenarios. If I was to do a project with IoT and I needed a streaming solution, Azure Stream Analytics would be a top choice. The most valuable features of Azure Stream Analytics are the ease of provisioning and the interface is not terribly complex.” - Olubisi A., Team Lead at a tech services company.

    “It's used primarily for data and mining - everything from the telemetry data side of things. It's great for streaming and makes everything easy to handle. The streaming from the IoT hub and the messaging are aspects I like a lot.” - Sudhendra U., Technical Architect at Infosys

    Sample Customers
    NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
    Rockwell Automation, Milliman, Honeywell Building Solutions, Arcoflex Automation Solutions, Real Madrid C.F., Aerocrine, Ziosk, Tacoma Public Schools, P97 Networks
    Top Industries
    REVIEWERS
    Computer Software Company30%
    Financial Services Firm15%
    University9%
    Marketing Services Firm6%
    VISITORS READING REVIEWS
    Financial Services Firm25%
    Computer Software Company13%
    Manufacturing Company7%
    Comms Service Provider6%
    REVIEWERS
    Computer Software Company27%
    Manufacturing Company18%
    Insurance Company9%
    Government9%
    VISITORS READING REVIEWS
    Computer Software Company15%
    Financial Services Firm12%
    Manufacturing Company8%
    Comms Service Provider5%
    Company Size
    REVIEWERS
    Small Business40%
    Midsize Enterprise19%
    Large Enterprise40%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise12%
    Large Enterprise71%
    REVIEWERS
    Small Business24%
    Midsize Enterprise10%
    Large Enterprise67%
    VISITORS READING REVIEWS
    Small Business20%
    Midsize Enterprise11%
    Large Enterprise69%
    Buyer's Guide
    Hadoop
    March 2024
    Find out what your peers are saying about Cloudera, Apache, Amazon and others in Hadoop. Updated: March 2024.
    765,234 professionals have used our research since 2012.

    Apache Spark is ranked 2nd in Hadoop with 58 reviews while Azure Stream Analytics is ranked 4th in Streaming Analytics with 21 reviews. Apache Spark is rated 8.4, while Azure Stream Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Azure Stream Analytics writes "Easy to set up and user-friendly, but could be priced better". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Apache NiFi, whereas Azure Stream Analytics is most compared with Amazon Kinesis, Databricks, Amazon MSK, Apache Flink and Apache Spark Streaming.

    We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.