Apache Spark vs Azure Stream Analytics comparison

Cancel
You must select at least 2 products to compare!
Apache Logo
2,430 views|1,869 comparisons
89% willing to recommend
Microsoft Logo
9,766 views|8,235 comparisons
95% willing to recommend
Comparison Buyer's Guide
Executive Summary
Updated on Sep 5, 2022

We performed a comparison between Apache Spark vs.Azure Stream Analytics based on our users’ reviews in five categories. After reading all of the collected data, you can find our conclusion below.

  • Ease of Deployment: U users note that both products are very straightforward and simple to set up.
  • Features: Users of both products are generally happy with their flexibility, stability, and scalability. Some Azure Stream Analytics users noted issues with stability.

    Apache Spark users note being particularly satisfied with its AI libraries and batch processing, but that there’s a learning curve to using it and that its stream processing needs to be developed more.

    Azure Stream Analytics users say they’re impressed with the solution's UI, real-time analytics, and its deep integration with other Azure products. Some users mention issues when connecting to Microsoft Power BI and would like to see clearer metrics.
  • Pricing: Apache Spark is an open-source product. You have to pay only when you use any bundled product, such as Cloudera. Azure Stream Analytics users say that the solution is fairly priced and is cheaper than its biggest competitors.
  • ROI: Apache Spark users make no mention of ROI. Azure Stream Analytics users mention being pleased with the ROI.
  • Service and Support: Because Apache Spark is open-source, they do not offer support. Azure Stream Analytics users report excellent service and support.

Comparison Results: Apache Spark and Azure Stream Analytics come out about equal in this comparison. Some users are more satisfied with Apache Spark’s stability, and pricing, but Azure Stream Analytics has an edge when it comes to ROI and technical support.

To learn more, read our detailed Hadoop Report (Updated: April 2024).
770,458 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"We use it for ETL purposes as well as for implementing the full transformation pipelines.""The scalability has been the most valuable aspect of the solution.""With Spark, we parallelize our operations, efficiently accessing both historical and real-time data.""The data processing framework is good.""The fault tolerant feature is provided.""We use Spark to process data from different data sources.""Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more.""I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library."

More Apache Spark Pros →

"It's a product that can scale.""We use Azure Stream Analytics for simulation and internal activities.""Provides deep integration with other Azure resources.""The most valuable features are the IoT hub and the Blob storage.""I like the IoT part. We have mostly used Azure Stream Analytics services for it""The solution's most valuable feature is its ability to create a query using SQ.""The life cycle, report management and crash management features are great.""The way it organizes data into tables and dashboards is very helpful."

More Azure Stream Analytics Pros →

Cons
"I would like to see integration with data science platforms to optimize the processing capability for these tasks.""Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet).""The migration of data between different versions could be improved.""The product could improve the user interface and make it easier for new users.""I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it.""It requires overcoming a significant learning curve due to its robust and feature-rich nature.""When you are working with large, complex tasks, the garbage collection process is slow and affects performance.""In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."

More Apache Spark Cons →

"The collection and analysis of historical data could be better.""The only challenge was that the streaming analytics area in Azure Stream Analytics could not meet our company's expectations, making it a component where improvements are required.""If something goes wrong, it's very hard to investigate what caused it and why.""Easier scalability and more detailed job monitoring features would be helpful.""The UI should be a little bit better from a usability perspective.""The solution's interface could be simpler to understand for non-technical people.""Early in the process, we had some issues with stability.""The solution could be improved by providing better graphics and including support for UI and UX testing."

More Azure Stream Analytics Cons →

Pricing and Cost Advice
  • "Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
  • "Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
  • "We are using the free version of the solution."
  • "Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
  • "Apache Spark is an expensive solution."
  • "Spark is an open-source solution, so there are no licensing costs."
  • "On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
  • "It is an open-source solution, it is free of charge."
  • More Apache Spark Pricing and Cost Advice →

  • "The cost of this solution is less than competitors such as Amazon or Google Cloud."
  • "We pay approximately $500,000 a year. It's approximately $10,000 a year per license."
  • "I rate the price of Azure Stream Analytics a four out of five."
  • "The licensing for this product is payable on a 'pay as you go' basis. This means that the cost is only based on data volume, and the frequency that the solution is used."
  • "There are different tiers based on retention policies. There are four tiers. The pricing varies based on steaming units and tiers. The standard pricing is $10/hour."
  • "The current price is substantial."
  • "Azure Stream Analytics is a little bit expensive."
  • "The product's price is at par with the other solutions provided by the other cloud service providers in the market."
  • More Azure Stream Analytics Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
    770,458 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:We use Spark to process data from different data sources.
    Top Answer:In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, and do the transformation in a subsecond
    Top Answer:Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analytics teams that have to interpret data to further the business goals of their… more »
    Top Answer:The product's price is at par with the other solutions provided by the other cloud service providers in the market.
    Top Answer:Azure Stream Analytics was not meeting our company's expectations because it was tedious to change the job, write queries, or if I needed to change something, I needed to stop the entire stream… more »
    Ranking
    1st
    out of 22 in Hadoop
    Views
    2,430
    Comparisons
    1,869
    Reviews
    26
    Average Words per Review
    444
    Rating
    8.7
    3rd
    out of 38 in Streaming Analytics
    Views
    9,766
    Comparisons
    8,235
    Reviews
    14
    Average Words per Review
    430
    Rating
    8.2
    Comparisons
    Also Known As
    ASA
    Learn More
    Overview

    Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

    Azure Stream Analytics is a robust real-time analytics service that has been designed for critical business workloads. Users are able to build an end-to-end serverless streaming pipeline in minutes. Utilizing SQL, users are able to go from zero to production with a few clicks, all easily extensible with unique code and automatic machine learning abilities for the most advanced scenarios.

    Azure Stream Analytics has the ability to analyze and accurately process exorbitant volumes of high-speed streaming data from numerous sources at the same time. Patterns and scenarios are quickly identified and information is gathered from various input sources, such as social media feeds, applications, clickstreams, sensors, and devices. These patterns can then be implemented to trigger actions and launch workflows, such as feeding data to a reporting tool, storing data for later use, or creating alerts. Azure Stream Analytics is also offered on Azure IoT Edge runtime, so the data can be processed on IoT devices.

    Top Benefits

    • User friendly: Azure Stream Analytics is very straightforward and easy to use. Out of the box and with a few clicks, users are able to connect to numerous sources and sinks, and easily develop an end-to-end pipeline. Stream Analytics can easily connect to Azure IoT Hub and Azure Event Hub for streaming ingestion, in addition to connecting with Azure Blob storage for historical data ingestion.

    • Flexible deployment: For low-latency analytics, Azure Stream Analytics can run on Azure Stack or IoT edge. For large-scale analytics, the solution can run in the cloud. Azure Stream Analytics uses the same query language and tools for both the cloud and the edge, facilitating an easier process for developers to design exceptional hybrid architectures for streaming processes.

    • Cost-effective: With Azure Stream Analytics, users only pay for the streaming units they consume; there are no upfront costs. Users can easily scale up or down as needed; there is no commitment or cluster provisioning.

    • Trustworthy: Azure Stream Analytics guarantees event processing to be 99.99% available with a minute level of granularity. Azure Stream Analytics has embedded recovery capabilities and checkpoints to keep things running smoothly at all times. Events are never lost with Azure Stream Analytics at-least once delivery of events and exactly one event processing.

    Reviews from Real Users

    “Azure Stream Analytics is something that you can use to test out streaming scenarios very quickly in the general sense and it is useful for IoT scenarios. If I was to do a project with IoT and I needed a streaming solution, Azure Stream Analytics would be a top choice. The most valuable features of Azure Stream Analytics are the ease of provisioning and the interface is not terribly complex.” - Olubisi A., Team Lead at a tech services company.

    “It's used primarily for data and mining - everything from the telemetry data side of things. It's great for streaming and makes everything easy to handle. The streaming from the IoT hub and the messaging are aspects I like a lot.” - Sudhendra U., Technical Architect at Infosys

    Sample Customers
    NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
    Rockwell Automation, Milliman, Honeywell Building Solutions, Arcoflex Automation Solutions, Real Madrid C.F., Aerocrine, Ziosk, Tacoma Public Schools, P97 Networks
    Top Industries
    REVIEWERS
    Computer Software Company30%
    Financial Services Firm15%
    University9%
    Marketing Services Firm6%
    VISITORS READING REVIEWS
    Financial Services Firm25%
    Computer Software Company13%
    Manufacturing Company7%
    Comms Service Provider6%
    REVIEWERS
    Computer Software Company27%
    Manufacturing Company18%
    Insurance Company9%
    Government9%
    VISITORS READING REVIEWS
    Computer Software Company15%
    Financial Services Firm12%
    Manufacturing Company8%
    Comms Service Provider5%
    Company Size
    REVIEWERS
    Small Business40%
    Midsize Enterprise18%
    Large Enterprise42%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise12%
    Large Enterprise71%
    REVIEWERS
    Small Business24%
    Midsize Enterprise10%
    Large Enterprise67%
    VISITORS READING REVIEWS
    Small Business20%
    Midsize Enterprise11%
    Large Enterprise69%
    Buyer's Guide
    Hadoop
    April 2024
    Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: April 2024.
    770,458 professionals have used our research since 2012.

    Apache Spark is ranked 1st in Hadoop with 60 reviews while Azure Stream Analytics is ranked 3rd in Streaming Analytics with 22 reviews. Apache Spark is rated 8.4, while Azure Stream Analytics is rated 8.2. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Azure Stream Analytics writes "Easy to set up and user-friendly, but could be priced better". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Apache NiFi, whereas Azure Stream Analytics is most compared with Amazon Kinesis, Databricks, Amazon MSK, Apache Flink and Apache Spark Streaming.

    We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.