We performed a comparison between Apache Spark Streaming and Databricks based on real PeerSpot user reviews.
Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Apache Spark Streaming was straightforward in terms of maintenance. It was actively developed, and migrating from an older to a newer version was quite simple."
"The solution is very stable and reliable."
"Apache Spark Streaming's most valuable feature is near real-time analytics. The developers can build APIs easily for a code-steaming pipeline. The solutions have an ecosystem of integration with other stock services."
"As an open-source solution, using it is basically free."
"It's the fastest solution on the market with low latency data on data transformations."
"The platform’s most valuable feature for processing real-time data is its ability to handle continuous data streams."
"Apache Spark Streaming has features like checkpointing and Streaming API that are useful."
"The solution is better than average and some of the valuable features include efficiency and stability."
"The most valuable feature of Databricks is the notebook, data factory, and ease of use."
"Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client."
"Databricks is hosted on the cloud. It is very easy to collaborate with other team members who are working on it. It is production-ready code, and scheduling the jobs is easy."
"It's very simple to use Databricks Apache Spark."
"The solution is very simple and stable."
"The load distribution capabilities are good, and you can perform data processing tasks very quickly."
"The simplicity of development is the most valuable feature."
"A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem."
"There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused."
"We would like to have the ability to do arbitrary stateful functions in Python."
"The cost and load-related optimizations are areas where the tool lacks and needs improvement."
"In terms of improvement, the UI could be better."
"The service structure of Apache Spark Streaming can improve. There are a lot of issues with memory management and latency. There is no real-time analytics. We recommend it for the use cases where there is a five-second latency, but not for a millisecond, an IOT-based, or the detection anomaly-based. Flink as a service is much better."
"It was resource-intensive, even for small-scale applications."
"Integrating event-level streaming capabilities could be beneficial."
"The initial setup is quite complex."
"The tool should improve its integration with other products."
"If I want to create a Databricks account, I need to have a prior cloud account such as an AWS account or an Azure account. Only then can I create a Databricks account on the cloud. However, if they can make it so that I can still try Databricks even if I don't have a cloud account on AWS and Azure, it would be great. That is, it would be nice if it were possible to create a pseudo account and be provided with a free trial. It is very essential to creating a workforce on Databricks. For example, students or corporate staff can then explore and learn Databricks."
"The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well."
"Costs can quickly add up if you don't plan for it."
"The data visualization for this solution could be improved. They have started to roll out a data visualization tool inside Databricks but it is in the early stages. It's not comparable to a solution like Power BI, Luca, or Tableau."
"Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster."
"In the future, I would like to see Data Lake support. That is something that I'm looking forward to."
"I have seen better user interfaces, so that is something that can be improved."
Apache Spark Streaming is ranked 8th in Streaming Analytics with 9 reviews while Databricks is ranked 2nd in Streaming Analytics with 78 reviews. Apache Spark Streaming is rated 8.0, while Databricks is rated 8.2. The top reviewer of Apache Spark Streaming writes "Easy integration, beneficial auto-scaling, and good open-sourced support community". On the other hand, the top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". Apache Spark Streaming is most compared with Amazon Kinesis, Spring Cloud Data Flow, Azure Stream Analytics, Apache Pulsar and SAS Event Stream Processing, whereas Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Dataiku, Dremio and Microsoft Azure Machine Learning Studio. See our Apache Spark Streaming vs. Databricks report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.