We performed a comparison between Apache Spark and Spark SQL based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The solution has been very stable."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"The scalability has been the most valuable aspect of the solution."
"The product’s most valuable features are lazy evaluation and workload distribution."
"The most valuable feature of Apache Spark is its ease of use."
"Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"The solution is easy to understand if you have basic knowledge of SQL commands."
"One of Spark SQL's most beautiful features is running parallel queries to go through enormous data."
"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline."
"The team members don't have to learn a new language and can implement complex tasks very easily using only SQL."
"I find the Thrift connection valuable."
"Overall the solution is excellent."
"Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that."
"Offers a variety of methods to design queries and incorporates the regular SQL syntax within tasks."
"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use."
"One limitation is that not all machine learning libraries and models support it."
"If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation."
"The solution’s integration with other platforms should be improved."
"At times during the deployment process, the tool goes down, making it look less robust. To take care of the issues in the deployment process, users need to do manual interventions occasionally."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"The product could improve the user interface and make it easier for new users."
"The migration of data between different versions could be improved."
"I've experienced some incompatibilities when using the Delta Lake format."
"Anything to improve the GUI would be helpful."
"It takes a bit of time to get used to using this solution versus Pandas as it has a steep learning curve."
"It would be useful if Spark SQL integrated with some data visualization tools."
"There are many inconsistencies in syntax for the different querying tasks."
"In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL."
"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
"Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Spark SQL is ranked 4th in Hadoop with 14 reviews. Apache Spark is rated 8.4, while Spark SQL is rated 7.8. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Spark SQL writes "Offers the flexibility to handle large-scale data processing". Apache Spark is most compared with Spring Boot, AWS Batch, SAP HANA, Cloudera Distribution for Hadoop and Azure Stream Analytics, whereas Spark SQL is most compared with IBM Db2 Big SQL, Netezza Analytics, SAP HANA and HPE Ezmeral Data Fabric. See our Apache Spark vs. Spark SQL report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.