We performed a comparison between Apache Spark and Spark SQL based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The processing time is very much improved over the data warehouse solution that we were using."
"The distribution of tasks, like the seamless map-reduce functionality, is quite impressive."
"Spark can handle small to huge data and is suitable for any size of company."
"It is useful for handling large amounts of data. It is very useful for scientific purposes."
"The main feature that we find valuable is that it is very fast."
"The solution has been very stable."
"ETL and streaming capabilities."
"Apache Spark can do large volume interactive data analysis."
"The stability was fine. It behaved as expected."
"Overall the solution is excellent."
"Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that."
"Data validation and ease of use are the most valuable features."
"The speed of getting data."
"I find the Thrift connection valuable."
"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline."
"One of Spark SQL's most beautiful features is running parallel queries to go through enormous data."
"Apache Spark provides very good performance The tuning phase is still tricky."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"Spark could be improved by adding support for other open-source storage layers than Delta Lake."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"At the initial stage, the product provides no container logs to check the activity."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"It takes a bit of time to get used to using this solution versus Pandas as it has a steep learning curve."
"SparkUI could have more advanced versions of the performance and the queries and all."
"There are many inconsistencies in syntax for the different querying tasks."
"I've experienced some incompatibilities when using the Delta Lake format."
"Anything to improve the GUI would be helpful."
"It would be useful if Spark SQL integrated with some data visualization tools."
"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
"It would be beneficial for aggregate functions to include a code block or toolbox that explains its calculations or supported conditional statements."
Apache Spark is ranked 2nd in Hadoop with 24 reviews while Spark SQL is ranked 4th in Hadoop with 7 reviews. Apache Spark is rated 8.4, while Spark SQL is rated 7.8. The top reviewer of Apache Spark writes "Offers seamless integration with Azure services and on-premises servers". On the other hand, the top reviewer of Spark SQL writes "Processing solution used for data engineering and transformation with the ability to process large datasets". Apache Spark is most compared with Spring Boot, AWS Batch, SAP HANA, Cloudera Distribution for Hadoop and AWS Lambda, whereas Spark SQL is most compared with IBM Db2 Big SQL, SAP HANA, HPE Ezmeral Data Fabric and Netezza Analytics. See our Apache Spark vs. Spark SQL report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.