Spark SQL Overview

Spark SQL is the #5 ranked solution in our list of top Hadoop tools. It is most often compared to Apache Spark: Spark SQL vs Apache Spark

What is Spark SQL?
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.
Spark SQL Buyer's Guide

Download the Spark SQL Buyer's Guide including reviews and more. Updated: January 2021

Spark SQL Customers
UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions
Spark SQL Video

Spark SQL Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Srinivasan Sugumar
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees
Real User
Top 10
Mar 23, 2020
GUI could be improved. Useful for speedily processing big data.

What is our primary use case?

We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs. I work for a financial institution, so every solution that they need to consider has to be on-premise. I'm actually just evaluating and up scaling my skill sets with this solution right now.

Pros and Cons

  • "The speed of getting data."
  • "Anything to improve the GUI would be helpful."

What other advice do I have?

We will have a lot of big data, which is why we need it. Otherwise, the solution is not needed. The solution really depends on the size of your data, its complexity, and the analysis that you are doing. Spark is good, but it is not mandatory. Since I don't have experience in production with the solution, the best I can rate it now is a five (out of 10).
DulalMali
Big Data Analytics Practice head at BSE
Real User
Top 5Leaderboard
Feb 11, 2020
An excellent solution that continues to mature but needs graphing capabilities

What is our primary use case?

We primarily use the solution as our data warehouse. We use it for data science.

Pros and Cons

  • "Overall the solution is excellent."
  • "The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."

What other advice do I have?

We use both the on-premises and cloud deployment models. We have a relationship with Cloudera and use their distribution channels. We don't have a relationship with Apache. Spark SQL is a good product. However, users need to have the capability of implementing the correct tools and efficiencies. I'd rate the solution seven out of ten.
Find out what your peers are saying about Apache, Informatica, VMware and others in Hadoop. Updated: January 2021.
456,495 professionals have used our research since 2012.
Piotr Kalanski
Cloud Team Leader at TCL
Real User
Top 5Leaderboard
May 5, 2020
Enables us to build a data pipeline and has good performance

What is our primary use case?

Our primary use case is for building a data pipeline and data analytics.

Pros and Cons

  • "The performance is one of the most important features. It has an API to process the data in a functional manner."
  • "In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper."

What other advice do I have?

I would rate Spark SQL a nine out of ten. My advice would be to read Databricks books about Spark. It's a good source of knowledge. In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper.
Seniorsoen67
Project Manager - Senior Software Engineer at a tech services company with 11-50 employees
Real User
Jul 17, 2019
A good stable and scalable solution for processing big data

What is our primary use case?

The primary use is to process big data. We were connecting into and we were applying sentiment analysis via hardware.

What needs improvement?

In the next release, maybe the visualization of some command-line features could be added.

For how long have I used the solution?

I've been using the solution for two to three weeks.

What do I think about the stability of the solution?

The stability was fine. It behaved as expected.

What do I think about the scalability of the solution?

The scalability of the solution is good.

How are customer service and technical support?

Technical support has been fine.

Which solution did I use previously and why did I switch?

We previously used Apache Hadoop.

How was the initial setup?

The initial setup…
reviewer1427205
Corporate Sales at a financial services firm with 10,001+ employees
Real User
Sep 30, 2020
It is stable, but its partitioning feature isn't that easy to use

What is our primary use case?

We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport.

Pros and Cons

  • "It is a stable solution."
  • "Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."

What other advice do I have?

Being a new user, I would rate Spark SQL a four out of ten.
Product Categories
Hadoop
Buyer's Guide
Download our free Hadoop Report and find out what your peers are saying about Apache, Informatica, VMware, and more!