We just raised a $30M Series A: Read our story

Spark SQL OverviewUNIXBusinessApplication

Spark SQL is #3 ranked solution in top Hadoop tools. IT Central Station users give Spark SQL an average rating of 8 out of 10. Spark SQL is most commonly compared to IBM Db2 Big SQL:Spark SQL vs IBM Db2 Big SQL. The top industry researching this solution are professionals from a computer software company, accounting for 31% of all views.
What is Spark SQL?
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.
Buyer's Guide

Download the Hadoop Buyer's Guide including reviews and more. Updated: November 2021

Spark SQL Customers
UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions
Spark SQL Video

Pricing Advice

What users are saying about Spark SQL pricing:
  • "The solution is open-sourced and free."

Spark SQL Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
SS
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees
Real User
Top 20
GUI could be improved. Useful for speedily processing big data.

Pros and Cons

  • "The speed of getting data."
  • "Anything to improve the GUI would be helpful."

What is our primary use case?

We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs.

I work for a financial institution, so every solution that they need to consider has to be on-premise.

I'm actually just evaluating and up scaling my skill sets with this solution right now.

What is most valuable?

The speed of getting data, as our TBs are big and it's a lot of data. 

What needs improvement?

Anything to improve the GUI would be helpful.

We have experienced a lot of issues, but nothing in the production environment.

For how long have I used the solution?

For a couple of months. However, we have not implemented in a production environment yet.

What do I think about the stability of the solution?

The solution has not been implemented yet. When it is implemented into the real world and production, that is when I expect to see some challenges.

How are customer service and technical support?

We have worked with the Cloudera support for this solution. They are average.

Which solution did I use previously and why did I switch?

I have an experience with other database tools for the span of more than 10 years.

How was the initial setup?

The initial setup is a bit complex.

Which other solutions did I evaluate?

We are also planning to use Informatica since there is a way in which you can use Spark in Informatica. You can use the Spark within Informatica because there is an an option to tie in a big data addition.

What other advice do I have?

We will have a lot of big data, which is why we need it. Otherwise, the solution is not needed. The solution really depends on the size of your data, its complexity, and the analysis that you are doing. Spark is good, but it is not mandatory.

Since I don't have experience in production with the solution, the best I can rate it now is a five (out of 10). 

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
DulalMali
Big Data Analytics Practice head at BSE
Real User
Top 5
An excellent solution that continues to mature but needs graphing capabilities

Pros and Cons

  • "Overall the solution is excellent."
  • "The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."

What is our primary use case?

We primarily use the solution as our data warehouse. We use it for data science.

What is most valuable?

Overall the solution is excellent.

The solution is continuing to evolve and mature over time.

What needs improvement?

The service is complex. This is due to the fact that it's a combination of a lot of technology.

The solution needs to include graphing capabilities. Including financial charts would help improve everything overall.

For how long have I used the solution?

I've been using the solution since 2013.

What do I think about the stability of the solution?

The solution is relatively stable. We haven't had issues with bugs or glitches.

What do I think about the scalability of the solution?

The solution is scalable. We've found it easy to expand as necessary.

Right now, we have about 42 users on the solution. They include IT and ETL staff as well as a business analyst.

How are customer service and technical support?

I've never been in touch with technical support. I can't speak to any experience our company has had with them.

Which solution did I use previously and why did I switch?

I've also worked with Apache SQL and SAP. This solution is a much more scalable and adventurous solution. It's also faster than the others. We used to use IQ, but at the time it couldn't scale well, so we switched to IBM Appliance. Then we switched to Spark. IBM was good, but it also had issues with scalability and it cost us a lot of money. 

How was the initial setup?

The initial setup is straightforward. We found it quite easy.

What about the implementation team?

We handled the implementation ourselves with our in-house team.

What's my experience with pricing, setup cost, and licensing?

The pricing of Apache is much more competitive than IBM.

What other advice do I have?

We use both the on-premises and cloud deployment models.

We have a relationship with Cloudera and use their distribution channels. We don't have a relationship with Apache.

Spark SQL is a good product. However, users need to have the capability of implementing the correct tools and efficiencies.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Find out what your peers are saying about Apache, Informatica, VMware and others in Hadoop. Updated: November 2021.
552,305 professionals have used our research since 2012.
Piotr Kalanski
Cloud Team Leader at TCL
Real User
Top 10
Enables us to build a data pipeline and has good performance

Pros and Cons

  • "The performance is one of the most important features. It has an API to process the data in a functional manner."
  • "In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper."

What is our primary use case?

Our primary use case is for building a data pipeline and data analytics. 

What is most valuable?

The performance is one of the most important features. It has an API to process the data in a functional manner. 

What needs improvement?

I would like to have the ability to process data without the overhead. To use the same API to process both terabytes data and be able to process one GB of data. 

For how long have I used the solution?

I have been using Spark SQL for around four years. 

What do I think about the stability of the solution?

It is very stable.

What do I think about the scalability of the solution?

It is scalable. I use it on and off. I use it mostly daily. 

How was the initial setup?

From an infrastructure perspective, it was easy for us to set up because we used some cloud services. But on-premise requires more setup. There is a learning curve. If you're not a programmer there is a learning curve. It requires more effort to learn more complex steps. 

I deployed it by myself. We use cloud so we are able to do it. 

The amount of people required for deployment will depend. One person is enough for AWS but not in other places. 

If you know how to do it, the deployment can be done in minutes. 

What other advice do I have?

I would rate Spark SQL a nine out of ten. 

My advice would be to read Databricks books about Spark. It's a good source of knowledge. 

In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
QG
Corporate Sales at a financial services firm with 10,001+ employees
Real User
Top 20
It is stable, but its partitioning feature isn't that easy to use

Pros and Cons

  • "It is a stable solution."
  • "Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."

What is our primary use case?

We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport. 

What is most valuable?

It is a stable solution. 

What needs improvement?

Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users.

For how long have I used the solution?

I have been using this solution for two months.

What do I think about the scalability of the solution?

Its scalability is okay. We are a big organization. 

What other advice do I have?

Being a new user, I would rate Spark SQL a four out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
KG
Associate Manager at a consultancy with 501-1,000 employees
Real User
Top 5Leaderboard
Easy to use, reliable, and useful data validation

What is our primary use case?

I am using this solution for data validation and writing queries.

What is most valuable?

Data validation and ease of use are the most valuable features.

What needs improvement?

There should be better integration with other solutions.

For how long have I used the solution?

I have been using the solution for approximately two years.

What do I think about the stability of the solution?

The solution has been stable.

What do I think about the scalability of the solution?

I have found the solution to be scalable. We have 20 people using the solution in my organization and we plan to increase usage.

What's my experience with pricing, setup cost, and licensing?

The solution is open-sourced and free.

What other advice do I have?

I rate…

What is our primary use case?

I am using this solution for data validation and writing queries.

What is most valuable?

Data validation and ease of use are the most valuable features.

What needs improvement?

There should be better integration with other solutions.

For how long have I used the solution?

I have been using the solution for approximately two years.

What do I think about the stability of the solution?

The solution has been stable.

What do I think about the scalability of the solution?

I have found the solution to be scalable. We have 20 people using the solution in my organization and we plan to increase usage.

What's my experience with pricing, setup cost, and licensing?

The solution is open-sourced and free.

What other advice do I have?

I rate Spark SQL a ten out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Product Categories
Hadoop
Buyer's Guide
Download our free Hadoop Report and find out what your peers are saying about Apache, Informatica, VMware, and more!