Spark SQL Other Solutions Considered

Aria Amini - PeerSpot reviewer
Data Engineer at Behsazan Mellat

We use the Python directory, and in some cases, we use Apache Hive on Python. We use Hive and Spark SQL simultaneously. We switched to FireSpark from Python.

Spark SQL is better than Python for running queries in a parallel way. The main difference between Spark, or PySpark, and Python pandas is that Spark SQL is used parallelly in a cluster. The main difference between Apache Hive and PySpark is that PySpark is more flexible than Apache Hive because Apache Hive arranges known scenarios, but PySpark queries on the fly.

View full review »
KM
Senior Analyst/ Customer Business and Insights Specialist at a tech services company with 501-1,000 employees

Our company gives us the freedom to use Python R, PySpark, or SQL languages so we have many tools available. Our team includes 17 developers and 25% of them use the solution. 

The solution is way better than Oracle SQL because Oracle takes a lot of effort to understand and use. 

The solution is similar to the format of MS SQL. With MS, there are defined data sources that place restrictions on what you are supposed to use. Sometimes we had to make sure we had a way through the restrictions. For example, if we didn't have access to a physical table then we had to create a duplicate instance or view of it. We could see the values but couldn't manipulate them because we didn't have access to the physical table. The effect of MS restrictions is based on the complexity of a project and any privacy-related data constraints.

For the solution, use cases sit within the Spark scope so you get multiple options for creating them. You can individually set each use case as closed, private, or public. You can run analytics for each use case because the data is contained within it. This process is much easier when compared to Oracle SQL or MS SQL. 

View full review »
SS
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees

We are also planning to use Informatica since there is a way in which you can use Spark in Informatica. You can use the Spark within Informatica because there is an an option to tie in a big data addition.

View full review »
Buyer's Guide
Spark SQL
March 2024
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
767,319 professionals have used our research since 2012.