2019-07-16T05:40:00Z

What is your primary use case for Spark SQL?

Julia Miller - PeerSpot reviewer
  • 0
  • 10
PeerSpot user
13

13 Answers

SurjitChoudhury - PeerSpot reviewer
Real User
Top 5
2023-11-23T15:19:35Z
Nov 23, 2023

I employ Spark SQL for various tasks. Initially, I gathered data from databases, SAP systems, and external sources via SFTP, storing it in blob storage. Using Spark SQL within Jupyter notebooks, I define and implement business logic for data processing. Our CI/CD process managed with Azure DevOps, oversees the execution of Spark SQL scripts, facilitating data loading into SQL Server. This structured data is then used by analytics teams, particularly in tools like Power BI, for thorough analysis and reporting. The seamless integration of Spark SQL in this workflow ensures efficient data processing and analysis, contributing to the success of our data-driven initiatives.

Search for a product comparison
SB
Real User
Top 5
2023-08-18T08:37:21Z
Aug 18, 2023

We used the solution for analytics of data and statistical reports from content management platforms.

Aria Amini - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-07-26T11:55:00Z
Jul 26, 2023

We have an HDFS environment for archiving data when there is an enormous volume of data, and the solution helps retrieve data from our HDFS archive. Developers use the solution for business analytics.

Sahil Taneja - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-05-05T08:54:14Z
May 5, 2023

We are using PySpark for big data processing, like multiple competitors of stock. We process it in in-memory using data frames and Spark SQL. We are using it along with the database to process the big data, especially the special Azure data. We are using PySpark. Databricks itself provides an environment that is pre-installed with Spark.

Lucas Dreyer - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-01-04T13:37:06Z
Jan 4, 2023

We use this solution for data engineering, data transformation, repairing data for machine learning and doing queries. We have between 30 and 40 users making use of this solution.

KM
Real User
Top 10Leaderboard
2022-11-22T13:27:47Z
Nov 22, 2022

Our company uses the solution to create pipelines and data sets. The ETL process transforms the data and certain written aggregations convert the raw data to data sets. The data sets are then exported to tables for dashboards.

Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: February 2024.
763,955 professionals have used our research since 2012.
AG
Real User
2021-12-02T15:07:38Z
Dec 2, 2021

The primary use case of this solution is to function within a distributed ecosystem. Spark is part of EMR, a Hadoop distribution, and is one of the tools in the ecosystem. You are not working with Hadoop in a vacuum—you leverage Spark, Hive, HBase—because it is just a distributed ecosystem. It has no value within itself. This solution can be deployed both on the cloud and on Cloudera distributions.

KG
Real User
2021-05-29T10:04:10Z
May 29, 2021

I am using this solution for data validation and writing queries.

QG
Real User
2020-09-27T04:10:00Z
Sep 27, 2020

We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport.

PK
Real User
2020-04-26T06:32:00Z
Apr 26, 2020

Our primary use case is for building a data pipeline and data analytics.

SS
Real User
2020-03-18T06:06:00Z
Mar 18, 2020

We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs. I work for a financial institution, so every solution that they need to consider has to be on-premise. I'm actually just evaluating and up scaling my skill sets with this solution right now.

DM
Real User
Top 20
2020-02-09T08:17:05Z
Feb 9, 2020

We primarily use the solution as our data warehouse. We use it for data science.

it_user986637 - PeerSpot reviewer
Real User
2019-07-16T05:40:00Z
Jul 16, 2019

The primary use is to process big data. We were connecting into and we were applying sentiment analysis via hardware.

Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers...
Download Spark SQL ReportRead more