Spark SQL Initial Setup

SurjitChoudhury - PeerSpot reviewer
Data engineer at Cocos pt

My experience with the initial setup of Spark SQL was relatively smooth. Understanding the system wasn't overly difficult because the data was structured in databases, and we could use notebooks for coding in Python or Java. Configuring networks and running scripts to load data into the database were routine tasks that didn't pose significant challenges. The flexibility to use different languages for coding and the ability to process data using key-value pairs in Python made the setup adaptable. Once we received the source data, processing it in SparkSQL involved writing scripts to create dimension and fact tables, which became a standard part of our workflow. Setting up Spark SQL was reasonably quick, but sometimes we face performance issues, especially during data loading into the SQL Server data warehouse. Sequencing notebooks for efficient job runs is crucial, and managing complex tasks with multiple notebooks requires careful tracking. Exploring ways to optimize this process could be beneficial. However, once you are familiar with the database architecture and project tools, understanding and adapting to the system become more straightforward.

View full review »
Lucas Dreyer - PeerSpot reviewer
Data Engineer at BBD

In terms of setting up the on-premise cluster, it can be quite complex. I would rate it a six or seven in terms of complexity. Using it on the cloud is very straightforward. 

We're using it mostly is on-premise, but we also have cloud instances where we use Spark so we have a mix of use cases.

 orThe deployment can be done by one person. Typically, we have bigger teams two or three people at least, but  one person can look after it and maintain.

View full review »
Aria Amini - PeerSpot reviewer
Data Engineer at Behsazan Mellat

We used Amber for Spark SQL's installation. We used Amber to install some IDs like Zeppelin and altering and Python. We used the tool and the Zeppelin ID. Installation is easy, but it can get complex if you want to use SparkSQL's cluster feature as well. But overall, the installation is not complex. It takes two or three days to deploy the solution if you want to install it on the Zeppelin ID and the Hadoop cluster. We needed one engineer to install and deploy the solution. Four engineers and some developers are working on the solution and doing development work in this environment.

The solution just requires one person for maintenance because of the Amber framework.

View full review »
Buyer's Guide
Spark SQL
April 2024
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,976 professionals have used our research since 2012.
SB
CTO at Dokument IT d.o.o.

The setup process for Spark is not well-documented, but that's expected because the solution is open-source. You must sneak around various blocks, but this is usual for an open-source solution. You could hire guys from the Databricks center, and they can fix nearly anything.

When you learn all the tricks, you can deploy the solution very fast in one hour. But that applies just to the development environment. We are not in production right now. I tested it on Windows and tested it on Ubuntu, and everything works well. But you have to reinvent the wheel because documentation is incomplete.

The deployment process is based on bash scripts. I was considering making Ansible playbooks and custom roles in Ansible, but I didn't have the time, though this is the plan. I moved from the bash scripts on Ansible because I prefer a declarative approach in software engineering. I have plans to totally automate the deployment, where one experienced engineer would be enough. The solution's final deployment would be on the Kubernetes cluster, and the infrastructure would be set up with Terraform on Ansible. Everything will be heavily optimized.

View full review »
Mahdi Sharifmousavi - PeerSpot reviewer
Lecturer at Amirkabir University of Technology

Deployment was carried out by our infrastructure department.

View full review »
KM
Senior Analyst/ Customer Business and Insights Specialist at a tech services company with 501-1,000 employees

The setup is very straightforward so I rate it a ten out of ten. 

View full review »
SS
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees

The initial setup is a bit complex.

View full review »
DM
Data Analytics Practice head at bse

The initial setup is straightforward. We found it quite easy.

View full review »
PK
Cloud Team Leader at TCL

From an infrastructure perspective, it was easy for us to set up because we used some cloud services. But on-premise requires more setup. There is a learning curve. If you're not a programmer there is a learning curve. It requires more effort to learn more complex steps. 

I deployed it by myself. We use cloud so we are able to do it. 

The amount of people required for deployment will depend. One person is enough for AWS but not in other places. 

If you know how to do it, the deployment can be done in minutes. 

View full review »
AG
Engineering Manager/Solution architect at a computer software company with 201-500 employees

The installation is straightforward because it's a cloud-based solution. 

View full review »
it_user986637 - PeerSpot reviewer
Project Manager - Senior Software Engineer at a tech services company with 11-50 employees

The initial setup was fine. If somebody knows what to expect it's okay.

View full review »
Buyer's Guide
Spark SQL
April 2024
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,976 professionals have used our research since 2012.