Apache Spark Initial Setup

Ilya Afanasyev - PeerSpot reviewer
Senior Software Development Engineer at Yahoo!

I didn't handle the initial setup. We were using this pipeline and clusters already. I just installed it on my local server. However, in terms of difficulty, I didn't see any problem. The deployment might only take a few hours. 

I found some documentation. I got the documentation from the site and downloaded the archive and unzipped it, and installed it. I can't say that I installed something from a special configuration. I just installed a few nodes for debugging and for running locally, and that's all. Also, in one case I used, for example, a Docker configuration with Spark. It all worked fine.

View full review »
SurjitChoudhury - PeerSpot reviewer
Data engineer at Cocos pt

Resource allocation and optimization in the computing tasks are different for on-premise systems. 

In cloud environments, resource allocation is already handled by the cloud provider, so you don't need to worry about it. 

On-prem, if you're using Hadoop with Spark, resource allocation might be handled by Kubernetes or YARN. These tools provide feedback to the Spark driver about available resources, and the driver allocates tasks to worker nodes based on that information.

View full review »
Suriya Senthilkumar - PeerSpot reviewer
Analyst at Deloitte

The initial setup complexity depends on whether it's on the cloud or on-premise. For cloud deployments, especially using platforms like Databricks, the process is straightforward and can be configured with ease. However, if the deployment is on-premise, the setup tends to be more time-consuming, although not overly complex.

View full review »
Buyer's Guide
Apache Spark
April 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel

When you install the complete environment, you install Spark as a part of this solution. The setup can be tricky when introducing security, such as connecting Spark using Kerberos. It can be tricky because when you use it, you have to distribute your architecture with many servers, and even then, you have to prepare Kerberos on every server. It's not possible to do this in one place.

Deploying Apache Spark is pretty complex. But that is a problem with the security approach. Our security guys requested this security, so we use Kerberos authentication mandatorily, which can be complex. We had five people for maintenance and deployment, not to mention deployment or other roles.

View full review »
Hamid M. Hamid - PeerSpot reviewer
Data architect at Banking Sector

The deployment of the product is easy.

Apache Spark's cluster deployment process is very easy.

There is only a deployment process required for an application to run on Apache Spark. Apache Spark itself is a setup tool. Deploying an application using Apache Spark is easy as a user since you just need to submit the code in Scala and submit it to the cluster, and then the deployment process can be done in one step.

The solution is deployed on an on-premises model.

View full review »
Lucas Dreyer - PeerSpot reviewer
Data Engineer at BBD

I haven't handled the deployment process, but setting it up on the cloud seems relatively straightforward.

View full review »
Atal Upadhyay - PeerSpot reviewer
AVP at MIDDAY INFOMEDIA LIMITED

The initial setup isn't complicated, but it varies from person to person. For me, it wasn't particularly complex; it was straightforward to use.

View full review »
Anshuman Kishore - PeerSpot reviewer
Director Product Development at Mycom Osi

The product's deployment phase is easy.

The product's deployment phase involved the CI/CD pipeline and Jenkins pipeline.

Earlier, the solution was deployed on an on-premises model. Later on, the solution was deployed on a cloud model.

Initially, during the product's deployment phase, it took more than four to five hours. With the passage of time, the product's deployment process became easier.

Around 50 to 100 people in my company are involved in the product's deployment process.

View full review »
VM
Cloud solution architect at 0

The setup I worked on was really complex, not specifically because of Spark but due to the integration with multiple services. 

It took us about a week to finalize the solution, as understanding the entire workflow and brainstorming on how to maintain private traffic was intricate.

Regarding the deployment process, it involved thorough planning and testing to ensure minimal latency. We managed to achieve a latency of around 20 to 30 milliseconds, which was pretty good.

View full review »
AmitMataghare - PeerSpot reviewer
Associate Director at a consultancy with 10,001+ employees

If Apache Spark is in the cloud, setting it up will require only minutes. If it's on Amazon, GCP, or Microsoft cloud, it'll take minutes to set everything up. However, if you are using the on-premise version, then it might take some time to set up the environment.

View full review »
Atif Tariq - PeerSpot reviewer
Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East

The solution’s initial setup is very easy.

View full review »
Lokesh Jayanna - PeerSpot reviewer
Vice President at Goldman Sachs at a computer software company with 10,001+ employees

The complexity of the initial setup depends on the kind of environment an organization is working with. It requires one executive for deployment. I rate the process an eight out of ten.

View full review »
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis

The solution's setup and installation of Apache Spark can vary in complexity depending on whether it's done in a standalone or cluster environment. The process is generally more straightforward in a standalone setup, especially if you're familiar with the concepts involved. However, setting up in a cluster environment may require more knowledge about clusters and networking, making it potentially more complex.

View full review »
ML
Information Technology Business Analyst at a aerospace/defense firm with 10,001+ employees

The basic installation is easy. However, we are working in the security business and need a very secure installation. It has been quite difficult. I rate the basic installation a ten out of ten. I rate the ease of setup a two or three out of ten for a more secure installation with all the security features. The solution is deployed on-premises in our organization. The deployment process requires a couple of weeks.

View full review »
Oscar Estorach - PeerSpot reviewer
Chief Data-strategist and Director at Theworkshop.es

When handling big data systems, the installation is a bit difficult. When you need to deploy the systems, it's better to use services like Databricks.

I am not a professional admin. I am a developer for and design architecture.

You can use it in your standalone system, however, it's not the best way. It would be okay for little branch codes, not for production.

View full review »
Armando Becerril - PeerSpot reviewer
Partner / Head of Data & Analytics at Kueski

The initial setup has been simplified over the past few years and is now relatively straightforward. 

View full review »
KK
Software Architect at Akbank

I don't have any idea about it. We are a big company, and we have another group for setting up Spark.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

The initial setup is straightforward. 

View full review »
SB
CTO at Hammerknife

The initial setup process is simple if you are a good professional. You have to select a few parameters and press enter. It is already integrated into Databricks platform. One person is enough to manage small and medium implementations.

View full review »
MA
PLC Programmer at Alzero

The initial setup was straightforward and was conducted on the cloud. The entire deployment process took just 15 minutes. The deployment process involves provisioning the computational part tool using Terraform.


View full review »
Jagannadha Rao - PeerSpot reviewer
Lead Data Scientist at International School of Engineering

Apache Spark's initial setup is slightly complex compared to other other solutions. Data scientists could install our previous tools with minimal supervision, whereas Apache Spark requires some IT support. Apache Spark's installation is a time-consuming process because it requires ensuring that all the ports have been accessed properly following certain guidelines.

View full review »
FK
Data Engineer at Berief Food GmbH

The deployment was easy.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

If you want to distribute Apache Spark in a certain way, it is simple. Not every engineer can do it. You need DevOps specialized skills on Spark is what is required.

If we are going to deploy the solution in a one-layer laptop installation, it is very straightforward, but this is not what someone is going to deploy in the production site.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

The initial setup is complex. 

View full review »
Salvatore Campana - PeerSpot reviewer
CEO & Founder at XAUTOMATA TECHNOLOGY GmbH

The initial setup was not easy, but we created a means of asking the user about their needs, making the setup much easier. We can now deploy the platform in thirty minutes using the public cloud or Kubernetes space.

View full review »
it_user371832 - PeerSpot reviewer
Chief System Architect at a marketing services firm with 501-1,000 employees

Setup a spark cluster can be difficult. it's related to your clustering strategy. There is 4 solution at least. 

ec2 script : work only on Amazon AWS

Standalone : manually configuration (hard)

Yarn : to leverage your already existing Hadoop environment.

Mesos : to use with your other Mesos ready application

View full review »
Onur Tokat - PeerSpot reviewer
Big Data Engineer Consultant at Collective[i]

The initial setup is not complex, but it depends on the product's component on the architecture. For example, if you use Hadoop, setup may not be easy. Deployment takes about a week, but the Spark cluster can be installed in the virtual architecture in a day.

View full review »
RV
Director at Nihil Solutions

The initial setup isn't too complex. It's quite straightforward.

We use CACD DevOps from deployment. We only use Spark for processing and for the Data Bricks cluster to spin off and do the job. It's continuously running int he background.

There isn't really any maintenance required per se. We just click the button and it comes up automatically, with the whole cluster and the Spark and everything ready to go.

View full review »
SA
Technical Consultant at a tech services company with 1-10 employees

The initial setup to get it to Hello World is pretty easy, you just have to install it. But when you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources. But you can get a lot of help from different sources on the internet. So it's great. A lot of people are doing it.

I work with a startup company. You know that in startups you do not have the luxury of different people doing different things, you have to do everything on your own, and it's an opportunity to learn everything. In a typical corporate or big organization you only have restricted SOPs, you have to work within the boundaries. In my organization, I have to set up all the things, configure it, and work on it myself.

View full review »
NK
Director of Enginnering at Sigmoid

The initial setup was a little complex when I was using open-source Spark. I was doing a POC in the on-premise environment, and the initial setup was a little cumbersome. It required a lot of set up on Unix systems. We also had to do a lot of configurations and install a lot of things. 

After I moved to the Cloudera CDH version, it was a little easy. It is a bundled product, so you just install whatever you want and use it.

View full review »
it_user946074 - PeerSpot reviewer
Principal Architect at a financial services firm with 1,001-5,000 employees

The initial setup was easy. We keep on getting data from different sources so we will keep on porting in little bits. It's not done in a single sitting, so I can't really say how long it takes.

View full review »
AR
Manager - Data Science Competency at a tech services company with 201-500 employees

With respect to the initial setup, it's neither easy nor very difficult. Our team has experience so it is not difficult for them. However, for a person that is new to using it, the setup might be very difficult.

View full review »
it_user372393 - PeerSpot reviewer
Big Data Consultant at a tech services company with 501-1,000 employees

The initial set-up is quite complex because you have to set-up many different configuration parameters that are deployment-specific. It is not trivial to set-up the correct configuration with so many variables involved.

View full review »
it_user74256 - PeerSpot reviewer
Engineer at a tech vendor with 10,001+ employees

Not that straightforward in terms of standalone deployment, there are some tricks which are not mentioned in the docs.

View full review »
it_user373173 - PeerSpot reviewer
Lead Big Data Engineer at a non-profit with 51-200 employees

The initial setup is not complex. The online documents are pretty good.

View full review »
it_user371334 - PeerSpot reviewer
CEO at a tech consulting company with 51-200 employees

The initial setup was simple.

View full review »
it_user374040 - PeerSpot reviewer
Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees

The initial set-up is straightforward as long as you have picked a right Hadoop distribution.

View full review »
it_user374028 - PeerSpot reviewer
Core Engine Engineer at a computer software company with 51-200 employees

The initial set-up was easy. I have not explored using this on AWS clusters.

View full review »
it_user371325 - PeerSpot reviewer
Data Scientist at a tech vendor with 10,001+ employees

The initial setup was complex. It was not easy getting the correct version and dependencies set up.

View full review »
LC
Snr Security Engineer at a tech vendor with 201-500 employees

The initial setup was complex. It is a complex tool. It's a lot to do with how you will use it. There is a lot to set up. They need to put a lot of scripts to it. There's nearly 60 to set up. When you set up the cloud, it takes about a day to set up. If you set it up on-premise, you know, on hardware, it only takes about a week.

View full review »
it_user1223676 - PeerSpot reviewer
Lead Consultant at a tech services company with 51-200 employees

The initial setup is straightforward. 

View full review »
Buyer's Guide
Apache Spark
April 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.