Apache Spark Primary Use Case

Ilya Afanasyev - PeerSpot reviewer

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

It's a root product that we use in our pipeline.

We have some input data. For example, we have one system that supplies some data to MongoDB, for example, and we pull this data from MongoDB, enrich this data from other systems - with some additional fields - and write to S3 for other systems. Since we have a lot of data, we need a parallel process that runs hourly.

View full review »

SurjitChoudhury - PeerSpot reviewer

SurjitChoudhury

Data engineer at Cocos pt

Our main use cases for Spark are Apache Spark SQL and sometimes Spark Streaming to process streaming data.

Like most solutions, we got data from SAP or Azure Data Warehouse. Suppose they were using Azure Cloud technology. So, the data comes from there, relational or sometimes semi-structured data like JSON files and all.

So, we process the data with Spark, writing this code with PySpark, actually Python, which Spark allows, to create the data forms and all and load it into the Tableau format, basically.

So, we try to load it into some database, like SQL Server or any other database. From there, the business data scientists or analysts pick up the data. So, any sort of different sources, basically, like e-commerce sites.

So, previously, we used mostly structured data, which was stored in SAP, mainframe Oracle, or any other system provided in structured formats like CSV.

Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark.

Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more.

Before Spark, there was MapReduce, but it was much slower. Even running the same query a second time would be time-consuming due to the I/O operations with disk storage. Spark was introduced to address these issues, offering processing speeds a hundred times faster than MapReduce, an initiative that saw contributions from Adobe Systems among others.

So, in response to the evolving needs of the industry, Spark has proven to be the solution, efficiently handling the processing requirements we face today.

View full review »

SS

Sachin Shukre

Sr Manager at a transportation company with 10,001+ employees

We use it for real-time and near-real-time data processing. We use it for ETL purposes as well as for implementing the full transformation pipelines.

View full review »

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.

768,578 professionals have used our research since 2012.

Suriya Senthilkumar - PeerSpot reviewer

Suriya Senthilkumar

Analyst at Deloitte

We use the product in our environment for data processing and performing Data Definition Language (DDL) operations.

View full review »

Miodrag Milojevic - PeerSpot reviewer

Miodrag Milojevic

Senior Data Archirect at Yettel

I use the solution for data lakes and big data solutions. I can combine it with the other program languages.

View full review »

Hamid M. Hamid - PeerSpot reviewer

Hamid M. Hamid

Data architect at Banking Sector

In my company, the solution is used for batch processing or real-time processing.

View full review »

Lucas Dreyer - PeerSpot reviewer

Lucas Dreyer

Data Engineer at BBD

We use it for data engineering and analytics to process and examine extensive datasets.

View full review »

Atal Upadhyay - PeerSpot reviewer

Atal Upadhyay

AVP at MIDDAY INFOMEDIA LIMITED

We pull data from various sources and employ a buzzword to process it for reporting purposes, utilizing a prominent visual analytics tool.

View full review »

Anshuman Kishore - PeerSpot reviewer

Anshuman Kishore

Director Product Development at Mycom Osi

I use the solution in my company for one of the cases where we have to deal with areas like topology engines and big topology chains.

View full review »

VM

Vineeth Marar

Cloud solution architect at 0

My contribution primarily focused on the networking aspect, ensuring secure and reliable connections between Azure services and on-premises servers. The solution was complex, involving private links, virtual machines, and custom firewall rules to facilitate secure data transmission.

I use Apache Spark, especially for data processing and analytics. My work involves a broad range of technologies, including PostgreSQL, Apache Kafka, Spark, and various Azure services. Previously, my focus was more on networking, cybersecurity, and Azure's data services like SQL and Active Directory.

View full review »

AmitMataghare - PeerSpot reviewer

AmitMataghare

Associate Director at a consultancy with 10,001+ employees

Apache Spark is a programming language similar to Java or Python. In my most recent deployment, we used Apache Spark to build engineering pipelines to move data from sources into the data lake.

View full review »

Atif Tariq - PeerSpot reviewer

Atif Tariq

Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East

Apache Spark is used for data computation, building data pipelines, or building analytics on top of batch data. Apache Spark is used to handle big data efficiently.

View full review »

Lokesh Jayanna - PeerSpot reviewer

Lokesh Jayanna

Vice President at Goldman Sachs at a computer software company with 10,001+ employees

We use the product for extensive data analysis. It helps us analyze a huge amount of data and transfer it to data scientists in our organization.

View full review »

UjjwalGupta - PeerSpot reviewer

UjjwalGupta

Module Lead at Mphasis

We're using Apache Spark primarily to build ETL pipelines. This involves transforming data and loading it into our data warehouse. Additionally, we're working with Delta Lake file formats to manage the contents.

View full review »

Oscar Estorach - PeerSpot reviewer

Oscar Estorach

Chief Data-strategist and Director at Theworkshop.es

You can do a lot of things in terms of the transformation of data. You can store and transform and stream data. It's very useful and has many use cases.

View full review »

Armando Becerril - PeerSpot reviewer

Armando Becerril

Partner / Head of Data & Analytics at Kueski

We use Spark for machine learning applications, clustering, and segmentation of customers.

View full review »

KK

Kürşat Kurt

Software Architect at Akbank

We just finished a central front project called MFY for our in-house fraud team. In this project, we are using Spark along with Cloudera. In front of Spark, we are using Couchbase.

Spark is mainly used for aggregations and AI (for future usage). It gathers stuff from Couchbase and does the calculations. We are not actively using Spark AI libraries at this time, but we are going to use them.

This project is for classifying the transactions and finding suspicious activities, especially those suspicious activities that come from internet channels such as internet banking and mobile banking. It tries to find out suspicious activities and executes rules that are being developed or written by our business team. An example of a rule is that if the transaction count or transaction amount is greater than 10 million Turkish Liras and the user device is new, then raise an exception. The system sends an SMS to the user, and the user can choose to continue or not continue with the transaction.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

Our primary use case is for interactively processing large volume of data.

View full review »

SB

SlavenBatnozic

CTO at Hammerknife

We use the product for real-time data analysis.

View full review »

MA

Marco Amhof

PLC Programmer at Alzero

We are a software solutions company that serves a variety of industries, including banking, insurance, and industrial sectors. The product is specifically employed for managing data platforms for our customers.

View full review »

Jagannadha Rao - PeerSpot reviewer

Jagannadha Rao

Lead Data Scientist at International School of Engineering

We use Apache Spark for storage and processing.

View full review »

FK

Farzam Khodaei

Data Engineer at Berief Food GmbH

Our customers configure their software applications, and I use Apache to check them. We use it for data processing.

View full review »

JK

reviewer2208003

Quantitative Developer at a marketing services firm with 11-50 employees

Predominantly, I use Spark for data analysis on top of datasets containing tens of millions of records.

View full review »

Mahdi Sharifmousavi - PeerSpot reviewer

Mahdi Sharifmousavi

Lecturer at Amirkabir University of Technology

We use this solution for it's anti-money laundering and direct marketing features within a banking environment.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

The solution can be deployed on the cloud or on-premise.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

We have built a product called "NetBot." We take any form of data, large email data, image, videos or transactional data and we transform unstructured textual data videos in their structured form into reading into transactional data and we create an enterprise-wide smart data grid. That smart data grid is being used by the downstream analytics tool. We also provide machine-building for people to get faster insight into their data.

View full review »

NB

reviewer1283880

CEO International Business at a tech services company with 1,001-5,000 employees

In AI deployment, a key step is aggregating data from various sources, such as customer websites, debt records, and asset information. Apache Spark plays a vital role in this process, efficiently handling continuous streams of data. Its capability enables seamless gathering and feeding of diverse data into the system, facilitating effective processing and analysis for generating alerts and insights, particularly in scenarios like banking.

View full review »

SP

Sumanth Punyamurthula

Director - Data Management, Governance and Quality at Hilton Worldwide

Ingesting billions of rows of data all day.

View full review »

Salvatore Campana - PeerSpot reviewer

Salvatore Campana

CEO & Founder at XAUTOMATA TECHNOLOGY GmbH

I use Spark to run automation processes driven by data.

View full review »

Onur Tokat - PeerSpot reviewer

Onur Tokat

Big Data Engineer Consultant at Collective[i]

I mainly use Spark to prepare data for processing because it has APIs for data evaluation.

View full review »

RV

Rajendran Veerappan

Director at Nihil Solutions

When we receive data from the messaging queue, we process everything using Apache Spark. Data Bricks does the processing and sends back everything the Apache file in the data lake. The machine learning program does some kind of analysis using the ML prediction algorithm.

View full review »

SA

reviewer879201

Technical Consultant at a tech services company with 1-10 employees

We are working with a client that has a wide variety of data residing in other structured databases, as well. The idea is to make a database in Hadoop first, which we are in the process of building right now. One place for all kinds of data. Then we are going to use Spark.

View full review »

NK

NitinKumar

Director of Enginnering at Sigmoid

I use it mostly for ETL transformations and data processing. I have used Spark on-premises as well as on the cloud.

View full review »

PE

reviewer1792824

Senior Test Automation Consultant / Architect at a tech services company with 11-50 employees

We are using it for big data. We are using a small part of it, which is related to using data.

View full review »

GA

reviewer1535340

Senior Solutions Architect at a retailer with 10,001+ employees

We use Apache Spark to prepare data for transformation and encryption, depending on the columns. We use AES-256 encryption. We're building a proof of concept at the moment. We prepare patches on Spark for Kubernetes on-premise and Google Cloud Platform.

View full review »

it_user946074 - PeerSpot reviewer

it_user946074

Principal Architect at a financial services firm with 1,001-5,000 employees

We use the solution for analytics.

View full review »

AR

reviewer1185906

Manager - Data Science Competency at a tech services company with 201-500 employees

My main task is working on predictive analytics, and Apache Spark is one of the tools that I utilize in this role. Primarily, we work with the predictive analysis of very large amounts of data.

Apache Spark is also helpful for data pre-processing, including data cleaning.

This solution is cloud-agnostic. You can use it with an EC2 instance and you can even install it on-premises. Some environments have it installed in VMs.

View full review »

AD

reviewer1046250

Senior Consultant & Training at a tech services company with 51-200 employees

We use this solution for information gathering and processing.

I use it myself when I am developing on my laptop.

I am currently using an on-premises deployment model. However, in a few weeks, I will be using the EMR version on the cloud.

View full review »

SK

reviewer1904019

Chief Technology Officer at a tech services company with 11-50 employees

I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.

View full review »

it_user1059558 - PeerSpot reviewer

it_user1059558

Portfolio Manager, Enterprise Solutions Architect at Capgemini

Streaming telematics data.

View full review »

KK

KamleshKhollam

Managing Consultant at a computer software company with 501-1,000 employees

Our use case for Apache Spark was a retail price prediction project. We were using retail pricing data to build predictive models. To start, the prices were analyzed and we created the dataset to be visualized using Tableau. We then used a visualization tool to create dashboards and graphical reports to showcase the predictive modeling data.

Apache Spark was used to host this entire project.

View full review »

MG

Mohamed Ghorbel

Director of BigData Offer at IVIDATA

We primarily use the solution to integrate very large data sets from another environment, such as our SQL environment, and draw purposeful data before checking it. We also use the solution for streaming very very large servers.

View full review »

reviewer894894 - PeerSpot reviewer

reviewer894894

Works at a computer software company with 51-200 employees

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.

View full review »

LC

Snrsecengin567

Snr Security Engineer at a tech vendor with 201-500 employees

We primarily use the solution for security analytics.

View full review »

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.

768,578 professionals have used our research since 2012.