Apache Spark Benefits

Atal Upadhyay - PeerSpot reviewer

Atal Upadhyay

AVP at MIDDAY INFOMEDIA LIMITED

Our experience with using Spark for machine learning and big data analytics allows us to consume data from any data source, including freely available data. The processing power of Spark is remarkable, making it our top choice for file-processing tasks.

Utilizing Apache Spark's in-memory processing capabilities significantly enhances our computational efficiency. Unlike with Oracle, where customization is limited, we can tailor Spark to our needs. This allows us to pull data, perform tests, and save processing power. We maintain a historical record by loading intermediate results and retrieving data from previous iterations, ensuring our applications operate seamlessly. With Spark, we parallelize our operations, efficiently accessing both historical and real-time data.

We utilize Apache Spark for our data analysis tasks. Our data processing pipeline starts with receiving data in the RAV format. We employ a data factory to create pipelines for data processing. This ensures that the data is prepared and made ready for various purposes, such as supporting applications or analysis.

There are instances where we perform data cleansing operations and manage the database, including indexing. We've implemented automated tasks to analyze data and optimize performance, focusing specifically on database operations. These efforts are independent of the Spark platform but contribute to enhancing overall performance.

View full review »

VM

Vineeth Marar

Cloud solution architect at 0

We've set up a Spark cluster running in Azure to process real-time data. This setup involves connecting Azure applications to the Spark cluster via Azure Private Link, ensuring secure data flow.

The architecture required detailed network design, including routing through Linux firewalls and ensuring data could be securely transmitted to and from on-premises servers.

While I was heavily involved in the network design aspect, the Spark cluster was primarily used for processing and analyzing data streams for various applications.

Moreover, from my experience, I haven't encountered significant challenges with integrations involving Spark. The crucial factor is having established connectivity.

Whether Spark is operating in Azure or on-premises doesn't significantly affect our operations, thanks to high-bandwidth solutions like ExpressRoute. The main consideration then becomes the cost. As long as we maintain performance standards, I don't see any issues, regardless of the deployment environment.

Ensuring the collection of relevant metrics and logs is critical for assessing performance improvements. The specifics of how these are collected or which tools are used might vary, but the goal is to gather comprehensive data for ongoing monitoring and improvement.

View full review »

KK

Kürşat Kurt

Software Architect at Akbank

Aggregations are very fast in our project since we started to use Spark. We can tell results in around 300 milliseconds. Before using Spark, the time was around 700 milliseconds.

Before using Spark, we only used Couchbase. We needed fast results for this project because transactions come from various channels, and we need to decide and resolve them at the earliest because users are performing the transaction. If our result or process takes longer, users might stop or cancel their transactions, which means losing money. Therefore, fast results time is very important for us.

View full review »

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.

768,578 professionals have used our research since 2012.

JK

reviewer2208003

Quantitative Developer at a marketing services firm with 11-50 employees

I have an example. We had a single-threaded application that used to run for about four to five hours, but with Spark, it got reduced to under one hour.

View full review »

Suresh_Srinivasan - PeerSpot reviewer

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

We are using Apache Spark, for large volume interactive data analysis.

MechBot is an enterprise, one-click installation, trusted data excellence platform. Underneath, I am using Apache Spark, Kafka, Hadoop HDFS, and Elasticsearch.

View full review »

SP

Sumanth Punyamurthula

Director - Data Management, Governance and Quality at Hilton Worldwide

Spark on AWS is not that cost-effective as memory is expensive and you cannot customize hardware in AWS. If you want more memory, you have to pay for more CPUs too in AWS.

View full review »

it_user365304 - PeerSpot reviewer

it_user365304

Software Consultant at a tech services company with 10,001+ employees

Apache Spark is a framework, which allows one organization to perform business & data analytics, at a very low cost, as compared to Ab-Initio or Informatica. Thus, by using Apache Spark in place of those tools, one organization can achieve huge reduction in cost, & without compromising with any data security & other data related issues, if controlled by an expert Scala programmer & Apache Spark does not bear the overheads of Hadoop of having high latency. All these points, by which my organization is being benefitted as well.

View full review »

Salvatore Campana - PeerSpot reviewer

Salvatore Campana

CEO & Founder at XAUTOMATA TECHNOLOGY GmbH

Apache Spark helped us with horizontal scalability and cost optimizations.

View full review »

it_user371832 - PeerSpot reviewer

it_user371832

Chief System Architect at a marketing services firm with 501-1,000 employees

Until Spark we didn't have the ability to analyse this quantity of data we're talking about two TB/hour. So we're now able to produce a lot of reports, and are also able to develop machine learning based analysis to optimize our business.

We've central access to every piece of data in the company including finance, business, debug etc. and the ability to join all this data together.

View full review »

it_user786777 - PeerSpot reviewer

it_user786777

Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees

Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions.

View full review »

NK

NitinKumar

Director of Enginnering at Sigmoid

Spark has been at the forefront of data processing engine. I have used Apache Spark for multiple projects for different clients. It is an excellent tool to process massive amount of data.

View full review »

it_user946074 - PeerSpot reviewer

it_user946074

Principal Architect at a financial services firm with 1,001-5,000 employees

I'm not sure how it has improved my organization but I believe that it's a good product.

View full review »

it_user372393 - PeerSpot reviewer

it_user372393

Big Data Consultant at a tech services company with 501-1,000 employees

We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past.

View full review »

it_user74256 - PeerSpot reviewer

it_user74256

Engineer at a tech vendor with 10,001+ employees

In the previous version, we use Storm to handle real-time data, however its performance doesn't meet the requirement. Spark Streaming's micro-batch mode helps improving performance. Also, Spark provides lots of high-level APIs, which reduces duplication of work.

View full review »

it_user746943 - PeerSpot reviewer

it_user746943

Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees

We developed a tool for data ingestion from HDFS->Raw->L1 layer with data quality checks, putting data to elastic search, performing CDC.

View full review »

it_user373173 - PeerSpot reviewer

it_user373173

Lead Big Data Engineer at a non-profit with 51-200 employees

I use Spark to process large amount of data in the energy industry.

View full review »

it_user371334 - PeerSpot reviewer

it_user371334

CEO at a tech consulting company with 51-200 employees

We have 1000x improvement in performance over other techniques. It's enabled interactive self-service access to data.

View full review »

it_user326142 - PeerSpot reviewer

it_user326142

Architect at a healthcare company with 51-200 employees

Made Big Data processing more convenient and a uniform framework adds to efficiency of usage since the same framework can be used for batch and stream processing.

View full review »

it_user374040 - PeerSpot reviewer

it_user374040

Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees

Apache Spark’s ability to perform batch processing at one second or less intervals is the most transformative and less pervasive for any data processing application. The ingested data can also be validated and verified for quality early in the data pipeline.

View full review »

it_user1059558 - PeerSpot reviewer

it_user1059558

Portfolio Manager, Enterprise Solutions Architect at Capgemini

It's a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL.

View full review »

it_user374028 - PeerSpot reviewer

it_user374028

Core Engine Engineer at a computer software company with 51-200 employees

Faster time to parse and compute data. It makes web-based queries for plotting data easier.

View full review »

KK

KamleshKhollam

Managing Consultant at a computer software company with 501-1,000 employees

The processing time is very much improved over the data warehouse solution that we were using.

View full review »

reviewer894894 - PeerSpot reviewer

reviewer894894

Works at a computer software company with 51-200 employees

It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes.

View full review »

it_user746673 - PeerSpot reviewer

it_user746673

Sr. Software Engineer at a tech vendor with 1-10 employees

Previously we were using Hadoop MapReduce to reduce the Google Ngrams (3TB), which took us approximately five days on our cluster. After using Spark, we were able to accomplish this task within hours.

View full review »

it_user371325 - PeerSpot reviewer

it_user371325

Data Scientist at a tech vendor with 10,001+ employees

We're able to perform data discovery on large datasets without too much difficulty.

View full review »

it_user365301 - PeerSpot reviewer

it_user365301

Software Developer (Product Engineering) at a computer software company with 501-1,000 employees

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

View full review »

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.

768,578 professionals have used our research since 2012.