Apache Spark Valuable Features

Ilya Afanasyev - PeerSpot reviewer
Senior Software Development Engineer at Yahoo!

We use batch processing. It works well with our formats and file versions. There's a lot of functionality. 

In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000.

The solution is scalable.

It's a stable product.

View full review »
SurjitChoudhury - PeerSpot reviewer
Data engineer at Cocos pt

Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. 

For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated.

In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.

View full review »
SS
Sr Manager at a transportation company with 10,001+ employees

There is no other platform that can challenge its features. Apart from the restrictions that come with its in-memory implementation.

View full review »
Buyer's Guide
Apache Spark
March 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
765,386 professionals have used our research since 2012.
Suriya Senthilkumar - PeerSpot reviewer
Analyst at Deloitte

The product’s most valuable features are lazy evaluation and workload distribution.

View full review »
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel

One of the reasons we use Spark is so we can use parallelism in data lakes. So in our case, we can get many data nodes, and the main power of Hadoop and big data solutions is the number of nodes usable for different operations. It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance. Also, Spark has an option for near real-time loading and processing. We use micro batches of Spark.

View full review »
Lucas Dreyer - PeerSpot reviewer
Data Engineer at BBD

It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained.

View full review »
VM
Cloud solution architect at 0

What I liked about the solution was its uniqueness. We provided the customer with a solution that hadn't been offered by anyone else before. 

It involved multiple components, such as Spark cluster, CMAX, a backend VM, and a Linux VM for mapping the service processes to the backend, which is running on-premises where the Kafka service was running. 

It was challenging for people to understand how to send traffic through the private link between all these services. Ensuring the traffic was sent to the correct destination with the correct source header without any operation issues was complex, but we achieved it.

We had multiple instances of fault tolerance and scalability.  

View full review »
AmitMataghare - PeerSpot reviewer
Associate Director at a consultancy with 10,001+ employees

One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast.

View full review »
Atif Tariq - PeerSpot reviewer
Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East

The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast.

View full review »
Lokesh Jayanna - PeerSpot reviewer
Vice President at Goldman Sachs at a computer software company with 10,001+ employees

The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it. It is a useful feature for us.

View full review »
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis

The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily.

View full review »
ML
Information Technology Business Analyst at a aerospace/defense firm with 10,001+ employees

We use it as an ETL tool to gather information from different systems. The product is useful for analytics.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

We use Spark to process data from different data sources. 

View full review »
Oscar Estorach - PeerSpot reviewer
Chief Data-strategist and Director at Theworkshop.es

Overall, it's a very nice tool.

It is great for transforming data and doing micro-streamings or micro-batching.

The product offers an open-source version.

The solution has been very stable.

The scalability is good.

Apache Spark is a huge tool. It has many use cases and is very flexible. You can use it with so many other platforms. 

Spark, as a tool, is easy to work with as you can work with Python, Scala, and Java.

View full review »
Armando Becerril - PeerSpot reviewer
Partner / Head of Data & Analytics at Kueski

Apache provides a lot of good documentation compared to other solutions. 

View full review »
KK
Software Architect at Akbank

AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI. 

View full review »
SB
CTO at Hammerknife Technologies d.o.o.

Apache Spark provides a very high-quality implementation of distributed data processing. I rate it 20 on a scale of one to ten.

View full review »
MA
PLC Programmer at Alzero

The solution, as a package, excels across the board. I appreciate everything, not just one or two specific features.


View full review »
Jagannadha Rao - PeerSpot reviewer
Lead Data Scientist at International School of Engineering

The most valuable feature of Apache Spark is its flexibility.

View full review »
FK
Data Engineer at Berief Food GmbH

The data processing framework is good. The product is very useful.

View full review »
JK
Quantitative Developer at a marketing services firm with 11-50 employees

The distribution of tasks, like the seamless map-reduce functionality, is quite impressive. For the user, it appears as simple single-line data manipulations, but behind the scenes, the executor pool intelligently distributes the map and reduce functions.

View full review »
Mahdi Sharifmousavi - PeerSpot reviewer
Lecturer at Amirkabir University of Technology

This solution provides a clear and convenient syntax for our analytical tasks.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

Apache Spark can do large volume interactive data analysis.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

We use all the features. We use it for end-to-end. All of our data analysis and execution happens through Spark.

The features we find most valuable are the: 

  • Machine learning
  • Data learning
  • Spark Analytics.
View full review »
NB
CEO International Business at a tech services company with 1,001-5,000 employees

The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations.

View full review »
SP
Director - Data Management, Governance and Quality at Hilton Worldwide

Powerful language.

View full review »
it_user365304 - PeerSpot reviewer
Software Consultant at a tech services company with 10,001+ employees

The most important feature of Apache Spark is that it provides large scale data processing with negligible latency at the cost of commodity hardwares. Spark framework is just a blessings over Hadoop, as the later does not allow fast processing of data, which is accomplished by the in-memory data processing of Spark.

View full review »
Salvatore Campana - PeerSpot reviewer
CEO & Founder at XAUTOMATA TECHNOLOGY GmbH

The most valuable feature is the grid computing.

View full review »
it_user371832 - PeerSpot reviewer
Chief System Architect at a marketing services firm with 501-1,000 employees

With spark SQL we've now the capabilities to analyse very large quantities of data located in S3 on Amazon at very low cost comparing other solution we checked. 

We also use our own Spark cluster to aggregate data on near real time and save the result on MySQL database. 

We've started new projects using the machine learning library ML.

View full review »
Onur Tokat - PeerSpot reviewer
Big Data Engineer Consultant at Collective[i]

The most valuable feature is that Spark uses Scala, which has good data evaluation functions. Spark also supports good distribution on the clusters and provides optimization on the APIs.

View full review »
RV
Director at Nihil Solutions

The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.

View full review »
SA
Technical Consultant at a tech services company with 1-10 employees

I have worked with Hadoop a lot in my career and you need to do a lot of things to get it to Hello World. But in Spark it is easy. You could say it's an umbrella to do everything under the one shelf. It also has Spark Streaming. I feel the streaming is its best feature because I have extracted to enter data and analysis within Spark Stream.

View full review »
it_user786777 - PeerSpot reviewer
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees

Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.

View full review »
NK
Director of Enginnering at Sigmoid

Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica.

Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.

View full review »
PE
Senior Test Automation Consultant / Architect at a tech services company with 11-50 employees

It is useful for handling large amounts of data. It is very useful for scientific purposes.

View full review »
GA
Senior Solutions Architect at a retailer with 10,001+ employees

I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library.

View full review »
it_user946074 - PeerSpot reviewer
Principal Architect at a financial services firm with 1,001-5,000 employees

The fast performance is the most valuable aspect of the solution.

View full review »
AR
Manager - Data Science Competency at a tech services company with 201-500 employees

One of the key features is that Apache Spark is a distributed computing framework. You can have multiple slaves and distribute the workload between them.

Another feature is memory-based computing. This is unlike Hadoop, which relies on storage. As it uses in-memory data processing, Spark is very fast.

View full review »
it_user372393 - PeerSpot reviewer
Big Data Consultant at a tech services company with 501-1,000 employees

The good performance. The nice graphical management console. The long list of ML algorithms.

View full review »
it_user74256 - PeerSpot reviewer
Engineer at a tech vendor with 10,001+ employees

Streaming data processing

View full review »
it_user746943 - PeerSpot reviewer
Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees

DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort.

View full review »
it_user373173 - PeerSpot reviewer
Lead Big Data Engineer at a non-profit with 51-200 employees

Spark is relatively easy to deploy, with rich features in handling big data. Spark Core, Spark SQL, Spark MLlib are used mostly in our applications.

View full review »
it_user371334 - PeerSpot reviewer
CEO at a tech consulting company with 51-200 employees

There are several valuable features.

  • Interactive data access (low latency)
  • Batch ETL-style processing
  • Schema-free data models
  • Algorithms
View full review »
AD
Senior Consultant & Training at a tech services company with 51-200 employees

The most valuable feature of this solution is its capacity for processing large amounts of data.

This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.

View full review »
it_user326142 - PeerSpot reviewer
Architect at a healthcare company with 51-200 employees

ETL and streaming capabilities.

View full review »
it_user374040 - PeerSpot reviewer
Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees

Spark Streaming, which allows you to construct event-driven information systems and respond to the events in near-real time.

View full review »
SK
Chief Technology Officer at a tech services company with 11-50 employees

The most valuable feature of Apache Spark is its ease of use.

View full review »
it_user1059558 - PeerSpot reviewer
Portfolio Manager, Enterprise Solutions Architect at Capgemini

It supports streaming and micro-batch.

View full review »
it_user374028 - PeerSpot reviewer
Core Engine Engineer at a computer software company with 51-200 employees
  • RDDs
  • DataFrames
  • Machine learning libraries
View full review »
KK
Managing Consultant at a computer software company with 501-1,000 employees

The most valuable features are the storage engine, the memory engine, and the processing engine.

View full review »
MG
Director of BigData Offer at IVIDATA

It is a very fast solution. It's very easy to use. There are many RPis with many languages like Scala, Java, R, and Python. The greatest advantage of Spark is that we can initiate many kinds of analytics including SQL analytics, graphics analytics, etc. 

View full review »
reviewer894894 - PeerSpot reviewer
Works at a computer software company with 51-200 employees

Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.

View full review »
it_user746673 - PeerSpot reviewer
Sr. Software Engineer at a tech vendor with 1-10 employees

The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a distributed fashion is quite good.

View full review »
it_user371325 - PeerSpot reviewer
Data Scientist at a tech vendor with 10,001+ employees

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

View full review »
it_user365301 - PeerSpot reviewer
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees

\Spark Streaming, Spark SQL and MLib in that order.

View full review »
LC
Snr Security Engineer at a tech vendor with 201-500 employees

The scalability has been the most valuable aspect of the solution.

View full review »
it_user1223676 - PeerSpot reviewer
Lead Consultant at a tech services company with 51-200 employees

The main feature that we find valuable is that it is very fast. In terms of big data, the main feature is that the data is in so many different nodes. It goes through many data nodes so whenever we use the data, it enables us to parse the data from different data nodes. 

View full review »
Buyer's Guide
Apache Spark
March 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
765,386 professionals have used our research since 2012.