Apache Spark Other Advice

Ilya Afanasyev - PeerSpot reviewer
Senior Software Development Engineer at Yahoo!

I can recommend the product. It's a nice system for batch processing huge data.

I'd rate the solution eight out of ten. 

View full review »
SurjitChoudhury - PeerSpot reviewer
Data engineer at Cocos pt

Overall, I would rate the solution a nine out of ten. 

I would recommend this tool to someone considering it for scalable data processing.

Nowadays, Apache Spark is on the market, and most organizations are using it. There are people with more experience and knowledge than me, and they're confident about this tool. 

That's why it's become a solution for organizations. It's not a one-man decision but rather a group or community effort.

View full review »
SS
Sr Manager at a transportation company with 10,001+ employees

If your use case involves real-time applications frequently changing columns or data frames, then Spark is a fantastic option for you. 

However, if you have a batch process and don't have a structural data analysis, I would suggest avoiding it. The high cost of cloud infrastructure combined with Apache Spark can be a significant burden in such scenarios.

Overall, I would rate the solution a nine out of ten. 

View full review »
Buyer's Guide
Apache Spark
April 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,236 professionals have used our research since 2012.
Suriya Senthilkumar - PeerSpot reviewer
Analyst at Deloitte

Apache Spark is a good product for processing large volumes of data compared to other distributed systems. It provides efficient integration with Hadoop and other platforms.

I rate it a ten out of ten.

View full review »
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel

Spark was written in Scala. Scala is a programming language fundamentally in Java and useful for data lakes.

We thought about using Flink instead, but it wasn't useful for us and wouldn't gain any additional value. Besides, Spark's community is much wider, so information is available and is better than Flink's.

I rate Apache Spark an eight out of ten.

If you plan to implement Apache Spark on a large-scale system, you should learn to use parallelism, partitioning, and everything from the physical level to get the best performance from Spark. And it will be good to know Python, especially for data scientists using PySpark for analysis. Likewise, it's good to know Scala because you can be very efficient in preparing some datasets since it is Spark's native language.

View full review »
Hamid M. Hamid - PeerSpot reviewer
Data architect at Banking Sector

The tool is used for real-time data analytics as it is very powerful and reliable. The code that you write with Apache Spark provides stability. There are many bugs that can appear according to the code that you use, which could be Java or Scala. So this is amazing. Apache Spark is very reliable, powerful, and fast as an engine. When compared with another competitor like MapReduce, Apache Spark performs 100 times better than MapReduce.

The monitoring part of the product is good.

The product offers clusters that are resilient and can run into multiple nodes.

The tool can run with multiple clusters.

The integration capabilities of the product with other platforms to improve our company's workflow are good.

In terms of the improvements in the product in the data analysis area, new libraries have been launched to support AI and machine learning.

My company is able to process huge datasets with Apache Spark. There is a huge value added to the organization because of the tool's ability to process huge datasets.

I rate the overall solution a nine out of ten.

View full review »
Lucas Dreyer - PeerSpot reviewer
Data Engineer at BBD

Additional skill requirements are crucial to use the solution and its related features effectively. Training costs and efforts may be necessary to ensure individuals are proficient in using these technologies. Overall, I would rate it nine out of ten.

View full review »
Atal Upadhyay - PeerSpot reviewer
AVP at MIDDAY INFOMEDIA LIMITED

Given our extensive experience with it and its ability to meet all our requirements over time, I highly recommend it. Overall, I would rate it nine out of ten.

View full review »
Anshuman Kishore - PeerSpot reviewer
Director Product Development at Mycom Osi

The tool offers functionality that helps my company deal with data processing in projects on a near real-time basis.

The impact of in-memory processing capabilities on the improvement of computational efficiency is one of the reasons why my company chose Apache Spark.

At the moment, my company plans to explore data analysis with Apache Spark. My company primarily used the product for data processing and not for data analysis.

If you buy the product with the capabilities of Azure DevOps and use the tool's dashboard, you find the solution to be good. The tool has an in-built UI and other good capabilities.

I feel that the product is fine and easy to use for those who plan to use it in the future. I recommended the tool to others based on the performance and scalability features it offers.

I managed data partitioning and distribution with Apache Spark once in my company.

The benefits of the use of the product revolve around the fact that it was easy to get the data processing done in a very quick and fastest possible way with the help of its n-memory processing and performance.

I rate the solution an eight and a half to nine out of ten.

View full review »
VM
Cloud solution architect at 0

My advice is to thoroughly understand your own needs and environment before making a decision. Recommendations should be based on product features, quality, accuracy, and stability. 

Cost is also a factor, but it should not be the only consideration. Depending on whether the priority is performance and scalability or cost-effectiveness, I would suggest a solution that best meets those needs, whether it's a managed service or a more cost-conscious option.

I would rate Spark as ten out of ten. I haven't had any issues with Spark in my experience.

View full review »
AmitMataghare - PeerSpot reviewer
Associate Director at a consultancy with 10,001+ employees

I rate Apache Spark an eight out of ten.

View full review »
Atif Tariq - PeerSpot reviewer
Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East

I would recommend Apache Spark to users doing analytics, data computation, or pipelines.

Overall, I rate Apache Spark ten out of ten.

View full review »
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis

If you're new to Apache Spark, the best way to learn is by using the Databricks Community Edition. It provides a cluster for Apache Spark where you can learn and test. I rate the product an eight out of ten.

View full review »
ML
Information Technology Business Analyst at a aerospace/defense firm with 10,001+ employees

I would recommend the product. I think it's a good solution for analytics. Overall, I rate the product an eight out of ten.

View full review »
Oscar Estorach - PeerSpot reviewer
Chief Data-strategist and Director at Theworkshop.es

I have the solution installed on my computer and on our servers. You can use it on-premises or as a SaaS.

I'd rate the solution at a nine out of ten. I've been very pleased with its capabilities. 

I would recommend the solution for the people who need to deploy projects with streaming. If you have many different sources or different types of data, and you need to put everything in the same place - like a data lake - Spark, at this moment, has the right tools. It's an important solution for data science, for data detectors. You can put all of the information in one place with Spark.

View full review »
Armando Becerril - PeerSpot reviewer
Partner / Head of Data & Analytics at Kueski

This is a good solution for big data use cases and I rate it eight out of 10. 

View full review »
KK
Software Architect at Akbank

I would advise planning well before implementing this solution. In enterprise corporations like ours, there are a lot of policies. You should first find out your needs, and after that, you or your team should set it up based on your needs. If your needs change during development because of the business requirements, it will be very difficult. 

If you are clear about your needs, it is easier to set it up. If you know how Spark is used in your project, you have to define firewall rules and cluster needs. When you set up Spark, it should be ready for people's usage, especially for remote job execution. 

I would rate Apache Spark a nine out of ten.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

I rate the overall solution a ten out of ten. 

View full review »
SB
CTO at Hammerknife

I recommend Apache Spark for batch analytics features.

View full review »
MA
PLC Programmer at Alzero

I recommend using the solution. Overall, I rate the solution a perfect ten.


View full review »
Lokesh Jayanna - PeerSpot reviewer
Vice President at Goldman Sachs at a computer software company with 10,001+ employees

I advise others to analyze data and understand your business requirements before purchasing the product. I rate it an eight out of ten.

View full review »
Jagannadha Rao - PeerSpot reviewer
Lead Data Scientist at International School of Engineering

I would recommend Apache Spark to other users.

Overall, I rate Apache Spark an eight out of ten.

View full review »
FK
Data Engineer at Berief Food GmbH

Overall, I rate the product more than eight out of ten.

View full review »
JK
Quantitative Developer at a marketing services firm with 11-50 employees

I would recommend understanding the use case better. Only if it fits your use case, then go for it. But it is a great tool.

Overall, I would rate Apache Spark an eight out of ten. 

View full review »
Mahdi Sharifmousavi - PeerSpot reviewer
Lecturer at Amirkabir University of Technology

I would rate this solution a nine out of ten.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

We are well versed in Spark, the version, the internal structure of Spark, and we know what exactly Spark is doing. 

The solution cannot be easier. Everything cannot be made simpler because it involves core data, computer science, pro-engineering, and not many people are actually aware of it.

I rate Apache Spark a six out of ten.

View full review »
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies

I would rate it a nine out of ten. 

View full review »
NB
CEO International Business at a tech services company with 1,001-5,000 employees

I would give it a rating of seven out of ten, which, by my standards, is quite high.

View full review »
it_user365304 - PeerSpot reviewer
Software Consultant at a tech services company with 10,001+ employees

My advice to others would be just to use Apache Spark for large scale data processing, as it provides good performance at low cost, unlike Ab-Initio or Informatica. But the main problem is, now in the market, there are not many people certified in Apache Spark.

View full review »
Salvatore Campana - PeerSpot reviewer
CEO & Founder at XAUTOMATA TECHNOLOGY GmbH

I would rate Apache Spark eight out of ten.

View full review »
Onur Tokat - PeerSpot reviewer
Big Data Engineer Consultant at Collective[i]

Spark can handle small to huge data and is suitable for any size of company. I would rate Spark as eight out of ten. 

View full review »
RV
Director at Nihil Solutions

We're customers and also partners with Apache.

While we are on version 2.6, we are considering upgrading to version 3.0.

I'd rate the solution nine out of ten. It works very well for us and suits our purposes almost perfectly.

View full review »
SA
Technical Consultant at a tech services company with 1-10 employees

On a scale of 1 to 10, I'd put it at an eight.

To make it a perfect 10 I'd like to see an improved configuration bot. Sometimes it is a nightmare on Linux trying to figure out what happened on the configuration and back-end. So I think installation and configuration with some other tools. We are technical people, we could figure it out, but if aspects like that were improved then other people who are less technical would use it and it would be more adaptable to the end-user.

View full review »
NK
Director of Enginnering at Sigmoid

I would definitely recommend Spark. It is a great product. I like Spark a lot, and most of the features have been quite good. Its initial learning curve is a bit high, but as you learn it, it becomes very easy.

I would rate Apache Spark an eight out of ten.

View full review »
PE
Senior Test Automation Consultant / Architect at a tech services company with 11-50 employees

I would advise not using it if you don't have experienced users inside your organization. If you have to figure it all out on your own, then you shouldn't start with it.

Overall, I would rate it a six out of 10. For a commercial use case, it is a six out of 10. For scientific purposes, it is an eight out of 10.

View full review »
GA
Senior Solutions Architect at a retailer with 10,001+ employees

I would recommend Apache Spark to new users, but it depends on the use case. Sometimes, it's not the best solution.

On a scale from one to ten, I would give Apache Spark a ten.

View full review »
it_user946074 - PeerSpot reviewer
Principal Architect at a financial services firm with 1,001-5,000 employees

I would recommend the solution. I would rate it an eight or nine out of 10.

For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go ahead. It doesn't work for me as I have different clients and different engagements.

View full review »
AR
Manager - Data Science Competency at a tech services company with 201-500 employees

We are not using the current version of this platform, Spark 3. However, we do know that it is used in the market and it has new features. We will eventually move to it.

My advice for anybody who wants to use Apache Spark is that they have two options. The first is Databricks, which are the creators of Apache Spark, and use their proprietary version. If you choose this option then you will have to pay for the product.

If instead, you use Apache Spark, then you can rely on your own expert in-house team for support, maintenance, and deployment. In this option, you don't have to pay anything to anybody outside of your company.

I would rate this solution an eight out of ten.

View full review »
it_user74256 - PeerSpot reviewer
Engineer at a tech vendor with 10,001+ employees

I love Spark over other solutions.

View full review »
it_user746943 - PeerSpot reviewer
Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees

Spark gives the flexibility for developing custom applications.

View full review »
it_user373173 - PeerSpot reviewer
Lead Big Data Engineer at a non-profit with 51-200 employees

Get to know how Spark works, what are job, stage, task, DAG, etc., and it will help you to write Spark application.

View full review »
it_user371334 - PeerSpot reviewer
CEO at a tech consulting company with 51-200 employees

Be sure to Uuse the Apache versions and avoid vendor-specific extensions.

View full review »
AD
Senior Consultant & Training at a tech services company with 51-200 employees

The work that we are doing with this solution is quite common and is very easy to do.

My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things.

I would rate this solution a nine out of ten.

View full review »
it_user326142 - PeerSpot reviewer
Architect at a healthcare company with 51-200 employees
it_user374040 - PeerSpot reviewer
Systems Engineering Lead, Mid-Atlantic at a tech company with 10,001+ employees

I also suggest having a Chief Technologist who has extensive experience in architecting several Big Data solutions. They should be able to communicate in business as well as technology language. Their expertise should range from infrastructure to application development and have command of Hadoop technologies.

View full review »
SK
Chief Technology Officer at a tech services company with 11-50 employees

I rate Apache Spark an eight out of ten.

View full review »
it_user374028 - PeerSpot reviewer
Core Engine Engineer at a computer software company with 51-200 employees

It's easy to use and has a learning curve.

View full review »
KK
Managing Consultant at a computer software company with 501-1,000 employees

I would rate this solution an eight out of ten.

View full review »
MG
Director of BigData Offer at IVIDATA

We use both on-premises and public and private cloud deployment models. We're partners with Databricks.

I'm a consultant. Our company works for large enterprises such as banks and energy companies. 17 of our workers use Apache Spark.

With the cloud, there are many companies that integrate Spark. Most projects in big data around the world use Spark, indirectly or directly. 

I'd rate the solution eight out of ten.

View full review »
it_user746673 - PeerSpot reviewer
Sr. Software Engineer at a tech vendor with 1-10 employees

This is a very good product for the big data analytics and integrates well with other parts like Machine Learning and graph analytics.

View full review »
it_user371325 - PeerSpot reviewer
Data Scientist at a tech vendor with 10,001+ employees

Learn Scala as this will greatly reduce the pain in starting off with Spark.

View full review »
it_user365301 - PeerSpot reviewer
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees

Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.

View full review »
LC
Snr Security Engineer at a tech vendor with 201-500 employees

I would rate this solution eight out of 10. 

View full review »
it_user1223676 - PeerSpot reviewer
Lead Consultant at a tech services company with 51-200 employees

The advice that I would give to someone considering this solution is that the quality of data has key streaming capabilities like velocity. This means how quickly you are going to refer to the data. These things matter by designing the solution. We need to take these things out. 

I would rate Apache Spark an eight out of ten. 

To make it a ten they should improve the speed. The data storage capacity means we can inject somewhere in the user database in more efficient ways.

View full review »
Buyer's Guide
Apache Spark
April 2024
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,236 professionals have used our research since 2012.