We just raised a $30M Series A: Read our story
KamleshKhollam
Consultant at Exusia
Consultant
Top 20
Good performance and resource management for hosting our data science platform

Pros and Cons

  • "The processing time is very much improved over the data warehouse solution that we were using."
  • "I would like to see integration with data science platforms to optimize the processing capability for these tasks."

What is our primary use case?

Our use case for Apache Spark was a retail price prediction project. We were using retail pricing data to build predictive models. To start, the prices were analyzed and we created the dataset to be visualized using Tableau. We then used a visualization tool to create dashboards and graphical reports to showcase the predictive modeling data.

Apache Spark was used to host this entire project.

How has it helped my organization?

The processing time is very much improved over the data warehouse solution that we were using.

What is most valuable?

The most valuable features are the storage engine, the memory engine, and the processing engine.

What needs improvement?

I would like to see integration with data science platforms to optimize the processing capability for these tasks.

For how long have I used the solution?

I have been using Apache Spark for the past year.

How are customer service and technical support?

We have not been in contact with technical support.

What's my experience with pricing, setup cost, and licensing?

The initial setup is straightforward. It took us around one week to set it up, and then the requirements and creation of the project flow and design needed to be done. The design stage took three to four weeks, so in total, it required between four and five weeks to set up.

What other advice do I have?

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
GA
Senior Solutions Architect at a retailer with 10,001+ employees
Real User
A unified analytics engine with a valuable parallel processing feature

Pros and Cons

  • "I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library."
  • "The logging for the observability platform could be better."

What is our primary use case?

We use Apache Spark to prepare data for transformation and encryption, depending on the columns. We use AES-256 encryption. We're building a proof of concept at the moment. We prepare patches on Spark for Kubernetes on-premise and Google Cloud Platform.

What is most valuable?

I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library.

What needs improvement?

The logging for the observability platform could be better.

For how long have I used the solution?

I know about this technology for a long time, maybe for about three years.

Which solution did I use previously and why did I switch?

Because my area is data analytics and analytics solutions, I use BigQuery, SQL, and ETL. I also use Dataproc and DataFlow.

What about the implementation team?

We use an integrator sometimes, but recently we put together a team to support the infrastructural requirements. This is because the proof of concept is self-administered.

What other advice do I have?

I would recommend Apache Spark to new users, but it depends on the use case. Sometimes, it's not the best solution.

On a scale from one to ten, I would give Apache Spark a ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate