What is our primary use case?
We have built a product called "NetBot." We take any form of data, large email data, image, videos or transactional data and we transform unstructured textual data videos in their structured form into reading into transactional data and we create an enterprise-wide smart data grid. That smart data grid is being used by the downstream analytics tool. We also provide machine-building for people to get faster insight into their data.
What is most valuable?
We use all the features. We use it for end-to-end. All of our data analysis and execution happens through Spark.
The features we find most valuable are the:
- Machine learning
- Data learning
- Spark Analytics.
What needs improvement?
We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data.
For how long have I used the solution?
I have been using Apache Spark for more than five years.
What do I think about the stability of the solution?
We haven't had any issues with stability so far.
What do I think about the scalability of the solution?
As long as you do it correctly, it is scalable.
Our users mostly consist of data analysts, engineers, data scientists, and DB admins.
Which solution did I use previously and why did I switch?
Before using this solution we used Apache Storm.
How was the initial setup?
The initial setup is complex.
What about the implementation team?
We installed it ourselves.
What other advice do I have?
I would rate it a nine out of ten.
Which deployment model are you using for this solution?