We just raised a $30M Series A: Read our story
2020-08-18T04:51:00Z

What Data Science Platform is best suited to a large-scale enterprise?

125

Hello community members,

There are many Data Science Platforms available. Which platform would you recommend that can handle large amounts of data? Why?

ITCS user
Guest
916 Answers

author avatar
User

DakaIku is a great general purpose data science platform for both supervised and unsupervised learning. It handles Big Data very well.

2020-10-15T16:18:55Z
author avatarRony_Sklar
Community Manager

@Ziad Chaudhry thanks for your input :)

author avatarAnastasia Ant
User

@Ziad Chaudhry I'd also vote for Dataiku, look at their cases https://www.dataiku.com/storie...

author avatar
Top 5Real User

Sparkcognition's Darwin product can handle very large data sets. 

2020-08-18T12:47:29Z
author avatarRussell Rothstein
Community Manager

@AaronCooke ​did you compare with any other solutions? What are the other alternatives for large data sets? BTW, thank you for sharing your review of Darwin with the community!

author avatarRony_Sklar
Community Manager

Thanks for your input @AaronCooke ​:) 

author avatar
Top 5Vendor

Data science platform is a vague term.  


It all depends on what you wish to accomplish. Are you talking about fast databases, ETLs, a Machine Learning tool, integration with R or Python, Self-Service Data Visualization Tool, Collaboration? No size fits all...

2021-08-26T12:58:55Z
author avatar
User

Dataiku, Domino, RapidMiner are notable candidates for your purpose, I presume. 


It has been 2 years when I checked several vendors and made the list as candidates. They all support large-scale data manipulation for data analysis and machine learning development as a platform that can be used by many people in a collaborative way.

2021-08-26T03:42:21Z
author avatar
Real User

I suspect that I cannot answer this. I have used Knime and RapidMiner with data sets that have had up to about 80,000 rows and 1,500 columns and both have performed well. However, I doubt whether the questioner would classify my usage as "large amounts of data". If my usage is like theirs, then both packages can be recommended.


Both Knime and RapidMiner offer the facility to link with Python or R, and those languages have modules or methods which offer better performance on large data sets (multi-processing or using GPUs, etc.), so those combinations might serve their purpose. So, they might use, say, Knime for ease of use and, say, R for the excess power or RapidMiner and Python.

2021-08-24T10:48:49Z
author avatar
User

If you want to handle computer vision data, I recommend the Superb AI Suite. 
https://www.superb-ai.com/

2020-09-09T22:37:17Z
author avatar
User

The question also needs to specify which domain, what kind of data and public or private platforms. 


For structured/tabular data driverless AI / H20.ai sparkling water is my preferred platform. 

2020-08-18T17:51:00Z
author avatarRony_Sklar
Community Manager

@Yogesh PARTE ​Good point - this is a more general question, but I do agree that it's easier to make recommendations with more details. Would you mind sharing more about why H20.ai Sparkling Water is your preferred choice in this instance?

author avatar
Top 5Real User

My experience has not been on large scale systems. Not even  multi-terabytes. My mult-megabytes would not help. Sorry!

2020-08-18T16:24:46Z
author avatar
Top 5LeaderboardReal User

IBM SPSS Modeler

2020-08-18T10:14:53Z
author avatarRussell Rothstein
Community Manager

@EzzAbdelfattah ​why do you recommend IBM SPSS Modeler? 

author avatarWalisonAbreu
Real User

@EzzAbdelfattah IMHO it's pretty much limited and outdated to handle with the latest frameworks features,

Find out what your peers are saying about Alteryx, Databricks, Knime and others in Data Science Platforms. Updated: November 2021.
552,407 professionals have used our research since 2012.