2020-01-09T17:10:00Z

Which are the best end-to-end data science platforms?


I have experience working as a senior integration architect for AI/ML enablement for a manufacturing company with 10,000+ employees.  

We are currently evaluating data science platforms. Which vendor offers an end-to-end solution that really works from features management to model deployment? 

Thanks! I appreciate the help.

Guest
1515 Answers

author avatar
Top 5LeaderboardConsultant

There is a lot of vendors that offers their data science platforms, but it depends on of what you call end-to-end vendors and if you write the Word really, well makes me think that you already test many of them. Data science platforms came from a variety of vendors like IBM, SAP, Microsoft, Domino Data labs, RapidMinder among others. First I suggest that you have a person or team ready to test these solutions, if not, remember to prepare some profiles with skills of programming and process design.

My recommendation is if you already work with IBM ask for their Data Science experience. In other case my suggestion is to try RapidMiner that seems to be very useful with a fluid interface for model deployment and could try Sas Enterprise Miner as the top of the model building and model deployment and appears as one of the leaders of these platforms.

I hope this was useful and regards.

2020-01-10T16:29:28Z
author avatar
Real User

KNIME or Alterxy is a good choice for a company to deploy AI applications.

It has:

1. light data processing like ETL,

2. AI modeling develop and deploy,

3. and output simple charts or output to databases for further use like API/BI/etc.

If you deploy in the cloud, you can also use the AWS Sagemaker or other cloud tools.

2020-01-10T08:45:29Z
author avatar
Top 20Real User

There are many vendors offering end to end deployment with pros and cons. You can evaluate based on :
- On-prem vs cloud requirement
- Data volume that you want to process
- Do you already have ETL processes in place to extract the relevant data from diff sources?
- How are you planning to consume your ML output (API/dashboard/reports, etc)?
- Lastly, your ML algorithms that you intend to use and whether analyzing structured or unstructured data or both.

If you need further details, I will ask my presales to get in touch with you. Please provide me your contact information
.

2020-01-10T07:48:10Z
author avatar
User

DataRobot for OnPrem
SageMaker for AWS

2020-01-10T07:37:16Z
author avatar
Top 20User

Another thing you need to be cognizant of is end-to-end platforms allow you to build and deploy models to production, that is ML 101, where the market is moving is building and scaling predictive applications for numerous business process and cases. Also many end-to-end platforms do not have the capabilities to deal with data drift, model retraining once it's in production and for more advanced use cases the capability for human-in-the-loop feedback to help retrain the model. A final thought I will put out there is explainability and interpretability are paramount today, you can build your models in open source, use these other tools to put them into production but you are going to have a gaping hole when someone comes to ask you, how did you build the model, what weights did you put on your features, how are you dealing with bias, etc. Majority of all platforms out there today, help you stitch together disparate open source solutions, but when you actually get into product-ionizing and scaling multiple business processes that are operationalized with machine learning they don't work.

2020-05-26T17:03:50Z
author avatar
Top 20User

The current issue today with the majority of DS platforms is they are based on disparate open-source libraries, or you need 5-6 different tools to build your end-to-end ML workflow, most have never seen production either.

At BigML we've been around for 10+ years were the first to market with an MLaaS platform and can help you and your team accomplish true end-to-end ML (source > dataset> model > predictions > production) all in a singular platform, we work with many clients in your space, and would be happy to talk with you. You can even sign up for our platform for free and take it for a spin.

2020-02-28T16:59:46Z
author avatar
User

One potential solution might be the SAS platform https://www.sas.com/en_us/software/platform.html

2020-01-13T11:16:43Z
author avatar
Vendor

As others have said, many options but add Dataiku, H2Oi, Alteryx, and Databricks to your list.

2020-01-11T02:28:04Z
author avatar
User

Check out our system at Novi.Systems. It's an entirely integrated platform that includes hardware and software that performs what you require and much more. We'd be glad to set up a demo for you that allows you to load your data and "test drive" all the capabilities for up to four weeks. Contact me at mike@novi.systems

2020-01-10T17:22:46Z
author avatar
Top 10Real User

Please check for H2Oi, AzureML, Tensorflow.

2020-01-10T16:28:48Z
author avatar
Real User

For "end-to-end" platform for data science, I would prefer KNIME.

I think KNIME is especially better in working with various sources of data and preprocessing, easier to modify/add/remove flows from time to time when situations are changed.

For analytic, I have 50% of chance using KNIME nodes, and another 50% to code in Python node. Anyway it gives flexibility that you can write your own codes (I don't write R). And things are much simpler when data is well preprocessed.

It also provide data visualisation nodes, good enough but for fancy presentation, you will want to try others like Tableau.

Therefore it is easy to scale up as KNIME can nicely simplify the process before preprocessing.

2020-01-10T09:09:24Z
author avatar
User

I would suggest having working sessions for Data Robot (if your implementation is on-prem). SageMaker is what I would recommend if you plan for AWS.

2020-01-10T07:31:43Z
author avatar
Top 20User

Another thing you need to be cognizant of is end-to-end platforms allow you to build and deploy models to production, that is ML 101, where the market is moving is building and scaling predictive applications for numerous business process and cases. Also many end-to-end platforms do not have the capabilities to deal with data drift, model retraining once it's in production and for more advanced use cases the capability for human-in-the-loop feedback to help retrain the model. A final thought I will put out there is explainability and interpretability are paramount today, you can build your models in open source, use these other tools to put them into production but you are going to have a gaping hole when someone comes to ask you, how did you build the model, what weights did you put on your features, how are you dealing with bias, etc. Majority of all platforms out there today, help you stitch together disparate open source solutions, but when you actually get into product-ionizing and scaling multiple business processes that are operationalized with machine learning they don't work.

2020-05-26T17:04:06Z
author avatar
Top 5LeaderboardConsultant

If you want to perform some ETL along with feature management and model deployment then I would recommend Alteryx + Data Robot

2020-04-08T09:59:47Z
author avatar
Top 5LeaderboardConsultant

The best data science platform is the one you try to fits best to fulfill all your requirements and that is the goal you want to reach, the data you have for use into the platform and the results that you wanted to have accordingly with your goals. So there is a lot of tools to use but my suggestion is to try those that is the most accepted if you do not work with one specific vendor. So try with RapidMiner, SAS Enterprise Miner, KNIME or Alterxy.

2020-01-15T15:06:42Z
Find out what your peers are saying about Alteryx, Databricks, Knime and others in Data Science Platforms. Updated: February 2021.
464,594 professionals have used our research since 2012.