What is our primary use case?
To create quick data analytic experiments, without incurring the time and cost of spinning up servers, setting up Hadoop, etc.
Although MLS makes it very easy to deploy the resulting machine-learning models via REST API, I primarily use MLS as a means to quickly spin up experiments and create proof of concept models.
How has it helped my organization?
Not widely adopted at my old workplace, I only used this to create quick proofs of concept to try to convince management of the viability of a project.
What is most valuable?
MLS allows me to set up data experiments by running through various regression and other machine learning algorithms, with different data cleaning and treatment tools. All of this can be achieved via drag and drop, and a few clicks of the mouse.
The easy drag and drop can create simple data science experiments. Low barrier to entry allows large number of candidates get started.
The graphical nature of the output makes it very easy to create PowerPoint reports as well.
What needs improvement?
Enable creating ensemble models easier, adding more machine learning algorithms.
For how long have I used the solution?
Less than one year.
What do I think about the stability of the solution?
Out of about 150-plus MLS experiments I have done, maybe two or three bugged out. Interestingly enough, those are the ones I can’t delete out of the account.
What do I think about the scalability of the solution?
Scalability, in terms of running experiments concurrently: Good. At max, I was able to run three different experiments concurrently.
Scalability in terms of deploying models: Unknown, I never deployed on Azure. But I would guess REST API could probably easily handle a few K worth of hits per second, since that is how Microsoft is going to get paid.
How are customer service and technical support?
Which solution did I use previously and why did I switch?
The only other solution beyond this would be standard tools used by data scientists, like R, Python, etc. All of these would have a fairly high barrier to entry, requiring programming experience. The main selling point of MLS is the low barrier to entry, where even tech-savvy business people can use it.
How was the initial setup?
Simple. Create MLS live account (preferably paid ones), open MLS, done.
Caveat: Different organizations have different attitudes towards cloud use, especially with sensitive data. At Bridgestone, the hardest part was getting corporate approval to allow me to upload heavily treated, sensitive data to a cloud platform.
What's my experience with pricing, setup cost, and licensing?
To use MLS is fairly cheap. Even the paid account is something like $20/month, unless you are provisioning large numbers of VMs for a Hadoop cluster.
The main MS makes money with this solution is forcing the user to deploy their model on REST API, and being charged each time the API is accessed. There are several pricing tiers for the API.
If you do not use the API, then value of MLS is to create rapid experiments ($20/month). The resulting model is not exportable to use, thus you’ll have to recreate the algorithms in either R or Python, which is what I did. MLS results gave me a direction to work with, the actual work is mostly done in R and Python outside of MLS.
Which other solutions did I evaluate?
R and Python.
Python + Pandas + scikit-learn:
- scikit-learn offers better performance for extremely large data sets
- Large-data manipulation tools
- Fairly good set of ML algorithms
- High barrier to entry, in terms of skill and knowledge
- Fairly labor intensive to create large number of experiments
R + caret:
- Very good amount of ML algorithms (so many it may cause paralysis from too much choice, 200-plus algorithms)
- Good performance, unless the data set is extremely large
- High barrier to entry
- Data manipulation is a pain, you probably want to use another tool to pre-treat the data before loading it into R dataframes
What other advice do I have?
For data science professionals or programmers I would rate this solution a four out of 10. A major feature is missing: creating ensemble models. This can be achieved with the tool, but it's clumsy and slow.
For marketing or business professionals I would rate it an eight out of 10. It has a low barrier to entry, and can quickly create models that can be used for proof of concept and justify further investment in a full data science or Big Data project.
R and Python, in my mind, are still the way to go for a true data science/predictive analysis project. MLS's value is the ease of use and low barrier to entry. If one is not a programmer or statistician, MLS is a good way to get a project started, create a proof of concept.