What is our primary use case?
I was trying to see if Darwin was going to be useful for the company and if it was useful for the project that I was working on. I was working with it, testing it, seeing how it worked, seeing how accessible it was, and if it would be something that would be viable for us to use.
We were hoping to use it on a machine-learning project, to categorize words based on their likeness to each other. I had to find a way to translate that, and encode it, into something that Darwin could actually read.
How has it helped my organization?
Darwin is really useful for people who don't necessarily do a lot of data science. A lot of what you're doing in Darwin, while it's a lot more efficient, you could do yourself if you had the knowledge and the time. It's definitely more efficient and it's also useful if you are someone who does not really understand how data science works, but you still want to implement some machine learning.
It gave me ideas for how I would want to implement something on my own. I wasn't going to use Darwin as my long-term solution for what I was going to do, but it gave me the ability to tweak it in different ways, and to do that a lot faster than if I was going to code it myself on Azure Notebook or AWS. I found that really useful. It made my workflow a lot more efficient.
My project, perhaps, was not the best fit for Darwin. It did not necessarily help me find the answer for that project, but it did allow me to work with different ideas of how I would want to set up the project and pursue those answers. So while it didn't hand me the answers, it gave me the sandbox to find that more correct workflow.
It saved time in two ways. One was that it made it more efficient to tweak some options and then run it. I didn't have to put in all the code myself or rewrite the code. And it also made it a lot easier to have multiple machine-learning processes run at once. Darwin has it neatly all in one place, which makes that a lot easier to use and, again, more efficient.
What is most valuable?
I really liked how there were a lot of abilities to tweak how it was going to run: How many folds you were going to use and cross-validation.
Also, while it wasn't super-relevant to me, I liked the data checking feature where it looks at your data and sees how viable it is for use. That's a really cool feature. Automatic assessment of the quality of datasets, to me, seems very valuable. I didn't use it too much but it looked like it worked well from the few times that I was involved in that feature. That's a great feature to have because cleaning datasets is a pain and there are often errors even after it has been cleaned. So to have another check to say, "Oh wait, there might be a problem here," is really useful.
Darwin's interactive suggestions are useful in how it could, for example, find what a more appropriate data type might be when you have it in the wrong data type. And sometimes it would tell you, "Oh, maybe you just want to drop this," especially if it was redundant or there was low variance. It's useful to see where there might be issues. I wouldn't necessarily trust it to do all of that itself. I would say it's more of a check rather than a be-all-end-all cleaning tool.
I found the interface really clean and easy to use.
What needs improvement?
There were a couple flaws when I was using it that they've probably fixed by now. For example, scroll bars were not sticking; various little things that made it feel like it was still in beta mode.
What I found was most important was the accessibility aspect. To that end, it could have more explanatory tool tips as an option, as a setting you could turn on and off or as a roll-over where they would pop up.
There is also transparency. There are issues around the ethics of artificial intelligence and machine learning. You need to have a lot of transparency regarding what is going on under the hood in order to trust it. Because so much is done under the hood of Darwin, it is hard to trust how it gets the answers it gets. Of course, if you're too transparent, that's overwhelming for the person who's looking at it, and it could also be an issue for SparkCognition. It's difficult to find the point of appropriate transparency so I wouldn't blame them if they're not yet at a great point with that. But they should be thinking about a way to be appropriately transparent. That would be helpful in trusting the answers that Darwin provides.
For how long have I used the solution?
I used Darwin for a few months. I haven't used it in the last few months.
What do I think about the stability of the solution?
For the most part, it performed pretty well.
At one point, there were some issues with it functioning correctly. I brought this up to the support team and they said, "We'll work on it immediately." They were really responsive. But they made it impossible to log in the next day while they were fixing it, and they didn't tell me beforehand that they were going to do that. I said, "Okay. Next time you really need to tell me that you're going to make it impossible to log in," because I thought something else had gone wrong. I didn't know that was just part of their fixing what was wrong from the day before. I think they heard me on that comment.
It had a couple of issues with being perfectly stable and sometimes, if I did something that it didn't like, it would still try to run it but would then crash that process, and it would be hard to tell beforehand that that was going to happen.
How are customer service and technical support?
The people at SparkCognition were actually super-helpful in helping me figure out how to use it for my project. I'm still a student and I'm still learning a lot of things. Getting some help in how to encode it and make it work was great. The amount of support that the team gave me was really great.
Whenever I had an issue, I could bring it to them and they'd be on it immediately. They were super-responsive and that was really good.
What was our ROI?
ROI is something that I am not super-qualified to answer, but the fact that it saves time, allows for more flexibility, and also helps more people get involved in machine learning, rather than just the people who have studied it, means probably has a lot of use and return. But I don't know what the actual breakdown of costs and benefits are.
Which other solutions did I evaluate?
I've never used anything else that packages everything together and does a lot of the work on its own. I used Azure Notebook and I use AWS coding in their version of Jupyter Notebook. I did use Weka back in college, which is also a machine learning software solution, but it's nowhere near as clever as Darwin. Weka is very clunky and awful to use.
What other advice do I have?
Go into it with the mindset of playing around with it to see how it can work with different tweaks. You would probably want to start with one of their use cases that you know is going to work properly. My project was weird and didn't really want to work properly with any machine learning, whether I put it in or it was Darwin. But when I used their use cases, it worked way better. So start with their use cases, play around with it, and really get familiar with it. I definitely have the millennial mindset of, "Here's a new piece of technology. I'm going to play with it and see how it works and pick it apart in my brain." That's the attitude that you have to bring to this.
Something that I learned from using Darwin was how the question of ethics intersects with how people are actually going to be creating their software. One of the things I've been doing at this company is looking at ethics and AI and their intersection. It was interesting to see how that actually plays out in the real world; how that transparency might be an issue or might not really be an issue.
In terms of the guidance that it provides towards making models operational, I would like to see a "before" and "after" if I ran the data set the same way, but changed things based on the guidance. It would help to see the before and after of how different and more accurate it becomes. I didn't do that. I didn't want a before and after check. So I don't really know how well it worked.
It's more of a check, something to keep you from messing up your data entirely or messing up the process entirely and wasting a ton of time. I don't think it's a be-all-end-all that can totally clean your data set for you, or totally guide you as to what you need to do. I do think it's useful in that it makes working with the data a lot more accessible. And I would say that's the main thing that I liked about Darwin, is that it makes machine learning a lot more accessible to people who don't really know what they're doing.
I would be happy to use it again to see how it's changed since I last used it.
I would rate Darwin at about seven out of ten. It's not perfect. It's definitely valuable but has a ways to go.