If you were talking to someone whose organization is considering Darwin, what would you say?
How would you rate it and why? Any other tips or advice?
Go into it with the mindset of playing around with it to see how it can work with different tweaks. You would probably want to start with one of their use cases that you know is going to work properly. My project was weird and didn't really want to work properly with any machine learning, whether I put it in or it was Darwin. But when I used their use cases, it worked way better. So start with their use cases, play around with it, and really get familiar with it. I definitely have the millennial mindset of, "Here's a new piece of technology. I'm going to play with it and see how it works and pick it apart in my brain." That's the attitude that you have to bring to this. Something that I learned from using Darwin was how the question of ethics intersects with how people are actually going to be creating their software. One of the things I've been doing at this company is looking at ethics and AI and their intersection. It was interesting to see how that actually plays out in the real world; how that transparency might be an issue or might not really be an issue. In terms of the guidance that it provides towards making models operational, I would like to see a "before" and "after" if I ran the data set the same way, but changed things based on the guidance. It would help to see the before and after of how different and more accurate it becomes. I didn't do that. I didn't want a before and after check. So I don't really know how well it worked. It's more of a check, something to keep you from messing up your data entirely or messing up the process entirely and wasting a ton of time. I don't think it's a be-all-end-all that can totally clean your data set for you, or totally guide you as to what you need to do. I do think it's useful in that it makes working with the data a lot more accessible. And I would say that's the main thing that I liked about Darwin, is that it makes machine learning a lot more accessible to people who don't really know what they're doing. I would be happy to use it again to see how it's changed since I last used it. I would rate Darwin at about seven out of ten. It's not perfect. It's definitely valuable but has a ways to go.
Machine learning is definitely not pixie dust. Often people fall into the trap of thinking "Oh, I just throw in some AI and it's going to magically make my data better." It's certainly not going to do that. But where it does have very specific applicability to problems is where you understand what it's good at and what it's not good at. I've worked with so many Fortune 500 companies in the oil industry and they can't keep data scientists around long enough to actually finish a project and solve a problem and then reintegrate it into their system. Darwin is the perfect tool to solve this issue; what the machine-learning industry needs at this point to expand exponentially in the oil and gas market. That's not to take anything away from what data scientists do. Solving these very difficult technical problems needs to be done by data scientists. But there just aren't enough people to practically apply that to the hundreds of thousands of actual use cases around the world in different industries. Having AI building AI models is really the only way to go if it's going to expand beyond larger companies. We've looked at Darwin to create data pipelines in production for models. As I said, SparkCognition has a great partner network. In particular, a partner called Cybersoft has a really interesting tool that wraps around the models that Darwin creates and lets us run them at the edge. We found this add-on that makes it a lot easier. Darwin is great but SparkCognition's partner network builds on what they have, so that it can be applied quickly to other industries. Darwin's connectors to common data repositories cover some 70 to 80 percent of the needs out there. The oil and gas industry has some very unique data structures, like WITSML and OPC that we, as a partner, can help integrate. In some cases, there are unique data structures where we have to do a little bit of development to bridge that gap or to streamline it so they can use it again without getting out of their existing toolset. Darwin is really the only thing that I've seen that does what it does. That alone makes it a 10 out of 10. The alternatives are so different.
My advice is to do extra cleaning of your data. Darwin is good when it has a really nice, clean dataset to generate a model, but you need to work at it to make sure you have that kind of dataset. On our team there are 25 people but there are just two of us using Darwin, my partner and me. He is a data scientist and I am an artificial intelligence engineer. We are using Darwin for the development phase, but we aren't using it for production. It's a fast tool for development. Within our group in the company, we develop solutions. We try to analyze the possibilities for doing so. We need the data so we extract it and then generate a model. Once the model is ready we put in an API or the cloud or the web. We can then query the model with new data and create a forecast, but it depends on the solution and on the data in the production phase. I believe we have a one-year licensing agreement. We are trying Darwin to see how it works and its benefits for us. It's hard to say if we will continue using Darwin. We are trying to determine if Darwin is a high-value tool. We need to use Darwin more. I have only been using it during about 5 or 10 percent of my time. I would rate Darwin at seven out of 10. Darwin saves us some work, but we also have to do extra work. It doesn't do all the work for you. In the beginning, when we started to see how Darwin works, we thought that maybe, from raw, dirty data, we could generate a model really fast, but that's not true. It's good at doing some parts of the work, but you need to work and to think about the solution to your problem. You need to think about the application to generate data according to your solution. Maybe that's a good thing. If Darwin did everything, perhaps I would not be needed in the company.
The biggest lesson I learned from using Darwin, honestly, was that they should interface with their clients much quicker and much more easily. They should make that process seamless to make sure clients are up and running ASAP so they can get their feet wet instead of wasting about a month of work. We don't have any plans to use it right now but we're open to using in the future. We're telling them this stuff because we want them to improve this product because we did see value in it. We did see the idea behind it, but the execution was not done very well, especially when it comes to tools to get people up and running on it quickly instead of spending weeks on end going back and forth to figure out logistics.
One of the most important things we learned, and that we also recommend to other companies, is to have a data link; to have all their data ready. Without data you cannot use Darwin. You really need the data to start using it and to take advantage of Darwin. You also need people who understand data science. They can help you understand how to use Darwin and to interpret the results that it gives you. Right now we are not measuring the accuracy of the models. We are using it to give some insight and some answers. We're on our way toward that.
Do not be intimidated by the apparent complexity of it because it is more user-friendly than you think. It makes AI easy. Start testing it because it's very trial and error. I really do believe people need to have this type of mentality to start using tools like Darwin. Don't be afraid of retesting it. We are using the automated AI model building because we want the AI model to be unique for each customer. We are getting all the data ready so it can be integrated into the modeling. We want to give each client a unique credit model to be automated through the AI. We don't have this currently. We are working on it. Right now, we don't have this in production, but are working on it so we can get there. I would rate them a nine (out of 10). I wouldn't put them as a 10 because there are still a lot of things for them to keep trying. However, so far, there are a lot of benefits that we could be taking out of it, but that is part of the learning process.
Once Darwin went down, then the product went down as well. This was a small issue. I would rate Darwin as an eight or nine out of 10, as a nontechnical person. I would prefer a tool with more control. A more experienced user would probably rate the product as a six out of 10.
You need to have good data sets to get good results. Before Darwin, you need to work on your data sets to have the correct data sets to make the correct models. Darwin is a solid solution, but the main advice that I have is that if you don't have the data, you can get Darwin but you're not going to get the results you want. The biggest lesson I have learned from using Darwin is that it makes things faster. We can test faster, not just one at a time. We speak with the team at SparkCognition and they help us to improve our ideas around the use cases that we can apply. That is another important lesson. The biggest problem for us is data sets because, sometimes, they don't pass in relation to Darwin. It's not a problem on Darwin's side, it's a big problem for us because we have a lot of unstructured data and we are working with other solutions, not Darwin, to have the data ready for algorithms. For Darwin, as a solution, you need people who understand the business and who understand how to improve the organization with the results of the models.
There are many Data Science Platforms available. Which platform would you recommend that can handle large amounts of data?
Let the community know what you think. Share your opinions now!