Please share with the community what you think needs improvement with Darwin.
What are its weaknesses? What would you like to see changed in a future version?
There were a couple flaws when I was using it that they've probably fixed by now. For example, scroll bars were not sticking; various little things that made it feel like it was still in beta mode. What I found was most important was the accessibility aspect. To that end, it could have more explanatory tool tips as an option, as a setting you could turn on and off or as a roll-over where they would pop up. There is also transparency. There are issues around the ethics of artificial intelligence and machine learning. You need to have a lot of transparency regarding what is going on under the hood in order to trust it. Because so much is done under the hood of Darwin, it is hard to trust how it gets the answers it gets. Of course, if you're too transparent, that's overwhelming for the person who's looking at it, and it could also be an issue for SparkCognition. It's difficult to find the point of appropriate transparency so I wouldn't blame them if they're not yet at a great point with that. But they should be thinking about a way to be appropriately transparent. That would be helpful in trusting the answers that Darwin provides.
There's always room for improvement in the UI and continuing to evolve it to do everything that the rest of AI can do. Because it's so much better than traditional methods, we don't get a ton of complaints of, "Oh, we wish we could do that." Most people are happy to see that they can build models that quickly, and that it can be done by the people who actually understand the problem, i.e. SMEs, rather than having to rely on data scientists. There's a small learning curve, but it's shorter for an SME in a given industry to learn Darwin than it takes for data scientists to learn industry-specific problems. The industry I work in deals with tons and tons of data and a lot of it lends itself to Darwin-created solutions. Initially, there were some limitations around the size of the datasets, the number of rows and number of columns. That was probably the biggest challenge. But we've seen the Darwin product, over time, slowly remove those limitations. We're happy with the progress they've made.
An area where Darwin might be a little weak is its automatic assessment of the quality of datasets. The first results it produces in this area are good, but in our experience, we have found that extra analysis is needed to produce an extra-clean set of data. Where it's good, for example, is if we have, say, a date column with different dates and maybe that data is not so valuable for the model because the difference in the dates is not significant. Darwin will find that kind of thing. But you definitely can't give Darwin a dirty dataset and then generate a really nice model. So we have to do extra analysis of the data. Cleaning the data always consumes a lot of time. Also, Darwin can generate new data, but I didn't find that to be very valuable. Darwin could improve generating new data and that would be an important improvement.
The solution's ability to capture complex relationships over time and the resulting accuracy of its predictions could be improved. They could also improve customer relations with education on how to use the tool. It took us quite a while to figure out how things were put together so that we could get things to work and provide proper feedback to our leadership. The Read Me's and the tutorials need to be greatly improved to get customers to understand how things work. It might be helpful to have some sample data sets for people to play around with, as well as some tutorial videos. It was very hard to find information on this in the time crunch that we had, to see how it worked and then make it work, while interfacing with folks at SparkCognition. These things should be a priority for them, in my opinion. Knowing how to use the tool would have given us more time to play with it instead of just trying to figure it out. They should "game-ify" it more. Darwin could do a few other things better such as automating determination of whether certain values need not be removed and that certain parameters should not be removed. And their UI could use some speeding up.
We have used Darwin as a complement to other tools like R and SPSS to get the accuracy we want. This is one thing we've told the people at SparkCognition and they're working on it. In these kinds of situations, we don't use Darwin 100 percent because of that limitation. The solution does help us towards making models operational but we are not at the place we want to be. We want models giving the answer to whether we should make a loan or not, but we are not at that point yet. It still has some limitations. These are things we have given as feedback and they're working on them. Also, it would be great to have a solution that can organize the models. Right now, when there are a lot of models, they are disorganized. In the future, when we have more models, it will be more complicated to find the things that we are working on. It's about the user interface. You have the screen where you can look for other models but you can't organize models by name or by date. Something they are working on, which is great, is to have an API that can access data directly from the source. Currently, we have to create a specific dataset for each model. But it would be great to have an API that gives us the opportunity to have a connection with our datasets or data lakes for each one, and a specific file for each model. Sometimes, you find that you have to add a new variable and you have to create a new file with that variable instead of having a connection via API to your datasets. We have also asked SparkCognition that instead of automatic suggestions for addressing dataset issue, things should be defined by the user instead. There have been occasions where we have numerical data and Darwin has suggested using a nominal variable. We would prefer to define categories ourselves, instead of the recommendation that Darwin makes.
The challenge is very big toward making models operational or to industrialize them. E.g., what we want to do is to make unique credit models for each customer. So, we are preparing the types of customers who we can try new credit models on Darwin. But, I see this still very challenging to be able to get the data sets so Darwin can work. At this point, we are working with it to get the data sets ready for Darwin. However, once they are in Darwin, I believe we will not have any problems and will have very good results, just as we have had for the risk portfolio management. We are trying to aim it for a more specific group of clients to target them more specifically. Right now, we have been using Darwin with clients that we don't want. That's how we have been reducing our delinquency index. Darwin is helping us identify clients that we need to close a relationship with, but we need it now to tell us the clients where we should be aiming to give them new products, new opportunities, or go to the market and reach new clients. The dashboards and displaying of the data needs improvement. Currently, only IT and business intelligence people are using the results we get from Darwin, but less sophisticated areas in technology could also benefit from it if we had more user-friendly dashboards. People get scared, and they think, "We will need to run something in Python," which is not the case. We could use more user-friendly dashboards so everyone could use them. However, they have already let me know that Darwin is already working on the dashboard implementation so our commercial areas can have access to the data in a more user-friendly way. This is great because it is a very important area of opportunity. We want to be able to test different updates. E.g., we've been waiting for the user-friendly dashboards since August. We really want to start working with that but don't know when it will be released. The people at SparkCognition told me that as soon as they were ready that they would contact us, so we could have a workshop for this. However, they haven't contacted us for this yet.
The automatic generation of some models doesn't work. If it was automatic, this would accelerate the work that we do. As a data scientist, I would find some other tools available for new methods which would be much more interesting because they would give me more control. However, for a normal person who is not yet a data scientist, Darwin would be more helpful for them. The analyze function takes a lot of time.
Our main data repository is on AWS. The trouble we are having is that we have to download the data from our repository to bring it into Darwin. It would be great if there was an API to connect our repository to Darwin. It would provide great automation because right now it takes time to download the information and then upload it to Darwin. Another area for improvement would be if the user interface could have non-supervised models. That would be great. Right now you can only work with supervised models. Finally, I would recommend that they work on improving the account functionality because we have had some difficulty in that area, in terms of logging in.
There are many Data Science Platforms available. Which platform would you recommend that can handle large amounts of data?
Let the community know what you think. Share your opinions now!