What is our primary use case?
I provide product management and SME services to oil companies as a consulting service. My company has partnered with SparkCognition to bundle its products into a package of services that I provide to my customers. For the most part, when I'm working with SparkCognition, and Darwin in particular, I'm working with it on behalf of one of my customers.
We do different engagements. We've done PoC projects with customers with versions 1.4 and onward.
The biggest use case we've seen is for automatic classification of data streaming in from oil and gas operations, whether exploration or production. We see the customers using it to quickly and intelligently classify the data. Traditionally, the way that would be done is through a very complicated branching code which is difficult to troubleshoot, or by having it manually done with SMEs or people in the office who know how to interpret the data and then classify it, for analytics.
The customers have looked at using machine learning for that, but they run into challenges — and this is really what Darwin is all about. Typically there is an SME who can look at the data and properly classify it or identify problems, but taking what he knows and what he does instinctively and communicating it to a data scientist who could build a model for that is a very difficult process. Additionally, data scientists are in very high demand, so they're expensive.
SMEs can look at data and quickly make interpretations. They've probably been looking at the data for 10 or 15 years. So it's not a matter of just, "Oh, we can plunk this SME beside a data scientist and in a couple of months they can turn out a model that does this." First, SMEs don't have time to be pulled out of their normal workload to educate the data scientists. And second, even if they do that you end up with something very rigid
With Darwin, customers can empower the SMEs to build the models themselves without having to go through the process of educating the data scientists, who may leave next week for a better paying job.
Most of the projects that we've done, PoCs, are typically done in the cloud, for ease of use. Because we work in the oil and gas space, public cloud is the preferred option in the U.S., with the simplified administration and a little bit lower cost. Overseas, the customers we've talked to have noted there are laws and restrictions that require their stuff to be on-premise. We've talked to potential customers about it, but we haven't actually done an on-premise project so far.
How has it helped my organization?
The automated AI model-building reduces the time that projects take. Before I started working with SparkCognition, I worked on several projects where it took months, and in some cases years — complex problems — for data scientists to even pick a machine-learning model to use. They might settle on a methodology such as random forest after quite a bit of analysis. Whenever a model is completed, it is a powerful and unique solution that can't be done with traditional programming, but it's almost impossible to tune in the field. Additionally, if you're talking oil and gas, some of the sites where you need to run these, especially on the edge, are very remote. It doesn't respond the way you want. So then you have to take it back to a data scientist and have it tuned.
Darwin lets you rerun the process with new data, with more data, with different tuning parameters. You don't have any of that back-and-forth with a physical person.
The solution has created the opportunity for machine learning to be practically implemented in places where it couldn't be implemented before. The current way that machine learning problems are implemented is with data scientists, usually as IT initiatives or R&D initiatives. Often, a company will say, "Okay, we're going to do machine learning." They have a big initiative, then hire some very expensive data scientists and they create a model that may be 10 or 15 percent better than what's out there. The challenge is that the model exists in MATLAB or Python but it's not integrated into the business systems like an ERP; or it's not integrated into their industrial control system. It ends up being a really cool PoC and it never turns into something that practically affects the business.
Darwin has opened up the places where you can do that.
The potential we see with Darwin, with the REST API and with the easy-to-approach interface, is that we can empower these SMEs to build things and interface them from day one with the existing control systems and other systems the business is using. So they're not stepping out of their traditional workflows to use machine learning. It's integrated.
As far as building models goes, versus hiring people with ML experience it is significantly faster and exponentially cheaper. You can build models, once you've done some initial training in Darwin, that would take a data scientist, with an SME, two or three months to build. With Darwin the SME can do that in a few days or less. For a lot of applications, especially in oil and gas, the savings are huge as far as practical applications of machine learning go, versus the tradition of using data scientists to build them one by one.
For our customers we have primarily looked at use cases around automatic event detection. We hadn't even tried to do that with data scientists because it just wasn't practical with the timeline and because the costs were too high. And using traditional software methods to try to solve that problem, the estimate that the customer had was that it would have taken three to four months of software development. We were able to build a model that provided effectively the same results within a week. A lot of that was just figuring out the data and data quality issues. The actual model building in Darwin took a few hours.
What is most valuable?
The key feature is the automated model-building. It has a good UI that will let people who aren't data scientists get in there and upload datasets and actually start building models, with very little training. They don't need to have any understanding of data science.
It also has the REST API which is used pretty extensively. It's a bit more feature-full and it is a great tool for customers who actually want to integrate ML models into their business systems, products. and workflows. This is a challenge we see with with machine-learning initiatives at a lot of companies: You hire data scientists. You give them a problem. You give them the data. You train them on what they need to do with it. And then they build a model but you can't just drop that model into your ERP. Or if you have supervisory systems, industrial systems like IoT applications, you can't just drop a model into that. Darwin and the REST API it has available abstracts all that away and makes it very easy to integrate into existing systems.
The accuracy, like anything else, is dependent on having a good data set. If you give it the right data — good, clean datasets — Darwin is as good, if not better, than anything out there. Even if, in its automated fashion, it initially returns something that may not be quite as accurate, the fact that you're able to iterate and correct the data quality issues quickly, rather than the traditional process where you work with the data scientists and you start getting results weeks or months later, enables you to iterate quickly to get to a higher level of accuracy.
Darwin's automatic assessment of the quality of those datasets does a good job. Additionally, its partner network provides industry-specific tools that integrate and work alongside Darwin, or wrap around Darwin, and provide a lot of additional capabilities. Darwin does a good job but where it doesn't, SparkCognition has a great partner network that has developed industry-specific things that solve problems that Darwin might not solve out-of-the-box.
The solution's interactive suggestions on how to address dataset issues to make the data ready for algorithmic development is interesting. It depends on the specific data set. Sometimes they're spot-on and sometimes it's a matter of the interpretation dataset. Overall, they're helpful and they definitely make the machine learning more approachable.
What needs improvement?
There's always room for improvement in the UI and continuing to evolve it to do everything that the rest of AI can do. Because it's so much better than traditional methods, we don't get a ton of complaints of, "Oh, we wish we could do that." Most people are happy to see that they can build models that quickly, and that it can be done by the people who actually understand the problem, i.e. SMEs, rather than having to rely on data scientists.
There's a small learning curve, but it's shorter for an SME in a given industry to learn Darwin than it takes for data scientists to learn industry-specific problems. The industry I work in deals with tons and tons of data and a lot of it lends itself to Darwin-created solutions.
Initially, there were some limitations around the size of the datasets, the number of rows and number of columns. That was probably the biggest challenge. But we've seen the Darwin product, over time, slowly remove those limitations. We're happy with the progress they've made.
For how long have I used the solution?
We started talking to SparkCognition in September or October of 2018, so it's been about a year.
What do I think about the stability of the solution?
The stability has definitely improved. Early on, there were some cases where we would run into the limitation on the data size but these have been resolved. But overall, we haven't had any issues with stability.
What do I think about the scalability of the solution?
The scalability is good. The way that we use it, where we build models and then deploy them to the edge, I don't think that we would run into traditional scalability challenges because of this deployment model. We build a model, tune it, and then we integrate it into a workflow or software. Once it's there, that model is outside of Darwin.
We look at every project that our customers present to see if it's a good candidate for Darwin. I definitely want to increase our use of Darwin in projects because it provides great return on investment for our customers.
How are customer service and technical support?
Tech support is really good and so are the customer success team guys. They're a terrific team to work with. They're always quick to get our team what we need to support our customers. Sales and product support have been outstanding.
Which solution did I use previously and why did I switch?
The route we looked at previously was that of hiring data scientists and having them build a model. That wasn't a business we wanted to be in, as far as our consulting goes. Darwin really opened up the market and let us add more value to our customers without changing the type of staff that we have.
How was the initial setup?
The initial setup is relatively straightforward and, where there are industry-specific needs, that's the value that we, Helio Summit, brings to the table. We connect the dots between a company and SparkCognition and their products. We're there to help customers get to that value really quickly. In our model, our consultants already have experience and training with the product. We've run PoCs and we're there to solve a problem and ensure they go smoothly for our customer.
The projects we do can take from a few weeks to a month. It depends on the size of the customer and how integrated they want it. Each customer's problems are different and each one requires integration to different systems.
Other than project management best practices, we don't really have an implementation strategy. We're a consulting company and each customer engagement is unique. We're going in there and developing something to solve a customer's specific problems. Each one is pretty unique.
On our side, it generally doesn't take more than one or two consultants on a project, depending on the amount of interfacing that needs to be done with existing systems. If we have to tie it into WITSML or an IoT system, there may be a need to have one of our developers assist with the project. But usually, one business analyst is enough in terms of people who need to get in there and actually interface with the customer.
Our goal is to deploy solutions for our customers that are are intuitive and reliable and don't require ongoing maintenance contracts. We really want to get the customer trained and using it on their own. Unless they have problems or want to do something else, we try to get to a point where the customer is self-sufficient as quickly as possible.
Our two biggest users are usually drilling engineers and geologists as well as data analysts who also use it. These will be people who run centers of excellence for their areas, for their departments, and for their companies. They will be the best SMEs that they have and they're there to advise other people throughout these companies.
What was our ROI?
Due to the nature of our business as a partner, we don't calculate an internal ROI. For our customers, we believe they see a return of between two and three times versus staffing an internal ML team.
It provides more value because we're enabling them to do something with machine learning that would otherwise take a lot of development or data scientists.
What's my experience with pricing, setup cost, and licensing?
Darwin has a great value statement. Customers always want it to be a little bit cheaper, but in the context of otherwise having to hire data scientists the pricing is very cost effective.
There are no additional costs unless you need custom development for system integration. We haven't run into anything beyond the licensing cost. If we need their services group to look at something, the pricing is pretty standard. There haven't been any surprises.
Which other solutions did I evaluate?
We looked at some of the PowerAI stuff. We really got into it because we had a customer with a very specific problem. We started looking at what's out there and we looked at a few including PowerAI and some of the open-source stuff. That's when we came across SparkCognition at an industry event and dug into it. It turned out to be a good fit for what this customer needed and the relationship grew from there.
What stood out to me about Darwin was how approachable it was for people who aren't in data science. For the science consultants we had, it clicked. It made sense. It was easy to articulate why you would use it and how.
What other advice do I have?
Machine learning is definitely not pixie dust. Often people fall into the trap of thinking "Oh, I just throw in some AI and it's going to magically make my data better." It's certainly not going to do that. But where it does have very specific applicability to problems is where you understand what it's good at and what it's not good at. I've worked with so many Fortune 500 companies in the oil industry and they can't keep data scientists around long enough to actually finish a project and solve a problem and then reintegrate it into their system. Darwin is the perfect tool to solve this issue; what the machine-learning industry needs at this point to expand exponentially in the oil and gas market.
That's not to take anything away from what data scientists do. Solving these very difficult technical problems needs to be done by data scientists. But there just aren't enough people to practically apply that to the hundreds of thousands of actual use cases around the world in different industries. Having AI building AI models is really the only way to go if it's going to expand beyond larger companies.
We've looked at Darwin to create data pipelines in production for models. As I said, SparkCognition has a great partner network. In particular, a partner called Cybersoft has a really interesting tool that wraps around the models that Darwin creates and lets us run them at the edge. We found this add-on that makes it a lot easier. Darwin is great but SparkCognition's partner network builds on what they have, so that it can be applied quickly to other industries. Darwin's connectors to common data repositories cover some 70 to 80 percent of the needs out there. The oil and gas industry has some very unique data structures, like WITSML and OPC that we, as a partner, can help integrate. In some cases, there are unique data structures where we have to do a little bit of development to bridge that gap or to streamline it so they can use it again without getting out of their existing toolset.
Darwin is really the only thing that I've seen that does what it does. That alone makes it a 10 out of 10. The alternatives are so different.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?