KNIME Review

Has good machine learning and big data connectivity but the scheduler needs improvement

What is our primary use case?

We are using KNIME for basic analytics to reduce the amount of processing time. We found that it takes a lot of time for scripting on the cloud, so we have been using it locally on our PCs.  

How has it helped my organization?

While the product has not yet improved our organization, we expect to use it in full deployments with our clients to greatly reduce their costs and make our services more attractive.  

What is most valuable?

The most valuable part of the solution is the machine learning part. The second feature that we use most is big data connectivity. When we deployed the architecture, we directed our IDS (Intrusion Detection System) server to where the big data will be on our servers. Then we needed some kind of basic machine learning and obviously. After that, we connected it with Tableau visualization. Now we are writing the big data part of our solution along with the overall machine learning. These two parts will be the most important for our business going forward.  

I think also connectivity with hybrid databases and also integration with languages like Python are great advantages to what we are seeking to do in our environment. We have been using these features extensively and we find them to be very valuable in achieving what we hoped to achieve with the tool.  

What needs improvement?

One thing that I found was that in the open-source version of the KNIME analytics platform, we see difficulties in scheduling jobs. If the scheduler could be updated in the open-source version, the software will be easier to schedule properly and to use efficiently.  

The second time that I faced difficulty using KNIME was with data processing time. When we use large chunks of data for local processing, the processing is very slow. We do not want to move these big data often. For me, it seemed that moving one gigabyte of data went very slowly. So, the second thing that I would really like to see is a better ability to handle large amounts of data locally with KNIME in an efficient manner.  

The third area that might be improved is that when we have a large amount of data — let's say like five gigabytes — then there is one panel completely ignored. The impact of that on the results of our data processing is not good. So I would really like to see the load balancing and the overall processing time substantially reduced.  

So the things I would most like to see are the ability to handle large amounts of data and improved performance in processing.  

For how long have I used the solution?

We have been working with KNIME for about six months.  

What do I think about the scalability of the solution?

We do not have many people using the solution in our company at this point because the tool is comparatively new to us. There are around three or four users right now. We do have plans to increase the usage and the number of users. We have been planning it because we have growth opportunities with some clients. The only potential problem is that right now, we are under-confident, in our capability to implement pure KNIME solutions without more discovery and testing. So, we are planning it to replace Alteryx eventually with KNIME. But as of now, we are just planning. We do plan to increase the usage in the future but we have not done anything yet regarding that.  

How was the initial setup?

The initial setup was very straightforward. It was not complex at all.  

What about the implementation team?

We deployed it, we installed it ourselves on our local system server.  

What other advice do I have?

We have done a few projects with some of our clients in KNIME. In these cases, we mainly used KNIME because of its ability to work in a data center environment in an enterprise system. This was one of the most important things that we were looking for. The second point was that KNIME is an open-source analytics platform. The point is that if some client has less data or a relatively small database, then we can use the open-source platform instead of using Alteryx, which is fairly expensive. These are the options we advise our clients about.  

On a scale from one to ten where one is the worst and ten is the best, I would rate this product as an eight out of ten. I honestly do not feel familiar enough with this product that my rating is accurate as I need to be more familiar with it over time. On the other hand, I have used KNIME and other tools in a similar category — like Informatica and Alteryx. Informatica is purely a data warehouse software. Alteryx is something we use frequently. So I have used three ETL tools. If I compared KNIME with Alteryx which are the most similar of the three, then I think KNIME is much better for our purposes. Strictly as a comparison with Alteryx, I would rate KNIME as an eight.  

Which deployment model are you using for this solution?

**Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Add a Comment