Please share with the community what you think needs improvement with KNIME.
What are its weaknesses? What would you like to see changed in a future version?
We are worried about the performance when it comes to using a lot of data that has many rows and columns. On the server-side, we are not sure whether KNIME can manage or handle large amounts of data without issue. It looks like it will easily work for small datasets but we are concerned about performance as the volume increases. KNIME needs to provide more documentation and training materials, including webinars or online seminars. At this time, it is not sufficient when compared to some other vendors. The user interface needs to be improved because it looks quite messy and I am not very comfortable using it.
I had some difficulty connecting to servers. It asked me to set something up on my server and it asked me for a code that I needed to generate on the server. There were several steps that I messed up. I followed all of the instructions but I couldn't manage it at all. I followed the directions in several forums to find out the problem. There should be better documentation and the steps should be easier.
The areas that I feel need improvement are: * It needs support for a joiner node to have three outputs (left unmatched, matched, right unmatched), as competitors do (have not checked 2019/20 releases). * I need the ability to add additional comparison conditions to a join. For example, in SQL you can specify only rows with a date fitting within a date range from the joined file. At the moment in KNIME, you should allow a join explosion to take place and filter what you need later, but sometimes the output becomes too big. * It would be helpful to have more examples of Java code for nodes, like Java Snippet. * I would like to have this solution show row counts on canvas, as it would improve the control and speed to build the workflow. * The pseudo-code types could be rationalised into one (e.g. only Java). * I would like to see better web scraping because every time I tried, it was not up to par, although you can use Python script.
One thing that I found was that in the open-source version of the KNIME analytics platform, we see difficulties in scheduling jobs. If the scheduler could be updated in the open-source version, the software will be easier to schedule properly and to use efficiently. The second time that I faced difficulty using KNIME was with data processing time. When we use large chunks of data for local processing, the processing is very slow. We do not want to move these big data often. For me, it seemed that moving one gigabyte of data went very slowly. So, the second thing that I would really like to see is a better ability to handle large amounts of data locally with KNIME in an efficient manner. The third area that might be improved is that when we have a large amount of data — let's say like five gigabytes — then there is one panel completely ignored. The impact of that on the results of our data processing is not good. So I would really like to see the load balancing and the overall processing time substantially reduced. So the things I would most like to see are the ability to handle large amounts of data and improved performance in processing.
It needs more examples, use cases, and MOOC to learn, especially with respect to the algorithms and how to practically create a flow from end-to-end. The learning curve is steep.
They could add more detailed examples of the functionality of every node, how it works and how we can use it, to make things easier at the beginning.