What is our primary use case?
This solution is primarily used for various data analytics in an enterprise environment.
The reality of any data analytics project including Data Science is that 90% of the effort goes into data sourcing and preparation. Data usually comes from multiple sources including data warehouses, web scraping, Excel input, free text, etc. KNIME allows you to do the 90% plus other predictive functionality.
How has it helped my organization?
It is a free open-source tool that performs very similarly to other expensive tools. KNIME has been great for me over the years. It allows me to connect to various sources including data warehouses, then put the processing logic together (ETL-like), which can be quite complex and produce the required output. Ultimately, it would go into Excel or Tableau for presentation.
What is most valuable?
The features that I find most valuable are:
- The visual workflow tools for custom and complex tasks always beat raw coding languages with the agility, speed to deliver, and ease of subsequent changes.
- Unlimited volume of data; you are only limited by the machine you run on.
- Python and R integration.
- Predictive functionality and text analytics. If it is not enough then you can use custom Python and R scripts.
- Looping functionality.
- Variables allow you to parameterize your flows.
- Run one node at a time, which is something that Alteryx users dream of doing.
- Managing (collapsing) sub-flows, which is another thing that Alteryx container users also dream of.
What needs improvement?
The areas that I feel need improvement are:
- It needs support for a joiner node to have three outputs (left unmatched, matched, right unmatched), as competitors do (have not checked 2019/20 releases).
- I need the ability to add additional comparison conditions to a join. For example, in SQL you can specify only rows with a date fitting within a date range from the joined file. At the moment in KNIME, you should allow a join explosion to take place and filter what you need later, but sometimes the output becomes too big.
- It would be helpful to have more examples of Java code for nodes, like Java Snippet.
- I would like to have this solution show row counts on canvas, as it would improve the control and speed to build the workflow.
- The pseudo-code types could be rationalised into one (e.g. only Java).
- I would like to see better web scraping because every time I tried, it was not up to par, although you can use Python script.
For how long have I used the solution?
I have been using KNIME for between four and five years.
What do I think about the stability of the solution?
My system occasionally may crash like other similar tools, although autosave is available.
What do I think about the scalability of the solution?
Scalability is limited to a desktop application.
How are customer service and technical support?
Obviously, as an open-source application, your options are limited but I have found answers on forums when I needed help.
Which solution did I use previously and why did I switch?
Recently I have been using Alteryx so I have collected a few points on differences in both tools. Both are good, I can conclusively say I could go back to KNIME and be as effective data professional as I am with Alteryx.
I have to use Alteryx due to my client's tool choice, but I know that what I am doing with Alteryx right now could be done better in KNIME. Of course, Alteryx has its own advantages for certain areas.
How was the initial setup?
It is a relatively simple install. You can even avoid installing it and run from a directory.
What's my experience with pricing, setup cost, and licensing?
KNIME is free as a stand-alone desktop-based platform but if you want to get a KNIME server then you can find the cost on their website. The fact that KNIME is open source may create challenges from an IT security view in an enterprise environment.
Which other solutions did I evaluate?
For this review, I would include Alteryx and Lavastorm (the latter is no longer available).
What other advice do I have?
If you need a good Visualisation functionality, you should use Tableau or something of that caliber. However, the data prep can be done KNIME, which would give you extra confidence that what goes into your Visualisation layer is correct.
Overall, KNIME is definitely worth considering.
Which deployment model are you using for this solution?