What is our primary use case?
I've handled different projects with this solution. After college, I've handled different projects. The most recent project that I handled was for a company from India. They were looking for a measure classification in regards to the type of engines that cars have, and the pollution levels that they have.
There was a mixture of text data that had to be classified. There was the need to transform the text data to a data type that would be easily classified. When employing text data you can't do classification directly. I had to clean the data and program all the variables to suit the required information.
How has it helped my organization?
The client has not provided me with a review yet.
However, I have benefited greatly as it's given me the confidence to tackle tedious projects that clients did not want to tackle in either Python or any other data analytics or data science software. By gaining that confidence, knowing very well that you have enough analytical skills, and now translating that to software-based platforms and juggling around with the same, has actually given me some level of proof and confidence.
Besides experience, the fear of not wanting to handle things is what might hold you down. As far as I know, with the number of years that I've used Weka, the confidence part is the most important item I receive from the solution.
What is most valuable?
The features that I found most valuable are the classification features. They have a lot of information and a lot of intel. With classification, there's always a chance to split the data into two datasets. You can split one metadata into 92 datasets during that train or test, and the performance can easily be identified after you've trained a model.
With clustering, if it's a yes, it's a yes, if it's a no, it's a no. It gives you a 100% level of accuracy of a model that has been trained, and that is in most cases, usually misleading. Classification is highly valuable when done as opposed to clustering.
What needs improvement?
If you were to open the software, there's a section written filter. Then you'd choose your filtering. The filter section lacks some specific transformation tools. If you want to change a variable from a numeric variable to a categorical variable, you don't have a feature that can enable you to change a variable from a numeric variable to a categorical variable. This needs to be improved.
Also, when you go to classification, there are some cases in which, under any employed data, under the classification section that you can not actually use tests data alone or trend data alone. Under classification and clustering as well, they should give options to only supply when you're making classification or performing classification on a dataset, then there needs to be an option to either use at trend data first, and then you supply a test data later on.
If they went full open-source, like Python and R, it would help the growth of the solution.
For how long have I used the solution?
I used Weka in my undergraduate studies. I've majored in mathematics for the past several years.
I had used the solution for three years, as I started using Weka when I was in the second year of school for daily tasks. Then I used it for two years for my masters. I didn't use it for my Ph.D. program as I was researching how to use Python and integrate it. During my studies, I used it for five years and in the field, in the actual field of employment, for contracts and employment, I've also employed it for another two years, so that makes it a total of seven years that I've used the solution.
What do I think about the stability of the solution?
The stability depends on the predictor variables and the independent variables that have been employed. You have to find predictor variables that are a true representation of the response variable. If the predictor variables are a true presentation of the response variables or variable, then you definitely have higher percentages, which is a true reflection of the classification algorithm and performance element.
If you've selected the most accurate predictor variable or independent variable, then there will be a highly stable solution. However, if you've selected a predictor variable that does not accurately with the response variable, then the stability of the solution will not be very accurate.
The reasons as to why an individual might be loyal to the services of a telecommunication company or might opt for another telecommunication company are the things that revolve around the subscription rate fee, for example. And the speed of offering services. We look at things that are in relation to the company itself, and the financial relations of individuals that are subscribed to services of the company. However, when you look at different things like age and gender those are not very important. When you choose the best variable, you will definitely get a highly stable solution.
What do I think about the scalability of the solution?
I've never tackled very, very big datasets with this solution, in the way I've tackled them with other data science software. However, from what I know, it is that it can highly scalable, and can perfectly handle very big datasets without any complications.
A very big dataset is a dataset that has, for example, more than 100,000 rows, or rows that run into the millions.
How are customer service and technical support?
The technical support is highly responsive. The university developed a 360-degree problem-solving platform. You find it entirely manageable, and interactive. Alongside that, if by any chance you're stuck at any point, then someone will get back to you via email or live chat. They're, highly interactive, and of course, they're there. It's why this is a very, very good platform.
Which solution did I use previously and why did I switch?
I have experience with solutions like R, Bacillus, and Python for data science. Python, in itself, has the best visualization ever. You can clearly see graphs nicely floated, normal distributions perfectly done, key distribution perfectly done, perfectly elaborated and perfectly labeled.
This is different from Weka, as, when you want to visualize everything all at once, you'll get a tiny graph. With Python, you can visualize all the graphs, and it doesn't matter whether they have picked your 100. All you have to do is change the scale of the graph, and then you will have a longer chart, but with highly defined graphs. If you want to visualize one particular graph, then the visualization will also be clear.
A pro on the side of Weka is that you do not need to have programming skills. With Weka, you just point as you grow, as you change, as you drag, as you drop, and as you click and you just run, and things show up. If you are a data scientist or you are a data analyst and you don't have enough coding tiers, then Weka is the right tool to use. But if you are good at coding, then you can go to Python.
How was the initial setup?
The process is straightforward. They did excellent work in easing up the processes of actually installing Weka. They also have tutorials on different platforms that make it easier for one individual to make references, go back and forth, and clear errors.
The data and the specifications that I've set up, when running any machine learning algorithm using Weka, are easy. For instance, let's say, I have a dataset with 400 rows, and then I have another dataset with maybe 50,000 rows. If I am to classify a specific variable which has yes or no entry, and the one with 400 rows, when I'm using folds, it will be classified faster. Then, with the one with the 50,000 rows, it will be classified slower as we have to do 10 fold for maybe 50,000 rows. It really depends on the size of the dataset how long it takes.
What's my experience with pricing, setup cost, and licensing?
I use both the paid and the open-source versions of the product. If you're a client and you don't want very many details incorporated in your solution, then we will go full open-source. Open source doesn't have very many solution alignment incorporations. However, the paid version has very many options and stuff that needs to be incorporated when providing a solution. It depends on the specifications of a client which we would use. It's not about the price.
What other advice do I have?
The solution is a desktop application. I did not deploy it on the cloud, actually. It's an application that is on my desktop, on my laptop.
If they want their task done faster, and they do not have enough coding expertise, this is definitely an excellent solution to choose from. If they want additional experience because Python and R might be a good option. With Weka, it looks like you're using maybe something like a Microsoft power BI. With Python or R you're actually giving a data scientist a run for his money as things change every day and things evolve and you have to dig deeper, you have to provide new stuff.
Overall, I'd rate the solution nine out of ten. It's tied with R in terms of how I would rate it. However, I find Python the best.