What is our primary use case?
Weka is a machine learning tool where we can use supervised and unsupervised learning tools to detect anomalies, for clustering, or classification algorithm.
The deployment method depends on the business's requirements. When I worked at the Air Force, it was all cloud. I deployed it on the cloud but that was treated as on-premise because that is confined within the Air Force. It depends upon the requirement of the user. If they want it on-premise, I can provide that. If they want it to be hosted on AWS or any other cloud services, that can also be done.
How has it helped my organization?
Our customers wanted a scale-based query to generate anomalies based on the data. We had a good experience when there is a small dataset or there is a known set of attributes. If you have at least a definition of the differences between attributes, then you can use the SQL, whereas in machine learning it is quite different. You don't have a case, it a kind of fuzzy logic being used to detect anomalies.
When they were using SQL they were getting they had quality data. We used Weka for a learning period, meaning how much data we have used to train and model to generate a condition. It was generating thousands of anomalies and those were not correct, because the attributes they were using and the SQL can be used with that difference between attributes at least.
When I used Weka for processing, I used these kinds of algorithms and it was very clear when I tested that output of the string algorithm using different techniques. I ran another Java program to check whether these anomalies are being properly predicted or not. So there I found that Weka is quite helpful compared to other programming techniques or the SQL-based solutions.
What is most valuable?
I like the machine algorithm for clustering systems. Weka has larger capabilities. There are multiple algorithms that can be used for clustering. It depends upon the user requirements. For clustering, I've used DBSCAN, whereas, for supervised learning, I've used AVM and RFT.
Weka is useful for analyzing any data set you want to analyze or if you want to run algorithms of small data sets. When it comes to the enterprise solution, you can use Weka libraries or at least this algorithm that is very available in the Weka libraries. In Java, I can manipulate all these algorithms and the libraries of Weka to produce the desired result for a customer.
What needs improvement?
I believe there are a few newer algorithms that are not present in the Weka libraries. If I want to have a solution that involves deep learning, I don't think that Weka has that capability. In that case, I have to use Python to predict any algorithms based on deep learning.
What do I think about the stability of the solution?
Weka is a stable solution. It has been working well for the past two years. I spoke to a few of my work colleagues. Even a 40-year-old was built over on PowerPoint Weka frequencies still works well. So Weka is definitely a stable solution.
What do I think about the scalability of the solution?
Weka is not horizontally scalable. If I had to run a large dataset over Weka I would have to have a very large usage. If I add another node into Weka and I want to have a cluster environment for Weka, it will not work. If I have data from various sources and it's a large amount of data, if it's possible to speed into various parts, and I can view this data in two different machines I can install Weka into four machines and then I program and move this data into four machines.
In that way, Weka can be horizontally scalable, but as a solution, it is not horizontally scalable. It is vertically scalable.
Weka doesn't require maintenance. Once the solution is left and it is deployed nobody is required to maintain it. Weka is quite stable, it doesn't cause any problems. If you want to deploy this in your enterprise, they help to properly implement those profits. Once it is properly implemented no maintenance is required.
How are customer service and technical support?
I have never used their technical support.
Which solution did I use previously and why did I switch?
Python is quite a hostile solution. If I get data it may not be in the format I request to run an analysis. Python is quite handy and it is easier than Weka to implement.
Weka provides a UI. If a person is very new to machinery or if somebody wants to run an analysis, Weka requires minimal programming but you need to have the knowledge of artificial learning. If somebody doesn't know it, they can't implement it.
How was the initial setup?
The initial setup was very straightforward. I have been doing Java programming for the last 20 years. Java is quite easy for me. It is written in Java and it is open-source. All courses are available in the first course of the Weka library.
When I tried to implement a Weka solution along with Java for any customer, it is quite straightforward because I just need to put a dependency of their JAR file inside the project and then I can use all their function and capabilities that are provided by Weka. That can be applied very well. There is good documentation of that and there are examples of the processes where Weka's features could be implemented. It is quite easy to use.
The amount of time it takes to deploy depends on the requirements. For performance, it took me only a day, meaning eight hours of work, and I could provide a solution for the Weka part only. For the UI and for other things, that is different.
Hardware took quite some time because the data was too large. Weka is not capable of handling a large amount of data. They wanted the solution to be Java and we didn't have any other libraries to do that. So I split out that data into the smallest chunks and then I ran these algorithms on that smallest data set. I combined that data and then manually produced the results. In that case, it took around six months to provide them a solution. It can take a day and then it can take up to six months.
Implementing the algorithm doesn't take much of your time. What takes time is how much data a customer has and how clean the data is. In terms of performance, it was quite a good data set. Every field of their attributes was available. There was a feature called collation-based features and I used that and it collated the results within a few minutes. Based on that, I implemented KLN on that. It is quite dependent on the data set the customer provides, how clean the data is, and what the output they want out of that data set is.
What was our ROI?
I think Weka is definitely a good investment, that is why we still use it. It has performance analytics as well so I think it is a better solution than others.
What other advice do I have?
Weka is pretty comprehensive and easy to use.
This is the first time that I used machine learning. I have a master's in technology. I analyze small data to get insights into algorithms. I learned a lot from all the files, then I implemented those into a Dell program.
It has many features that are not available and there is not much development since it is open source. It should be developed faster. I would rate Weka a six out of ten for these reasons.
Which version of this solution are you currently using?