What is our primary use case?
Zaloni is actually a big data platform management tool. It is extensively used. They have different connectors. You can start ingesting your batch and you can also use real-time streaming, but we haven't used that component. We have used it mostly for batch ingestions.
It's not a single product. It has multiple pieces within itself. I think most of them are plug-and-play.
We are using the enterprise version of Zaloni.
How has it helped my organization?
The benefits of Zaloni are that it is readily deployable, so you will have the solution within the tool itself. It's a matter of how you need to integrate and how you can establish your connectors because it has a lot of connectors built-in. Then you run your data pipelines in that span of time.
It's commercial, off the shelf, with minimal configuration, and within three months you can go to the production if you are handling small-scale of data. But if it's huge figures, then it takes more time.
Another good advantage with Zaloni is that it has a very good schema evolution process but the process is very tedious to run on the tool. But once you go through that difficult phase, then it has a very good building. For example, if you want to ingest all the data, it has the ability to automatically find out the right schema at that time and place, and then it picks up that schema and runs the data accordingly. The historical data integration is another very good feature. You can ingest data whenever you want from the historical data.
What is most valuable?
In terms of most valuable features, Zaloni has different components. One is its ingestion aspect where you can create a lot of ingestions based on the file levels or based on the time. You have listeners actually continuously listening on certain poles, and as the data arrives it will start picking up. That is the one use-case that we have used.
Another one is batch ingestion where you can set up your timer. At the set time, it will check whether the file is present, then it will start the data pipeline and pick up the jobs to trigger.
Another feature is that it is time-based. That means the advantage with Zaloni is you can have both set up - either on data arrival or schedule-based, or you can have both, mix and match.
Once the ingestion starts, it has the connectivity to the rest of the big data components. You can store the data on S3 or wherever you want. It has different connectors to connect to on-premise and cloud. We were using on AWS, so our primary storage was the S3 buckets for the information. It uses the Hive as well. Without big data components, it cannot work on its own. You need to have big data installed fast, and Zaloni works on top of it.
It has Bedrock architecture which is a component that actually manages these schedules and all other activities. Bedrock is another tool within Zaloni itself.
There is one more component called Metadata Management. It's called EMDM I guess, but I don't remember exactly what the component is called. It allows you to manage your metadata management. It's kind of a business metadata. You can go ahead and write whatever fields that describe the data, then it manages them.
What needs improvement?
The major pain point with Zaloni is that their exception handling is not good. If any event happens (an event is when the job stops in the middle of the process), it doesn't tell you at which point it failed and it doesn't tell the operations team how they should take corrective actions unless you call Zaloni and then identify the issues. That is one issue.
Another issue is that sometimes your jobs fail and if you run it a second time, it will go through.
A third area it could be improved is the deployment process. When you want to deploy anything, it has a lot of manual processes. For example, you have to create your password in an encrypted format, and then you have to use a lot of manual deployment process. They should actually be building something else, like using Jenkins or automating their process. I have suggested to them that they have to improve their deployment process because they want everyone to run a manual deployment. It takes a lot of time, about half a day, for any single deployment. Then test it, then it might not work, and then reverting back is not easy because of the manual deployment process.
I think in recent versions they added a lot of upgrades and additional features including a lot of integrations. Before it was just AWS, later they extended it to Azure. I'm not sure how they have extended it to GCP.
It has some built-in features and a lot of improvements now because the UI and the features were not easy to navigate. Regarding showing the metrics, it's okay. I will say it's neither easy nor hard, there is whatever is required.
Lastly, on the governance side, it's not very good. We faced some issues with the Ranger version. Ranger was an authentication tool on big data and Zaloni had some compatibility issues with the Ranger at that time. Later they said those are all going to be resolved, but at that time it had some issues. I'm not sure whether it was working with the Sentry or not.
For how long have I used the solution?
I have been using Zaloni Data Platform within the last 18 months.
What do I think about the stability of the solution?
Stability-wise, it's good. I don't see many issues, except one thing - once a week from a job user to event. But after that, if you resubmit your job, then it goes through. So I don't see an issue once you establish it. It's a good product to continue with.
So when you say maintenance it means daily operations is the one thing that I can look at. The other one is version upgrades and upgrading the security patches. Because of the employers, we relied on the Zaloni team for any maintenance activities, like version updates.
Our team needs to handle daily operations. I was one of the team managing daily operations like running and making sure the cluster is up, jobs are running perfectly, and the data is updated for the next business day and available for business users.
What do I think about the scalability of the solution?
In terms of scalability, we were the first one to implement it with the scalability. We tried and tested the scaling with the AWS. It has good scalability. When it comes to auto-scaling Zaloni the only thing is the underlying cluster should have the scalability.
We implemented it in AWS, and once we defined the threshold, I think we were able to run on five different instances. It is auto-scaling enabled on the AWS server.
That was the platform with Zaloni itself. Initially we implemented a big data solution, massive data, almost one petabyte of data. The entire need was to ingest it into big data and Zaloni was the fastest product to test and implement into production.
But before that, there were some attempts to try to build their own clusters without any tools or anything, but those were not successful. Then they had to go and buy Zaloni and then implement the solution.
Because of the complexity involved, my employer was trying to switch to other platforms because they wanted to try different tools rather than sticking to Zaloni because of its difficulty in managing and the version upgrades, because every time you need to have somebody from Zaloni look into the issues.
They were identifying different tools and experimenting with it.
How are customer service and technical support?
Their support is very good because it has a dedicated support team for our employer. They were able to respond. If you call anytime, 24/7, people are available and they made sure that things are taken care.
They are quite responsive in that.
Which solution did I use previously and why did I switch?
I had experience with similar solutions, but Zaloni is something different. They market it as a big data management tool but it's not really because it handles only part of all the data.
If you take the example of Cloudera, it makes your job easier to manage your clusters because it has a lot of built-in features and a lot of UI features so you can add a node at any time. You can decommission the node, you can balance the cluster. But Zaloni doesn't have many of those capabilities. It's more like you can look at it as an ingestion tool. Actually, a little superior to ingestion because it has some metadata management and you don't have to procure another license for that.
On the governance side it doesn't help, but with additional features you can manage your data confidentiality, with data plug-and-play solutions. You can encrypt all the data or you can encrypt only certain fields in the database. You can do it while ingesting or you can encrypt once it is ingested. It has the ability to encrypt the data throughout your pipeline.
How was the initial setup?
In terms of initial setup you need to actually get Zaloni support to do that. We had evaluated the different tools like Talon and other ingestion tools. Even AWS has one of the tools, I forgot its name. We evaluated how we can use these tools to simplify this process.
Whatever workflows you're creating you'll have to create on Zaloni only. So if you create outside of the Zaloni, it doesn't know anything about that so you have to have a scheduler built-in to the platform itself. It has the ability to integrate your data pipelines, then you can deploy the code within that. You have the ability to manage your metadata.
What was our ROI?
I would say there was a return on investment. I would say within two years, but it depends how well you can sell your data. It depends on how organizations look at it. It changes from everyone's perspective.
How is your organization really interested in selling the data and making money by exposing the data to the APS? Then definitely you can get the revenue much faster.
But in my organization, that's not the model, because the model is that they wanted to give data to business users so that they can play around with the data with less effort.
What's my experience with pricing, setup cost, and licensing?
I don't know how the licensing works, to be honest, but it's quite expensive. We paid around 150 to 200 grand, per yearly basis. There is no special pricing charges. It's just licensing regardless of the number of users and regardless of how much data you're processing. They don't have all the different licensing structures.
Which other solutions did I evaluate?
We are thinking about Talon, which is a similar product. There are a lot of different products that can be used. In a different organization, Apache can also be used. That is an open-source. And if you are really rich, then you can go Pentaho Data Integration, a similar tool, that is much easier. Informatica big data component is another one.
What other advice do I have?
I would recommend Zaloni. But before choosing they should evaluate different options so they know which one is better. Even though we have similar products, some products are good in one aspect, some are not good in others.
Take the example of Informatica. Informatica is very good for database warehousing platforms. But when it comes to big data and when you have to input the data into big data then it has a lot of difficulties. Then I had to use another component that was better in handling the data.
So it's based on the need. What is your objective and where are you pulling out the data? If it's just simple, you're pulling the data from only RDBMS then you can rely on Zaloni or in any other product and then you can use it right away, out of box. You don't need any other tools as such. But maybe you are specialized in one them and you have a lot of restrictions, like in financial institutions, when you're not allowed to get the data from any data source. What they do is offload the data from the database and then they put it in some server. Then you can access the data from that server. There are lots of layers for building it to manage the security. If that is the case here, then you have to look at which one suits you best.
There are areas of improvement. One is especially the manual deployment process that was in place. That's one of the biggest challenges. Also when you're creating the entities it could have been done much more easily than it was.
On a scale of 1 to 10, 1 being the worst and 10 being the best, I would rate Zaloni a seven.
Which deployment model are you using for this solution?