Please share with the community what you think needs improvement with Collibra Governance.
What are its weaknesses? What would you like to see changed in a future version?
Collibra is very good at talking to modern database systems such as a normal RDBMS (e.g.DB2, SQL server or Oracle). Where it isn't great is with older technologies that you'll typically find in finance or insurance industries (e.g. VSAM or ISAM, or those types of older technologies). It just doesn't connect with them very easily. They do provide an ability to use a separate product called MuleSoft, which they used to license (as a bundle) up until last year until Salesforce bought MuleSoft, and that division is happening in 2021. With this 'bolt-on', you could go and get that data, but you had to write that code and maintain it yourself. It wasn't an out-of-box (OOB) feature, which is what we really liked from the Collibra offering. Our only way to access these older technologies was to create a MuleSoft flow, maintain, and deploy it. This leaves us with technical debt which will need to continually be maintained. In fact, we built all our custom Mulesoft flows using Mule 3.x and will soon be pushed to upgrade to Mule 4.x. This will not be a simple upgrade and will likely result in additional cost to bring in consulting resources more familiar with the technology. Since we do have a lot of older legacy systems, things that aren't greenfield, if you will, it adds a lot more overhead than what we were originally led to believe when we originally purchased the product. We're not that deep into the Collibra product yet because it's only been a couple of years. We do like their ability to automate the workflows, such that, for example, if somebody comes in to say, "I want to request access to this data," you can build your own workflows to automate the approval process. There are some that are out-of-box, I think they could go a little bit further with some of their out-of-box workflows instead of having to create a workflow manually, get somebody to code it, and implement it. I think they could offer a bit more in that respect. The second item that I think they could do better at is to have other products, or have things where they have a set of taxonomy per industry that says, "Here's what a policy is. Here's what a customer is," that kind of thing. They don't implement that out-of-box in Collibra, you have to do that yourself, whereas other products bring that to the table. Informatica, I believe, has their own insurance industry or industry specific taxonomy that would come with the product. It makes adding the new logical constructs to Collibra a more manual workup to take care of. The classification becomes more manual because you don't get that out-of-box to say, "Hey, I recognize that that's a policy, because I know that about that and the taxonomy." You have to manually make that connection.
I'm fairly new to the product, however, what I generally hear from my clients is that the requirement around having ways to ingest more metadata. Currently, with Collibra, they provide you a catalog platform, which helps you integrate or get metadata from a few commonly known platforms, like Tableau and IBM Db2, and Informatica. If they could bring them through, or if they could bring in more connectors to help us ingest metadata from other systems as well, that would be really helpful. That would reduce a lot of time and effort from our end. If people had backward compatibility as well, that would be much better. I've also worked on other technologies, primarily Java, which is very, very much backward compatible. Any new implementation which they bring in does not impact your existing work to a heavy extent. It would be helpful if Collibra was similar.
The solution needs to be controlled. It can go sometimes out of hand. The speed sometimes, especially now, since we have moved to the Collibra Cloud, has not been the best. The management of the speed of the tool is not that great. It's also partially impacted by the fact that we need to use a VPN and we have got a lot of security measures. Sometimes it's not working well together with everything else. That is the main pain point that we are having. Occasionally we get little bugs that occur, however, this is typical. We would like to have a data lineage feature. It's just like on a different module. That's already available, as well as some advanced connectors. From my perspective, I would like to see improvement in the dashboard creation, to make it easier to create a really nice dashboard, and to also be able to play with the user interface when it comes to those dashboards.
The connectors are not very sophisticated. They can do, for example, Informatica and Tableau, but the connectors themselves could be improved. I recently got a subscription for another 600K for Collibra for one more year, so the author licenses are not used much. And they keep changing the UI platform; that can also be improved. From an administration perspective, I like the white-glove onboarding part of Collibra. That was actually nice and I really liked that. For administration in general, I like that you can use Collibra however you want. It's more raw and easily adaptable. So you can cook it or you can steam it or you can make changes to it in a lot of different ways, but it would also be nice if there were an already available analytics tools like Tableau at hand. Though it is easily adaptable and you'll have a completed end product which you can really leverage.
There are many new aspects of the solution, however, I haven't yet gone through the documentation to see if they really help solve for issues or not. Many features have recently changed their appearance and I need to re-learn how they work. Sometimes, if a client needs a specific customization, we cannot do it directly. The client needs to reach out to Collibra and request the customization. The technical support is very poor.
Collibra, as far as I know, does not have a connector like Oracle, or a mainframe. It's important to have a connector so that you have access to up-to-date information. Sometimes the data can be out-of-date as the updates are not automatic. Users could be looking at obsolete information. You need to be precise about the names of the field and you have to develop them yourself. It's my understanding that they are working on a solution where you can import all the information that you need from a data validation too, or from a CRM. It's something they really need to get better at. It would be better if there was a way to import all data and metadata in an automatic way in one block form.
I am a business person — I am a team leader. My duty is to ensure that the data governance processes are set up; that's how I started to use Collibra. There are certain limitations I have observed in Collibra. With regards to our data lake, Collibra doesn't give us direct connectivity to the Azure Data Lake. We have to establish data lineages. We have to browse those files manually and then connect them via Collibra — that's how data dictionaries get published. Overall, it's quite a manual type of process which needs a lot of human intervention. I've been hearing that tools like Talent are going to be available soon, which we hope to leverage in the near future. Talent is similar to other ETL or Informatica-type tools. It directly connects to the source system, captures all the transformation tools, and provides you with a spreadsheet that talks about data lineage, which can be fed into Collibra. If this functionality could be improved, it would be a great time-saving solution. It would require less effort and it would be a more automated kind of system, less dependent on human operation, which means that it would be less prone to errors as well. We create and issue the management of workflows with Collibra. In regards to workflows, I find that they can be made very simple. For example, a request goes directly to the person who is in charge of that particular asset and some simpler workflows can be assigned to it. Recently, I find that the default process of issue management in Collibra is really complex — It wasn't really helpful to us.
One problem is the data lineage, especially extracting the ETL transformation from different ETL tools and identifying how the data is getting changed from one layer to different layers and how the transformation is applied. It doesn't support all the ETL tools for extracting the transformation logic. It supports some of the tools, but there are still some tools that need to be supported. There is also a small pain point in terms of integration. There is a little bit of change in their strategy from Collibra's end. Earlier, they used to offer two solutions. One was out of the box, and one was a custom-built solution for which they used to provide a dual connector. Now the focus from the Collibra side is more on using the out of the box connector. They are discouraging doing the custom integration. That leaves us with two problems. The first problem is that the out-of-the-box connecter is not yet enabled for a lot of systems, and the second problem is that the out-of-the-box connecter has certain limitations. If we want to tweak those as per our needs, it is not possible. However, the custom-built is still supported, and you can still build a custom integration by using the API, but it is not very encouraged by Collibra. Its dashboard also needs to be improved. There are options to use the HTML code to customize your dashboard, but it has a lot of limitations.
It should have more integrations with things like CyberArk because its main purpose is GDPR implementation. We have to have more scope for things that implement more privacy. CyberArk makes sure your credentials are vaulted and your things are secure when you're creating your integrations or connecting to an application. I do believe that they are working on this feature.
The workflows and the language they use needs to be improved. Programming the needs for every user on the workflows is a key improvement that is required. In addition, they haven't updated their training solution in a while. We need to implement a lot of things ourselves and they want us to move to the cloud but there are a lot of glitches in the system. There are three environments - stage, development and production. Often things work well in the first two stages and then when you get to production, they don't work. It happens a lot and their response is slow.
We have an issue with metadata history. If someone changes the metadata, we can't see who changed it. But they are trying to upgrade the system with this feedback and are still working on it. We are still waiting for a proper log to maintain the solution.
The issue may be the way it's been implemented in my company but, for Collibra to be really useful, what's missing is an easy way to connect to different data sources and different types of data sources and actually ingest and profile some of that data. That's the trouble we've always had in getting wider adoption of the tool. Unless there's a mandate from the enterprise data office or the like, regular users are not going to use the tool for really robust business use cases without having some actual data in there. I know there is some out of the box capability for this, but I think it needs to be easier for Collibra to actually ingest and run some basic profiling on the data itself. That's currently missing from the tool.
While connecting with the data source, it's not very easy. If there's a firewall, it is difficult to connect with the database. It's not easy when you are configuring on the database. Right now, the client is decommissioning the MuleSoft integration and they're moving to APIs. Collibra Connect and MuleSoft integration were there before, however, now there's a move to API. Within a year or two, they will all move to API. Whoever is using it now with MuleSoft and Collibra Connect needs to find another way for connecting with the API. I don't think they are providing additional software for MuleSoft integration. Primarily, they are telling us, okay, we will decommission this and move to API. The only thing that's lacking in terms of the change is when connecting to database. Sometimes the connection causes issues if the data is breaking the firewall and ingesting the data.
The breadth of available connectors for metadata ingestion need to grow quickly to support customers as they expand their data governance programs to include a diverse list of source systems from which they want to derive business value. The connectors are needed to bring metadata into Collibra and enable lineage, workflows, definitions, etc. That said, this is not just a Collibra problem - this is an everybody problem. The central challenge is the availability of APIs to ingest text structural metadata, which is a common problem across any data governance platform or even any integration platform, honestly. To be fair, I would say that Collibra's purpose and primary value is as a collaboration platform, which is the core value of business-centric data governance, and not as an integration platform. For this purpose, they are clearly the leading solution.
I'm checking the following two products: Collibra Data Governance and Azure Purview.
And I'm looking for your inputs about the difference between them. What does Azure Purview offer that Collibra Data Governance doesn't?
Thanks in advance for your help!
What are key differences between MDM and Data Governance?
What are the practical differences in which each of these solutions is applied?