Collibra Catalog vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Collibra Catalog
Average Rating
7.8
Number of Reviews
5
Ranking in other categories
Metadata Management (3rd)
StreamSets
Average Rating
8.4
Number of Reviews
24
Ranking in other categories
Data Integration (8th)
 

Featured Reviews

PalakKhaneja - PeerSpot reviewer
Mar 7, 2024
Has efficient feature for endpoint lineage, but they could provide wide range of connectors
The product’s primary use case is metadata management. It helps us capture different datasets, including images, glossaries, etc Collibra Catalog has significantly enhanced data governance and compliance for our team, primarily through its valuable feature of endpoint lineage enabling visual…
Kevin Kathiem Mutunga - PeerSpot reviewer
Mar 24, 2023
Enables us to build data pipelines without knowing how to code and helped us break down data silos within our organization
Using StreamSets to create pipelines for batch streaming or ETL is easy and straightforward. However, if one is new to StreamSets, it may not be so simple and may require a lot of documentation for assistance. We utilize StreamSets' ability to connect to enterprise data stores, making it easy to begin trading instantly without needing to be technically skilled. We use StreamSets to move data into analytics platforms. In my experience, it is initially quite easy to move data back if we have a clear understanding of data transit, importation, and exporting from external sources. This solution enables us to build data pipelines without knowing how to code. The solution includes templates that guide us and help us customize our data easily. It is essential that StreamSets does not necessitate coding, as this saves a considerable amount of time that would otherwise be spent writing code, as well as resources that would be required to hire experts. Transformer for Snowflake can help with both simple and complex transformation logic. For example, creating a plan to perform EPL and machine learning operations is easy and fast. However, if the same operations are performed on-site, it can be difficult to troubleshoot events due to limited visibility into the results. StreamSets' Transformer for Snowflake is important to us because it saves us a lot of time and enables us to complete a task remotely with only two or three people. It is important that Transformer for Snowflake is a serverless engine embedded within the platform. We have the capability of creating a data operations platform, so we don't have to worry or even be aware of what we are doing at the moment. We can simply create a device and use it in the pipeline we want it to be in. The solution improved the way we work, benefiting both our customers and our development and retainer teams. StreamSets helps us develop a platform manually, with a lot of teamwork, either remotely or on-site, depending on which option we use. This has had a significant impact on our organization in terms of how we process and transform data. I would say that it is very easy for us to update the template so that we can have real, actual data in APL claims and in the supply chain. StreamSets' data drift resilience is very effective and can run in the data grid. The data drift resilience has reduced the time it takes us to fix data drift breakages by approximately 25 percent. StreamSets helped us break down data silos within our organization. The ability to break down data silos helps StreamSets to gain quick insights. In general, it is a great feature that ensures we have activities or processes in place. We know precisely what to prevent and what to implement. StreamSets saved us around 30 percent of our time, meaning that a task that would take five hours to complete manually can now be done in around three and a half hours. The reusable assets are reducing workload by 35 percent by allowing different people to use a single platform or resource, regardless of whether they have a similar SKU or a different SKU. This feature can help an organization simplify, implement, and transmit more easily. It is not only the cost of one packet that we paid for, but now we are implementing a strategy using different people within the company. It would be very expensive if we had to hire a new person to manage that task and it would also take a lot of time. StreamSets is not only saving us money, but it is also ensuring that we complete strategies on time. StreamSets as well helped us scale our operations, which has had a significant impact on our business. We now have a better understanding of how to secure data and provide reliable security for the transmission of data from internal servers to external services, as well as meeting our client's application needs.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Collibra Catalog's best feature is the data quality checker."
"We have had no complaints about the stability."
"Collibra Catalog has significantly enhanced data governance and compliance for our team, primarily through its valuable feature of endpoint lineage enabling visual representation of the data."
"The data lineage capability is valuable as it shows how different sources are connected and how data flows, which is crucial for projects like migrations. Moreover, data lineage visualization in Collibra Catalog aids our data governance initiatives."
"Collibra Catalog is simple to use and user-friendly for those who are not technically inclined since it is easy to find while also easy to see data lineage diagrams."
"What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
"It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated."
"The most valuable features are the option of integration with a variety of protocols, languages, and origins."
"The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
 

Cons

"I'd like to see more integration with other reporting sources."
"A key area for improvement in Collibra Catalog lies in its integration capabilities, particularly with a broader range of sources."
"Collibra Catalog could improve its automation to increase the efficiency of the software."
"The tool's overall functionalities need to improve since, nowadays, many tools, from a business perspective, are easy to use."
"The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
"Using ETL pipelines is a bit complicated and requires some technical aid."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed."
"There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline."
 

Pricing and Cost Advice

"The product is highly priced compared to other vendors."
"Collibra Catalog is fairly priced - I would rate their pricing seven out of ten."
"Collibra offers a per-user licensing model."
"I think they can bring a few more features and align better with other quality products."
"StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
"StreamSets is expensive, especially for small businesses."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
"I believe the pricing is not equitable."
"StreamSets is an expensive solution."
"It has a CPU core-based licensing, which works for us and is quite good."
"There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
"The overall cost for small and mid-size organizations needs to be better."
report
Use our free recommendation engine to learn which Metadata Management solutions are best for your needs.
787,383 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
26%
Computer Software Company
14%
Energy/Utilities Company
7%
Manufacturing Company
6%
Financial Services Firm
17%
Computer Software Company
13%
Manufacturing Company
8%
Government
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about Collibra Catalog?
The data lineage capability is valuable as it shows how different sources are connected and how data flows, which is crucial for projects like migrations. Moreover, data lineage visualization in C...
What needs improvement with Collibra Catalog?
I'd like to see more integration with other reporting sources like Qlik Sense, beyond the currently supported ones like Tableau and Power BI.
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

Video not available
 

Overview

 

Sample Customers

AXA XL, DNB, Adobe, PMI, Holland America Line, UC Davis Health, Cox Automotive
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about Informatica, Alation, Collibra and others in Metadata Management. Updated: May 2024.
787,383 professionals have used our research since 2012.