We performed a comparison between Matillion ETL and StreamSets based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The most valuable feature of Matillion ETL is the ETL. The solution is open-source which provides advantages, such as good performance and high efficiency. Additionally, it supports three data types which eliminates predefining the data, and we can write script models in Python."
"The most valuable feature of Matillion ETL is its ease of use. If you have had some experience with other solutions, such as Snowflake, the use of this solution will be simple."
"It has helped us to get onto the cloud quickly."
"It can scale to a great extent. It can handle the load that we are putting on it, which is about 5TBs."
"The technical support treats us well. They already have a support portal, and they are responsive, which helps."
"The loading of data is the most valuable feature of Matillion ETL."
"Matillion ETL has great Git integration that is perfect and convenient to use."
"It's been able to do everything we require."
"In StreamSets, everything is in one place."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"It is really easy to set up and the interface is easy to use."
"The ability to have a good bifurcation rate and fewer mistakes is valuable."
"I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"The scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy."
"Ideally, I would like it to integrate with Secrets Manager as well as the AWS."
"The tool's lineage is very weak."
"I am looking forward to seeing the expansion of the source range for their data loader product."
"While the UI is good, it could be improved in its efficiency and made easier to use."
"The current version is a bit more limited because it's on a virtual machine, and everything executes on that one virtual machine."
"Our main challenge currently is that Matillion runs on an EC2 instance, limiting us to running only two processes simultaneously at the entry level."
"It needs integration with more data sources."
"The improvement area could be possible if the tool provides better integration capabilities with other ecosystems, including governance tools or data cataloging tools, as it is currently an area where the solution is lacking."
"They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved."
"We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered."
"The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
Matillion ETL is ranked 4th in Cloud Data Integration with 24 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. Matillion ETL is rated 8.6, while StreamSets is rated 8.4. The top reviewer of Matillion ETL writes "Efficient data integration and transformation with seamless cloud-native integration". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". Matillion ETL is most compared with Snowflake, Azure Data Factory, AWS Glue, Informatica PowerCenter and SSIS, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and IBM InfoSphere DataStage. See our Matillion ETL vs. StreamSets report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.