StreamSets
Overview
StreamSets Data Collector (SDC) is a lightweight, powerful design and execution engine that streams data in real time.
SDC was released to the open source community in 2015.
SDC is supported by StreamSets (founded 2014) that was 2018 Cloudera Partner Impact Awards Winner.
StreamSets Data Collector Features
Web-based user interface
Highly configurable:
“At least once” and “At most once” delivery guarantees are supported.
Deep integration with the Hadoop ecosystem, including connectors for HDFS, HBase, Kafka and Solr.
Flexible deployment targets of pipelines to edge servers or to clusters.
Deployed as a Spark Streaming application or as a MapReduce job.
Embedded monitoring to provide runtime visibility to data flow performance.
A key concept in SDC is the idea of pipeline
What is pipeline?
A pipeline describes the flow of data from the origin system to destination systems and defines how to transform the data along the way.
A pipeline consists of a single origin stage to represent the origin system, multiple processor stages to transform data, and multiple destination stages to represent destination systems.
Links
Last updated
Was this helpful?