What are native data pipelines?
Native data pipelines are pipelines offered directly from a vendor's product, allowing customers to push data to their data warehouses without involving a third-party ELT provider. Native data pipelines enable customers to transform and push their data directly from a SaaS app to their preferred destination.
Why would I offer native data pipelines from my product?
First-movers like Stripe (surprise, surprise) have begun to roll out native data pipelines to their customers over the past few months. As more companies have adopted data warehouses such as Snowflake, demand has increased for a secure and efficient way to get data from SaaS tools into warehouses.
Customers of SaaS products have traditionally invested in building their own pipelines or contracted third-party ELT/ETL providers like Fivetran for this purpose. These options can work but each have their own drawbacks such as limited data access and pipeline reliability issues, since they are reliant on SaaS tools' external APIs. On top of that, there are inherent security risks introduced by involving a third-party provider to move data.
Customers want data directly from the source and are requesting that SaaS vendors provide native pipelines that are easy to configure, minimize security risks, and offer the highest quality data. Not only do native data pipelines meet these customer needs, but they also represent a new revenue stream for SaaS tools that was previously captured by third-parties.
We've seen fast adoption of native pipeline offerings such as Stripe's and believe that every SaaS company will soon follow suit as offering data sharing features becomes table stakes. In this world, SaaS customers will create pipelines and data transformations by simply pressing a few buttons on their product dashboards.
An example: Using a native pipeline vs a third-party provider
Imagine ACME Inc. wants to push customer data from HubSpot into its Snowflake warehouse. By contracting a third-party tool like Airbyte or Fivetran, ACME Inc. can use a pre-built connector that accesses data made available via the HubSpot API.
Now let's say HubSpot decides to offer native data pipelines to its customers. A team member at ACME Inc. can then activate a secure data pipeline directly from their HubSpot dashboard without involving any third-party. Since it is offering the pipelines, HubSpot can choose to expose more data than is made available via its API (Stripe has done this with their data pipeline product) and ACME Inc. doesn't need to worry about the pipeline experiencing breaking changes because data is flowing directly from the source.
How do I begin offering pipelines to my customers?
To offer native data pipelines you'll need to build out
- connectors to all sources and destinations (Snowflake, Redshift, etc)
- a workflow engine for executing ETL jobs (scheduling, retries, progress tracking, resumption after failure, durable timers, etc)
- a transform engine that can munge data into the desired format
Data sharing infrastructure can be built out internally, however, it will most likely require a large investment of engineering resources and perpetual maintenance.
A better option is to use open source infrastructure like Pipebird. Using Pipebird, your company can begin offering native data pipelines from its product in a matter of hours - the only thing your team has to do is set up a source and add a destination through the Pipebird API.
How does Pipebird work?
Pipebird is embeddable infrastructure specifically designed for securely sharing data with customers. With Pipebird you can:
- select sources to push data from (ex: Postgres, MySQL, CockroachDB, etc).
- let customers configure pipelines and apply transformations (such as type casting).
- periodically sync data directly to customers' warehouses (such as Snowflake).
Pipebird is open source and designed to be hosted on your infrastructure so that you and your customers remain in full control of your data at all times. You can get started with Pipebird by viewing deployment options on our GitHub repo, emailing email@example.com, or joining the Pipebird Slack Community.