The term « data pipeline » is a reference to a sequence of processes that gather raw data and convert it into the format that is utilized by software programs. Pipelines can be batch-based or real-time. They can be deployed in the cloud or on-premises and their software is commercial or open source.
Data pipelines are like physical pipelines that bring water from a river to your home. They transfer data from one layer to the other (data lakes or warehouses) like a physical pipe brings water from the river to your home. This allows for analytics and insights to be drawn from the data. In the past, transferring these data was a manual process like daily uploads and lengthy wait times for insights. Data pipelines replace manual procedures and allow organizations to transfer data more efficiently and without risk.
Accelerate development with a virtual pipeline of data
A virtual data pipeline offers significant savings on infrastructure costs in terms of storage costs in the datacenter as well as remote offices, as well as hardware, network and management costs associated with deploying non-production environments like test environments. Automating data refresh, masking and access control based on roles and the ability to customize and integrate databases, can cut down on time.
IBM InfoSphere Virtual Data Pipeline (VDP) is a multi-cloud copy-management solution that decouples test and development environments from production infrastructures. It uses patented snapshot and changed-block tracking technology to capture application-consistent copies of databases and other files. Users can mount masked, fast virtual copies of databases in non-production environments and start testing in just minutes. This is particularly beneficial to speed up DevOps and agile methods as well as speeding up time to market.