Data lineage

From Wiki @ Karl Jones dot com
Jump to: navigation, search

Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time.

Description

It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources.

It also enables replaying specific portions or inputs of the dataflow for step-wise debugging or regenerating lost output. In fact, database systems have used such information, called data provenance, to address similar validation and debugging challenges already.

Data Lineage provides a visual representation to discover the data flow/movement from its source to destination via various changes and hops on its way in the enterprise environment.

Data lineage represents: how the data hops between various data points, how the data gets transformed along the way, how the representation and parameters change, and how the data splits or converges after each hop.

Easier representation of the Data Lineage can be shown with dots and lines, where dot represents a data container for data point(s) and lines connecting them represents the transformation(s) the data point under goes, between the data containers.

Representation of Data Lineage broadly depends on scope of the Metadata Management and reference point of interest.

See also

External links