Metadata Management: Lineage Tracking in Git (Part 3 of…
This article is third in a 4-part series on managing metadata assets using Git version control for metadata in the IRI Workbench GUI. It focuses on the tracking of job script histories for the benefit of metadata lineage analysis. Other articles in the series cover the use of Git as a metadata asset hub, for version control, and for metadata security.
In the previous article, I demonstrated the submission of IRI project metadata assets through Git in IRI Workbench, built on Eclipse™. This article explains the importance of metadata lineage analysis as it relates to the ETL and data governance tools (like CoSort the FieldShield data masking tool) in the IRI Voracity data management platform, and how to view the history of an IRI job or project stored in a Git repository.
Metadata lineage shows us the origin of our data and how it has been transformed during processing in order to reach its current state. Understanding and retaining the knowledge of data’s origins, changes, and destinations, are fundamental to data governance. It also improves confidence in the metadata itself, the data and processes the metadata defines, and the decisions we make from the end results of the processing.
Along with documenting the flow of data, metadata lineage can help us manage and secure our data more appropriately, since lineage also stores information on the location, time, and data managers themselves.
Using Git, lineage can be presented conveniently in the context of metadata version control. The history of each version of your asset can be viewed and analyzed, with a log of the changes and who made them. To access this view, right-click on your project and go to Team > Show in History.
With this selected, the project history appears, and details each commit made over time. “Head” means that this is the current commit where your repository is pointing; and “master” signifies that this is the main branch of your project.
At any time, you can compare the current and past state of a file. To do so, right-click on the file you wish to analyze and select Compare With > Local History; then select the revision for comparison. This will indicate the exact changes between the two versions. Following is an example of a field (column) name change in a SortCL job script managed by an IRI CoSort (FieldShield, NextForm, RowGen or Voracity) product user:
Changes to metadata and other assets are easy to see in this view, as is who made the change (and when). Having those changes in the repository will allow others to deliberate over the implications. Searching (see below) for all instances of the string across your projects will expose their impact beyond the immediate application.
Beyond the metadata lineage information that Git provides, IRI Workbench users can use the extensive file and text search features of Eclipse to analyze lineage, along with the impact of changes (metadata impact analysis). Search for metadata within or across projects; for example, column names that appear in one or more data definition files (.DDF repositories), or transformation rules (like conditions) in IRI job scripts.
Search window results allow you to not only see occurrences at a glance, but drill down to open each asset where the item occurs (e.g., when a given field name is specified). This gives you and your team better insight (and foresight) into the impact changes to that metadata asset before (and after) they are made.
If you need more information on the IRI metadata management using Git series, you can visit part two of the series – Version Control. You can also read more here about using Git for Asset Security.