Metadata Management: Using Git as an Asset Hub (Part…
This article is first in a 4-part series on managing job-specific metadata assets in IRI Workbench. It focuses on the value of a metadata hub in general, and a Git implementation in particular. Subsequent articles in the series cover the use of Git for metadata asset: version control, lineage tracking, and security. For a primer on how to share and leverage IRI job projects and their artifacts enterprise-wide via Github, see this article.
What is a Metadata Management Hub?
By enabling the management and understanding of both metadata and data assets, a metadata management hub can deliver both insight and control in enterprise information management (EIM) infrastructures.
A metadata management hub is software which enables the collection, storage, maintenance, and distribution of metadata information through repositories. Users can then find and analyze information about their metadata, and thus their data and processing activities as well. It also makes it easier for people and teams to manage and secure their metadata.
The most recognized types of metadata repositories are: centralized, distributed, and federalized. Each has its own pros and cons. IRI is introducing a distributed metadata management hub strategy in its Eclipse Integrated Development Environment (IRI Workbench) using Git repositories.
What is Git?
Git is open-source software that provides working directories with complete history and version tracking capabilities. Git does not depend on network access or a central server. IRI considers also Git to be a good alternative to a database-driven metadata repository hub and control system because it:
- is purpose-built to manage and track code (and thus also metadata) evolution
- is already familiar to Eclipse users (via the Git integration Plug-in)
- installs and runs seamlessly in IRI Workbench
- does not require a separate database to administer
- is free
Consider also that you can widely distribute metadata information via Git repositories in the cloud. You can create a metadata community and make repositories for your own needs, and other repositories for group needs. Through Git hosting services, repositories can be shared among the group — either in an open-source (public) repository, or a private repository that is only visible to users with permission. Either way, you can choose to manage and use metadata independently or collaboratively.
Using Git for IRI Metadata Asset Management
Through the Eclipse plug-in for Git integration, IRI Workbench users can deploy Git repositories to manage various IRI data definition file (.DDF) and manipulation metadata, including:
- Voracity (.flow) ETL and other multi-table workflows and rules
- CoSort SortCL (.scl) data transformation and reporting scripts, logs, etc.
- FACT (.ini) DB unload scripts
- NextForm (.ncl) data and DB migration, replication and federation scripts
- FieldShield (.fcl) data masking scripts, data class libraries, masking rules, audit logs, etc.
- RowGen (.rcl) test data generation scripts
Check in and manage IRI workflows (e.g. ETL, reorgs, DB test data populations), batch jobs, transformation and generation rules, etc. Use the hub to check-in and manage other enterprise metadata assets wielded in IRI Workbench operations as well, including .ddl files and .sql procedures, database loader control files, et al … just as Java programmers use Git repositories to manage their source code.
IRI Workbench users who install and use Git will also see:
- real-time access to (and control of) their metadata assets
- heads-up display of the most current information on each asset*
- continuous synchronization with participants
- clear of development bottlenecks related to metadata version uncertainty, duplication, etc.
- security on their metadata and other assets under Git control
Access to, and information within, the hub is provided to authorized users. The hub controls asset viewing, creation, deletion, and modification rights, and facilitates communication between users about their assets. The hub also controls asset check-in/out and versioning, and facilitates metadata changes, lineage tracking, and impact analysis.
More Reading
The articles in this series elaborate on the use and features of Git for IRI and related metadata management. For purposes of illustration, we will show how CoSort SortCL job script metadata is managed using Git from IRI Workbench. The same hub system can be used for data assets as well, which we will explore separately.
* With installation and use of Git, folders and files in the project explorer view of IRI Workbench will get automatic indicators:
The possible status conditions (as shown above) are:
- staged — change in the file will be included in the next commit
- added to index — staged and under Git version control
- dirty — file has been changed since last commit
- ignored — file will not be affected by Git operations
- tracked — file is committed and has not changed
- untracked — file is not committed or staged
Keep these cues in mind as you continue reading about, and using, IRI metadata management via Git for part two – version control, part three – lineage tracking, and part four – security.