Master Data Management with Git
Master Data Management (MDM) is a strategic enterprise information management (EIM) life cycle initiative designed to foster the consistency and accurate maintenance of master (or reference) data. MDM programs typically support multiple types of data, including information about: customers, products, employees, and financials.
A good MDM system will also unify and synchronize master data in a central repository, per this article:
Analysis and reporting is greatly dependent on the quality of an organization’s master data. Master data may either be stored in a central repository, sourced from one or more systems, or referenced centrally using an index. However, when it is used by several functional groups, it may be distributed and stored in different applications across an organization, and this copy [of] data may be inconsistent (and if so, inaccurate).”
The use of the the Git integration plug-in for Eclipse can enable and enforce master data centralization and consistency. It also works seamlessly with the metadata and ergonomics in IRI Workbench, an integrated development environment (IDE) for EIM. This system supports various team-sharing paradigms, either through a local server or a more globalized Git hosting site, in support of the MDM group’s work.
This system also supports the “CRUD Life Cycle” for master data (how it is created, read, updated, deleted, and searched), described here by Microsoft’s Roger Wolter and Kirk Haselden. It also:
- provides a method of version control by storing the history of changes in the repositories used for the data, preserving the lineage for future reference or audits
- facilitates the development of “master” master data by merging disparate or prior versions of master data
- secures the master data files with password-protected access rights for specific individuals
- avoids the overhead and maintenance issues of storing master data in a database
One of the most powerful features for MDM in Git is change management. Communicating when values have changed increases MDM efficiency, improves the accuracy and integrity of the master data values, and promotes data governance practices.
When master data values change, the right individuals must know about it, and the system must be able to handle and synchronize these changes. Version control offers a convenient way to track and share those changes with others. Git’s technique is to create commits, which record the changes to be stored in a repository, along with a description to indicate what was changed:
Users can receive these changes by fetching them, and resolve differences in the versions of the data through merging:
IRI Workbench users can also track changes in master data with Git, and identify modified data (and metadata) with the standard text search functions in Eclipse. This supports ‘change impact analysis’ and gives MDM teams more opportunities to reconsider and modify their master data values and infrastructure. These changes in turn, can be managed in this same place.
For security, a Git server (or a Git hosting service) allows administrators to designate who has access to those master data files checked into the system, and the extent of that access, from read-only to full read/write permissions. They can control who can see, update, delete, and synchronize changes in the data. The default SSH (Secure Shell) server paradigm for master data transfers features encryption and authentication for added protection.
In conclusion, through the access and change management abilities of Git in IRI Workbench, safe and productive team-management and version-control for master data is possible. For more details on the use of Git to manage metadata and certain data life cycle assets, see our four-part series on distributed metadata management, where both security protocols and version control for textual sources are described in greater detail.