UUID Support in IRI Products
A universally unique identifier, or UUID, is an identifier standard used in software construction and data processing. The main function of the UUID is to be a unique identifier in registries, databases, or larger data stores like a data warehouse. Depending on the specific mechanism used, a UUID is either guaranteed to be different or extremely likely to be different from any other UUID generated for each record.
UUIDs typically contain a reference to the network address of the host that generated the ID, a precise timestamp at the time of creation, and a randomly generated component. The network host address identification for each unique computer and the timestamp usually can provide a significantly unique id for each record of data processed. However, the random generated element of the UUID adds an extra guarantee, or protection, that each one will be different to each transaction or identification need.
There are many uses of UUIDs in software architecture. The simplest examples can be seen in document autosaves that might be performed in Microsoft products which refer to the UUID as a GUID, or globally unique identifier. Also, various web services might use them when preserving content as a record in another autosave example. There may be multiple users in a cloud environment working on the same or similar elements, and the one retention area will assign unique identifiers for each record being created. A single machine might see UUIDs used to identify data or records that come from the multiple devices plugged into it, whether they are media storage, bridges, peripherals, etc.
More complex uses might be found in a data warehouse, where information is culled from multiple databases with overlapping primary IDs and a laundry list of similar fields and data. The UUID acts as a surrogate key so every record has a unique identifier and does not risk creating any data or reference conflicts for keys or fields from previous databases. Other more advanced uses include moving data to a different partition, as well as one-off applications with their own reasons for generating and using a UUID.
IRI RowGen and other software now share a method for generating UUIDs, which is perfect for a number of different processes that IRI products can handle, including indexing generated test data records. The UUID is created in a simple function that generates a value for every record.
# Generated with the IRI Workbench - New Test Data Job # # Author: donp # Created: 2015-06-26 08:45:51 # /INFILE=random_file_placeholder /PROCESS=RANDOM /LENGTH=0 /INCOLLECT=6 /FIELD=(UUID=create_uuid(), TYPE=ALPHA, POSITION=1, SEPARATOR="\t") /FIELD=(GUID=create_uuid("{}"), TYPE=ALPHA, POSITION=2, SEPARATOR="\t") /REPORT /OUTFILE=stdout /PROCESS=RECORD /FIELD=(UUID=create_uuid(), TYPE=ALPHA, POSITION=1, SEPARATOR="\t") /FIELD=(GUID=create_uuid("{}"), TYPE=ALPHA, POSITION=2, SEPARATOR="\t")
The output from this job script is a tab-separated, two field (column) display of random UUID and GUID values:
e359e1dc-3e48-40cf-a919-dac4131e6dfe {85b08e51-7133-468f-b493-bc7892840c09} 9d62043d-f956-40d3-b218-ef3b7c5957db {acb7c310-f032-4a9b-b493-7487ff1e41ae} 486580be-912e-4436-9008-f8ee1cedede1 {0fc44370-3851-43b9-984f-9abc79dcaec6} 57997636-6f7f-4bd2-887b-3be5158a0ef9 {b2041312-b6fb-4826-9b91-b5dfe29bc32f} 3b058f7c-5068-45ca-bba7-6b74c273859f {8eb6922f-8aea-4a3e-8622-9ed93e4d00af} ee826617-ca8c-4e08-9838-f4a5090b172a {7844b5fb-e3e0-49d7-849f-6ef02fcda9b5}
This can be useful in a number of cases. IRI RowGen users generating test database tables or merging data into a simulated data warehouse can create UUID columns along with the others; the job scripts would specify the generation function for each field needing a UUID value. This function joins the previous SEQUENCER and ROWID functions still available in RowGen.
IRI NextForm users might apply the UUIDs to migrate data from multiple databases when they need to identify each record as a unique entry in the new database. IRI FieldShield users might assign UUIDs to masked targets for obfuscating, tokenizing, de-identifying or auditing efforts, and IRI CoSort (SortCL) users can assign UUIDs to records in data integration (ETL) jobs.