Migrating COBOL Vision Files – The Metadata
In the course of legacy platform and/or application migration, COBOL users often need to convert their binary and index files into a human-readable, ASCII-numeric target. One of these older formats is the ACUCOBOL Vision1 file, which we discussed previously in this article.
In order to convert and make use of those files, you need a program or utility that can read and process them, and metadata that will define your data source and target layouts. There are two programs that are capable of natively processing COBOL Vision files:
- IRI NextForm: The “COBOL edition” of this program is primarily used to convert from one file type to another, and/or convert from one data type to another. NextForm also has report formatting features, and can be used to federate and replicate data.
- IRI CoSort: The “SortCL” program within CoSort does all that NextForm does, plus many other functions, including: data transformations (sort, join, aggregate, pivot, etc.) and data masking (encrypt, hash, pseudonymize, etc.).
Note that both NextForm and CoSort are supported in the IRI Workbench, a free graphical integrated development environment (IDE), built on Eclipse.™ The workbench automates the discovery and definition of metadata for, and the creation and execution of, IRI jobs.
You will also need to have access to either the COBOL copybook or the Identification Section of the XFD file associated with the Vision file. If you have neither, life will be much harder, though there is the possibility of guessing what’s inside after a ‘blind’ conversion.2
Both the XFD and the COBOL copybook contain the metadata for the data file, including:
- record size
- column positions or offset for the fields
- byte size of the fields
- data type of the fields
- how many times a particular group of definitions occur
- format, if any fields are one of the various numerics
- field names
Let’s assume you have a Vision file that is accompanied by the following copybook:
FD CLIENT-PROCEDURE-FILE. *01 FILLER PIC X(186). 01 CLIENT-RECORD. 05 PROCEDURE-KEY. 10 PROCEDURE-CODE PIC X(60). 05 PROCEDURE-DATA. 10 CATEGOTY PIC X(5). 10 PROCEDURE PIC X(10). 10 PROCEDURE-DESC OCCURS 3 TIMES. 15 PROCEDURE-PART PIC X(30). 15 PROCEDURE-CHARGE PIC S9(9)V9(4).
The field section of an XFD file for the same Vision file might (and there are different versions) contain:
00000,00186,16,00186,+00,000,999,CLIENT-RECORD 00000,00060,16,00060,+00,000,999,PROCEDURE-KEY 00000,00060,16,00060,+00,000,000,PROCEDURE-CODE 00060,00136,16,00126,+00,000,999,PROCEDURE-DATA 00060,00005,16,00005,+00,000,000,CATEGORY 00065,00010,16,00010,+00,000,000,PROCEDURE 90001,00003,00037,START-OCCURS 00075,00037,16,00030,+00,000,999,PROCEDURE-DESC 00075,00030,16,00020,+00,000,000,PROCEDURE-PART 00105,00007,09,00014,-02,000,000,PROCEDURE-CHARGE 90002,END-OCCURS
Lines that are actual field definitions have 000 in the 7th column. The first column has the start position, the second has the byte size, the third has the data type, and the last column has the field name. Following is a list of the more common codes for the data types:
0 | Numeric edited |
1 | Unsigned numeric |
2 | Signed numeric (trailing separate) |
3 | Signed numeric (training combined) |
4 | Signed numeric (leading separate) |
5 | Signed numeric (leading combined) |
6 | Signed computational |
7 | Unsigned computational |
8 | Positive packed-decimal |
9 | Signed packed-decimal |
10 | Computational-6 |
16 | Alphanumeric |
Either the copybook or the XFD can be used to translate the legacy file dictionary to the Data Definition File (.ddf) format used by all IRI software. For copybooks, IRI also has a standalone conversion utility, called ‘cob2ddf‘ to create DDF repositories with the same information in IRI’s metadata syntax. This tool runs in the IRI Workbench as well to produce the metadata for your NextForm or CoSort SortCL jobs.
The copybook-equivalent layout ready for IRI software (produced by cob2ddf) looks like this:
/FIELD=(PROCEDURE_CODE, POSITION=1, SIZE=60) /FIELD=(CATEGOTY, POSITION=61, SIZE=5) /FIELD=(PROCEDURE, POSITION=66, SIZE=10) /FIELD=(PROCEDURE_PART_1, POSITION=76, SIZE=30) /FIELD=(PROCEDURE_CHARGE_1, POSITION=106, SIZE=7, MF_CMP3, IMPLIED_DECIMAL=2) /FIELD=(PROCEDURE_PART_2, POSITION=113, SIZE=30) /FIELD=(PROCEDURE_CHARGE_2, POSITION=143, SIZE=7, MF_CMP3, IMPLIED_DECIMAL=2 ) /FIELD=(PROCEDURE_PART_3, POSITION=150, SIZE=30) /FIELD=(PROCEDURE_CHARGE_3, POSITION=180, SIZE=7, MF_CMP3, IMPLIED_DECIMAL=2 )
There are COBOL computational fields in the data; these are the MF_CMP3. They are numeric fields that are not human readable. To translate MF_CMP3 to a regular NUMERIC data type, the size needs to be double the size given for the original field, plus 1 for the decimal.
The parameter IMPLIED_DECIMAL means that there is no decimal in the original data, but it is understood or implied that there should be a decimal with the number of digits to the right of the decimal equal to the implied number. In the COBOL copybook, this number is obtained from the number of 9s to the right of the V in the numeric mask of the field. In the XFD file, this number is obtained from the fifth column that defines the field.
However, once your fields are defined, they can be used in NextForm or other programs to convert the fields to new data types, and to remap the fields to other positions in readable, portable output files.
Once you have the .DDF layouts, you can then use the NextForm COBOL edition or CoSort’s SortCL program to convert the data to something human readable, with a job script (that either product can use) like this:
/INFILE=CLIENT-PROCEDURE.dat /PROCESS=VISION /FIELD=(PROCEDURE_CODE, POSITION=1, SIZE=60) /FIELD=(CATEGORY, POSITION=61, SIZE=5) /FIELD=(PROCEDURE, POSITION=66, SIZE=10) /FIELD=(PROCEDURE_PART_1, POSITION=76, SIZE=30) /FIELD=(PROCEDURE_CHARGE_1, POSITION=106, SIZE=7, MF_CMP3, IMPLIED_DECIMAL=2) /FIELD=(PROCEDURE_PART_2, POSITION=113, SIZE=30) /FIELD=(PROCEDURE_CHARGE_2, POSITION=143, SIZE=7, MF_CMP3, IMPLIED_DECIMAL=2 ) /FIELD=(PROCEDURE_PART_3, POSITION=150, SIZE=30) /FIELD=(PROCEDURE_CHARGE_3, POSITION=180, SIZE=7, MF_CMP3, IMPLIED_DECIMAL=2 ) /REPORT /OUTFILE= CLIENT-PROCEDURE-ASCII.dat /FIELD=(PROCEDURE_CODE, POSITION=1, SIZE=60) /FIELD=(CATEGORY, POSITION=61, SIZE=5) /FIELD=(PROCEDURE, POSITION=66, SIZE=10) /FIELD=(PROCEDURE_PART_1, POSITION=76, SIZE=30) /FIELD=(PROCEDURE_CHARGE_1, POSITION=106, SIZE=15, PRECISION=2, NUMERIC) /FIELD=(PROCEDURE_PART_2, POSITION=121, SIZE=30) /FIELD=(PROCEDURE_CHARGE_2, POSITION=151, SIZE=15, PRECISION=2, NUMERIC) /FIELD=(PROCEDURE_PART_3, POSITION=166, SIZE=30) /FIELD=(PROCEDURE_CHARGE_3, POSITION=196, SIZE=15, PRECISION=2, NUMERIC) /OUTFILE= CLIENT-PROCEDURE-FIXED.dat /LENGTH=186 /FIELD=(PROCEDURE_CODE, POSITION=1, SIZE=60) /FIELD=(CATEGORY, POSITION=61, SIZE=5) /FIELD=(PROCEDURE, POSITION=66, SIZE=10) /FIELD=(PROCEDURE_PART_1, POSITION=76, SIZE=30) /FIELD=(PROCEDURE_CHARGE_1, POSITION=106, SIZE=7, MF_CMP3) /FIELD=(PROCEDURE_PART_2, POSITION=113, SIZE=30) /FIELD=(PROCEDURE_CHARGE_2, POSITION=143, SIZE=7, MF_CMP3) /FIELD=(PROCEDURE_PART_3, POSITION=150, SIZE=30) /FIELD=(PROCEDURE_CHARGE_3, POSITION=180, SIZE=7, MF_CMP3)
Notice that there are two output files, each with their own metadata:
- The first has converted the MF_CMP3 to regular NUMERIC, adjusted the field sizes and positions accordingly, and used PRECISION to set the decimal. The records will have a linefeed (LF) or carriage return linefeed (CRLF) as their terminator.
- The second has not changed any of the field definitions, but notice that there is a /LENGTH statement. This means that there will not be a LF or CRLF to terminate the records, and the LENGTH value determines where the records end. It is best to use this when there are binary data types in the records, such as the computationals, so that the data are not confused with record terminators.
- IRI job scripts like the one above can specify any number of data sources and targets. These two files could have been 200, and a mix of files, database tables, pipes, and/or procedures in one or more formats (including customized, detail and summary reports).
Another consideration is the source computer for the data. For some binary data types, if the endianness of the source and destination computers is not the same, the hex representation of these fields will not be the same. IRI software addresses endian states at the field and file level. Click here for more information, and contact nextform@iri.com if you have any questions.
1. Vision files use an index format that was developed by AcuCOBOL, which was later acquired by Micro Focus. Micro Focus has other COBOL index formats that can be processed by the IRI programs NextForm and CoSort SortCL. The supported MF-ISAM formats include IDX3, IDX4, IDX8, ESDS, XESDS, and C-ISAM. IRI software also supports Micro Focus Variable Length (MFVL) long and short records, and the other data source formats listed here.↩
2. If you need help creating a DDF from an XFD, or if do not have either an XFD or a copybook, contact IRI for help. Without an XFD or copybook, you can use NextForm or SortCL to convert the Vision file ‘in the blind’ to another file format, and then work to define fields manually with the metadata discovery wizard in the IRI Workbench. You will have to parse the fields manually to assign names, offsets, and data types to them, at which point you may (if successful) have a new file and metadata repository to work with. But as you can imagine, especially with binary fields, this is much more difficult than having an XFD or copybook to work with, and arriving at accurate field definitions may remain impossible.