This is generic file comparison program that compares the masked file with the original one, and report any exception pertaining to masking. Note that this program will be used for comparing fixed length and CSV files only.
1. The program will be used for comparing sensitive data fields to ensure these have been successfully masked. The program will read unmasked data file, masked data file and parameter control file as input. We need to reuse the control files that we are creating for data masking of the standard extract files.
The Control File for Fixed Length files is a fixed 80 byte record length file. In this file you need to mention the starting position, the ending position and type of the fields (optional) of each data fields you need to mask. A sample Control file will look like;
41,60
61,70
110,119,DOB *Special +/- 30 days logic is require for DOB scrambling
200,219
260,275,PAN *Special logic is required for PAN
326,333
The Control File for CSV files is also a fixed 80 byte record length file. In this file you need to mention the delimiter character in the 1st record. Subsequent records should contain the starting relative position and type of the fields (optional) of each data fields you want to mask. A sample Control file will look like;
| *This is the delimiter character.
5
7
10,DOB *Special +/- 30 days logic is require for DOB scrambling
15
19,PAN *Special logic is required for PAN
25
Wherever, special logic is required, such as DOB, PAN etc., that has to be mentioned against the field specification in the Control File as shown above.
2. We need to provide a PARM input to the comparison program. This will tell the program what type of extract file is being processed and how to read the control file. The PARM parameter values will be FIXED or CSV.
3. The COBOL Compare module SCRMCMPR should do the following;
a. Read the PARM and decide what type of input extract file and control file it needs to process.
b. Read the control file and accordingly populate the to be masked fields;
i. In case of fixed length file – start and end position and type of field
ii. In case of CSV file – field number and type of filed
c. It then reads the input extract file and the masked file and do the following;
i. Compare the original and masked value of the fields as mentioned in b.
ii. Report exceptions i.e. if value did not change or if format changed. This should provide the record number as well against such exceptions.
d. Repeat c. until all the records in the input extract and masked files are processed.
Can someone please throw some light?