IBM Mainframe Forum

by **avinashmusic** » Thu Dec 26, 2013 12:35 pm

This is generic file comparison program that compares the masked file with the original one, and report any exception pertaining to masking. Note that this program will be used for comparing fixed length and CSV files only.

1. The program will be used for comparing sensitive data fields to ensure these have been successfully masked. The program will read unmasked data file, masked data file and parameter control file as input. We need to reuse the control files that we are creating for data masking of the standard extract files.

The Control File for Fixed Length files is a fixed 80 byte record length file. In this file you need to mention the starting position, the ending position and type of the fields (optional) of each data fields you need to mask. A sample Control file will look like;
41,60
61,70
110,119,DOB *Special +/- 30 days logic is require for DOB scrambling
200,219
260,275,PAN *Special logic is required for PAN
326,333
The Control File for CSV files is also a fixed 80 byte record length file. In this file you need to mention the delimiter character in the 1st record. Subsequent records should contain the starting relative position and type of the fields (optional) of each data fields you want to mask. A sample Control file will look like;
| *This is the delimiter character.
5
7
10,DOB *Special +/- 30 days logic is require for DOB scrambling
15
19,PAN *Special logic is required for PAN
25
Wherever, special logic is required, such as DOB, PAN etc., that has to be mentioned against the field specification in the Control File as shown above.

2. We need to provide a PARM input to the comparison program. This will tell the program what type of extract file is being processed and how to read the control file. The PARM parameter values will be FIXED or CSV.

3. The COBOL Compare module SCRMCMPR should do the following;

a. Read the PARM and decide what type of input extract file and control file it needs to process.
b. Read the control file and accordingly populate the to be masked fields;
i. In case of fixed length file – start and end position and type of field
ii. In case of CSV file – field number and type of filed
c. It then reads the input extract file and the masked file and do the following;
i. Compare the original and masked value of the fields as mentioned in b.
ii. Report exceptions i.e. if value did not change or if format changed. This should provide the record number as well against such exceptions.
d. Repeat c. until all the records in the input extract and masked files are processed.

Can someone please throw some light?

by **enrico-sorichetti** » Thu Dec 26, 2013 2:11 pm

Can someone please throw some light?

Frankly it is not clear at all what You are asking ?

by **avinashmusic** » Thu Dec 26, 2013 2:30 pm

Hi Enrico,

Thanks for coming back. I think the reason why i am writing the pgm would clear up the air.

We are a part of data masking team, where in we are copying data from Production to our test enviroment. Our production data contains sensitive fields, so these sensitive fields are to be masked, for example: PAN of a customer in production has to be masked to a random value to hide the secrecy. So, we have written a program to mask such fields.
I am writing this program to validate the masking. For example:

There is a file(original) which is an extract of a table. We have identified that few of the table fields are sensitive and give the starting position of the field and ending position of the field to the masking program. It reads the file and overlays the particular position with a random data and writes to a masked file.

Now, the pgm i am going to write is going to compare the masked file with the original file and validate the masking.
Eg., if there is PAN number in the original file from 41st position to 50th position, then i have to read the masked file and check that the masked value is not equal to the value in the original file and also validate if there is any format changefor the specific field. Hope this helps.

by **NicC** » Thu Dec 26, 2013 5:38 pm

Well, you still have not asked a question or indicated where you are having a problem.

by **dick scherrer** » Sat Dec 28, 2013 5:22 am

Hello,

If any of the fields to be masked are all (or part of) the sort key(s) your task will be more difficult.

To do what you want, write a 2-file match/merge to position the code within the 2 files.

Additionally create an array of the definition of the sensitive and non-sensitive data (length and displacement within the record). An external file that would be loaded at run time would be easy to implement.

When matching records (by the key(s)), compare the records - first by the common fields, and then by the sensitive fields, the code can indicate anything that is invalid.

If the data volume is large, expect this to use a high amount of cpu time in addition to the I/o needed.

IBM Mainframe Forum

COBOL file compare module

COBOL file compare module

Re: COBOL file compare module

Re: COBOL file compare module

Re: COBOL file compare module

Re: COBOL file compare module