Hello Billy,
a. No. of records in DS-A are 30 million and up (these are received from external system, so we do not have control over the record count/record build logic); DS-B has 23k records; each is a key-combination record.
b. The mainframe is being accessed via remote server, so am unable to copy the data as-is; though for representation, relevant parts of data look something as below:
Dataset-A (the one from which records are to be dropped)
<xml informational tag><identifier-1>data</identifier-1><identifier-2>data</identifier-2><identifier-3><identifier-4><identifier-5>data</identifier-5><key-1>data</key-1></identifier-4><identifier-6>data</identifier-6><identifier-7>data</identifier-7><key-2>data</key-2></identifier-3><identifier-8><identifier-9><identifier-10>data</identifier-10><key-1>data</key-1></identifier-9><identifier-11>data</identifier-11><identifier-12>data</identifier-12><key-3>data</key-3></identifier-8>................</xml informational tag>
<xml informational tag><identifier-1>data</identifier-1><identifier-2>data</identifier-2><identifier-3><identifier-4><identifier-5>data</identifier-5><key-1>data</key-1></identifier-4><identifier-6>data</identifier-6><identifier-7>data</identifier-7><key-4>data</key-4></identifier-3><identifier-8><identifier-9><identifier-10>data</identifier-10><key-1>data</key-1></identifier-9><identifier-11>data</identifier-11><identifier-12>data</identifier-12><key-5>data</key-5></identifier-8>.......</xml informational tag>
<xml informational tag><identifier-1>data</identifier-1><identifier-2>data</identifier-2><identifier-3><identifier-4><identifier-5>data</identifier-5><key-1>data</key-1></identifier-4><identifier-6>data</identifier-6><identifier-7>data</identifier-7><key-6>data</key-6></identifier-3><identifier-8><identifier-9><identifier-10>data</identifier-10><key-1>data</key-1></identifier-9><identifier-11>data</identifier-11><identifier-12>data</identifier-12><key-7>data</key-7></identifier-8>.......</xml informational tag>
...
...
Dataset-B (the keys that are to be dropped)
key-2 - 16 bytes (3 bytes space) key-1 - 8 bytes
key-4 - 16 bytes (3 bytes space) key-1 - 8 bytes
key-6 - 16 bytes (3 bytes space) key-1 - 8 bytes
...
...
In the above samples, records in DS-A can vary in length/key-combinations. Some records do contain error meaning they might have key-1 or can have a different value altogether in place of key-1; the position of key-1 is also not fixed; representational data though shows a simpler record.
Duplicate records are possible; repeated key combinations on same record are possible; repeated key combinations on different records are possible; basically in simple terms, the data is virtually free-form with key combinations placed here and there.
The aim is to drop the entire xml record from DS-A, if the combination of Key-1 and Key-n given in DS-B is present anywhere in any record of DS-A; if both values are present in the same record- position of the value in that record doesn't matter- drop the record. If the DS-B data combinations are not found on 'm' records in DS-A, then copy that record as-is from DS-A to output dataset.
So basically, if one was to write a COBOL program for this, and they have the logic for parsing data in place, then it would be a simple:
03 WS-KEY-COMBINATIONS PIC X(27) VALUE SPACES.
88 88-KEY-COMBINATION VALUE
'key-2 - 16 bytes (3 bytes space) key-1 - 8 bytes'
'key-4 - 16 bytes (3 bytes space) key-1 - 8 bytes'
'key-6 - 16 bytes (3 bytes space) key-1 - 8 bytes'
---
---
.
---
---
IF NOT 88-KEY-COMBINATION
MOVE DS-A-RECORD TO OUTPUT-RECORD
END-IF
---
---
I really hope, was able to state the requirement clearly; any pointers to simplify the approach would be really helpful.
Thank you.