IBM Mainframe Forum

by **Dmitriy** » Wed May 24, 2017 6:25 pm

hello colleagues!
I need to add sequence number to each group of records, like unique word ids...

input:

output:

Select all

630774221 0001
630774221 0001
963495850 0002
963495850 0002
963495850 0002
345695561 0003
678609548 0004
678609548 0004
678609548 0004
918367402 0005
279702180 0006
 

and file contains billions of records, so how to do this with maximum performance?
can you help me please. Thanks in advance!

by **Aki88** » Mon May 29, 2017 11:29 am

Hello,

A few questions before we look at the solution:
a. You do not want the records to be sorted while padding the ID? The output you've shown retains the original order of records.
b. Is there a possibility of a unique group record to appear again somewhere down the line, if so how do you want that handled; for example:

Select all

630774221 
630774221 
963495850 
963495850 
963495850 
345695561 
678609548 
678609548 
678609548 
630774221 --> here this appears again
630774221 --> here this appears again  
918367402 
279702180 
 

c. You've mentioned that there can be billions of records in input, but you've shown unique identifiers of 4 bytes only, which would mean that it can accommodate maximum of '9999' unique identifiers.

Solution to the query is fairly straight forward unless the aforementioned complexities are not added to it; you need to group the records and PUSH an ID to it. DFSORT allows 15 bytes zoned decimal id to be pushed in, which means 999,999,999,999,999 is the maximum value:

Select all

//SORTIN   DD *                           
630774221                                 
630774221                                 
963495850                                 
963495850                                 
963495850                                 
345695561                                 
678609548                                 
678609548                                 
678609548                                 
918367402                                 
279702180                                 
/*                                        
//SORTOUT  DD SYSOUT=*                    
//SYSIN    DD *                           
 SORT FIELDS=COPY                         
 INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,9), 
                          PUSH=(11:ID=15))
/*                                        
 

Output:

Select all

630774221 000000000000001
630774221 000000000000001
963495850 000000000000002
963495850 000000000000002
963495850 000000000000002
345695561 000000000000003
678609548 000000000000004
678609548 000000000000004
678609548 000000000000004
918367402 000000000000005
279702180 000000000000006
 

by **enrico-sorichetti** » Mon May 29, 2017 11:40 am

the number of records is NOT related to the number of groups/identifiers
:mrgreen:

by **Aki88** » Mon May 29, 2017 11:53 am

Hello Mr. Sorichetti,

enrico-sorichetti wrote:the number of records is NOT related to the number of groups/identifiers

Yes, I completely agree; but going by the representative data, there are certain records which have only one entry (instead of paired/grouped entries).
Hence the SORT card written gives the solution for maximum possible groups; TS is expected to tweak it to fit his needs.
I'd be very-very surprised if ONLY 9999 groups were possible in the actual 'billions of records'.

Best regards.

by **prino** » Mon May 29, 2017 12:56 pm

Dmitriy wrote:... and file contains billions of records ...

And if if my uncle was a woman he'd be my aunt...

Which PHB has come up with this ludicrous time-wasting requirement?

IBM Mainframe Forum

how to add numbers to groops of records...

how to add numbers to groops of records...

Re: how to add numbers to groops of records...

Re: how to add numbers to groops of records...

Re: how to add numbers to groops of records...

Re: how to add numbers to groops of records...