Removing duplicates from sorted dataset when using JOINKEYS

IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

Removing duplicates from sorted dataset when using JOINKEYS

Postby aaa » Wed May 11, 2011 8:12 pm

I'm using JOINKEYS to keep all records from a dataset (F1) that have a paired record in (F2). I'd like to remove duplicated keys from F2 to avoid the generation of duplicated F1 records in the output file, and since F2 has been previously sorted, I have specified the SORTED option for F2 and I'm using the SUM FIELDS=NONE statement to remove duplicates before the JOIN phase.

MERGE is selected by DFSORT from the F2 subtask, but it seems to ignore the SUM FIELDS=NONE statement. If I don't use the SORTED option it behaves as expected, but I'll like to avoid sorting an already sorted dataset. Is there a way to remove duplicates from F2 without using another job step?

Job step
Code: Select all
//STEP010  EXEC PGM=SORT                     
//SYSOUT   DD SYSOUT=*                       
//MASTER   DD *                             
1AAAAAAAAA
5BBBBBBBBB
3CCCCCCCCC
4DDDDDDDDD
//PULL     DD *                             
3                                           
5                                           
5                                           
5                                           
7                                           
//SORTOUT  DD SYSOUT=*                       
//DFSPARM  DD *                             
* CONTROL STATEMENTS FOR JOINKEYS APPLICATION
  JOINKEYS F1=MASTER,FIELDS=(1,1,A)         
  JOINKEYS F2=PULL,FIELDS=(1,1,A),SORTED     
  REFORMAT FIELDS=(F1:1,10)                 
* CONTROL STATEMENT FOR MAIN TASK           
  OPTION COPY                               
//JNF2CNTL DD *                             
* CONTROL STATEMENT FOR SUBTASK2 (F2)               
 SUM FIELDS=NONE                             
/*                                           
//*           


Actual OUTPUT
Code: Select all
3CCCCCCCCC
5BBBBBBBBB
5BBBBBBBBB
5BBBBBBBBB


Expected OUTPUT
Code: Select all
3CCCCCCCCC
5BBBBBBBBB
aaa
 
Posts: 6
Joined: Thu May 05, 2011 9:34 pm

Re: Removing duplicates from sorted dataset when using JOINK

Postby aaa » Wed May 11, 2011 8:25 pm

I just read this in the DFSORT manual:

If you use the SORTED operand, statements and options only available for a sort application, such as SUM, will be ignored for the subtask that copies the input file.


It looks like it's not possible to remove duplicated without sorting the dataset. But since it's using a MERGE application, shouldn't the SUM statement be supported?
aaa
 
Posts: 6
Joined: Thu May 05, 2011 9:34 pm

Re: Removing duplicates from sorted dataset when using JOINK

Postby skolusu » Wed May 11, 2011 9:21 pm

aaa,

When you use SORTED operand , DFSORT Simply copies the records from the file and SUM statement does not apply for a COPY statement as there is no key to compare and remove the duplicates. Use the following DFSORT control cards which will give you the desired results

Code: Select all
//DFSPARM  DD *                               
* CONTROL STATEMENTS FOR JOINKEYS APPLICATION 
  JOINKEYS F1=MASTER,FIELDS=(1,1,A)           
  JOINKEYS F2=PULL,FIELDS=(1,1,A),SORTED       
  REFORMAT FIELDS=(F1:1,10,F2:3,5)             
* CONTROL STATEMENT FOR MAIN TASK             
  OPTION COPY                                 
  OUTFIL BUILD=(1,10),INCLUDE=(11,5,ZD,EQ,1)   
//JNF2CNTL DD *                               
* CONTROL STATEMENT FOR SUBTASK2 (F2)         
  INREC OVERLAY=(3:SEQNUM,5,ZD,RESTART=(1,1)) 
/*                                             
//*
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 582
Joined: Wed Apr 02, 2008 10:38 pm


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post