ICETOOL - Matching with duplicates



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

ICETOOL - Matching with duplicates

Postby pulcinella » Fri May 09, 2008 4:39 pm

Hi,

I need compare two files for obtain two files. The keys for both is the first 8 position

file 1 (23 positions)

uuuuuuuu AAAAAAAA 001 A
uuuuuuuu AAAAAAAA 002 T
uuuuuuuu AAAAAAAA 003 G
uuuuuuuu AAAAAAAA 004 A
uuuuuuuu AAAAAAAA 005 A
vvvvvvvv BBBBBBBB 001 B
vvvvvvvv BBBBBBBB 003 B
vvvvvvvv BBBBBBBB 004 B
vvvvvvvv BBBBBBBB 006 Y
xxxxxxxx AAAAAAAA 001 A
xxxxxxxx AAAAAAAA 006 B
xxxxxxxx AAAAAAAA 008 C
xxxxxxxx AAAAAAAA 009 D
xxxxxxxx AAAAAAAA 010 E
yyyyyyyy CCCCCCCC 001 F
yyyyyyyy CCCCCCCC 002 A
yyyyyyyy CCCCCCCC 003 R
yyyyyyyy CCCCCCCC 004 Y
yyyyyyyy CCCCCCCC 005 E
yyyyyyyy CCCCCCCC 006 B
zzzzzzzz AAAAAAAA 001 A
zzzzzzzz AAAAAAAA 002 A
zzzzzzzz AAAAAAAA 005 A
zzzzzzzz AAAAAAAA 006 A
zzzzzzzz AAAAAAAA 007 A
zzzzzzzz AAAAAAAA 008 B

file 2 (17 positions)

uuuuuuuu A CCCCCC
vvvvvvvv B RRRRRR
xxxxxxxx G HHHHHH
yyyyyyyy E DDDDDD
zzzzzzzz A AAAAAA

file output1 (29 position)

uuuuuuuu AAAAAAAA 001 A CCCCCC
uuuuuuuu AAAAAAAA 004 A CCCCCC
uuuuuuuu AAAAAAAA 005 A CCCCCC
vvvvvvvv BBBBBBBB 001 B RRRRRR
vvvvvvvv BBBBBBBB 003 B RRRRRR
vvvvvvvv BBBBBBBB 004 B RRRRRR
yyyyyyyy CCCCCCCC 001 F
yyyyyyyy CCCCCCCC 002 A
yyyyyyyy CCCCCCCC 003 R
yyyyyyyy CCCCCCCC 004 Y
yyyyyyyy CCCCCCCC 005 E DDDDDD
yyyyyyyy CCCCCCCC 006 B
zzzzzzzz AAAAAAAA 001 A AAAAAA
zzzzzzzz AAAAAAAA 002 A AAAAAA
zzzzzzzz AAAAAAAA 005 A AAAAAA
zzzzzzzz AAAAAAAA 006 A AAAAAA
zzzzzzzz AAAAAAAA 007 A AAAAAA
zzzzzzzz AAAAAAAA 008 B


file 1 (29 positions)

uuuuuuuu AAAAAAAA 001 A CCCCCC
uuuuuuuu AAAAAAAA 002 T
uuuuuuuu AAAAAAAA 003 G
uuuuuuuu AAAAAAAA 004 A CCCCCC
uuuuuuuu AAAAAAAA 005 A CCCCCC
vvvvvvvv BBBBBBBB 001 B RRRRRR
vvvvvvvv BBBBBBBB 003 B RRRRRR
vvvvvvvv BBBBBBBB 004 B RRRRRR
vvvvvvvv BBBBBBBB 006 Y
xxxxxxxx AAAAAAAA 001 A
xxxxxxxx AAAAAAAA 006 B
xxxxxxxx AAAAAAAA 008 C
xxxxxxxx AAAAAAAA 009 D
xxxxxxxx AAAAAAAA 010 E
yyyyyyyy CCCCCCCC 001 F
yyyyyyyy CCCCCCCC 002 A
yyyyyyyy CCCCCCCC 003 R
yyyyyyyy CCCCCCCC 004 Y
yyyyyyyy CCCCCCCC 005 E DDDDDD
yyyyyyyy CCCCCCCC 006 B
zzzzzzzz AAAAAAAA 001 A AAAAAA
zzzzzzzz AAAAAAAA 002 A AAAAAA
zzzzzzzz AAAAAAAA 005 A AAAAAA
zzzzzzzz AAAAAAAA 006 A AAAAAA
zzzzzzzz AAAAAAAA 007 A AAAAAA
zzzzzzzz AAAAAAAA 008 B

Thanks
pulcinella
 
Posts: 114
Joined: Mon Dec 10, 2007 10:18 pm
Has thanked: 0 time
Been thanked: 0 time

Re: ICETOOL - Matching with duplicates

Postby Frank Yaeger » Fri May 09, 2008 8:51 pm

What is the RECFM and LRECL for the two input files?

What are the "rules" for getting the two output files from the two input files? What are you trying to do exactly? I don't have time to guess - you need to tell me.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: ICETOOL - Matching with duplicates

Postby pulcinella » Mon May 12, 2008 8:41 pm

Excuse me,

I wrong the question. I explain best:

input 1 file --> 23 position (lrecl)
input 2 file --> 17 position (lrecl)

uuuuuuuu AAAAAAAA 001 A
uuuuuuuu AAAAAAAA 002 T
uuuuuuuu AAAAAAAA 003 G
uuuuuuuu AAAAAAAA 004 A
uuuuuuuu AAAAAAAA 005 A
vvvvvvvv BBBBBBBB 001 B
vvvvvvvv BBBBBBBB 003 B
vvvvvvvv BBBBBBBB 004 B
vvvvvvvv BBBBBBBB 006 Y
xxxxxxxx AAAAAAAA 001 A
xxxxxxxx AAAAAAAA 006 B
xxxxxxxx AAAAAAAA 008 C
xxxxxxxx AAAAAAAA 009 D
xxxxxxxx AAAAAAAA 010 E
yyyyyyyy CCCCCCCC 001 F
yyyyyyyy CCCCCCCC 002 A
yyyyyyyy CCCCCCCC 003 R
yyyyyyyy CCCCCCCC 004 Y
yyyyyyyy CCCCCCCC 005 E
yyyyyyyy CCCCCCCC 006 B
zzzzzzzz AAAAAAAA 001 A
zzzzzzzz AAAAAAAA 002 A
zzzzzzzz AAAAAAAA 005 A
zzzzzzzz AAAAAAAA 006 A
zzzzzzzz AAAAAAAA 007 A
zzzzzzzz AAAAAAAA 008 B

file 2 (17 positions)

uuuuuuuu A CCCCCC
vvvvvvvv B RRRRRR
xxxxxxxx G HHHHHH
yyyyyyyy E DDDDDD
zzzzzzzz A AAAAAA

I want matching the two input files fot obtain two files of 29 position when:

output 1 file --> file 1 + file 2 where the third column of file 2 match with the four column of file 1 (column 1, column2, column 3 of file 1 and column 3 of file 2). Only coindence

output 1 (lrecl = 29)

uuuuuuuu AAAAAAAA 001 A CCCCCC
uuuuuuuu AAAAAAAA 004 A CCCCCC
uuuuuuuu AAAAAAAA 005 A CCCCCC
vvvvvvvv BBBBBBBB 001 B RRRRRR
vvvvvvvv BBBBBBBB 003 B RRRRRR
vvvvvvvv BBBBBBBB 004 B RRRRRR
yyyyyyyy CCCCCCCC 005 E DDDDDD
zzzzzzzz AAAAAAAA 001 A AAAAAA
zzzzzzzz AAAAAAAA 002 A AAAAAA
zzzzzzzz AAAAAAAA 005 A AAAAAA
zzzzzzzz AAAAAAAA 006 A AAAAAA
zzzzzzzz AAAAAAAA 007 A AAAAAA

output 2 file --> file 1 + file 2 (column 1, column2, column 3 of file 1 and column 3 of file 2). Union of all records

output 2 (lrecl = 29)

uuuuuuuu AAAAAAAA 001 A CCCCCC
uuuuuuuu AAAAAAAA 002 T
uuuuuuuu AAAAAAAA 003 G
uuuuuuuu AAAAAAAA 004 A CCCCCC
uuuuuuuu AAAAAAAA 005 A CCCCCC
vvvvvvvv BBBBBBBB 001 B RRRRRR
vvvvvvvv BBBBBBBB 003 B RRRRRR
vvvvvvvv BBBBBBBB 004 B RRRRRR
vvvvvvvv BBBBBBBB 006 Y
xxxxxxxx AAAAAAAA 001 A
xxxxxxxx AAAAAAAA 006 B
xxxxxxxx AAAAAAAA 008 C
xxxxxxxx AAAAAAAA 009 D
xxxxxxxx AAAAAAAA 010 E
yyyyyyyy CCCCCCCC 001 F
yyyyyyyy CCCCCCCC 002 A
yyyyyyyy CCCCCCCC 003 R
yyyyyyyy CCCCCCCC 004 Y
yyyyyyyy CCCCCCCC 005 E DDDDDD
yyyyyyyy CCCCCCCC 006 B
zzzzzzzz AAAAAAAA 001 A AAAAAA
zzzzzzzz AAAAAAAA 002 A AAAAAA
zzzzzzzz AAAAAAAA 005 A AAAAAA
zzzzzzzz AAAAAAAA 006 A AAAAAA
zzzzzzzz AAAAAAAA 007 A AAAAAA
zzzzzzzz AAAAAAAA 008 B
pulcinella
 
Posts: 114
Joined: Mon Dec 10, 2007 10:18 pm
Has thanked: 0 time
Been thanked: 0 time

Re: ICETOOL - Matching with duplicates

Postby Frank Yaeger » Mon May 12, 2008 11:23 pm

Sorry, but what you're saying is not making sense to me.

output 1 file --> file 1 + file 2 where the third column of file 2 match with the four column of file 1


You show file1 records like this:

uuuuuuuu AAAAAAAA 001 A CCCCCC

"four column of file 1" is the 'A'?

You show file2 records like this:

uuuuuuuu A CCCCCC

"third column of file 2" is CCCCCC - do you mean second column of file 2 which would be 'A'? Or do you really mean third column and if so, do you mean the fifth column of file1?

If you're trying to match on the 'A' column, then why in output file1 do you show the u records that have an 'A' but not the 'x' records that have an 'A'?

Hopefully, you can see why I'm confused.

Are you trying to match on one field ('A') or two fields or what? If two fields, which two fields exactly?

You need to do a better job of explaining the "rules" using your example before I can help you.
Frank Yaeger - DFSORT Development Team (IBM) - yaeger@us.ibm.com
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
=> DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort
User avatar
Frank Yaeger
Global moderator
 
Posts: 1079
Joined: Sat Jun 09, 2007 8:44 pm
Has thanked: 0 time
Been thanked: 15 times

Re: ICETOOL - Matching with duplicates

Postby pulcinella » Thu May 22, 2008 9:32 pm

Hi Frank,

I want join the two input files for obtain two output files:

the first output file contain the first 23 position (of input 1 file) and the last column of input 2 file where the
position 1-8 (uuuuuuuu) and position 23 (a) of the first file match with the position 1-8 (uuuuuuuu) and position 10 (a) of
the second file for obtain only the correspondence:

uuuuuuuu AAAAAAAA 001 A (first record of input file 1) + CCCCCC (last column of input file 2)
uuuuuuuu AAAAAAAA 004 A (four record of input file 1) + RRRRRR (last column of input file 2)
uuuuuuuu AAAAAAAA 005 A (five record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 001 B (six record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 003 B (seven record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 004 B (eight record of input file 1) + RRRRRR (last column of input file 2)

the second output file contain the first 23 position (of input 1 file) and the last column of input 2 file where the
position 1-8 (uuuuuuuu) and position 23 (a) of the first file match with the position 1-8 (uuuuuuuu) and position 10 (a) of
the second file for obtain all records (the correspondence and not correspondence):

uuuuuuuu AAAAAAAA 001 A (first record of input file 1) + CCCCCC (last column of input file 2)
uuuuuuuu AAAAAAAA 002 T (second record of input file 1) + blanks (it's not correspondence)
uuuuuuuu AAAAAAAA 003 G (third record of input file 1) + blanks (it's not correspondence)
uuuuuuuu AAAAAAAA 004 A (four record of input file 1) + RRRRRR (last column of input file 2)
uuuuuuuu AAAAAAAA 005 A (five record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 001 B (six record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 003 B (seven record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 004 B (eight record of input file 1) + RRRRRR (last column of input file 2)
vvvvvvvv BBBBBBBB 006 Y (nine record of input file 1) + blanks (it's not correspondence)
pulcinella
 
Posts: 114
Joined: Mon Dec 10, 2007 10:18 pm
Has thanked: 0 time
Been thanked: 0 time

Re: ICETOOL - Matching with duplicates

Postby skolusu » Fri May 23, 2008 1:43 am

The following DFSORT/ICETOOL jcl will give you the desired results.

//STEP0100 EXEC PGM=ICETOOL           
//TOOLMSG  DD SYSOUT=*                 
//DFSMSG   DD SYSOUT=*                 
//IN1      DD *                       
UUUUUUUU A CCCCCC                     
VVVVVVVV B RRRRRR                     
XXXXXXXX G HHHHHH                     
YYYYYYYY E DDDDDD                     
ZZZZZZZZ A AAAAAA                     
//IN2      DD *                       
UUUUUUUU AAAAAAAA 001 A               
UUUUUUUU AAAAAAAA 002 T               
UUUUUUUU AAAAAAAA 003 G               
UUUUUUUU AAAAAAAA 004 A               
UUUUUUUU AAAAAAAA 005 A               
VVVVVVVV BBBBBBBB 001 B               
VVVVVVVV BBBBBBBB 003 B               
VVVVVVVV BBBBBBBB 004 B               
VVVVVVVV BBBBBBBB 006 Y               
XXXXXXXX AAAAAAAA 001 A               
XXXXXXXX AAAAAAAA 006 B               
XXXXXXXX AAAAAAAA 008 C               
XXXXXXXX AAAAAAAA 009 D               
XXXXXXXX AAAAAAAA 010 E               
YYYYYYYY CCCCCCCC 001 F               
YYYYYYYY CCCCCCCC 002 A               
YYYYYYYY CCCCCCCC 003 R               
YYYYYYYY CCCCCCCC 004 Y               
YYYYYYYY CCCCCCCC 005 E               
YYYYYYYY CCCCCCCC 006 B               
ZZZZZZZZ AAAAAAAA 001 A               
ZZZZZZZZ AAAAAAAA 002 A               
ZZZZZZZZ AAAAAAAA 005 A               
ZZZZZZZZ AAAAAAAA 006 A               
ZZZZZZZZ AAAAAAAA 007 A               
ZZZZZZZZ AAAAAAAA 008 B               
//T1       DD DSN=&&T1,DISP=(MOD,PASS),SPACE=(CYL,(1,1),RLSE)   
//T2       DD DSN=&&T2,DISP=(,PASS),SPACE=(CYL,(1,1),RLSE)     
//OUT      DD SYSOUT=*                                         
//TOOLIN   DD *                                                 
  COPY FROM(IN1) USING(CTL1)                                   
  COPY FROM(IN2) USING(CTL2)                                   
  SPLICE FROM(T1) TO(T2) ON(32,09,CH) -                         
  WITHALL WITH(01,31) USING(CTL3) KEEPNODUPS                   
  SORT FROM(T2) USING(CTL4)                                     
//CTL1CNTL DD *                                                 
  OUTFIL FNAMES=T1,BUILD=(31X,1,8,10,1,12,6)                   
//CTL2CNTL DD *                                                 
  OUTFIL FNAMES=T1,BUILD=(1,23,SEQNUM,8,ZD,1,8,23,1,6X)         
//CTL3CNTL DD *                                                 
  OUTFIL FNAMES=T2,OMIT=(1,31,CH,EQ,C' '),                     
  BUILD=(1,31,41,6)                                             
//CTL4CNTL DD *                                                 
  SORT FIELDS=(24,8,CH,A)                                       
  OUTFIL FNAMES=OUT,BUILD=(1,23,32,6)                           
/*


The output from this job is
UUUUUUUU AAAAAAAA 001 ACCCCCC
UUUUUUUU AAAAAAAA 002 T       
UUUUUUUU AAAAAAAA 003 G       
UUUUUUUU AAAAAAAA 004 ACCCCCC
UUUUUUUU AAAAAAAA 005 ACCCCCC
VVVVVVVV BBBBBBBB 001 BRRRRRR
VVVVVVVV BBBBBBBB 003 BRRRRRR
VVVVVVVV BBBBBBBB 004 BRRRRRR
VVVVVVVV BBBBBBBB 006 Y       
XXXXXXXX AAAAAAAA 001 A       
XXXXXXXX AAAAAAAA 006 B       
XXXXXXXX AAAAAAAA 008 C       
XXXXXXXX AAAAAAAA 009 D       
XXXXXXXX AAAAAAAA 010 E       
YYYYYYYY CCCCCCCC 001 F       
YYYYYYYY CCCCCCCC 002 A       
YYYYYYYY CCCCCCCC 003 R       
YYYYYYYY CCCCCCCC 004 Y       
YYYYYYYY CCCCCCCC 005 EDDDDDD
YYYYYYYY CCCCCCCC 006 B       
ZZZZZZZZ AAAAAAAA 001 AAAAAAA
ZZZZZZZZ AAAAAAAA 002 AAAAAAA
ZZZZZZZZ AAAAAAAA 005 AAAAAAA
ZZZZZZZZ AAAAAAAA 006 AAAAAAA
ZZZZZZZZ AAAAAAAA 007 AAAAAAA
ZZZZZZZZ AAAAAAAA 008 B       
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
skolusu
 
Posts: 586
Joined: Wed Apr 02, 2008 10:38 pm
Has thanked: 0 time
Been thanked: 39 times


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post