XML Field parsing using DFSORT



IBM's flagship sort product DFSORT for sorting, merging, copying, data manipulation and reporting. Includes ICETOOL and ICEGENER

XML Field parsing using DFSORT

Postby Swapnilkumar » Wed May 28, 2014 6:47 pm

Hello, I have requirement as mentioned below.
I have several input fields with xml format. Each field is having format like <Delimiter>Field value</Delimiter>. So I have fields namely, msgid,sysid,src,and msgtyp for which I have devloped below SORT card for it. <<Please dont go in detail for FINDREP part as I have shown to understand skip of the general header part of XML>>

 OPTION COPY
 INREC FINDREP=(INOUT=(C'<?xml version="1.0msg>',C'',                  -
 C'</message>',C''))
 OUTREC PARSE=(%01=(STARTAFT=C'<msgid>',ENDBEFR=C'</msgid>',FIXLEN=10),
               %02=(STARTAFT=C'<sysid>',ENDBEFR=C'</sysid>',FIXLEN=10),
               %03=(STARTAFT=C'<src>',ENDBEFR=C'</src>',FIXLEN=10),
          %04=(STARTAFT=C'<msgtyp>',ENDBEFR=C'</msgtyp>',FIXLEN=10)),
        BUILD=(%01,21:%02,31:%03,41:%04)

and this will work perfectly fine for the order sequence of xml delimiters as, msgid ==> sysid ==> src ==> msgtyp then only.
If the delimiter order sequence is changed then it yields with wrong output. e.g. msgid ==> sysid ==> msgtyp ==>src
or
src ==> sysid ==> msgid ==> msgtyp
or any combination other than "msgid ==> sysid ==> src ==> msgtyp" this order sequence.

Can any one suggest me for this SORT card working for all the order sequences that will be a great help. Cause in XML format input its not mandate all the fields will be there in the input records, say some input records might have all the 4 fields with delimiters and some records may have only field with delimiter or two field with 2 delimiters likewise.
____________________________________________
Regards,
- Swapnilkumar.
Swapnilkumar
 
Posts: 12
Joined: Tue Aug 07, 2012 10:05 pm
Location: Pune, India
Has thanked: 0 time
Been thanked: 0 time

Re: XML Field parsing using DFSORT

Postby enrico-sorichetti » Wed May 28, 2014 7:38 pm

try with something along the lines of

****** ***************************** Top of Data ******************************
- - -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  3 Line(s) not Displayed
000004 //S1      EXEC PGM=SORT
000005 //SYSPRINT  DD SYSOUT=*
000006 //SYSOUT    DD SYSOUT=*
000007 //SORTIN    DD *
000008 <MS>MSG1</MS><SY>SYS1</SY><SR>SRC1</SR><TY>TYP1</TY>
000009 <MS>MSG2</MS><SY>SYS2</SY><SR>SRC2</SR>
000010 <MS>MSG3</MS><SY>SYS3</SY>
000011 <MS>MSG4</MS>
000012 <MS>MSG1</MS><SY>SYS1</SY><SR>SRC1</SR><TY>TYP1</TY>
000013 <SY>SYS1</SY><SR>SRC1</SR><TY>TYP1</TY>
000014 <SR>SRC1</SR><TY>TYP1</TY>
000015 <TY>TYP1</TY>
000016 <SY>SYS1</SY><SR>SRC1</SR><TY>TYP1</TY>
000017 <MS>MSG1</MS><SR>SRC1</SR><TY>TYP1</TY>
000018 <MS>MSG1</MS><SY>SYS1</SY><TY>TYP1</TY>
000019 <MS>MSG1</MS><SY>SYS1</SY><SR>SRC1</SR>
000020 //SORTOUT   DD SYSOUT=*
000021 //SYSIN     DD *
000022   OPTION COPY
000023   INREC  IFTHEN=(WHEN=INIT,
000024          PARSE=(%1=(STARTAFT=C'<MS>',ENDBEFR=C'</MS>',FIXLEN=10)),
000025          OVERLAY=(61:%1)),
000026          IFTHEN=(WHEN=INIT,
000027          PARSE=(%2=(STARTAFT=C'<SY>',ENDBEFR=C'</SY>',FIXLEN=10)),
000028          OVERLAY=(71:%2)),
000029          IFTHEN=(WHEN=INIT,
000030          PARSE=(%3=(STARTAFT=C'<SR>',ENDBEFR=C'</SR>',FIXLEN=10)),
000031          OVERLAY=(81:%3)),
000032          IFTHEN=(WHEN=INIT,
000033          PARSE=(%4=(STARTAFT=C'<TY>',ENDBEFR=C'</TY>',FIXLEN=10)),
000034          OVERLAY=(91:%4))
****** **************************** Bottom of Data ****************************
 

to obtain ...

********************************* TOP OF DATA **********************************
1</SY><SR>SRC1</SR><TY>TYP1</TY>        MSG1      SYS1      SRC1      TYP1
2</SY><SR>SRC2</SR>                     MSG2      SYS2      SRC2
3</SY>                                  MSG3      SYS3
                                        MSG4
1</SY><SR>SRC1</SR><TY>TYP1</TY>        MSG1      SYS1      SRC1      TYP1
1</SR><TY>TYP1</TY>                               SYS1      SRC1      TYP1
1</TY>                                                      SRC1      TYP1
                                                                      TYP1
1</SR><TY>TYP1</TY>                               SYS1      SRC1      TYP1
1</SR><TY>TYP1</TY>                     MSG1                SRC1      TYP1
1</SY><TY>TYP1</TY>                     MSG1      SYS1                TYP1
1</SY><SR>SRC1</SR>                     MSG1      SYS1      SRC1
******************************** BOTTOM OF DATA ********************************
 
cheers
enrico
When I tell somebody to RTFM or STFW I usually have the page open in another tab/window of my browser,
so that I am sure that the information requested can be reached with a very small effort
enrico-sorichetti
Global moderator
 
Posts: 3003
Joined: Fri Apr 18, 2008 11:25 pm
Has thanked: 0 time
Been thanked: 164 times

Re: XML Field parsing using DFSORT

Postby BillyBoyo » Wed May 28, 2014 8:20 pm

Multiple PARSE statements is one way. Another would be to look at using ABSPOS. Check in the manual what it can do for you in this case.

Is the "-" on the second line there for a reason?
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: XML Field parsing using DFSORT

Postby Ed Goodman » Wed May 28, 2014 9:09 pm

If I were stuck using DFSORT for this, I'd write a COBOL exit routine with an XML parser in it.

If you want to handle changing XML, you have handle changing XML.
Ed Goodman
 
Posts: 341
Joined: Thu Feb 24, 2011 12:05 am
Has thanked: 3 times
Been thanked: 17 times

Re: XML Field parsing using DFSORT

Postby Swapnilkumar » Thu May 29, 2014 3:06 pm

Thank you "Enrico-sorichetti" the SORT card works perfectly fine.
Thank you again for prompt response, i was busy with client hence could not reply on it.
1) Actually I could not understand the logic behind putting position values for OVERLAY.
2) There are total 93 different fields which are supposed to encounter for a single input XML record and with different PIC Clause size.
3) So in all do I need to put 93 PARSE statements for same?
4) Also I dont want the input string with me at all in the output.
Could you please answer above would be a great help Enrico!!!

@Billi Boyo,
for the question "Is the "-" on the second line there for a reason?" "-" indicates the continuation of SORT instruction in next line.

@ED Goodman,
True I will also proceed with COBOL program and first time I thoght why not DFSORT? so hence trying to resolve with it. A new change.
____________________________________________
Regards,
- Swapnilkumar.
Swapnilkumar
 
Posts: 12
Joined: Tue Aug 07, 2012 10:05 pm
Location: Pune, India
Has thanked: 0 time
Been thanked: 0 time

Re: XML Field parsing using DFSORT

Postby enrico-sorichetti » Thu May 29, 2014 3:41 pm

just try with
 000022   OPTION COPY
 000023   INREC  IFTHEN=(WHEN=INIT,
 000024          PARSE=(%1=(STARTAFT=C'<MS>',ENDBEFR=C'</MS>',FIXLEN=20))),
 000025          IFTHEN=(WHEN=INIT,
 000026          PARSE=(%2=(STARTAFT=C'<SY>',ENDBEFR=C'</SY>',FIXLEN=20))),
 000027          IFTHEN=(WHEN=INIT,
 000028          PARSE=(%3=(STARTAFT=C'<SR>',ENDBEFR=C'</SR>',FIXLEN=20))),
 000029          IFTHEN=(WHEN=INIT,
 000030          PARSE=(%4=(STARTAFT=C'<TY>',ENDBEFR=C'</TY>',FIXLEN=20))),
 000031          IFTHEN=(WHEN=INIT,
 000032          BUILD=(%1,%2,%3,%4))
cheers
enrico
When I tell somebody to RTFM or STFW I usually have the page open in another tab/window of my browser,
so that I am sure that the information requested can be reached with a very small effort
enrico-sorichetti
Global moderator
 
Posts: 3003
Joined: Fri Apr 18, 2008 11:25 pm
Has thanked: 0 time
Been thanked: 164 times

Re: XML Field parsing using DFSORT

Postby BillyBoyo » Thu May 29, 2014 4:21 pm

I thought you may think that, but it is not true. Remove it, please, and show an error message caused by the removal. I know that you can't. It is a comment, and I just wanted to check that you knew as much as your manner seems to indicate.

If you have 93 elements to parse, and you want to use individual PARSE statements, then you will need 93 of them,

You will need 93 PARSE fields even if doing only one PARSE, unless you have DFSORT V2.1, in which case some shortening may be possible.

Did you look at ABSPOS as I suggested, or were you just too keen on being wrong about the "continuation"?
BillyBoyo
Global moderator
 
Posts: 3804
Joined: Tue Jan 25, 2011 12:02 am
Has thanked: 22 times
Been thanked: 265 times

Re: XML Field parsing using DFSORT

Postby Swapnilkumar » Thu May 29, 2014 4:37 pm

Perfect Enrico!! Thanks for the help!!! :)
____________________________________________
Regards,
- Swapnilkumar.
Swapnilkumar
 
Posts: 12
Joined: Tue Aug 07, 2012 10:05 pm
Location: Pune, India
Has thanked: 0 time
Been thanked: 0 time

Re: XML Field parsing using DFSORT

Postby enrico-sorichetti » Thu May 29, 2014 5:17 pm

here is a little REXX toy to generate

INPUT
 ****** ***************************** Top of Data ******************************
 000001 MS    20
 000002 SY    20
 000003 SR    20
 000004 TY  20
 ****** **************************** Bottom of Data ****************************

OUTPUT
 ****** ***************************** Top of Data ******************************
 000001   OPTION COPY
 000002   INREC  IFTHEN=(WHEN=INIT,
 000003          PARSE=(%01=(STARTAFT=C'<MS>',
 000004                      ENDBEFR=C'</MS>',FIXLEN=20))),
 000005          IFTHEN=(WHEN=INIT,
 000006          PARSE=(%02=(STARTAFT=C'<SY>',
 000007                      ENDBEFR=C'</SY>',FIXLEN=20))),
 000008          IFTHEN=(WHEN=INIT,
 000009          PARSE=(%03=(STARTAFT=C'<SR>',
 000010                      ENDBEFR=C'</SR>',FIXLEN=20))),
 000011          IFTHEN=(WHEN=INIT,
 000012          PARSE=(%04=(STARTAFT=C'<TY>',
 000013                      ENDBEFR=C'</TY>',FIXLEN=20))),
 000014          IFTHEN=(WHEN=INIT,
 000015          BUILD=(%01,%02,%03,%04))
 ****** **************************** Bottom of Data ****************************


the toy
 ****** ***************************** Top of Data ******************************
 000001 /*REXX */
 000002 Trace "O"
 000003 tags = "'ENRICO.SORT.CNTL(CTL04IN)'"
 000004 cntl = "'ENRICO.SORT.CNTL(CTL04OU)'"
 000005
 000006 zrc =  $alloc("TAGS", tags, "SHR REUSE" )
 000007 if  zrc \= 0 then do
 000008     say "alloc error for file" tags
 000009     exit
 000010 end
 000011 zrc =  $alloc("CNTL", cntl, "SHR REUSE" )
 000012 if  zrc \= 0 then do
 000013     say "alloc error for file" cntl
 000014     exit
 000015 end
 000016
 000017 zrc =  $tsoex("EXECIO * DISKR TAGS (STEM TAG. FINIS")
 000018 if  zrc \= 0 then do
 000019     say "EXECIO error for file" tags
 000020     signal frtags
 000021 end
 000022 ctl.0 = 0
 000023 call put  "  OPTION COPY"
 000024 lws = "  INREC  "
 000025 do t = 1 to tag.0
 000026     tag = strip(word(tag.t,1))
 000027     len = strip(word(tag.t,2))
 000028     call put lws"IFTHEN=(WHEN=INIT,"
 000029     lws = "         "
 000030     call put lws"PARSE=(%"right(t,2,"0")"=(STARTAFT=C'<"tag">',"
 000031     call put lws"            ENDBEFR=C'</"tag">',FIXLEN="len"))),"
 000032 end
 000033
 000034 call put lws"IFTHEN=(WHEN=INIT,"
 000035 bld = lws"BUILD=("
 000036 do t = 1 to tag.0
 000037    if ( length(bld) < 65 ) then do
 000038       bld = bld || "%" || right(t,2,"0") || ","
 000039       iterate
 000040    end
 000041    call put bld
 000042    bld = lws"       "
 000043    bld = bld || "%" || right(t,2,"0") || ","
 000044 end
 000045 bld = left(bld, length(bld)-1)"))"
 000046 call put bld
 000047
 000048 zrc =  $tsoex("EXECIO" ctl.0 "DISKW CNTL (STEM CTL. FINIS")
 000049
 000050 frcntl :
 000051 call   $free "cntl"
 000052 frtags :
 000053 call   $free "TAGS"
 000054
 000055
 000056 Exit 0
 000057
 000058 put:
 000059    c = ctl.0 +1
 000060    ctl.c= arg(1)
 000061    ctl.0 = c
 000062    return
 000063 /* */
 000064 novalue:
 000065 say  "*********************************"
 000066 say  "**                             **"
 000067 say  "** novalue trapped at line" || right(sigl,4) || " **"
 000068 say  "**                             **"
 000069 say  "*********************************"
 000070 exit
 000071
 000072 /* */
 000073 $tsoex:
 000074    tso_0tr = trace("O")
 000075    Address TSO arg(1)
 000076    tso_0rc = rc
 000077    trace value(tso_0tr)
 000078    return tso_0rc
 000079 /* */
 000080 $alloc:procedure
 000081    alc_0tr = trace("O")
 000082    parse upper arg ddnm, dsnm, misc
 000083    ddnm = strip(ddnm)
 000084    dsnm = strip(dsnm)
 000085    dsnm = strip(dsnm,,"'")
 000086    dsnm = "DA('"dsnm"') "
 000087    misc = space(misc)
 000088    alc_0ms = msg("OFF")
 000089    Address TSO "FREE  FI("ddnm") "
 000090    Address TSO "ALLOC FI("ddnm") " dsnm misc
 000091    alc_0rc = rc
 000092    z = msg(alc_0ms)
 000093    trace value(alc_0tr)
 000094    Return alc_0rc
 000095 /* */
 000096 $free:procedure
 000097    alc_0tr = trace("O")
 000098    parse upper arg ddnm
 000099    ddnm = strip(ddnm)
 000100    alc_0ms = msg("OFF")
 000101    Address TSO "FREE  DD("ddnm") "
 000102    alc_0rc = rc
 000103    z = msg(alc_0ms)
 000104    trace value(alc_0tr)
 000105    Return alc_0rc
 ****** **************************** Bottom of Data ****************************


as is it works up to 99 tags ...
for more change the 2 and 65 appropriately
and for readability shift the ENDBEFR also
cheers
enrico
When I tell somebody to RTFM or STFW I usually have the page open in another tab/window of my browser,
so that I am sure that the information requested can be reached with a very small effort
enrico-sorichetti
Global moderator
 
Posts: 3003
Joined: Fri Apr 18, 2008 11:25 pm
Has thanked: 0 time
Been thanked: 164 times


Return to DFSORT/ICETOOL/ICEGENER

 


  • Related topics
    Replies
    Views
    Last post