ABAP Trapdoors: Size Does Matter

This is a repost of an article published on the SAP Community Network.


Welcome to another ABAP Trapdoors article. If you are intersted in the older articles, you can find a link list at the bottom of this post.

There are various ways to handle XML data in ABAP, all of them more or less well-documented. If you need a downwards-compatible event-based parsing approach, for example, you might want to use the iXML library with its built-in SAX-style parser. (Note that iXML still constructs the entire document, so it's more like a DOM parser with a SAX event output attached to it. If you're looking for a strictly serial processing facility, check out the relatively new sXML library instead.)

The iXML documentation has a, let's say, distinctive writing style, and the library proudly distinguishes itself from the remaining ABAP ecosystem (for example, by using zero-based indexes instead of one-based lists in various places), but all things considered, it's a viable and stable solution. That is, if you observe the first rule of SAX: Size Does Matter. Consider the following example:

REPORT ztest_ixml_sax_parser.  

CLASS lcl_test_ixml_sax_parser DEFINITION CREATE PRIVATE.  
  PUBLIC SECTION.  
    CLASS-METHODS run.  
ENDCLASS.  

CLASS lcl_test_ixml_sax_parser IMPLEMENTATION.  
  METHOD run.  
    CONSTANTS: co_line_length TYPE i VALUE 100.  
    TYPES: t_line   TYPE c LENGTH co_line_length,  
           tt_lines TYPE TABLE OF t_line.  
    DATA: lt_xml_data       TYPE tt_lines,  
          l_xml_size        TYPE i,  
          lr_ixml           TYPE REF TO if_ixml,  
          lr_stream_factory TYPE REF TO if_ixml_stream_factory,  
          lr_istream        TYPE REF TO if_ixml_istream,  
          lr_document       TYPE REF TO if_ixml_document,  
          lr_parser         TYPE REF TO if_ixml_parser,  
          lr_event          TYPE REF TO if_ixml_event,  
          l_num_errors      TYPE i,  
          lr_error          TYPE REF TO if_ixml_parse_error.  
    DATA: lr_ostream TYPE REF TO cl_demo_output_stream.  
    " prepare the output stream and display  
    lr_ostream = cl_demo_output_stream=>open( ).  
    SET HANDLER cl_demo_output_html=>handle_output FOR lr_ostream.  
    " prepare the data to be parsed  
    lt_xml_data = VALUE #( ( '<?xml version="1.0"?>' )  
                           ( '<foo name="bar">' )  
                           ( '  <baz number="1"/>' )  
                           ( '  <baz number="2"/>' )  
                           ( '  <baz number="4"/>' )  
                           ( '</foo>' ) ).  
    " determine the size of the table - since the lines have a fixed length, that should be easy  
    l_xml_size = co_line_length * lines( lt_xml_data ).  
    " initialize the iXML objects  
    lr_ixml = cl_ixml=>create( ).  
    lr_stream_factory = lr_ixml->create_stream_factory( ).  
    lr_istream = lr_stream_factory->create_istream_itable( table = lt_xml_data  
                                                           size  = l_xml_size ).  
    lr_document = lr_ixml->create_document( ).  
    lr_parser = lr_ixml->create_parser( stream_factory = lr_stream_factory  
                                        istream        = lr_istream  
                                        document       = lr_document ).  
    lr_parser->set_event_subscription( if_ixml_event=>co_event_attribute_post +  
                                       if_ixml_event=>co_event_element_pre +  
                                       if_ixml_event=>co_event_element_post ).  
    " the actual event handling loop.  
    lr_ostream->write_text(  
        iv_text   = 'iXML Parser Events'  
        iv_format = if_demo_output_formats=>heading  
        iv_level  = 1  
    ).  
    DO.  
      lr_event = lr_parser->parse_event( ).  
      IF lr_event IS INITIAL. " if either the end of the document is reached or an error occurred  
        EXIT.  
      ENDIF.  
      CASE lr_event->get_type( ).  
        WHEN if_ixml_event=>co_event_element_pre.  
          lr_ostream->write_text( |new element '{ lr_event->get_name( ) }'| ).  
        WHEN if_ixml_event=>co_event_attribute_post.  
          lr_ostream->write_text( |attribute '{ lr_event->get_name( ) }' = '{ lr_event->get_value( ) }'| ).  
        WHEN if_ixml_event=>co_event_element_post.  
          lr_ostream->write_text( |end of element '{ lr_event->get_name( ) }'| ).  
      ENDCASE.  
    ENDDO.  
    " error handling  
    l_num_errors = lr_parser->num_errors( ).  
    IF l_num_errors > 0.  
      lr_ostream->write_text(  
          iv_text   = 'iXML Parser Errors'  
          iv_format = if_demo_output_formats=>heading  
          iv_level  = 1  
      ).  
      DO l_num_errors TIMES.  
        lr_error = lr_parser->get_error( sy-index - 1 ). " because iXML is 0-based  
        lr_ostream->write_text( |{ lr_error->get_severity_text( ) } at offset { lr_error->get_offset( ) }: { lr_error->get_reason( ) }| ).  
      ENDDO.  
    ENDIF.  
    lr_ostream->close( ).  
  ENDMETHOD.  
ENDCLASS.  

START-OF-SELECTION.  
  lcl_test_ixml_sax_parser=>run( ).  

You can copy this program into your system and execute it, it doesn't do anything harmful: It simply assembles a simple XML document (in a real application, you would get this from a file, a database, a network source - whatever), constructs an input stream around it, passes it to a parser and executes a parse-evaluate-print-loop until either the end of the output is encountered or something bad happens.

If your system is a non-unicode (NUC) system (you can easily check if this is the case using System --> Status), the program will run just fine, producing an output similar to the following image:

AT-Size-OutputNormal_mw.png

If your system happens to be a unicode (UC) system, the program won't behave quite the same way - you will get a rather nondescriptive error message (error at offset 0: unexpected symbol; expected '<', '</', entity reference, character data, CDATA section, processing instruction or comment).

AT-Size-OutputError_mw.png

It certainly does not help that the parser does not return an offset (or a line and column number) when assembling the error message. However, the events logged prior to the error messages provide a hint: The error always occurs after half of the lines of the table have been processed. You can easily verify this by changing the number of baz elements in the sample above. Since I've already mentioned that this issue occurs on UC systems only, it's now easy to deduce what went wrong here:

AT-Size-iXMLInterface_mw.png

The iXML stream factory expects the size to be the number of bytes, not the number of characters. The code works as long as a character is represented by a single byte, but in UC systems, that's not the case. The solution - or maybe one of the solutions - is relatively simple:

    " determine the size of the table for both UC and NUC systems  
    l_xml_size = co_line_length * lines( lt_xml_data ) * cl_abap_char_utilities=>charsize.  

This trapdor is a rather devious contraption because it will not be detected by the standard unicode checks and the error message is about as misleading as it can get. Also, whether you get to see the message at all depends on the actual implementation of the parsing program. If the original developer thought that error handling might be left to be implemented by those who follow - well, it's a long way down...

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer