ABAP Trapdoors: LOOP Dizziness

This is a repost of an article published on the SAP Community Network.

First off, I'd like to apologize for the delay since my last ABAP trapdoor article. I'm really grateful for all of the kind responses I'm getting for these articles. However, having a 14-months-old toddler around invariably implies having all kind of germs around, and some of them apparently teamed up to knock me out for some time...

As you certainly all know, there are numerous ways to iterate through an internal table. Probably the most popular ones are LOOP AT table INTO line, LOOP AT table ASSIGNING <fs> and LOOP AT table REFERENCE INTO dref. Everyone should be aware about the performance issues and side effects that can arise when using the different versions – most importantly, LOOP AT … INTO … performs a copy of every single table line while the other variants only move around pointers or addresses. On the one hand, this means that for ‘wide’ tables a lot of time is consumed to push data around, but it also implies that you have to copy the data back into the table after modifying it whereas the other variants allow in-place editing of the table contents.

I’m a curious person. I wanted to know how much slower it would be to copy the contents of the table around instead of editing it in-place, so I hacked together a small program that performs a number of iterations on tables of various sizes and compares the time taken for the different variants. Now I’m not an artist, but my graphical capabilities are sufficient to generate basic diagrams that unfortunately have not survived one of the disasters the SCN teams calls "content migration"...

This chart shows how much time it takes to iterate over tables of varying width (10k lines each, only the size of the line changes!) up to approx 32k characters. I’ve added moving average lines to the charts because for various reasons (probably alignment, page sizes and other memory management effects) the data points bounce around rather wildly. (Feel free to e-mail me for the full-size images, raw data or the source code of the program.) As you can see, the ASSIGNING (green) and REFERENCE INTO (blue) variant are nearly identical when it comes to time consumption, whereas the INTO (red) variant can be much more expensive.

Again just out of curiosity, here’s a detail shot of the table sizes between 1 and 1000 characters that unfortunately have not survived one of the disasters the SCN teams calls "content migration"...

I’m very well aware that these charts are rather subjective and may vary depending upon machine types, unicode/non-unicode systems and other factors. When in doubt, measure – just keep in mind that it’s usually not necessary to shovel around tons of data just to add up a few numbers. And while we’re at it, there’s also the TRANSPORTING NO FIELDS variant that can be rather handy to count lines or check for the existence of values.

After this rather lengthy introduction, here is today’s ABAP trapdoor. Actually, it’s more like an oversized mouse trap – dangerous enough to hurt your foot, but not large enough to swallow an entire ABAP developer. Take a look at the following program:

DATA: numbers TYPE TABLE OF i,
      sum TYPE i.

  APPEND sy-index TO numbers.

sum = 0.
LOOP AT numbers ASSIGNING <number>.
  sum = sum + <number>.
WRITE: / 'Sum 1:', sum.

CLEAR sum.
LOOP AT numbers INTO <number>.
  ADD <number> TO sum.
WRITE: / 'Sum 2:', sum.

At the first glance, these are just two variations on how to sum up the values contained in a table – the first one slightly more C-line, the second one is for COBOL lovers. But let’s take a look at the output:

Sum 1: 5.050

Sum 2: 5.049

Wha…? Okay, you have probably already spotted the problem. After the first LOOP is completed, the field symbol stays assigned to the last line of the table, and by using the field symbol as target of the second LOOP, the last line of the table is modified with each iteration. This leads to the incorrect sum as well as involuntarily changed table content. The same thing would happen with a REFERENCE INTO loop, but that’s not as likely to occur because the syntax is different.

The best way to prevent this from happening is to UNASSIGN the field symbol used right after the ENDLOOP statement. This way, the program will at least crash in a controlled fashion instead of silently producing incorrect results.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer