Things They Don't Tell You In CS

The title is a lie because I was, in fact, told this in CS but I don't think it is part of the curriculum, but here it is anyway:

Depending how 2D arrays are represented internally you can have a different performance if you iterate over them differently. If you iterate in a way that does not reflect the way the data is stored internally you may end up flipping between pages more than is strictly necessary. Extreme cases can trigger thrashing. In practice, of course, it is rarely an issue: the working set and amounts of data in most programs doesn't get large enough for this to be anything but an insignificant performance hit; although it can noticeably improve the speed of matrix operations on a large matrix stored as a 2d array.

For example, many languages store multidimensional arrays in contiguous data blocks row-by-row. So an nxm 2D array would be stored in a block of memory that is n x m x elementSize.
IE the 2D array conceptualized by the matrix
1  2  3  4  5
6  7  8  9  10
11 12 13 14 15
would be stored internally like {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15}.

Let's say that we're iterating over our matrix above. Suppose that the values 1 through 8 are on one page1 and the values 9 through 15 are on page2 and that either page1 OR page2 can be in memory, but not both, due to memory constraints such as too few frames available to use the full working set. Ignoring the terrible simplicity of this scenario which fails to take into account more intelligent paging algorithms and modern RAM amounts, we can actually have a notable performance difference between these two psuedocode loops:

for i = 0 to numRows {
   for k = 0 to numColumns {
       myArray(i,k) = myArray(i,k) * 2;

for k = 0 to numColumns{
   for i = 0 to numRows{
      myArray(i,k) = myArray(i,k) * 2;

In the first version, we would have an execution like this:
1 2 3 4 5 6 7 8     (page fault to page2) 
9 10 11 12 13 14 15
while the second loop would have something like this:
1 6   (page fault to page2) 
11    (page fault to page1) 
2 7   (page fault to page2) 
12    (page fault to page1) 
3 8   (page fault to page2) 
13    (page fault to page1) 
4 9   (page fault to page2)
14    (page fault to page1)
5     (page fault to page2)
10 15
in this example, we can avoid 8 page faults by simply reversing our inner and outer loops - we can get a performance boost by changing nothing about the scenario, architecture or programming language.

Moral of the story is that there is lots of room for optimization with arrays and that it pays to know how your programming language handles data structures internally and make intelligent design choices with that in mind.

2 things about

Things They Don't Tell You In CS
  1. Such a great point and yeah (I mean "no" - they never told us that in class..)

    Still, I'm happier when there are convenient language features that allow you to express the fact that the element-wise array multiplication could all happen "at the same time" - (at least logically) and let the compiler work out how the load/op/store instructions should go for optimal efficiency with regards to data flow across the memory hierarchy.

  2. I get the feeling that you work with parallel programs a lot, eh? It was so hard for me to switch to a parallel mindset from a serial one... how to build loops in a way that all the loop iterations could happen at the same time without causing bad side effects.

    But in all cases, I really *would* hope that a compiler would be intelligent enough to handle a rather formulaic optimization like your example, or in my post's example!


Copyright 2012 Phile not Found. See About
Powered by Blogger

"Whenever you find that you are on the side of the majority, it is time to pause and reflect."