Friday, February 25, 2011

Problem Solving in PowerPC Assembly

When you're working with other people's code, you don't always have the source, and sometimes this means you have to dive into the assembly code to figure out what's going on.

If you're lucky, there will be a small function where there are well defined inputs and outputs and you can see what's going on...

In this case someone reported a bug where if they created two different streaming textures, the program would crash deep inside glTexSubImage2D() in a routine called gleCopy().  They helpfully sent a small test program, and sure enough, it crashed on my iMac.

So, not having any leads, I start stepping through the assembly code:

...
0x0424a3fc <gleCopy+88>:        mtctr   r4
0x0424a400 <gleCopy+92>:        rlwinm  r0,r9,2,0,29
0x0424a404 <gleCopy+96>:        addi    r9,r9,1
0x0424a408 <gleCopy+100>:       lwzx    r2,r11,r0
0x0424a40c <gleCopy+104>:       stwx    r2,r3,r0
0x0424a410 <gleCopy+108>:       bdnz+   0x424a400 <gleCopy+92>
...
0x0424a464 <gleCopy+192>:       add     r11,r11,r7
0x0424a46c <gleCopy+200>:       add     r3,r3,r8
... 

This is a tight copy loop, where the loop count is loaded from register r4, and the data is copied by offset from pointer r11 and stored at that same offset in poiner r3. At the end of the loop, r11 is incremented by r7 and r3 is incremented by r8.

In C this might look like:
while (rows_to_copy) {
    for (i = 0; i < count; ++i) {
        dst[i] = src[i];
    }
    src += src_pitch;
    dst += dst_pitch;
}

When I printed out the registers, I noticed an interesting thing. The source and destination pointers started out the same, but the source and destination pitches were completely different!

To understand why the source and destination pointers were the same, I looked at the code that creates streaming textures and saw I'm using an Apple extension to have OpenGL use application memory instead of internally allocated memory.  Since I use that same pointer when updating the streaming textures, it makes sense that if the system thought copying needed to be done, it would copy from the pointer I passed in, to the pointer I told it to use for data storage.

But why would it think it needed to do copying?  I looked on the original web page describing how to optimize Mac OS X texture upload and noticed that it had a note saying that only textures with a width of a multiple of 32 bytes would bypass the copying step.  I figured that it shouldn't crash if they didn't have an aligned width, but what the heck.  I resized the textures and as expected the program still crashed.

So what would cause the pitches to be different?  Well, in a normal texture upload, the pitch is controlled by the  GL_UNPACK_ROW_LENGTH attribute.  In a flash, I realized that's probably what the extension uses to determine the original pitch of the texture.  Sure enough, I was missing a call to set that attribute when creating the texture. Looking more closely at the values in assembly, the pitch that it was using for the destination was indeed the pitch value that was set for the first texture upload, and of course if pitch values don't match in a pixel copy operation, you have to copy them row by row, just like the assembly code above was doing.

Adding a call to set GL_UNPACK_ROW_LENGTH in texture creation fixed the problems!

So, even though the bug was in my code, and a pretty obvious one in retrospect, it was really helpful to be able to look at the assembly and understand WHY my code was wrong.

Cheers!

1 comment:

  1. That's great of solving problems in power pc assembly. I am glad to see that you shared the code as well. :-)

    ReplyDelete