If you're lucky, there will be a small function where there are well defined inputs and outputs and you can see what's going on...
In this case someone reported a bug where if they created two different streaming textures, the program would crash deep inside glTexSubImage2D() in a routine called gleCopy(). They helpfully sent a small test program, and sure enough, it crashed on my iMac.
So, not having any leads, I start stepping through the assembly code:
...
0x0424a3fc <gleCopy+88>: mtctr r4
0x0424a400 <gleCopy+92>: rlwinm r0,r9,2,0,29
0x0424a404 <gleCopy+96>: addi r9,r9,1
0x0424a408 <gleCopy+100>: lwzx r2,r11,r0
0x0424a40c <gleCopy+104>: stwx r2,r3,r0
0x0424a410 <gleCopy+108>: bdnz+ 0x424a400 <gleCopy+92>
...
0x0424a464 <gleCopy+192>: add r11,r11,r7
0x0424a46c <gleCopy+200>: add r3,r3,r8
...
This is a tight copy loop, where the loop count is loaded from register r4, and the data is copied by offset from pointer r11 and stored at that same offset in poiner r3. At the end of the loop, r11 is incremented by r7 and r3 is incremented by r8.
In C this might look like:
while (rows_to_copy) { for (i = 0; i < count; ++i) { dst[i] = src[i]; } src += src_pitch; dst += dst_pitch; }
When I printed out the registers, I noticed an interesting thing. The source and destination pointers started out the same, but the source and destination pitches were completely different!
To understand why the source and destination pointers were the same, I looked at the code that creates streaming textures and saw I'm using an Apple extension to have OpenGL use application memory instead of internally allocated memory. Since I use that same pointer when updating the streaming textures, it makes sense that if the system thought copying needed to be done, it would copy from the pointer I passed in, to the pointer I told it to use for data storage.
But why would it think it needed to do copying? I looked on the original web page describing how to optimize Mac OS X texture upload and noticed that it had a note saying that only textures with a width of a multiple of 32 bytes would bypass the copying step. I figured that it shouldn't crash if they didn't have an aligned width, but what the heck. I resized the textures and as expected the program still crashed.
So what would cause the pitches to be different? Well, in a normal texture upload, the pitch is controlled by the GL_UNPACK_ROW_LENGTH attribute. In a flash, I realized that's probably what the extension uses to determine the original pitch of the texture. Sure enough, I was missing a call to set that attribute when creating the texture. Looking more closely at the values in assembly, the pitch that it was using for the destination was indeed the pitch value that was set for the first texture upload, and of course if pitch values don't match in a pixel copy operation, you have to copy them row by row, just like the assembly code above was doing.
Adding a call to set GL_UNPACK_ROW_LENGTH in texture creation fixed the problems!
So, even though the bug was in my code, and a pretty obvious one in retrospect, it was really helpful to be able to look at the assembly and understand WHY my code was wrong.
Cheers!
That's great of solving problems in power pc assembly. I am glad to see that you shared the code as well. :-)
ReplyDelete