Thursday, February 17, 2011

Ninja hacking on the iPhone

I'm tracking down a crash in SDL on the iPhone, and the path is not yet clear to me, but I thought some people would enjoy the view along the way.

The crash itself happens only on the real phone, not on the simulator, and it's a crash initializing an SDL_uikitopenglview, which is a view deriving from SDL_uikitview, which in turn derives from UIView.

The callstack for the crash looks like this:
_class_getMeta ()
_class_isInitialized ()
_class_initialize ()
...
objc_msgSend_uncached ()
UIKit_GL_CreateContext () at SDL_uikitopengles.m:146

Of course everything past my code is ARM assembly, which makes it a little tricky to debug.  Luckily Apple has published the source to their Objective C runtime, so I can disassemble the functions using gdb and follow along:
http://www.opensource.apple.com/source/objc4/objc4-437.1

First, there's a couple useful things to know if you're poking around at this level:

The ARM calling conventions are that registers r0 through r3 are for parameters passed into functions, and they correspond to parameters from the left to the right. The return value of the function is also passed back through r0.

The Xcode debugging window has a nice interface with the code right there along with the local variables and registers. On the far right is a button to bring up the gdb console where you can do some pretty advanced things.

gdb quick reference:
b <name> - set a breakpoint at the beginning of the named function
s - go to the next line of code, stepping into function calls
n - go to the next line of code, skipping over function calls
si - go to the next assembly instruction
fin - run until the function returns
c - continue running until the next breakpoint
p <var> - print the value of a variable or register (e.g. $r0, $r1, etc.)
x <address> - lookup the symbol associated with an address
display <var> - print the value of a variable or register after each command 
list - list the code around the current execution

Most of these we don't need since the Xcode UI is pretty nice, but a really handy one is 'si', since that will let us step into the assembly and then use the UI to continue tracing the execution.

So first, I set a breakpoint at the line that crashes:
view = [[SDL_uikitopenglview alloc]

Then, I bring up the gdb console and use the 'si' command a few times until I get into assembly, just to see what things look like:

I'm curious what the first parameter to objc_msgSend() is, so I use 'x $r0' and it shows that it's "OBJC_CLASS_$_SDL_uikitopenglview", which is the Objective C class definition for my custom view.

Then I use the 'b' command to set a breakpoint in the _class_initialize() function, and bring up the code so I can follow along with the assembly.  When the breakpoint hits, I step into the first instruction in the function, a call to _class_getNonMetaClass(). I double check r0, and it's still my view class definition, but on return from the function, it's been set to 0!

The code that was executed is this:
static class_t *getNonMetaClass(class_t *cls)
{
    rwlock_assert_locked(&runtimeLock);
    if (isMetaClass(cls)) {
        cls = NXMapGet(uninitializedClasses(), cls);
    }
    return cls;
}
which means that somehow the class for my view didn't get into the map of classes that my program has loaded.

I did a little googling and found that Apple has a set of APIs for managing and interacting with the Objective C classes, and so I wrote a function to print them out and look for anything with SDL in it:
void print_classes()
{
    int i, numClasses;
    Class * classes;

    numClasses = objc_getClassList(NULL, 0);
    classes = malloc(sizeof(Class) * numClasses);
    numClasses = objc_getClassList(classes, numClasses);
    for (i = 0; i < numClasses; ++i) {
        char *name = class_getName(classes[i]);
        if (SDL_strstr(name, "SDL_")) {
            name; // Yay, found it!
        }
    }
    free(classes);
}
Sure enough, when I run it on the simulator I find the SDL view classes, and when I run it on the device they don't show up. If I use nm on the application binary in the app folder, I see the classes are there, in both the simulator and device binaries:
nm -m Happy | fgrep SDL_uikitopenglview
0008a287 (__TEXT,__text) non-external -[SDL_uikitopenglview context]
...
000cd06c (__DATA,__objc_data) external _OBJC_CLASS_$_SDL_uikitopenglview
000cd058 (__DATA,__objc_data) external _OBJC_METACLASS_$_SDL_uikitopenglview

So, at this point I know why the crash is happening, but I don't know why the classes aren't being loaded on the device, or how to fix it yet.

Update:  Eric Wing figured this out.  The problem is that the Objective C class definitions were in a static library and the linker wasn't bringing in all the code necessary to construct the classes.  The solution is to add -ObjC to the "Other Linker Flags" setting for your application.

Thanks Eric! :)

6 comments:

  1. I think the problem is that the #define SDL_IPHONE_KEYBOARD is inconsistently defined. Because ivars are conditionally added to the class, we are susceptible to the fragile base class problem. The size of objects gets out of whack and that's why the stack trace looks weird.

    For testing, I put
    #define SDL_IPHONE_KEYBOARD 1

    at the top of SDL_UIKitview.h. The crash problem went away for me, but I was testing on an iPhone 1st gen with iOS 3.1 and no GLES 2.0 so the "Happy" demo didn't like my device much :P

    ReplyDelete
  2. My first thought is that you may have some inconsistent settings for linking your binaries depending on your target/architecture. I would double check those first, Xcode can make things confusing if you're not used to it. Maybe it's as simple as not having the library properly linked for the device.

    ReplyDelete
  3. Eric, I tried that and it didn't fix it for me. I still have the crash on both iOS 4.2 and 3.1.3.

    Stephane, is there anything specific I should look for? I verified that the symbols are there...

    ReplyDelete
  4. Figured it out. The answer is we need to supply the -ObjC linker flag.

    I hate this flag. This is the third time I've been bit by this (once very badly).
    Sometimes you also need the -all_load flag in addition to this, though I am unclear on exactly when you need it. I think it was tied to the use of class categories. But in both cases, it comes down to the use of Objective-C categories in static libraries.

    Anyway, this is more BS non-sense that wouldn't exist if Apple would just open up dynamic linking on iOS for 3rd parties. Obj-C was never really designed for static linking. In fact, I think Mach-O was the first OS to introduce dynamic libraries. To work around the problems/bugs, you start throwing these flags at the problem. More info here:

    http://www.dribin.org/dave/blog/archives/2006/03/13/static_objc_lib/
    http://developer.apple.com/library/mac/#qa/qa2006/qa1490.html

    Anyhow, for us, the -ObjC flag should be added to Other Linker Flags for each build target. (Since we have multiple targets, it makes sense to put it at the highest level if possible.)

    ReplyDelete
  5. As you wrote that you are tracking down a crash in SDL on the iPhone and you also presented the details of how you proceed.That is really very informative Thanks for sharing this all.

    ReplyDelete