I recently bought Game Programming Gems 2 and was captivated by an article titled ‘Micro-Threads for Game Object AI’.
A really neat technique is describes that can performing very fast cooperative multitasking, also sometimes referred to as co routines. The heart of this technique involves manipulating the processor stack pointer to switch between multiple contexts of execution. Its very similar in concept to true multitasking, however each thread decides when to share the processor. The concept is simple enough, but really so powerful and radically different from many other techniques for AI I have used. I knew I would have to program up a text to play with this idea. Using this method, each object in my game can pretend it is running in its own thread and I am sure this can find application elsewhere too.
The article provides a nice description, but doesn’t provide enough information to directly copy an implementation, and my copy arrived without the CD, but that is fine however and I am having fun of working it out my own. I have made a few projects in the past which manipulated the CPU registers to perform various tricks, so I can build upon that knowledge.
My first implementation appeared to operate correctly until exit where my call to delete threw up an assert. This is because the fake stack was getting sprayed with bad data somehow from within my virtual thread. I will be spending lots of time in the debugger until I find out the cause.
I believe that the debug run time version of printf() is writing past the end of the 1kb stack, causing a buffer overflow. Hmm….
I went back and read the article all over again and there were many hints that I simply didn’t have enough stack space at all. The author also proposed a very nice improvement, where we effectively extend the stack size by copying each micro thread into much larger stack before execution. Thus the Micro Thread code can make use of a large stack space, but individually they only take up a small amount of space. After execution, we extract the smaller part of the stack again. This places the limitation that _yield() cant be called deep in some code in the thread, but that is hardly a restriction at all.
Here is my demo code for anyone that want to have a look at its implementation. I wrapped it up with a little API as well so it should be somewhat readable.
My brain must have been thinking of improvements overnight since I woke up with three ideas I had to experiment with. In each case it was to address the stability and protect against stack overflows. The most major improvement is the use of VirtualProtect(), which is used here to set the bottom of the shared stack region to a guard page. This wastes 4k of stack space, but it is well worth it since now a nice exception is thrown if we write to the stack anywhere in the bottom 4k.
The second improvement, is to check the value of the stack pointer upon return from a micro thread. If it falls within the space allocated to each thread then everything is fine, otherwise we will need to allocate more stack space for that thread. If this check fails, and esp lies outside of our thread space, then we can be sure that we wont be saving all of that threads state during the task switch. That would lead to very very bad things.
A final improvement is to address what happen when the micro thread function returns. Currently, there is no return address on the stack, so execution could jump literally anywhere.
The solution is to ensure that we have a valid address for this function to return to. We can set the return address to a custom function that can handle the situation on one of two ways. We can setup the stack and processor state so that it will execute the micro thread function once more, thus it will feel like our micro thread function is permanently called. Alternatively we can mark that thread as finished, switch back to our main thread, and then kill the micro thread cleanly. Since both options have merit, I will implement both of then and let the user specify the handling during thread creation.
I am now playing with Vectored Exception handlers to see if I can recover from stack overflows, and automatically terminate a malfunctioning thread. I have confirmed that an exception is being thrown when I intentionally overflow the stack from within a micro thread. It shouldn’t be too hard to switch the context back from within a vectored exception handler. I remember having played around with something similar, having self modifying code from an exception handler if a debugger is detected. According to the original article, it states that structured exceptions no longer work correctly from micro threads since no handlers are on the stack, perhaps this is something I can look into addressing also.
Here is the latest version of the library provided as a source release. The vectored exception handler really works amazingly well, terminating any micro thread that attempts to overwrite its stack. If a micro thread returns from its function, the thread will terminate as well. There are many more stability improvements and validity check making this quite a viable option for use in any project now. I have included a small test program also so that the functionality can be demonstrated.