Search This Blog

Monday 22 August 2016

What killed my message loop?

Every once in a while, I find myself in situations where I have to put most of my debugging skills into action. The reasons vary, however one of the worst is when you make decisions based on false assumptions. These are extremely dangerous as they not only make the process of debugging much longer they might also lead to wrong conclusions and dead ends.
Some colleagues of mine asked for my help the other day. The software we develop displays a login dialog at startup. Pressing Cancel is supposed to close the application but the process was said to remain in memory thereafter. While the login screen is visible, several components start initializing, so that we gain some performance after successful authentication. The application itself is very complex, but it's basically a mixture of WinForms and Silverlight. Again because of performance reasons, the app utilizes 2 UI threads, one for WinForms (Thread A from now on) stuff and another for the Silverlight plugins (Thread B). The latter is created using very similar code to this:
Thread t = new Thread(() =>
{
    //Do some initialization stuff, like creating a WebBrowser control for the Silverlight content

    //Enter message loop
    System.Windows.Forms.Application.Run();
});

t.SetApartmentState(ApartmentState.STA);
t.Start();


Of course the message loop has to be stopped eventually and that we do during the disposal of our component:
protected override void Dispose(bool disposing)
{
    if (disposing)
    {
        //Run this on the message loop thread
        Application.ExitThread();
    }
}


Okay, so as I said, cancelling the login dialog did not result in process termination, which implied that a foreground thread got stuck, so first I checked the threads and call stacks with WinDbg and found 2 points of interest.
First of all, there was no sign of Thread B. What the heck?
Secondly, Thread A seemed to be in the middle of waiting for an operation to complete.
0:008> !clrstack
OS Thread Id: 0xe24 (8)
Child SP       IP Call Site
05a2e4cc 7757019d [HelperMethodFrame_1OBJ: 05a2e4cc] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
05a2e5b0 6ebac7c1 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
05a2e5c8 6ebac788 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
05a2e5dc 69414e7e System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
05a2e61c 697e3b96 System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
05a2e620 6941722b [InlinedCallFrame: 05a2e620]
05a2e6a4 6941722b System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
05a2e6d8 694171dc System.Windows.Forms.Control.Invoke(System.Delegate)
...
05a2e77c 6e0b501a System.ComponentModel.Component.Dispose()
05a2e788 697df58c System.Windows.Forms.Form.Dispose(Boolean)
...
05a2e7f8 6e0b501a System.ComponentModel.Component.Dispose()
05a2e804 7086d5df Microsoft.Practices.ObjectBuilder.LifetimeContainer.Dispose(Boolean)
05a2e84c 7086d541 Microsoft.Practices.ObjectBuilder.LifetimeContainer.Dispose()
05a2e854 6a5a99ca Microsoft.Practices.CompositeUI.WorkItem.Dispose(Boolean)
...


Okay, so it seems we try to invoke a delegate synchronously with Control.Invoke(), but what is the runtime type of this Control?
0:008> !dso
OS Thread Id: 0xe24 (8)
ESP/REG  Object   Name
05A2E3D0 024a72d8 System.Windows.Forms.WindowsFormsSynchronizationContext
05A2E428 12302034 Microsoft.Win32.SafeHandles.SafeWaitHandle
05A2E4C0 12302034 Microsoft.Win32.SafeHandles.SafeWaitHandle
05A2E4F0 12302034 Microsoft.Win32.SafeHandles.SafeWaitHandle
05A2E53C 0275c894 System.Windows.Forms.WebBrowser
...

0:008> !clrstack -a
OS Thread Id: 0xe24 (8)
Child SP       IP Call Site
...

05a2e5dc 69414e7e System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
    PARAMETERS:
        this (0x05a2e5e8) = 0x0275c894
        waitHandle (0x05a2e5e4) = 0x1230201c
    LOCALS:
        <no data>
        0x05a2e5f4 = 0x00000d18
        <no data>
        <no data>
        <no data>
        0x05a2e5e0 = 0x00000000
        0x05a2e5ec = 0x00000000
        <no data>
...


It’s a WebBrowser control, which makes sense as it hosts the Silverlight plugin. Okay, now we know, that Thread A is waiting for a synchronous call to finish on the thread that created the WebBrowser control. Any guess which thread that is? It’s Thread B! The one that disappeared! It doesn’t really matter what method is waiting for execution, the problem is obviously with the absence of Thread B that can’t execute anything anymore.
So what killed the message loop of Thread B? I repeated the use-case, with the following breakpoint:
!bpmd System.Windows.Forms.dll System.Windows.Forms.Application.ExitThread

But no luck. This was not invoked at all. And at this point, I made a mistake. I knew about another api – Application.Exit() – but I made the following 2 false assumptions with regards to it:
1. It does not kill my message loop if invoked from a thread different from Thread B –> FALSE
2. No one calls this, because it’s a rather aggressive way of exiting an application –> FALSE
So based on these, I was chasing ghosts for a while, e.g. looking for ThreadAbortExceptions and standard exceptions, which did help a little bit, as these revealed the following call stack while Thread B was still alive:
0:019> !clrstack
OS Thread Id: 0x3e80 (19)
Child SP       IP Call Site
0babe250 773b5b68 [InlinedCallFrame: 0babe250]
0babe24c 6a77425c DomainBoundILStubClass.IL_STUB_CLRtoCOM()
0babe250 6a9db707 [InlinedCallFrame: 0babe250] System.Windows.Forms.UnsafeNativeMethods+IOleInPlaceObject.InPlaceDeactivate()
0babe2a8 6a9db707 System.Windows.Forms.WebBrowserBase.TransitionFromInPlaceActiveToRunning()
0babe2b8 6a9db321 System.Windows.Forms.WebBrowserBase.TransitionDownTo(AXState)
0babe2e0 6ab3bd26 System.Windows.Forms.WebBrowserBase.WndProc(System.Windows.Forms.Message ByRef)
0babe310 6a24e33e System.Windows.Forms.WebBrowser.WndProc(System.Windows.Forms.Message ByRef)
0babe320 6a237201 System.Windows.Forms.Control+ControlNativeWindow.OnMessage(System.Windows.Forms.Message ByRef)
0babe328 6a2371e9 System.Windows.Forms.Control+ControlNativeWindow.WndProc(System.Windows.Forms.Message ByRef)
0babe33c 6a237130 System.Windows.Forms.NativeWindow.Callback(IntPtr, Int32, IntPtr, IntPtr)
0babe560 004aa0e1 [InlinedCallFrame: 0babe560]
0babe55c 6a289d9b DomainBoundILStubClass.IL_STUB_PInvoke(System.Runtime.InteropServices.HandleRef)
0babe560 6a288a7c [InlinedCallFrame: 0babe560] System.Windows.Forms.UnsafeNativeMethods.IntDestroyWindow(System.Runtime.InteropServices.HandleRef)
0babe598 6a288a7c System.Windows.Forms.UnsafeNativeMethods.DestroyWindow(System.Runtime.InteropServices.HandleRef)
0babe5a8 6a288993 System.Windows.Forms.NativeWindow.DestroyHandle()
0babe5ec 6a2891d8 System.Windows.Forms.Control.DestroyHandle()
0babe5f0 6aa0337b [InlinedCallFrame: 0babe5f0]
0babe664 6aa0337b System.Windows.Forms.Application+ParkingWindow.Destroy()
0babe66c 6a7c0b77 System.Windows.Forms.Application+ThreadContext.DisposeParkingWindow()
0babe670 6a7c0bf0 [InlinedCallFrame: 0babe670]
0babe6a4 6a7c0bf0 System.Windows.Forms.Application+ThreadContext.DisposeThreadWindows() 0babe6c8 6a245f75 System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32)
0babe6cc 6a245bc9 [InlinedCallFrame: 0babe6cc]
0babe754 6a245bc9 System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
0babe7a4 6a245a42 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0babe7d0 6a7bfbca System.Windows.Forms.Application.Run()



Hm… checking the source code of System.Windows.Forms.dll I found that DisposeThreadWindows() is only invoked if the message loop processes message 18 = 0x12, which turns out to be defined as WM_QUIT in "WinUser.h". You do have Windows SDK installed, don’t you? :-)

So what is sending WM_QUIT to our message loop? There are several win32 functions to achieve this, so I decided to define some native breakpoints to get my hands on the evil call stack. 
bu user32!PostMessageW "dd [esp+8] L1;.if (poi(@esp+8)!=0x12) {gc;}"
bu user32!PostThreadMessageW "dd [esp+8] L1;.if (poi(@esp+8)!=0x12) {gc;}"
bu user32!SendMessageW  "dd [esp+8] L1;.if (poi(@esp+8)!=0x12) {gc;}"
bu user32!SendNotifyMessageW  "dd [esp+8] L1;.if (poi(@esp+8)!=0x12) {gc;}"
bu user32!SendMessageCallbackW  "dd [esp+8] L1;.if (poi(@esp+8)!=0x12) {gc;}"


Okay, this requires some explanation. The debuggee is a 32-bit process, so these win32 functions are called with standard calling convention, i.e. function parameters are passed on the stack, pushed right to left, and the callee cleans the stack. When these breakpoints are hit, the following stack layout can be observed:
Object Offset
RetAddr 0 <—TopOfStack = [ss:Esp]
hWnd 0x4
Msg 0x8
wParam 0xC
lParam 0x10

So [ss:Esp+8] points to the message we are interested in. We always print this value and if it’s not WM_QUIT we continue execution.
I reproduced the use-case and voilĂ , I got lucky with PostThreadMessageW:
0:007> !clrstack
OS Thread Id: 0x42fc (7)
Child SP       IP Call Site
05ece9b8 774cddc0 [InlinedCallFrame: 05ece9b8] System.Windows.Forms.UnsafeNativeMethods.PostThreadMessage(Int32, Int32, IntPtr, IntPtr)
05ece9b4 6acb12eb System.Windows.Forms.Application+ThreadContext.PostQuit()
05ece9e8 6acb08da System.Windows.Forms.Application+ThreadContext.Dispose(Boolean)
05ece9ec 6acb10ea [InlinedCallFrame: 05ece9ec]
05ecea40 6acb10ea System.Windows.Forms.Application+ThreadContext.OnAppThreadExit(System.Object, System.EventArgs)
05ecea48 6acbffe7 System.Windows.Forms.ApplicationContext.ExitThreadCore()
05ecea54 6acb0df1 System.Windows.Forms.Application+ThreadContext.ExitCommon(Boolean)
05ecea88 6acb0269 System.Windows.Forms.Application.ExitInternal()
05eceac0 6acaf81e System.Windows.Forms.Application.Exit(System.ComponentModel.CancelEventArgs)
05ecead8 6bbb4aac ***LoginDialog.OnCancelButtonClick(System.Object, System.EventArgs)
05eceb10 012ba9b7 [MulticastFrame: 05eceb10] System.EventHandler.Invoke(System.Object, System.EventArgs)
05eceb3c 6a719366 System.Windows.Forms.Control.OnClick(System.EventArgs)
05eceb50 6a71ba1c System.Windows.Forms.Button.OnClick(System.EventArgs)
05eceb60 6acde020 System.Windows.Forms.Button.OnMouseUp(System.Windows.Forms.MouseEventArgs)
...
05ecebc8 6acba8ec System.Windows.Forms.Control.WmMouseUp(System.Windows.Forms.Message ByRef, System.Windows.Forms.MouseButtons, Int32)
05ecec28 6b02381a System.Windows.Forms.Control.WndProc(System.Windows.Forms.Message ByRef)
...


Gotcha’! This was the point I checked the docs of Exit() that was called on Thread A and resulted in exiting the message loop of Thread B and thus eventually destroying it. A little later in time Thread A got stuck as it wanted to perform something synchronously on Thread B.
The docs on msdn proved to be right: Exit() “Informs all message pumps that they must terminate, and then closes all application windows after the messages have been processed”.

Possible solutions

So how to fix this? There are many approaches, but these ones seemed to be the most appropriate:
1. Use ExitThread() instead of Exit() – this way only the message loop of the executing thread is affected.
2. Use the wpf way to enter the message loop, i.e. Dispatcher.Run(). Of course, you’ll have to change ExitThread() to something like Dispatcher.ExitAllFrames() or Dispatcher.[Begin]InvokeShutdown() to exit the loop during shutdown.

Conclusion

Creating a managed breakpoint for Application.Exit() would have showed the problematic call stack much faster. A quick search on message loop shutdown would have surely pointed this out. Oh well. That’d have been pretty boring, don’t you agree? ,-)