About Archive Tags RSS Feed

 

Failing to debug a crash with epiphany-browser and webkit

20 August 2012 21:50

Today I'm in bed, because I have le sniffles, and a painful headache. I'm using epiphany to write this post, via VNC to my main desktop, but I'm hating it as I've somehow evolved into a state where the following crashes my browser:

  • Open browser.
  • Navigate to gmail.com
  • Login.
  • Wait for page to complete loading, showing my empty inbox.
  • Click "signout".

Running under GDB shows nothing terribly helpful:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff51a0a46 in ?? () from /usr/lib/libwebkit-1.0.so.2
#2  0x00007ffff3d8f79d in ?? () from /usr/lib/libsoup-2.4.so.1
#3  0x00007ffff2a4947e in g_closure_invoke () from /usr/lib/libgobject-2.0.so.0
#4  0x00007ffff2a5f7f4 in ?? () from /usr/lib/libgobject-2.0.so.0
...

To get more detail I ran "apt-get install epiphany-browser-dbg" - this narrows down the crash, but not in a useful way:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff51a0a46 in finishedCallback (session=<value optimized out>, msg=0x7fffd801d9c0, data=) at ../WebCore/platform/network/soup/ResourceHandleSoup.cpp:329
#2  0x00007ffff3d8f79d in ?? () from /usr/lib/libsoup-2.4.so.1
#3  0x00007ffff2a4947e in g_closure_invoke () from /usr/lib/libgobject-2.0.so.0
#4  0x00007ffff2a5f7f4 in ?? () from /usr/lib/libgobject-2.0.so.0
..

So this crash happens in ResourceHandleSoup.cpp. Slowly I realized that this came from the webkit package.

We see that the last call by name is to the function in line ResourceHandleSoup.cpp:329, that puts us at the last line of this function:

// Called at the end of the message, with all the necessary about the last informations.
// Doesn't get called for redirects.
static void finishedCallback(SoupSession *session, SoupMessage* msg, gpointer data)
{
    RefPtr<ResourceHandle> handle = adoptRef(static_cast<ResourceHandle*>(data));

    // TODO: maybe we should run this code even if there's no client?
    if (!handle)
        return;

    ResourceHandleInternal* d = handle->getInternal();

    ResourceHandleClient* client = handle->client();
    if (!client)
       return;

..
..
    client->didFinishLoading(handle.get());
}

So we see there is some validation that happens, then a call to "didFinishLoading" and somewhere shortly after that it dies. didFinishLoading looks trivial:

void WebCoreSynchronousLoader::didFinishLoading(ResourceHandle*)
{
      g_main_loop_quit(m_mainLoop);
      m_finished = true;
}

So my mental-debugging is stymied. I blame my headache. It looks like there is no obvious NULL-pointer deference, if we pretend client cannot be NULL. So the next step is to get the source, the build-dependencies and then build a debug version of webkit. I ran "apt-get source webkit", then editted the file ./debian/rules to add --enable-debug and rebuilt it:

skx@precious:~/Debian/epiphany/webkit-1.2.7$ DEB_BUILD_OPTIONS="nostrip noopt" debuild -sa

*time passes*

The build fails:

  CXX    WebCore/svg/libwebkit_1_0_la-SVGUseElement.lo
../WebCore/svg/SVGUseElement.cpp: In member function ‘virtual void WebCore::SVGUseElement::insertedIntoDocument()’:
../WebCore/svg/SVGUseElement.cpp:125: error: ‘class WebCore::Document’ has no member named ‘isXHTMLDocument’
../WebCore/svg/SVGUseElement.cpp:125: error: ‘class WebCore::Document’ has no member named ‘parser’
make[2]: *** [WebCore/svg/libwebkit_1_0_la-SVGUseElement.lo] Error 1

Ugh. So I guess we disable that "--enable-debug", and hope that "nostrip noopt" helps instead.

*Thorin sits down and starts singing about gold*

Finally the debugging build has finished and I've woken up again. Let us do this thing. I'd looked over the webkit tracker and the crashing bugs list in the meantime, but nothing jumped out at me as being similar to my issue.

Anyway without the --enable-debug flag present in the call to ../configure the Debian webkit packages were built, eventually, and installed:

skx@precious:~/Debian/epiphany$ mv libwebkit-dev_1.2.7-0+squeeze2_amd64.deb x.deb
skx@precious:~/Debian/epiphany$ sudo dpkg --install libweb*deb
[sudo] password for skx:
(Reading database ... 173767 files and directories currently installed.)
Preparing to replace libwebkit-1.0-2 1.2.7-0+squeeze2 (using libwebkit-1.0-2_1.2.7-0+squeeze2_amd64.deb) ...
Unpacking replacement libwebkit-1.0-2 ...
Preparing to replace libwebkit-1.0-2-dbg 1.2.7-0+squeeze2 (using libwebkit-1.0-2-dbg_1.2.7-0+squeeze2_amd64.deb) ...
Unpacking replacement libwebkit-1.0-2-dbg ...
Preparing to replace libwebkit-1.0-common 1.2.7-0+squeeze2 (using libwebkit-1.0-common_1.2.7-0+squeeze2_all.deb) ...
Unpacking replacement libwebkit-1.0-common ...
Setting up libwebkit-1.0-common (1.2.7-0+squeeze2) ...
Setting up libwebkit-1.0-2 (1.2.7-0+squeeze2) ...
Setting up libwebkit-1.0-2-dbg (1.2.7-0+squeeze2) ...
skx@precious:~/Debian/epiphany$

Good news everybody: The crash still happens!

Firing up GDB should hopefully reveal more details - but sadly it didn't.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff51a0a46 in finishedCallback (session=, msg=0xb03420, data=) at ../WebCore/platform/network/soup/ResourceHandleSoup.cpp:329
..
(gdb) up
#1  0x00007ffff51a0a46 in finishedCallback (session=, msg=0xb03420, data=) at ../WebCore/platform/network/soup/ResourceHandleSoup.cpp:329
329     client->didFinishLoading(handle.get());
(gdb) p client
$1 = <value optimized out>

At this point my head hurts too much, and I'm stuck. No quote today, I cannot be bothered.

| 9 comments

 

Comments on this entry

icon Steven C at 13:21 on 20 August 2012
http://pyro.eu.org/

If you haven't already, you could try to reproduce it using /usr/lib/webkit-*/libexec/GtkLauncher just to rule out anything epiphany-browser itself might be doing.

Does "bt full" give any other clues? It seems as if "client" or "handle" somehow becomes zero even after the code was supposed to check for that. But optimisation has made things incomprehensible.

What might the optimizer not see? Perhaps a buffer being overrun and thus wiping out client or handle?

The new webkit for Wheezy ought to give you lots more fun of this nature.

icon Steve Kemp at 13:30 on 20 August 2012
http://steve.org.uk/

Thanks for the pointer to that script, it still fails, which I guess means the bug is in webkit itself.

There are no further, useful, details in the full backtrace though:

skx@precious:~$ gdb /usr/lib/webkit-1.0-2/libexec/GtkLauncher
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
..
[New Thread 0x7fffdbb82700 (LWP 20566)]
[Thread 0x7fffdbb82700 (LWP 20566) exited]
..
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x00007ffff73a5a46 in finishedCallback (session=&lt;value optimized out&gt;, msg=0x804ad0, data=&lt;value optimized out&gt;) at ../WebCore/platform/network/soup/ResourceHandleSoup.cpp:329
        d = 0x7fffd7d4d400
        client = &lt;value optimized out&gt;
#2  0x00007ffff381d79d in ?? () from /usr/lib/libsoup-2.4.so.1
No symbol table info available.

icon Anonymous at 14:20 on 20 August 2012

Your blog software has done strange things to the backtrace: it turned 'value optimized out' into 'value optimized="optimized" out="out"'

icon Anonymous at 14:22 on 20 August 2012

How about installing debug info for libsoup and libgobject?

icon Steven C at 14:28 on 20 August 2012
http://pyro.eu.org/

'd' has a value, so the handle->getInternal dereference must have been okay. I think that somehow the 'client' variable is zero, either because the optimiser assumed it could never happen, and/or it got changed within that function, sometime after it was tested.

I wonder if there's a way to stop that variable getting optimised out, without turning off optimisations altogether? Then you might be able to see it's value on entry into the function, or step through and watch for it being changed?

icon Sami Liedes at 15:23 on 20 August 2012

At this point I usually try to run the crashing program under Valgrind. Just do 'valgrind $binary' and read the output. It's likely to give a good hint about what goes wrong.

icon Psychon at 15:34 on 20 August 2012

Try printf("%p\n", client->didFinishLoading);

My guess from the backtrace would be that this tries to call a function at address 0x0 and doesn't really dereference a NULL pointer, hence this guess.

icon Steve Kemp at 15:47 on 20 August 2012
http://steve.org.uk/

I tried wrapping the function to disable optimizations - but that resulted in no changes.

So the backtrace is still the unhelpful:

#1  0x00007ffff73a5178 in finishedCallback (session=[value optimized out], msg=0x8244f0, data=[value optimized out]) at ../WebCore/platform/network/soup/ResourceHandleSoup.cpp:332

Trying to do things mostly is painful :

(gdb) call printf("%p\n", client-&gt;didFinishLoading)
0x7fffd903fbe8
$1 = 15
(gdb) call client-&gt;didFinishLoading()
Cannot access memory at address 0x0

So it seems that "client->didFinishLoading" points to an address which is non-NULL, but calling that function results in the NULL pointer exception. The code again is this:

(>void WebCoreSynchronousLoader::didFinishLoading(ResourceHandle*)
  {
      g_main_loop_quit(m_mainLoop);
      m_finished = true;
  }

But attempting to printf() on "m_mainLoop" or "m_finished" shows no such symbol. Ditto for client->m_finished,etc. I'm out of depth.

I'll try valgrind, but I expect it will take forever and result in a million error-reports.

icon Steven Chamberlain at 18:47 on 20 August 2012
http://pyro.eu.org/

Maybe it's doing that from the wrong thread?

Might be worth looking into these:
https://bugs.webkit.org/attachment.cgi?id=129994&action=review
https://bugs.webkit.org/show_bug.cgi?id=68238