Entries tagged public bugfixing

Related tags: bugs, chronicle, debian, ftp, gnomad2, less, make.

Guess he wasn't too popular at the end, huh?

Tuesday, 23 December 2008

Previously I've posted a couple of running commentries describing how I've examined and gone about fixing a couple of bugs in Debian packages I use, enjoy, or have stumbled upon by accident.

Each of these commentries has resulted in a change or two to the affected software to make the bug vanish.

Fixing the bug is usually the hard part, but obviously it isn't the only thing that you need to do. While it is fun to have a locally fixed piece of software if you don't share it then the very next release will have the same bug again - and you'll spend your life doing nothing more than fixing the same bug again and again.

Generally the way that I report my fixes is by sending email to the Debian bug tracker - because I usually only try to fix bugs that I see reported already. Specificially I tend to only care about:

  • Bugs in packages that I maintain.
  • Bugs that affect me personally. I'm selfish.
  • Bugs in packages I use, even if they don't affect me.
  • Segfaults. Because segfaults and security issues often go hand in hand.

So my starting point is generally an existing bug report, such as the last bug I attempted to fix:

This bug was pretty simple to track down, and once I had added a couple of lines to the source to fix it creating the patch and reporting it was pretty simple.

The way I generally work is to download the source tree of the Debian package to my local system and work with it in-place until I think I've fixed the issue. I generally get the current sources to a package by running:

apt-get source "package"

Once I'm done fixing I'll want to create a patch. A patch is just a simple way of saying:

  • open file foo.c
  • Change "xxx" on line 12 to be "yyy".
  • Add "blah blah" after line 25

Assuming I made my changes in the local source in /tmp/gnomad2-2.9.1 I'll move that somewhere safe, and download another copy of the unmodified source, so I can create a diff and ensure that I've got the change recorded correctly:

skx@gold:/tmp$ mv gnomad2-2.9.1 gnomad2-2.9.1.new
skx@gold:/tmp$ apt-get source gnomad2

Now I have two trees:

  • /tmp/gnomad2-2.9.1.new - My modified, fixed, and updated directory.
  • /tmp/gnomad2-2.9.1 - The current source in Debian's unstable branch.

Creating the patch just means running:

 diff --recursive \
      --ignore-space-change \
      --context \
 gnomad2-2.9.1 gnomad2-2.9.1.new/

This will output the diff to the console. You can save it to a file too by rusing tee:

 diff --recursive \
      --ignore-space-change \
      --context \
 gnomad2-2.9.1 gnomad2-2.9.1.new/  | tee /tmp/patch.diff

If the new directory isn't clean your patch will include things like:

Only in gnomad2-2.9.1.new/src: player.o
Only in gnomad2-2.9.1.new/src: playlists.o
Only in gnomad2-2.9.1.new/src: prefs.o
Only in gnomad2-2.9.1.new/src: tagfile.o
Only in gnomad2-2.9.1.new/src: util.o
Only in gnomad2-2.9.1.new/src: wavfile.o

Just edit those lines out of the diff.

So the next step is to submit that to the bug report. Simply mail "bugnumber@bugs.debian.org" (509288@bugs.debian.org in our example) including your patch in the mail. If all goes well you'll receive an auto-reply after a while.

Finally to make things neat you can manipulate the bug tracker by email by sending a mail :

To: control@bugs.debian.org
Subject: updates

tags 12344 + patch
end
stop
bored now

Here is one I sent earlier.

This will ensure that the bug is reported as having a patch present in it.

Job done.

In an ideal world the next time the package is uploaded to Debian the bug will be fixed, marked as closed, and the world will be a little happier.

In a non-ideal world your patch will sit in the bug tracker for years with no further comment. If that happens there is not too much you can do, except send reminders by email, or distract yourself with a nice curry.

Happily Debian maintainers really do seem to appreciate bug fixes, and I'd say it is rare that my fixes have been ignored. It happens, but not often enough to make me give up.

ObFilm: Ghostbusters 2

| 7 comments.

 

What can you do? Sparta will need sons.

Tuesday, 23 December 2008

Since I've ranted a little recently lets do another public bugfix. The last few times people seemed to like them, and writing things down helps me keep track of where I am, what I'm doing, and how soon it will be "beer o'clock".

So I looked over the release critical bug list, looking for things that might be easy to fix.

One bug jumped out at me:

I installed the package:

skx@gold:~$ apt-get install gnomad2
..
The following NEW packages will be installed
  gnomad2 libmtp7 libnjb5 libtagc0
..

Once I copied an .mp3 file to my home directory, with the .ogg suffix I got a segfault on startup:

skx@gold:~$ cp /home/music/Audio/RedDwarf-back-to-reality.mp3 ~/foo.ogg
skx@gold:~$ gnomad2
..
LIBMTP_Get_First_Device: No Devices Attached
PDE device NULL.
TagLib: Ogg::File::packet() -- Could not find the requested packet.
TagLib: Vorbis::File::read() - Could not find the Vorbis comment header.
[segfault]
skx@gold:~$

So I downloaded the source ("apt-get source gnomad2"), and the dependencies for rebuilding ("apt-get build-dep gnomad2"). This allowed me to rebuild it locally:

skx@gold:/tmp/gnomad2-2.9.1$ ./configure --enable-debug && make
skx@gold:/tmp/gnomad2-2.9.1$ cd src/

And now it can be run under GDB.

skx@gold:/tmp/gnomad2-2.9.1/src$ gdb ./gnomad2
GNU gdb 6.8-debian
...
(gdb) run
Starting program: /tmp/gnomad2-2.9.1/src/gnomad2
...
LIBMTP_Get_First_Device: No Devices Attached
PDE device NULL.
[New Thread 0x41dcd950 (LWP 23593)]
TagLib: Ogg::File::packet() -- Could not find the requested packet.
TagLib: Vorbis::File::read() - Could not find the Vorbis comment header.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41dcd950 (LWP 23593)]
0x00007fc89e0340b0 in taglib_tag_artist () from /usr/lib/libtag_c.so.0
(gdb)

Interestingly the crash comes from a library, libtag_c.so.0. So either:

  • The bug is realy in the libtag module.
  • The gnomad2 package doesn't handle a failure case it should.

Time to guess which that is. Charitably we'll assume any library that segfaults will be quickly fixed, because it will have more users than a single program..

Moving on we can look at the gnomad2 source for mentions of taglib. Several mentions are found, but src/tagfile.c is clearly the correct place to look. That file contains the following code:

void
get_tag_for_file (metadata_t *meta)
{
  gchar *tmppath = filename_fromutf8(meta->path);
  TagLib_Tag *tag;
  const TagLib_AudioProperties *properties;
  TagLib_File *file = taglib_file_new(tmppath);

  if(file == NULL) {
    g_printf("could not open file %s", tmppath);
    g_free(tmppath);
    return;
  }
  g_free(tmppath);

  tag = taglib_file_tag(file);
  properties = taglib_file_audioproperties(file);

  gchar* artist = taglib_tag_artist(tag);
  ..

This looks like a great place to explore because opening a file to read the tags is most likely where the crash is going to be coming from.

Interestingly we see the code:

  • Calls taglib_file_new to get a handle of some kind relating to the file.
  • Tests for errors with that operation.
  • Calls taglib_file_tag to fetch tag information using the handle, and indirectly the file.
  • But then uses taglib_tag_artist to fetch the artist (?I guess?) without testing the handle previously obtained was valid.

Let us guess that the file opening is succeeding but that the tag structure fetched via taglib_file_tag is NULL - and this causes the crash.

A quick edit:

  g_free(tmppath);

  tag = taglib_file_tag(file);
  if(tag == NULL) {
    g_printf("tag was null");
    return;
  }

  properties = taglib_file_audioproperties(file);
  gchar* artist = taglib_tag_artist(tag);

Rebuild, and the segfault is gone. We have a winner. Now we just need to file a patch...

ObFilm: 300

| 7 comments.

 

Don't you just hate loose ends?

Friday, 21 March 2008

Today I spent a while fixing some more segfault bugs. I guess that this work qualifies as either fixing RC bugs, or potential security bugs.

Anyway I did an NMU of libpam-tmpdir a while back to fix all but one of the open bugs against it.

I provided a patch for #461625 yelp: segfault while loading info documentation, which fixes the symptoms of bad info-parsing, and avoids the segfault.

I also looked into the #466771 busybox cpio: double free or corruption during cpio extraction of hardlinks - but it turns out that was already fixed in Sid.

Finally I found a segfault bug open against ftp:

To reproduce this bug run:

skx@gold:~$ ftp ftp.debian.org
220 saens.debian.org FTP server (vsftpd)
Name (ftp.debian.org:skx): anonymous
331 Please specify the password
Password: foo@bar.com
ftp> cd debian/doc
250 Directory successfully changed.
ftp> get dedication-2.2.cn.txt dedication-2.2.de.txt dedication-2.2.es.txt ..
local: dedication-2.2.de.txt remote: dedication-2.2.cn.txt
Segmentation fault

You need to repeat the arguments about 50 times. But keep adding more and more copies of the three files to the line until you get the crash.

It isn't interesting as a security issue as it is client side only; but as a trivially reproducable issue it becomes fun to solve.

Lets build it with debugging information, and run it again. Here is what we see:

Core was generated by `./ftp/ftp ftp.debian.org'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002b85ad77f1cf in fwrite () from /lib/libc.so.6
(gdb) up
#1  0x0000000000408c3e in command (fmt=0x40dd15 "TYPE %s") at ftp.c:366
366		fputs("\r\n", cout);
(gdb) up
#2  0x0000000000402c3e in changetype (newtype=3, show=)
    at cmds.c:348
348			comret = command("TYPE %s", p->t_mode);
(gdb) up
#3  0x000000000040a569 in recvrequest (cmd=,
    local=0x623d10 "dedication-2.2.de.txt",
    remote=0x6238d4 "dedication-2.2.cn.txt", lmode=0x40e310 "w",
    printnames=) at ftp.c:935
935			changetype(type, 0);

OK so things look trashed, and not in the middle of a copy/sprintf/similar. i.e. there is no obvious problem.

Lets take a step back from things. We know that the crash occurs when we send a long command line. Looking over the code we see the fucntion main.c:makeargv(). This attempts to split the input command line string into an array of tokens.

Interestingly we see this:

char **
makeargv(int *pargc, char **parg)
{
	static char *rargv[20];
	int rargc = 0;
	char **argp;

I wonder what happens if we set the 20 to 2048? Good guess. The crash no longer occurs. (Though I'm sure it would if you entered enough tokens...)

So we know that the crash is relating to the number of space-separated tokens upon the command line. If we increase the limit we'll be fine. But of course we want to fix it properly. There are two ways forward:

  • Abort handling a line if there are >15 "space" characters on the line.
  • Recode the makeargv function to work properly.

I did eventually submit a patch to the bug report which uses dynamic memory allocation, and should always work. Job done.

I mailed the maintainer of FTP and said unless I heard differently I'd NMU and cleanup the package in a week.

All being well this entry will be nicely truncated in the RSS feeds as support for the <cut> tag was the main new feature in my previous upload of chronicle - the blog compiler I use/wrote/maintain.

ObQuote: Razor Blade Smile

| No comments

 

Some people get by with a little understanding

Sunday, 9 March 2008

Since my last example of fixing a bug received some interesting feedback (although I notice no upload of the package in question ..) we'll have another go.

Looking over my ~/.bash_history file one command I use multiple times a day is make. Happily GNU make has at least one interesting bug open:

I verified this bug by saving the Makefile in the report and running make:

skx@gold:~$ make
make: file.c:84: lookup_file: Assertion `*name != '\0'' failed.
Aborted

(OK so this isn't a segfault; but an assertion failure is just as bad. Honest!)

So I downloaded the source to make, and rebuilt it. This left me with a binary with debugging symbols. The execution was much more interesting this time round:

skx@gold:~$ ./make
*** glibc detected ***
  /home/skx/./make: double free or corruption (fasttop): 0x00000000006327b0 ***
======= Backtrace: =========
/lib/libc.so.6[0x2b273dbdd8a8]
/lib/libc.so.6(cfree+0x76)[0x2b273dbdf9b6]
/home/skx/./make[0x4120a5]
/home/skx/./make[0x4068ee]
/home/skx/./make[0x406fb2]
...
[snip mucho texto]

And once I'd allowed core-file creation ("ulimit -c 9999999") I found I had a core file to help debugging.

Running the unstripped version under gdb showed this:

(gdb) up
#5  0x00000000004120a5 in multi_glob (chain=0x1c, size=40) at read.c:3106
3106			    free (memname);

So it seems likely that this free is causing the abort. There are two simple things to do here:

  • Comment out the free() call - to see if the crash goes away (!)
  • Understand the code to see why this pointer might be causing us pain.

To get started I did the first of these: Commenting out the free() call did indeed fix the problem, or at least mask it (at the cost of a memory leak):

skx@gold:~$ ./make
make: *** No rule to make target `Erreur_Lexicale.o', needed by `compilateur'.  Stop.

So, now we need to go back to read.c and see why that free was causing problems.

The function containing the free() is "multi_glob". It has scary pointer magic in it, and it took me a lot of tracing to determine the source of the bug. In short we need to change this:

free (memname);

To this:

free (memname);
memname = 0;

Otherwise the memory is freed multiple times, (once each time through the loop in that function. See the source for details).

Patch mailed.

| 5 comments.

 

Listen to me when I'm telling you

Thursday, 14 February 2008

So today I'm a little bit lazy and I've got the day off work. As my previous plan suggested I wanted to spend at least some of the day tackling semi-random bugs. Earlier I picked a victim: less.

less rocks, and I use it daily. I even wrote an introduction to less once upon a time.

So lets take a look at two bugs from the long-neglected pile. These two issues are basically the same:

They seem like simple ones to fix, with the same root cause. Here's an example if you want to play along at home:

 cp /dev/null testing
 gzip testing
 zless testing.gz

What do you see? I see this:

"testing.gz" may be a binary file.  See it anyway?

When I select "y" I see the raw binary of the compressed file.

So, we can reproduce it. Now to see why it happens. /bin/zless comes from the gzip package and is a simple shell script:

#!/bin/sh
# snipped a lot of text
LESSOPEN="|gzip -cdfq -- %s"; export LESSOPEN
exec less "$@"

So what happens if we run that?

$ LESSOPEN="|gzip -cdfq -- ~/testing.gz" /usr/bin/less ~/testing.gz
"/home/skx/testing.gz" may be a binary file.  See it anyway?

i.e. it fails in the same way. Interestingly this works just fine:

gzip -cdfq -- ~/testing.gz | less

So we've learnt something interesting and useful. We've learnt that when LESSOPEN is involved we get the bug. Which suggests we should "apt-get source less" and then "rgrep LESSOPEN ~/less-*/".

Doing so reveals the following function in filename.c:

	public char *
open_altfile(filename, pf, pfd)
	char *filename;
	int *pf;
	void **pfd;
{

/* code to test whether $LESSOPEN is set, and attempt to run the
   command if it is */

		/*
		 * Read one char to see if the pipe will produce any data.
		 * If it does, push the char back on the pipe.
		 */
		f = fileno(fd);
		SET_BINARY(f);

		if (read(f, &c, 1) != 1)
		{
			/*
			 * Pipe is empty.  This means there is no alt file.
			 */
			pclose(fd);
			return (NULL);
		}
		ch_ungetchar(c);
		*pfd = (void *) fd;
		*pf = f;
		return (save("-"));

That might not be digestible, but basically less runs the command specified in $LESSOPEN. If it may read a single character of output from that command it replaces the file it was going to read with the output of the command instead!

(i.e. Here less failed to read a single character, because our gzipped file was zero-bytes long! So instead it reverted to showing the binary gzipped file.)

So we have a solution: If we want this to work we merely remove the "read a single character test". I can't think of circumstance in which that would do the wrong thing, so I've submitted a patch to do that.

Bug(s) fixed.

Incidentally if you like these kind of "debuggin by example" posts, or hate them, do let me know. So I'll know whether to take notes next time or not..

| 22 comments.

 

Recent Posts

Recent Tags