About Archive Tags RSS Feed

 

Entries tagged source-searching

Seek & destroy

20 June 2007 21:50

Debconf7 continues very well, with a nice trip to a local sauna this evening.

Unfortunately our party had to split into two, with the ladies going to one section and the gentlemen to another. Still it was fun, and I'm glad I went. It was probably just as well that Megan didn't get to see me being terrified of the cold water!

Apart from that things went well today. I saw half a talk on Xen, sufficient to see my name in lights, then half a talk on security, again managing to see my name on the big screen.

If people, at debconf7, would like to learn more about security I've volunteered myself on the "skills exchange" page of the wiki to demonstrate the process which Moritz described as "manual and complicated". (ie. releasing a DSA and updating the webpage.) I think this will happen sometime on Friday.

I also need to track down AJ and talk about debootstrap work.

One more thing, before I go to sleep. During the security talk Moritz did mention the idea of being able to grep through the source code of the entire archive. This is a topic which has been raised before.

Right now I'm keen to make this possible, so overnight I'm syncing the latest sid archive and I think I have a plan to make it work.

  • Sync the sources of the given distribution.
  • Once that has happened recursively unpack any archive we can understand. (.tar.bz2, tar.gz, etc).
  • Either:
    • Write a simple script with "grep".
    • or import the unpacked root into a text indexing system.
    • Also it would be useful to recognize common files via SHA + MD5 checksums.
  • make some simple GUI.

I think that the sync will take a while for me on my home connection, and I also believe that the unpacking will be grossly CPU-intensive. Still it should be a worthwhile job even if I can't get it done, because it will tell us the kind of machine which is required to actually do it.

I'd like to use a project machine because I'm not entirely sure I have the necessary space but that should become apparent fairly quickly. (The archive I can handle, the unpacked tress might be a little bit much for me. Still I have 100Gb free and I guess that is a good starting point).

If anybody has any tips for full-text-indexers that would be appropriate for fast queries against a large directory tree of source code then I'd be interested in hearing about them.

| No comments