About Archive Tags RSS Feed


Entries tagged planet-search

Let the bells ring out for Christmas

2 December 2007 21:50

In the next week I intend to drop the search engine which archives content posted to Planet Debian.

It appears to have very little use, except for myself, and I'm significantly better at bookmarking posts of interest these days.

If you'd like to run your own copy the code is available and pretty trivial to reimplement regardless. There are only two parts:

  • Poll and archive content from the planet RSS feed - taking care of duplicates.
  • Scanning for /robots.txt upon the source-host, to avoid archiving content which should be "private".

Once you've done that you'll have a database populated with blog entries, and you just need to write a little search script.

ObRandom: In the time it has been running it has archived 15,464 posts!

| No comments


You can't hide the knives

6 December 2007 21:50

After recently intending to drop the Planet Debian search and recieving complaints that it was/is still useful it looks like there is a good solution.

The code will be made live and official upon the planet debian in the near future.

The DSA team promptly installed the SQLite3 package for me, and I've ported the code to work with it. Once Apache us updated to allow me to execute CGI scripts it'll be moved over, and I'll export the current data to the new database.

In other news I'm going to file an ITP bug against asql as I find myself using it more and more...

| No comments


I still got the blues for you

11 January 2008 21:50

Been a week since I posted. I've not done much, though I did complete most of the migration of my planet-searching code to gluck.debian.org.

This is now logging to a local SQLite database, and available online.

I've updated the blog software so that I can restrict comments to posts made within the past N days - which has helped with spam.

My other comment-spam system is the use of the crm114 mail filter. I have a separate database now for comments (distinct from that I use for email), and after a training on previous comments all is good.

Other than being a little busy over the past week life is good. Especially when I got to tell a recruitment agent that I didn't consider London to be within "Edinburgh & Surrounding Region". Muppets.

The biggest downside of the week was "discovering" a security problem in Java, which had been reported in 2003 and is still unfixed. Grr. (CVE-2003-1156 for those playing along at home).

Heres the code:

#  Grep for potentially unsafe /tmp usage in shared libraries

find /lib -name '*.so' -type f -print > /tmp/$$
find /usr/lib -name '*.so' -type f -print >> /tmp/$$
for i in $(cat /tmp/$$ ); do
    out=$(strings $i | grep '/tmp')
    if [ ! -z "$out" ]; then
        echo "$i"
        echo "$out"
rm /tmp/$$



I am the Earl of Preston

6 September 2009 21:50

Paul Wise recently reported that the Planet Debian search index hadn't updated since the 7th of June. The search function is something I added to the setup, and although I don't use it very often when I do find it enormously useful.

Anyway normal service should now be restored, but the search index will be missing the content of anything posted for the two months the indexer wasn't running.

Recently I tried to use this search functionality to find a post that I knew I'd written upon my blog a year or so ago, which I'd spectacularly failed to find via grep and my tag list.

Ultimately this lead to my adding a search interface to my own blog entries using the namazu2 package. If I get some free time tomorrow I'll write a brief guide to setting this up for the Debian Administration website - something that has been a little neglected recently.

ObFilm: Bill & Ted's Excellent Adventure