About Archive Tags RSS Feed


Entries tagged webstats

This is the part where you tell me what matters is on the inside

24 November 2009 21:50

Some technology evolves very quickly, for example the following things are used by probably 80% of readers of this page:

  • A web browser.
  • A mail client.
  • A webserver.

But other technology is stuck in the past and only sees laclustre updates and innovations (not that inovation is mandatory or automatically a good thing)

Right now I'm looking at my webserver logs, trying to see who is viewing my sites, where they came from, and what their favourite pie is.

In the free world we have the choice of awstats, webalizer, and visitors (possibly more that I'm unaware of). In the commercial world everybody and their dog uses Google's analytics.

On the face of it a web analysis package is trivial:

  • Read in some access.log files.
  • Process to some internal database representaqtion.
  • Generate static/dynamic HTML output from your intermediate form, optionally including graphs, images, and pie-charts.

If you add javascript-fu to each of your pages you can track page titles, exit links, screen resolutions, and other data to record too. (Though I guess thats a seperate problem; trying to merge that data in with the data you have in your access log without making nasty links like "GET /trackin.gif?x_res=800;y_res=600". Anyway I guess with cookies you could correlate reasonably carefully.)

In conclusion why are my web statistics so dull, boring, and less educational than I desire?

I'd be tempted to experiment, but I suspect this is a problem which has subtle issues I'm overlooking and requires an artistic slant to make pretty.

(ObLink: asql is my semi-solution to logfile analysis.)

ObFilm: Bound



I work alone like you. We always work alone.

27 November 2009 21:50

A couple of days ago I was lamenting the state of webstats, although I was a little vague as to my purpose. Specifically I was wanting to find out about the screen resolutions and user-agents viewing a couple of sites.

To get screen resolutions you really need to inject javascript into your pages, which is icky. Still its a small price to pay, and chances are most people won't notice.

Of course there are drawbacks:

Javascript dependency.

If the visitors don't use/enable javascript you see nothing.

You cannot capture everything.

e.g. HTTP status code isn't available.

To solve this problem completely you therefore need to have access to both your apache logs and your javascript-captured information. Probably.

As a proof of concept I've injected the following javascript into most pages of three sites. This code:

  • Finds the screen resolution.
  • Finds the HTTP referer.
  • Finds the current page's title.
  • Then submits that to a server-side collection script, via a one-by-one pixel IMG

The script that receives the data writes out the data to a small per-domain SQLite database, which I can then use to generate prettyness. However I suck at being pretty, in most ways, so I've only got functional:

All of this is dynamic and most of the data is anchored to "today", as thats proof of concept enough. Were piwik not written in vile PHP I'd use that - I don't see anything similar out there which is Perl..

The big decision is now "Keep it dynamic" vs. "Output static pages". (vs. call off the experiment now I know that I'm safe to assume "big resolutions").

(Naming software is hard. Recent stuff I've done has had an skx prefix primarily for google-juice. e.g. Randomly I notice that if you search for my personal site on Google's UK engine I come top. Cool.)

ObSubject: The Bourne Identity