All being well the Debian Administration website now fully supports UTF-8.
This change was a long time coming, considering the amount of time the site has been live.
Most of the changes have been present for a while:
- Correctly setting the database to store UTF-8 internally, rather than latin1.
- Correctly setting the charset of the generated pages.
The only missing part was ensuring the at the text input by visitors/users was correctly decoded and treated as UTF-8. This was handled by updating changing the Perl CGI module to explicitly call charset appropriately.
Since the code behind the site masks the database, memcached, and CGI handles behind singletons the change itself was pretty trivial:
I made more changes this evening to tie it all together, and to ensure that my Database connection is always forced to use UTF but I think that wasn't so important.
I hope this is vaguely useful the next time I have to fight with character sets & encodings. It is just all so nasty. Failing that these pages are vaguely useful:
ObFilm: Run Lola Run