January 6, 2008

Postgresql lc_collate aggravation

Posted in Linux, Open Source tagged , , , , , at 11:14 pm by ilanco

Implementing tearch2 on an existing postgres database : not as simple as it seems.

When trying to install tsearch2 into an existing database I stumbled upon a problem. It seems that tsearch2 takes its default locale from the lc_collate setting of the postgres cluster.

In my setup this was set to “en_US.ISO-8859-1”.

You can check your collation by sending this query to postgres :
show lc_collate;

This database was upgraded to 8.2 from an older version and the collation was set a long time ago when the database was created. Because of this setting tsearch2 refused to accept any characters outside this encoding, which was very annoying since we have our data utf-8 encoded.

The only solution was to create a new cluster with a UTF-8 collation and import all the data.

I hope postgres will support more dynamic collation settings in the future.