Collation Drivers for InterBase
Release 0.99 of 6 June 2002
ibCollate provides a set of supplemental collation (sorting) orders for InterBase v4, v5, and v6.5 / Firebird 1.0. ibCollate collation drivers are based on the pre-supplied InterBase collation drivers – altered to ignore case distinctions and accent distinctions between characters.
NOTE: InterBase version 6.0, 6.01 and Firebird Beta versions are NOT supported due to bugs in the InterBase codebase from 6.0 until corrected in 6.5 .
Changes since Beta 2 Release
1. InterBase 6.5 & Firebird 1.0 supported. Clunky way of supporting InterBase 6.0 and 6.01 removed.
2. Accent ignoring collation added.
3. Columns up to 130 characters in length can be indexed in NOCASE & NOACCENT collations.
4. “Codepoint order” collations, such as ISO8849_1, are now also supported in NOCASE variants.
2. Only Windows platforms are currently supported.
NOTE: $INTERBASE is the installation directory,
commonly
C:/Program Files/InterBase Corp/InterBase on Windows.
Test the installation by running the script intl2tst.sql. This script creates a temporary database and accesses each driver supported by ibCollate. Results can be compared to the baseline results in the test subdirectory.
You may need to edit the test script to change the userid/password from sysdba/masterkey – or to set the database location.
For each database that will access the collation drivers, run the script ibCollate.sql (using ISQL or WISQL). This script declares the collation drivers so they can be accessed by the database.
Delete the file gdsintl2.dll from the installed location (as above). You may need to stop your InterBase Server for the delete to succeed.
For any databases where you declared ibCollate collations, execute the following:
Delete * from rdb$collations
where rdb$collation_name like ‘BETA%NOCASE’
Or rdb$collation_name like ‘BETA%NOACCENT’;
Commit;
This release is compatible with Windows 95/98/NT/2000.
This release is compatible with InterBase 4.x, 5.x and 6.5, and Firebird 1.0
This release has been tested on Windows 2000 with InterBase 6.5 and Firebird 1.0
After installation, ibCollate drivers are used just like InterBase provided collations. The InterBase engine automatically locates and loads the gdsintl2.dll when it is necessary.
create table example (
Name character(30) character set iso8859_1 collate DE_DE_NOCASE
Address character(30) character set iso8849_1 );
These select statements automatically use DE_DE_NOCASE for the relational operation.
select * from example where Name = “McDonald”;
select * from example where Name > “Joe”;
This statement contains an explicit use of the collation.
select * from example where Address = “Main Street” collate DE_DE_NOCASE;
The ORDER BY clause also automatically follows case insensitive order:
select * from example order by name;
or
select * from example order by address collate DE_DE_NOCASE;
For all NOCASE drivers, if upper(f1) = upper(f2) then f1 = f2.
Table T
|
Field |
|
Á |
|
a |
|
á |
|
b |
|
A |
SELECT Field FROM T WHERE Field = ‘A’ COLLATE EN_UK_NOCASE;
Returns ‘A’ and ‘a’
All NOACCENT drivers are also NOCASE.
SELECT Field FROM T WHERE Field = ‘A’ COLLATE EN_UK_NOACCENT;
Returns ‘A’, ‘Á’, ‘á’ and ‘a’
Collation Drivers for the NeXTStep character set are not supported.
At this time, only Collation Drivers for Windows platforms are supported. (Additional platforms can be supported if someone can give me telnet access to a suitable development machine with InterBase & C installed).
Backups of databases using the Collation Drivers in ibCollate can only be restored to machines which have ibCollate installed.
Each ibCollate driver has a higher collation ID than the based-upon driver. This results in the InterBase engine preferring the ibCollate driver to the InterBase provided driver when there is an ambiguity in collation. For example:
create table test (
First char(30) collate DE_DE,
Last char(30) collate DE_DE_NOCASE );
The following query has an ambiguity in which collation to use:
select * from test where First = Last;
InterBase will chose DE_DE_NOCASE as it has a higher collation ID. To control the collation choice alter your SQL statement to specify the preferred collation order:
Force DE_DE_NOCASE:
… where First collate DE_DE_NOCASE = Last;
… where First collate DE_DE_NOCASE = Last collate DE_DE_NOCASE;
Force DE_DE:
… where First = Last collate DE_DE;
… where First collate DE_DE = Last collate DE_DE;
Force ISO8859_1:
… where First collate ISO8859_1 = Last collate ISO8859_1;
Please contact me at dave@BrookstoneSystems.com.
Future development work planned for ibCollate includes:
- Porting to Linux and Solaris
- Better key compression – allowing columns up to approx 150 characters to be indexed.
- Variants of collations that treat SPACE as a primary difference, instead of a 4-th level difference
- NOCASE variants of SJIS & other Asian character set collations
- Support of LATIN2 & additional LATIN* character sets
- Unicode collation orders
Priorities of these projects will be based on email feedback I receive!
NOTE:
During the Beta period, all collations are named BETA_<Base Name>_NOCASE or BETA_<Base Name>_NOACCENT. After Beta concludes, the “BETA_” portion of the name will be dropped. (This enables bugs to be fixed without me worrying about anyone depending on indices with these collations).
The following drivers are provided with ibCollate
|
DOS437_NOCASE |
|
|
PDOX_ASCII_NOCASE |
|
|
PDOX_INTL_NOCASE |
Case Insensitive version of PDOX_INTL |
|
PDOX_SWEDFIN_NOCASE |
Case Insensitive version of PDOX_SWEDFIN |
|
DB_DEU437_NOCASE |
Case Insensitive version of DB_DEU437 |
|
DB_ESP437_NOCASE |
Case Insensitive version of DB_ESP437 |
|
DB_FIN437_NOCASE |
Case Insensitive version of DB_FIN437 |
|
DB_FRA437_NOCASE |
Case Insensitive version of DB_FRA437 |
|
DB_ITA437_NOCASE |
Case Insensitive version of DB_ITA437 |
|
DB_NLD437_NOCASE |
Case Insensitive version of DB_NLD437 |
|
DB_SVE437_NOCASE |
Case Insensitive version of DB_SVE437 |
|
DB_UK437_NOCASE |
Case Insensitive version of DB_UK437 |
|
DB_US437_NOCASE |
Case Insensitive version of DB_US437 |
|
DOS850_NOCASE |
|
|
DB_FRC850_NOCASE |
Case Insensitive version of DB_FRC850 |
|
DB_DEU850_NOCASE |
Case Insensitive version of DB_DEU850 |
|
DB_ESP850_NOCASE |
Case Insensitive version of DB_ESP850 |
|
DB_FRA850_NOCASE |
Case Insensitive version of DB_FRA850 |
|
DB_ITA850_NOCASE |
Case Insensitive version of DB_ITA850 |
|
DB_NLD850_NOCASE |
Case Insensitive version of DB_NLD850 |
|
DB_PTB850_NOCASE |
Case Insensitive version of DB_PTB850 |
|
DB_SVE850_NOCASE |
Case Insensitive version of DB_SVE850 |
|
DB_UK850_NOCASE |
Case Insensitive version of DB_UK850 |
|
DB_US850_NOCASE |
Case Insensitive version of DB_US850 |
|
DOS865_NOCASE |
|
|
PDOX_NORDAN4_NOCASE |
Case Insensitive version of PDOX_NORDAN4 |
|
DB_DAN865_NOCASE |
Case Insensitive version of DB_DAN865 |
|
DB_NOR865_NOCASE |
Case Insensitive version of DB_NOR865 |
|
DOS860_NOCASE |
|
|
DB_PTG860_NOCASE |
Case Insensitive version of DB_PTG860 |
|
DOS863_NOCASE |
|
|
DB_FRC863_NOCASE |
Case Insensitive version of DB_FRC863 |
|
ISO8859_1_NOCASE |
|
|
DA_DA_NOCASE |
Case Insensitive version of DA_DA |
|
DU_NL_NOCASE |
Case Insensitive version of DU_NL |
|
FI_FI_NOCASE |
Case Insensitive version of FI_FI |
|
FR_FR_NOCASE |
Case Insensitive version of FR_FR |
|
FR_CA_NOCASE |
Case Insensitive version of FR_CA |
|
DE_DE_NOCASE |
Case Insensitive version of DE_DE |
|
IS_IS_NOCASE |
Case Insensitive version of IS_IS |
|
IT_IT_NOCASE |
Case Insensitive version of IT_IT |
|
NO_NO_NOCASE |
Case Insensitive version of NO_NO |
|
ES_ES_NOCASE |
Case Insensitive version of ES_ES |
|
SV_SV_NOCASE |
Case Insensitive version of SV_SV |
|
EN_UK_NOCASE |
Case Insensitive version of EN_UK |
|
EN_US_NOCASE |
Case Insensitive version of EN_US |
|
PT_PT_NOCASE |
Case Insensitive version of PT_PT |
|
DOS852_NOCASE |
|
|
DB_CSY_NOCASE |
Case Insensitive version of DB_CSY |
|
DB_PLK_NOCASE |
Case Insensitive version of DB_PLK |
|
DB_SLO_NOCASE |
Case Insensitive version of DB_SLO |
|
PDOX_CSY_NOCASE |
Case Insensitive version of PDOX_CSY |
|
PDOX_PLK_NOCASE |
Case Insensitive version of PDOX_PLK |
|
PDOX_HUN_NOCASE |
Case Insensitive version of PDOX_HUN |
|
PDOX_SLO_NOCASE |
Case Insensitive version of PDOX_SLO |
|
DOS857_NOCASE |
|
|
DB_TRK_NOCASE |
Case Insensitive version of DB_TRK |
|
DOS861_NOCASE |
|
|
PDOX_ISL_NOCASE |
Case Insensitive version of PDOX_ISL |
|
CYRL_NOCASE |
|
|
DB_RUS_NOCASE |
Case Insensitive version of DB_RUS |
|
PDOX_CYRL_NOCASE |
Case Insensitive version of PDOX_CYRL |
|
WIN1250_NOCASE |
|
|
PXW_CSY_NOCASE |
Case Insensitive version of PXW_CSY |
|
PXW_HUNDC_NOCASE |
Case Insensitive version of PXW_HUNDC |
|
PXW_PLK_NOCASE |
Case Insensitive version of PXW_PLK |
|
PXW_SLOV_NOCASE |
Case Insensitive version of PXW_SLOV |
|
WIN1251_NOCASE |
|
|
PXW_CYRL_NOCASE |
Case Insensitive version of PXW_CYRL |
|
WIN1252_NOCASE |
|
|
PXW_INTL_NOCASE |
Case Insensitive version of PXW_INTL |
|
PXW_INTL850_NOCASE |
Case Insensitive version of PXW_INTL850 |
|
PXW_NORDAN4_NOCASE |
Case Insensitive version of PXW_NORDAN4 |
|
PXW_SPAN_NOCASE |
Case Insensitive version of PXW_SPAN |
|
PXW_SWEDFIN_NOCASE |
Case Insensitive version of PXW_SWEDFIN |
|
WIN1253_NOCASE |
|
|
PXW_GREEK_NOCASE |
Case Insensitive version of PXW_GREEK |
|
WIN1254_NOCASE |
|
|
PXW_TURK_NOCASE |
Case Insensitive version of PXW_TURK |
Also NOACCENT versions of each driver above, except drivers where the base driver name is the same as a character set name. (I’m too lazy to update the table).