ibCollate

Collation Drivers for InterBase

 

Release 0.99 of 6 June 2002

 

ibCollate provides a set of supplemental collation (sorting) orders for InterBase v4, v5, and v6.5 / Firebird 1.0.  ibCollate collation drivers are based on the pre-supplied InterBase collation drivers – altered to ignore case distinctions and accent distinctions between characters.

 

 

NOTE:       InterBase version 6.0, 6.01 and Firebird Beta versions are NOT supported due to bugs in the InterBase codebase from 6.0 until corrected in 6.5 .

 

Changes since Beta 2 Release

 

1.      InterBase 6.5 & Firebird 1.0 supported.  Clunky way of supporting InterBase 6.0 and 6.01 removed.

2.      Accent ignoring collation added.

3.      Columns up to 130 characters in length can be indexed in NOCASE & NOACCENT collations.

4.      “Codepoint order” collations, such as ISO8849_1, are now also supported in NOCASE variants.

 

Changes since Beta 1 Release

 

  1. Resolved compatibility problems with InterBase v6.0 release.  This necessitated creating separate binaries for InterBase 6.0 vs. earlier InterBase versions.  This will be resolved in a future InterBase release. [This was resolved in InterBase 6.5 and Firebird 1.0]

 

  1. Corrected problems with the NOCASE versions of all PDW (Paradox Windows) and DB (dBase) compatible drivers.

 

  1. Corrected problems with the NOCASE variants of multi-character sequences.  For instance, the sequences “ae”, “Ae”, “aE” and “AE” were sorting as case-distinct.  In NOCASE sorting these sequences are now considered equal.

 

  1. Corrected problems with the ˙ (lowercase-y-umlaut) character in all Latin1 character set drivers.  lowercase-y-umlaut uppercases to uppercase-Y for most locales.  It now sorts equal to uppercase-Y in NOCASE drivers.

 

  1. Enhanced test suite.

 

Known Problems

 

  1. ISO88591_NOACCENT – and other “binary collation” NOACCENT drivers are not supported at this time.

 

2.   Only Windows platforms are currently supported.

 

Installation:

 

NOTE:  $INTERBASE is the installation directory, commonly
C:/Program Files/InterBase Corp/InterBase on Windows.

 

Windows 95/98/NT/2000 (InterBase versions 4.x, 5.x):

 

  1. Unzip ibCollate.zip.

 

  1. If you are re-installing, you will need to stop your InterBase server.

 

  1. Copy gdsintl2.dll to the same directory as the InterBase provided gdsintl.dll ($INTERBASE/bin for InterBase 4.x & 5.x) ($INTERBASE/intl for InterBase 6.5 and Firebird 1.0)

 

  1. If you stopped your server, restart the InterBase server.

 

  1. Test your installation with the intl2tst.sql script described below.

 

  1. For each database hosted on your server, use the script ibCollate.sql to declare the collation orders.

 

Testing your installation:

 

Test the installation by running the script intl2tst.sql.  This script creates a temporary database and accesses each driver supported by ibCollate.  Results can be compared to the baseline results in the test subdirectory.

 

You may need to edit the test script to change the userid/password from sysdba/masterkey – or to set the database location.

Declaring collation orders to the database:

 

For each database that will access the collation drivers, run the script ibCollate.sql (using ISQL or WISQL).  This script declares the collation drivers so they can be accessed by the database.

 

Uninstallation:

 

Windows (all InterBase / Firebird versions)

 

Delete the file gdsintl2.dll from the installed location (as above).  You may need to stop your InterBase Server for the delete to succeed.

 

For any databases where you declared ibCollate collations, execute the following:

Delete * from rdb$collations

where rdb$collation_name like ‘BETA%NOCASE’

     Or rdb$collation_name like ‘BETA%NOACCENT’;

Commit;

 

Compatibility:

 

This release is compatible with Windows 95/98/NT/2000.

This release is compatible with InterBase 4.x, 5.x and 6.5, and Firebird 1.0

 

This release has been tested on Windows 2000 with InterBase 6.5 and Firebird 1.0

 

Usage:

 

After installation, ibCollate drivers are used just like InterBase provided collations.  The InterBase engine automatically locates and loads the gdsintl2.dll when it is necessary.

 

Example:

 

create table example (

      Name     character(30) character set iso8859_1 collate DE_DE_NOCASE

      Address character(30) character set iso8849_1 );

 

These select statements automatically use DE_DE_NOCASE for the relational operation.

 

select * from example where Name = “McDonald”;

select * from example where Name > “Joe”;

 

This statement contains an explicit use of the collation.

 

select * from example where Address = “Main Street” collate DE_DE_NOCASE;

 

The ORDER BY clause also automatically follows case insensitive order:

 

select * from example order by name;

 

or

 

select * from example order by address collate DE_DE_NOCASE;

 

NOCASE drivers

 

For all NOCASE drivers, if upper(f1) = upper(f2) then f1 = f2.

 

Table T

Field

Á

a

á

b

A

 

SELECT Field FROM T WHERE Field = ‘A’ COLLATE EN_UK_NOCASE;

 

Returns ‘A’ and ‘a’

 

NOACCENT drivers

 

All NOACCENT drivers are also NOCASE.

 

SELECT Field FROM T WHERE Field = ‘A’ COLLATE EN_UK_NOACCENT;

 

Returns ‘A’, ‘Á’, ‘á’ and ‘a’

Notes:

 

Collation Drivers for the NeXTStep character set are not supported.

 

At this time, only Collation Drivers for Windows platforms are supported.  (Additional platforms can be supported if someone can give me telnet access to a suitable development machine with InterBase & C installed).

 

Backups of databases using the Collation Drivers in ibCollate can only be restored to machines which have ibCollate installed.

 

Each ibCollate driver has a higher collation ID than the based-upon driver.  This results in the InterBase engine preferring the ibCollate driver to the InterBase provided driver when there is an ambiguity in collation.  For example:

 

create table test (

    First  char(30) collate DE_DE,

    Last  char(30) collate DE_DE_NOCASE );

 

The following query has an ambiguity in which collation to use:

 

select * from test where First = Last;

 

InterBase will chose DE_DE_NOCASE as it has a higher collation ID.  To control the collation choice alter your SQL statement to specify the preferred collation order:

 

Force DE_DE_NOCASE:

… where First collate DE_DE_NOCASE = Last;

… where First collate DE_DE_NOCASE = Last collate DE_DE_NOCASE;

 

Force DE_DE:

… where First = Last collate DE_DE;

… where First collate DE_DE = Last collate DE_DE;

 

Force ISO8859_1:

… where First collate ISO8859_1 = Last collate ISO8859_1;

 

Support:

 

Please contact me at dave@BrookstoneSystems.com.

 

Future Projects

 

Future development work planned for ibCollate includes:

 

-         Porting to Linux and Solaris

-         Better key compression – allowing columns up to approx 150 characters to be indexed.

-         Variants of collations that treat SPACE as a primary difference, instead of a 4-th level difference

-         NOCASE variants of SJIS & other Asian character set collations

-         Support of LATIN2 & additional LATIN* character sets

-         Unicode collation orders

 

Priorities of these projects will be based on email feedback I receive!

Provided Drivers:

 

NOTE: 

During the Beta period, all collations are named BETA_<Base Name>_NOCASE or BETA_<Base Name>_NOACCENT.   After Beta concludes, the “BETA_” portion of the name will be dropped.  (This enables bugs to be fixed without me worrying about anyone depending on indices with these collations).

 

The following drivers are provided with ibCollate

 

DOS437_NOCASE

 

PDOX_ASCII_NOCASE

 

PDOX_INTL_NOCASE

Case Insensitive version of PDOX_INTL

PDOX_SWEDFIN_NOCASE

Case Insensitive version of PDOX_SWEDFIN

DB_DEU437_NOCASE

Case Insensitive version of DB_DEU437

DB_ESP437_NOCASE

Case Insensitive version of DB_ESP437

DB_FIN437_NOCASE

Case Insensitive version of DB_FIN437

DB_FRA437_NOCASE

Case Insensitive version of DB_FRA437

DB_ITA437_NOCASE

Case Insensitive version of DB_ITA437

DB_NLD437_NOCASE

Case Insensitive version of DB_NLD437

DB_SVE437_NOCASE

Case Insensitive version of DB_SVE437

DB_UK437_NOCASE

Case Insensitive version of DB_UK437

DB_US437_NOCASE

Case Insensitive version of DB_US437

DOS850_NOCASE

 

DB_FRC850_NOCASE

Case Insensitive version of DB_FRC850

DB_DEU850_NOCASE

Case Insensitive version of DB_DEU850

DB_ESP850_NOCASE

Case Insensitive version of DB_ESP850

DB_FRA850_NOCASE

Case Insensitive version of DB_FRA850

DB_ITA850_NOCASE

Case Insensitive version of DB_ITA850

DB_NLD850_NOCASE

Case Insensitive version of DB_NLD850

DB_PTB850_NOCASE

Case Insensitive version of DB_PTB850

DB_SVE850_NOCASE

Case Insensitive version of DB_SVE850

DB_UK850_NOCASE

Case Insensitive version of DB_UK850

DB_US850_NOCASE

Case Insensitive version of DB_US850

DOS865_NOCASE

 

PDOX_NORDAN4_NOCASE

Case Insensitive version of PDOX_NORDAN4

DB_DAN865_NOCASE

Case Insensitive version of DB_DAN865

DB_NOR865_NOCASE

Case Insensitive version of DB_NOR865

DOS860_NOCASE

 

DB_PTG860_NOCASE

Case Insensitive version of DB_PTG860

DOS863_NOCASE

 

DB_FRC863_NOCASE

Case Insensitive version of DB_FRC863

ISO8859_1_NOCASE

 

DA_DA_NOCASE

Case Insensitive version of DA_DA

DU_NL_NOCASE

Case Insensitive version of DU_NL

FI_FI_NOCASE

Case Insensitive version of FI_FI

FR_FR_NOCASE

Case Insensitive version of FR_FR

FR_CA_NOCASE

Case Insensitive version of FR_CA

DE_DE_NOCASE

Case Insensitive version of DE_DE

IS_IS_NOCASE

Case Insensitive version of IS_IS

IT_IT_NOCASE

Case Insensitive version of IT_IT

NO_NO_NOCASE

Case Insensitive version of NO_NO

ES_ES_NOCASE

Case Insensitive version of ES_ES

SV_SV_NOCASE

Case Insensitive version of SV_SV

EN_UK_NOCASE

Case Insensitive version of EN_UK

EN_US_NOCASE

Case Insensitive version of EN_US

PT_PT_NOCASE

Case Insensitive version of PT_PT

DOS852_NOCASE

 

DB_CSY_NOCASE

Case Insensitive version of DB_CSY

DB_PLK_NOCASE

Case Insensitive version of DB_PLK

DB_SLO_NOCASE

Case Insensitive version of DB_SLO

PDOX_CSY_NOCASE

Case Insensitive version of PDOX_CSY

PDOX_PLK_NOCASE

Case Insensitive version of PDOX_PLK

PDOX_HUN_NOCASE

Case Insensitive version of PDOX_HUN

PDOX_SLO_NOCASE

Case Insensitive version of PDOX_SLO

DOS857_NOCASE

 

DB_TRK_NOCASE

Case Insensitive version of DB_TRK

DOS861_NOCASE

 

PDOX_ISL_NOCASE

Case Insensitive version of PDOX_ISL

CYRL_NOCASE

 

DB_RUS_NOCASE

Case Insensitive version of DB_RUS

PDOX_CYRL_NOCASE

Case Insensitive version of PDOX_CYRL

WIN1250_NOCASE

 

PXW_CSY_NOCASE

Case Insensitive version of PXW_CSY

PXW_HUNDC_NOCASE

Case Insensitive version of PXW_HUNDC

PXW_PLK_NOCASE

Case Insensitive version of PXW_PLK

PXW_SLOV_NOCASE

Case Insensitive version of PXW_SLOV

WIN1251_NOCASE

 

PXW_CYRL_NOCASE

Case Insensitive version of PXW_CYRL

WIN1252_NOCASE

 

PXW_INTL_NOCASE

Case Insensitive version of PXW_INTL

PXW_INTL850_NOCASE

Case Insensitive version of PXW_INTL850

PXW_NORDAN4_NOCASE

Case Insensitive version of PXW_NORDAN4

PXW_SPAN_NOCASE

Case Insensitive version of PXW_SPAN

PXW_SWEDFIN_NOCASE

Case Insensitive version of PXW_SWEDFIN

WIN1253_NOCASE

 

PXW_GREEK_NOCASE

Case Insensitive version of PXW_GREEK

WIN1254_NOCASE

 

PXW_TURK_NOCASE

Case Insensitive version of PXW_TURK

 

Also NOACCENT versions of each driver above, except drivers where the base driver name is the same as a character set name. (I’m too lazy to update the table).