TOP をテンプレートにして作成
ホーム
バックアップ
一覧
検索
最終更新
ヘルプ
ログイン
開始行
International Language Support (INTL)
Adriano dos Santos Fernandes
Table of Contents
New INTL Interface for Non-ASCII Character Sets
Metadata Text Conversion
Supported Character Sets
This chapter describes the new international language sup...
New INTL Interface for Non-ASCII Character Sets
A. dos Santos Fernandes
Architecture
Enhancements
INTL Plug-ins
New Character Sets/Collations
Developments in V.2.1
ICU Character Sets
The UNICODE Collations
Specific Attributes for Collations
Collation Changes in V.2.1
Originally described by N. Samofatov, Firebird 2's new in...
Architecture
Firebird allows character sets and collations to be decla...
At attachment time you normally specify the character set...
Two special character sets, NONE and OCTETS, can be used ...
With other character sets, conversion is performed as CHA...
With NONE/OCTETS the bytes are just copied: NONE/OCTETS->...
Enhancements
Enhancements that the new system brings include:
Well-formedness checks
Some character sets (especially multi-byte) do not accept...
Uppercasing
In Firebird 1.5.x, only the ASCII-equivalent characters a...
For example,
isql -q -ch dos850
SQL> create database 'test.fdb';
SQL> create table t (c char(1) character set dos850);
SQL> insert into t values ('a');
SQL> insert into t values ('e');
SQL> insert into t values ('á');
SQL> insert into t values ('é');
SQL>
SQL> select c, upper(c) from t;
C UPPER
====== ======
a A
e E
á á
é é
In Firebird 2 the result is:
C UPPER
====== ======
a A
e E
á Á
é É
Maximum String Length
In v.1.5.x the engine does not verify the logical length ...
This has been retained for compatibility for legacy chara...
sqlsubtype and Attachment Character Set
When the character set of a CHAR or VARCHAR column is any...
Enhancements for BLOBs
Several character set-related enhancements have been adde...
COLLATE clauses for BLOBs
A DML COLLATE clause is now allowed with BLOBs.
Example
select blob_column from table
where blob_column collate unicode = 'foo';
Full equality comparisons between BLOBs
Comparison can be performed on the entire content of a te...
Character set conversion for BLOBs
Conversion between character sets is now possible when as...
INTL Plug-ins
Character sets and collations are installed using a manif...
The manifest file should be put in the $rootdir/intl with...
The file /intl/fbintl.conf is an example of a manifest fi...
<intl_module fbintl>
filename $(this)/fbintl
</intl_module>
<charset ISO8859_1>
intl_module fbintl
collation ISO8859_1
collation DA_DA
collation DE_DE
collation EN_UK
collation EN_US
collation ES_ES
collation PT_BR
collation PT_PT
</charset>
<charset WIN1250>
intl_module fbintl
collation WIN1250
collation PXW_CSY
collation PXW_HUN
collation PXW_HUNDC
</charset>
Note
The symbol $(this) is used to indicate the same directory...
New Character Sets/Collations
Two character sets introduced in Firebird 2 will be of pa...
UTF8 character set
The UNICODE_FSS character set has a number of problems: i...
Now, UTF8 is a new character set, without the inherent pr...
UNICODE collations (for UTF8)
UCS_BASIC works identically to UTF8 with no collation spe...
Sort order sample:
isql -q -ch dos850
SQL> create database 'test.fdb';
SQL> create table t (c char(1) character set utf8);
SQL> insert into t values ('a');
SQL> insert into t values ('A');
SQL> insert into t values ('á');
SQL> insert into t values ('b');
SQL> insert into t values ('B');
SQL> select * from t order by c collate ucs_basic;
C
======
A
B
a
b
á
SQL> select * from t order by c collate unicode;
C
======
a
A
á
b
B
Developments in V.2.1
The 2.1 release sees further capabilities implemented for
using ICU charsets through fbintl
UNICODE collation (charset_UNICODE) being available for a...
using collation attributes
CREATE/DROP COLLATION statements
SHOW COLLATION and collation extraction in ISQL
Verifying that text blobs are well-formed
Transliterating text blobs automatically
ICU Character Sets
All non-wide and ASCII-based character sets present in IC...
If the character set you need is not included, you can re...
Registering an ICU Character Set Module
To use an alternative character set module, you need to r...
in the server's language configuration file, intl/fbintl....
in each database that is going to use it
Registering a Character Set on the Server
Using a text editor, register the module in intl/fbintl.c...
<charset NAME>
intl_module fbintl
collation NAME [REAL-NAME]
</charset>
Registering a Character Set in a Database
To register the module in a database, run the procedure s...
Using the Stored Procedure
A Sample
Here is the sample declaration in fbintl.conf:
<charset GB>
intl_module fbintl
collation GB GB18030
</charset>
The stored procedure takes two arguments: a string that i...
execute procedure sp_register_character_set ('GB', 4);
The CREATE COLLATION Statement
Syntax for CREATE COLLATION
CREATE COLLATION <name>
FOR <charset>
[ FROM <base> | FROM EXTERNAL ('<name>') ]
[ NO PAD | PAD SPACE ]
[ CASE SENSITIVE | CASE INSENSITIVE ]
[ ACCENT SENSITIVE | ACCENT INSENSITIVE ]
[ '<specific-attributes>' ]
Note
Specific attributes should be separated by semicolon and ...
Examples
/* 1 */
CREATE COLLATION UNICODE_ENUS_CI
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=en_US';
/* 2 */
CREATE COLLATION NEW_COLLATION
FOR WIN1252
PAD SPACE;
/* NEW_COLLATION should be declared in .conf file
in the $root/intl directory */
The UNICODE Collations
The UNICODE collations (case sensitive and case insensiti...
Naming Conventions
The naming convention you should use is charset_collation...
create collation win1252_unicode
for win1252;
create collation win1252_unicode_ci
for win1252
from win1252_unicode
case insensitive;
Note
The character set name should be as in fbintl.conf (i.e. ...
Specific Attributes for Collations
Note
Some attributes may not work with some collations, even t...
DISABLE-COMPRESSIONS
Disable compressions (aka contractions) changing the orde...
Valid for collations of narrow character sets.
Format: DISABLE-COMPRESSIONS={0 | 1}
Example
DISABLE-COMPRESSIONS=1
DISABLE-EXPANSIONS
Disable expansions changing the order of a character to s...
Valid for collations of narrow character sets.
Format: DISABLE-EXPANSIONS={0 | 1}
Example
DISABLE-EXPANSIONS=1
ICU-VERSION
Specify what version of ICU library will be used. Valid v...
Valid for UNICODE and UNICODE_CI.
Format: ICU-VERSION={default | major.minor}
Example
ICU-VERSION=3.0
LOCALE
Specify the collation locale.
Valid for UNICODE and UNICODE_CI. Requires complete versi...
Format: LOCALE=xx_XX
Example
LOCALE=en_US
MULTI-LEVEL
Uses more than one level for ordering purposes.
Valid for collations of narrow character sets.
Format: MULTI-LEVEL={0 | 1}
Example
MULTI-LEVEL=1
SPECIALS-FIRST
Order special characters (spaces, symbols, etc) before al...
Valid for collations of narrow character sets.
Format: SPECIALS-FIRST={0 | 1}
Example
SPECIALS-FIRST=1
Collation Changes in V.2.1
Spanish
ES_ES (as well as the new ES_ES_CI_AI) collation automati...
Note
The attributes are stored at database creation time, so t...
The ES_ES_CI_AI collation was standardised to current usage.
UTF-8
Case-insensitive collation for UTF-8. See feature request...
Metadata Text Conversion
Repairing Your Metadata Text
Firebird versions 2.0.x had two problems related to chara...
When creating or altering objects, text associated with m...
The types of text affected were PSQL sources, description...
Note
Even in the current version (2.1.x) the problem can still...
In reads from text BLOBs, transliteration from the BLOB c...
Repairing Your Metadata Text
If your metadata text was created with non-ASCII encoding...
Important
The procedure involves multiple passes through the databa...
The database should already have been converted to ODS11....
Before doing anything, make a copy of the database.
In the examples that follow, the string $fbroot$ represen...
Create the procedures in the database
[1] isql /path/to/your/database.fdb
[2] SQL> input '$fbroot$/misc/upgrade/metadata/metadata_c...
Check your database
[1] isql /path/to/your/database.fdb
[2] SQL> select * from rdb$check_metadata;
The rdb$check_metadata procedure will return all objects ...
If no exception is raised, your metadata is OK and you ca...
Otherwise, the first bad object is the last one listed be...
Fixing the metadata
To fix the metadata, you need to know in what character s...
[1] isql /path/to/your/database.fdb
[2] SQL> input '$fbroot$/misc/upgrade/metatdata/metadata_...
[3] SQL> select * from rdb$fix_metadata('WIN1252'); -- r...
[4] SQL> commit;
The rdb$fix_metadata procedure will return the same data ...
Important
It should be run once!
After this, you can remove the upgrade procedures.
Remove the upgrade procedures
[1] isql /path/to/your/database.fdb
[2] SQL> input '$fbroot$/misc/upgrade/metadata/metadata_c...
Supported Character Sets
See Appendix B at the end of these notes, for a full list...
最終行:
International Language Support (INTL)
Adriano dos Santos Fernandes
Table of Contents
New INTL Interface for Non-ASCII Character Sets
Metadata Text Conversion
Supported Character Sets
This chapter describes the new international language sup...
New INTL Interface for Non-ASCII Character Sets
A. dos Santos Fernandes
Architecture
Enhancements
INTL Plug-ins
New Character Sets/Collations
Developments in V.2.1
ICU Character Sets
The UNICODE Collations
Specific Attributes for Collations
Collation Changes in V.2.1
Originally described by N. Samofatov, Firebird 2's new in...
Architecture
Firebird allows character sets and collations to be decla...
At attachment time you normally specify the character set...
Two special character sets, NONE and OCTETS, can be used ...
With other character sets, conversion is performed as CHA...
With NONE/OCTETS the bytes are just copied: NONE/OCTETS->...
Enhancements
Enhancements that the new system brings include:
Well-formedness checks
Some character sets (especially multi-byte) do not accept...
Uppercasing
In Firebird 1.5.x, only the ASCII-equivalent characters a...
For example,
isql -q -ch dos850
SQL> create database 'test.fdb';
SQL> create table t (c char(1) character set dos850);
SQL> insert into t values ('a');
SQL> insert into t values ('e');
SQL> insert into t values ('á');
SQL> insert into t values ('é');
SQL>
SQL> select c, upper(c) from t;
C UPPER
====== ======
a A
e E
á á
é é
In Firebird 2 the result is:
C UPPER
====== ======
a A
e E
á Á
é É
Maximum String Length
In v.1.5.x the engine does not verify the logical length ...
This has been retained for compatibility for legacy chara...
sqlsubtype and Attachment Character Set
When the character set of a CHAR or VARCHAR column is any...
Enhancements for BLOBs
Several character set-related enhancements have been adde...
COLLATE clauses for BLOBs
A DML COLLATE clause is now allowed with BLOBs.
Example
select blob_column from table
where blob_column collate unicode = 'foo';
Full equality comparisons between BLOBs
Comparison can be performed on the entire content of a te...
Character set conversion for BLOBs
Conversion between character sets is now possible when as...
INTL Plug-ins
Character sets and collations are installed using a manif...
The manifest file should be put in the $rootdir/intl with...
The file /intl/fbintl.conf is an example of a manifest fi...
<intl_module fbintl>
filename $(this)/fbintl
</intl_module>
<charset ISO8859_1>
intl_module fbintl
collation ISO8859_1
collation DA_DA
collation DE_DE
collation EN_UK
collation EN_US
collation ES_ES
collation PT_BR
collation PT_PT
</charset>
<charset WIN1250>
intl_module fbintl
collation WIN1250
collation PXW_CSY
collation PXW_HUN
collation PXW_HUNDC
</charset>
Note
The symbol $(this) is used to indicate the same directory...
New Character Sets/Collations
Two character sets introduced in Firebird 2 will be of pa...
UTF8 character set
The UNICODE_FSS character set has a number of problems: i...
Now, UTF8 is a new character set, without the inherent pr...
UNICODE collations (for UTF8)
UCS_BASIC works identically to UTF8 with no collation spe...
Sort order sample:
isql -q -ch dos850
SQL> create database 'test.fdb';
SQL> create table t (c char(1) character set utf8);
SQL> insert into t values ('a');
SQL> insert into t values ('A');
SQL> insert into t values ('á');
SQL> insert into t values ('b');
SQL> insert into t values ('B');
SQL> select * from t order by c collate ucs_basic;
C
======
A
B
a
b
á
SQL> select * from t order by c collate unicode;
C
======
a
A
á
b
B
Developments in V.2.1
The 2.1 release sees further capabilities implemented for
using ICU charsets through fbintl
UNICODE collation (charset_UNICODE) being available for a...
using collation attributes
CREATE/DROP COLLATION statements
SHOW COLLATION and collation extraction in ISQL
Verifying that text blobs are well-formed
Transliterating text blobs automatically
ICU Character Sets
All non-wide and ASCII-based character sets present in IC...
If the character set you need is not included, you can re...
Registering an ICU Character Set Module
To use an alternative character set module, you need to r...
in the server's language configuration file, intl/fbintl....
in each database that is going to use it
Registering a Character Set on the Server
Using a text editor, register the module in intl/fbintl.c...
<charset NAME>
intl_module fbintl
collation NAME [REAL-NAME]
</charset>
Registering a Character Set in a Database
To register the module in a database, run the procedure s...
Using the Stored Procedure
A Sample
Here is the sample declaration in fbintl.conf:
<charset GB>
intl_module fbintl
collation GB GB18030
</charset>
The stored procedure takes two arguments: a string that i...
execute procedure sp_register_character_set ('GB', 4);
The CREATE COLLATION Statement
Syntax for CREATE COLLATION
CREATE COLLATION <name>
FOR <charset>
[ FROM <base> | FROM EXTERNAL ('<name>') ]
[ NO PAD | PAD SPACE ]
[ CASE SENSITIVE | CASE INSENSITIVE ]
[ ACCENT SENSITIVE | ACCENT INSENSITIVE ]
[ '<specific-attributes>' ]
Note
Specific attributes should be separated by semicolon and ...
Examples
/* 1 */
CREATE COLLATION UNICODE_ENUS_CI
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=en_US';
/* 2 */
CREATE COLLATION NEW_COLLATION
FOR WIN1252
PAD SPACE;
/* NEW_COLLATION should be declared in .conf file
in the $root/intl directory */
The UNICODE Collations
The UNICODE collations (case sensitive and case insensiti...
Naming Conventions
The naming convention you should use is charset_collation...
create collation win1252_unicode
for win1252;
create collation win1252_unicode_ci
for win1252
from win1252_unicode
case insensitive;
Note
The character set name should be as in fbintl.conf (i.e. ...
Specific Attributes for Collations
Note
Some attributes may not work with some collations, even t...
DISABLE-COMPRESSIONS
Disable compressions (aka contractions) changing the orde...
Valid for collations of narrow character sets.
Format: DISABLE-COMPRESSIONS={0 | 1}
Example
DISABLE-COMPRESSIONS=1
DISABLE-EXPANSIONS
Disable expansions changing the order of a character to s...
Valid for collations of narrow character sets.
Format: DISABLE-EXPANSIONS={0 | 1}
Example
DISABLE-EXPANSIONS=1
ICU-VERSION
Specify what version of ICU library will be used. Valid v...
Valid for UNICODE and UNICODE_CI.
Format: ICU-VERSION={default | major.minor}
Example
ICU-VERSION=3.0
LOCALE
Specify the collation locale.
Valid for UNICODE and UNICODE_CI. Requires complete versi...
Format: LOCALE=xx_XX
Example
LOCALE=en_US
MULTI-LEVEL
Uses more than one level for ordering purposes.
Valid for collations of narrow character sets.
Format: MULTI-LEVEL={0 | 1}
Example
MULTI-LEVEL=1
SPECIALS-FIRST
Order special characters (spaces, symbols, etc) before al...
Valid for collations of narrow character sets.
Format: SPECIALS-FIRST={0 | 1}
Example
SPECIALS-FIRST=1
Collation Changes in V.2.1
Spanish
ES_ES (as well as the new ES_ES_CI_AI) collation automati...
Note
The attributes are stored at database creation time, so t...
The ES_ES_CI_AI collation was standardised to current usage.
UTF-8
Case-insensitive collation for UTF-8. See feature request...
Metadata Text Conversion
Repairing Your Metadata Text
Firebird versions 2.0.x had two problems related to chara...
When creating or altering objects, text associated with m...
The types of text affected were PSQL sources, description...
Note
Even in the current version (2.1.x) the problem can still...
In reads from text BLOBs, transliteration from the BLOB c...
Repairing Your Metadata Text
If your metadata text was created with non-ASCII encoding...
Important
The procedure involves multiple passes through the databa...
The database should already have been converted to ODS11....
Before doing anything, make a copy of the database.
In the examples that follow, the string $fbroot$ represen...
Create the procedures in the database
[1] isql /path/to/your/database.fdb
[2] SQL> input '$fbroot$/misc/upgrade/metadata/metadata_c...
Check your database
[1] isql /path/to/your/database.fdb
[2] SQL> select * from rdb$check_metadata;
The rdb$check_metadata procedure will return all objects ...
If no exception is raised, your metadata is OK and you ca...
Otherwise, the first bad object is the last one listed be...
Fixing the metadata
To fix the metadata, you need to know in what character s...
[1] isql /path/to/your/database.fdb
[2] SQL> input '$fbroot$/misc/upgrade/metatdata/metadata_...
[3] SQL> select * from rdb$fix_metadata('WIN1252'); -- r...
[4] SQL> commit;
The rdb$fix_metadata procedure will return the same data ...
Important
It should be run once!
After this, you can remove the upgrade procedures.
Remove the upgrade procedures
[1] isql /path/to/your/database.fdb
[2] SQL> input '$fbroot$/misc/upgrade/metadata/metadata_c...
Supported Character Sets
See Appendix B at the end of these notes, for a full list...
ページ名:
新規
名前変更
ホーム
一覧
検索
最終更新
バックアップ
ヘルプ
最終更新のRSS