Character encoding on Transactd
To handle the character code correctly, you need to specify the character code that is defined in the database and the one that is used in client program.
If you use Japanese in Transactd, MySQL/MariaDB variable character-set-server
is must be one of utf8
, utf8mb4
or cp932
.
Character code kind in the database with Transactd
There are three character code kind in the database with Transactd.
- Character code of field value
- Character code of field names and table names
- Character code of meta-information other than the above 2 (ex: database names)
(1) Character code of field value
The Character code of field value can be confirmed by SHOW CREATE TABLE
SQL statement.
It also can be confirmed by fielddef::charsetIndex()
with Transactd.
This is determined by the design of the table.
(2) Character code of field names and table names
This is the character code of the schema table for Transactd.
It can be confirmed by tabledef::schemaCodePage
.
If you automatically generate a schema table, utf8
is used.
If you created the schema table on your own,
it is any character code which was specified by tabledef::schemaCodePage
.
(3) Character code of meta-information other than the above 2
The character code of meta-information other than (2) is utf8
.
The default character code which is used by the client program
If the character code that is used in the client program is not known, Transactd API can not encode results correctly.
To specify character code, use nsdatabase::setExecCodePage(unsigned int codepage)
.
The default value of it is CP_ACP
on Windows, and CP_UTF8
on others.
In the UNICODE version of the library on Windows, the character code for the client program is fixed in utf16
.
The value specified in the nsdatabase::setExecCodePage
will be ignored.
Using API
(1) get/set field value
The field value will be converted automatically from nsdatabase::execCodePage()
to
fielddef::charsetIndex()
, or reverse of it.
For example, data
parameter in table::setFV(short index, const char* data)
is it.
If the two character codes is same, conversion does not occur.
In the UNICODE version of the library on Windows, use the utf16
.
The field value will be converted automatically from utf16
to
fielddef::charsetIndex()
, or reverse of it.
(2) get/set field names and table names
The table names and field names will be searched or collated with value which is stored in the schema table for Transactd.
For example, name
parameter in table::setFV(const char* name, const char* data)
is it.
If you generated schema table automatically, specify the string which has been encoded in utf8
.
If you created the schema table on your own, you can specify any code page in tabledef::schemaCodePage()
.
In this case, use the string which has been encoded in the code page
which has been specified with tabledef::schemaCodePage()
for table names and field names.
In the UNICODE version of the library on Windows, use the utf16
.
The table names and field names will be converted automatically from utf16
to
tabledef::schemaCodePage()
, or reverse of it.
(3) get/set meta-information other than the above 2
Use utf8
for character code of meta-information other than field names and table names. Such as database names.
For example, uri
parameter in database::open()
is it.
These information will be passed to MySQL/MariaDB internal functions through Transactd Plugin.
These functions use utf8
.
In the UNICODE version of the library on Windows, use the utf16
.
The field value will be converted automatically from utf16
to utf8
, or reverse of it.
Easy settings on each environment
(a) C++ / PHP / Ruby on Linux, Mac OS X
(1) Write the program in utf8
.
nsdatabase::execCodePage()
is utf8
by default. You do not have to specify it.
(2) Even if you create the table schema on your own, please use utf8
.
tabledef::schemaCodePage
is utf8
by default. You do not have to specify it.
(3) Use utf8
in meta-information other than (2).
Use utf8
in all, field value, field names, table names and database names.
But, if the field value has been encoded in multibyte character code other than cp932
utf16
and utf8
, please see more details in details of character code converting and details of DDL operation.
(b) C++ / COM(JScript, C#) on Windows
Use the UNICODE version of the library with C++. The library for COM is UNICODE version by default.
Write program with utf16
or cp932
, either is fine.
Set compiler option to use UNICODE character set in C++.
The UNICODE version of the library convert character code automatically, in all of (1) (2) (3). This is easiest way.
(c) PHP / Ruby on Windows
(1) Write the program in utf8
.
nsdatabase::execCodePage()
will be specified as utf8
in constructor of database
object in the library for PHP/Ruby. You do not have to specify it.
(2) Even if you create the table schema on your own, please use utf8
.
tabledef::schemaCodePage
is utf8
by default. You do not have to specify it.
(3) Use utf8
in meta-information other than (2).
Use utf8
in all, field value, field names, table names and database names.
(d) In the case of write program in cp932
* Please USE OTHER SETTINGS except you has special reasons such as existing database uses cp932
.
This setting is not recommended because it is complex.
(1) Specify cp932
to MySQL/MariaDB variable character-set-server
.
Set nsdatabase::setExecCodePage(CHARSET_CP932)
on program.
If you use PHP/Ruby, write program in cp932
.
(2) If you use multibyte character in field names or table names,
generate the schema table with Transactd API and set cp932
to tabledef::schemaCodePage
.
Alternatively, open auto-generated schema table, set cp932
to tabledef::schemaCodePage
,
set values in cp932
to table names and field names.
(3) Specify database names with the string which has been encoded in utf8
.
Details of character code converting
Transactd client converts string character code, but Transactd plugin does not convert any string. All conversion is in client side.
OS functions MultiByteToWideChar
and WideCharToMultiByte
are used to convert string on Windows.
iconv
is used to convert strings on Linux.
Transactd caches the converting libraries for each character code combination, to convert strings faster.
The combination of mbcs
, utf16
and utf8
are cached by default.
SHIFT-JIS
is defined as default mbcs
character code in MBC_CHARSETNAME
macro
in mbcswchrLinux.h
.
If you want to change default mbcs
, modify MBC_CHARSETNAME
and re-compile.
Also, if you want to use multiple mbcs
at the same time, you need to modify mbcswchrLinux.h
and mbcswchrLinux.cpp
.
Details of DDL operation
Some of Transactd DDL operations use SQL layer operation.
utf8
is used in whole of MySQL/MariaDB internal processing,
but the string which will be passed to SQL layer must be encoded in the character code specified in
MySQL/MariaDB variable character-set-server
.
Usually, programmers do not have to be aware of the character code
which used to encoding string to be passed to SQL layer.
However, in the case of Linux, encoding will not work well if character-set-server
is not one of
the cached character codes, either of mbcs
(SHIFT-JIS
by default), utf16
or utf8
.
Codepage list
The codepages which you can specify to execCodePage
or schemaCodePage
are listed in
characterset.cpp.