Character encoding on Transactd

To handle the character code correctly, you need to specify the character code that is defined in the database and the one that is used in client program.

If you use Japanese in Transactd, MySQL/MariaDB variable character-set-server is must be one of utf8, utf8mb4 or cp932.

Character code kind in the database with Transactd

There are three character code kind in the database with Transactd.

  1. Character code of field value
  2. Character code of field names and table names
  3. Character code of meta-information other than the above 2 (ex: database names)

(1) Character code of field value

The Character code of field value can be confirmed by SHOW CREATE TABLE SQL statement. It also can be confirmed by fielddef::charsetIndex() with Transactd.

This is determined by the design of the table.

(2) Character code of field names and table names

This is the character code of the schema table for Transactd. It can be confirmed by tabledef::schemaCodePage. If you automatically generate a schema table, utf8 is used.

If you created the schema table on your own, it is any character code which was specified by tabledef::schemaCodePage.

(3) Character code of meta-information other than the above 2

The character code of meta-information other than (2) is utf8.

The default character code which is used by the client program

If the character code that is used in the client program is not known, Transactd API can not encode results correctly. To specify character code, use nsdatabase::setExecCodePage(unsigned int codepage).

The default value of it is CP_ACP on Windows, and CP_UTF8 on others.

In the UNICODE version of the library on Windows, the character code for the client program is fixed in utf16. The value specified in the nsdatabase::setExecCodePage will be ignored.

Using API

(1) get/set field value

The field value will be converted automatically from nsdatabase::execCodePage() to fielddef::charsetIndex(), or reverse of it. For example, data parameter in table::setFV(short index, const char* data) is it.

If the two character codes is same, conversion does not occur.

In the UNICODE version of the library on Windows, use the utf16. The field value will be converted automatically from utf16 to fielddef::charsetIndex(), or reverse of it.

(2) get/set field names and table names

The table names and field names will be searched or collated with value which is stored in the schema table for Transactd. For example, name parameter in table::setFV(const char* name, const char* data) is it.

If you generated schema table automatically, specify the string which has been encoded in utf8.

If you created the schema table on your own, you can specify any code page in tabledef::schemaCodePage(). In this case, use the string which has been encoded in the code page which has been specified with tabledef::schemaCodePage() for table names and field names.

In the UNICODE version of the library on Windows, use the utf16. The table names and field names will be converted automatically from utf16 to tabledef::schemaCodePage(), or reverse of it.

(3) get/set meta-information other than the above 2

Use utf8 for character code of meta-information other than field names and table names. Such as database names. For example, uri parameter in database::open() is it.

These information will be passed to MySQL/MariaDB internal functions through Transactd Plugin. These functions use utf8.

In the UNICODE version of the library on Windows, use the utf16. The field value will be converted automatically from utf16 to utf8, or reverse of it.

Easy settings on each environment

(a) C++ / PHP / Ruby on Linux, Mac OS X

(1) Write the program in utf8. nsdatabase::execCodePage() is utf8 by default. You do not have to specify it.

(2) Even if you create the table schema on your own, please use utf8. tabledef::schemaCodePage is utf8 by default. You do not have to specify it.

(3) Use utf8 in meta-information other than (2).

Use utf8 in all, field value, field names, table names and database names.

But, if the field value has been encoded in multibyte character code other than cp932 utf16 and utf8, please see more details in details of character code converting and details of DDL operation.

(b) C++ / COM(JScript, C#) on Windows

Use the UNICODE version of the library with C++. The library for COM is UNICODE version by default.

Write program with utf16 or cp932, either is fine. Set compiler option to use UNICODE character set in C++.

The UNICODE version of the library convert character code automatically, in all of (1) (2) (3). This is easiest way.

(c) PHP / Ruby on Windows

(1) Write the program in utf8. nsdatabase::execCodePage() will be specified as utf8 in constructor of database object in the library for PHP/Ruby. You do not have to specify it.

(2) Even if you create the table schema on your own, please use utf8. tabledef::schemaCodePage is utf8 by default. You do not have to specify it.

(3) Use utf8 in meta-information other than (2).

Use utf8 in all, field value, field names, table names and database names.

(d) In the case of write program in cp932

* Please USE OTHER SETTINGS except you has special reasons such as existing database uses cp932. This setting is not recommended because it is complex.

(1) Specify cp932 to MySQL/MariaDB variable character-set-server. Set nsdatabase::setExecCodePage(CHARSET_CP932) on program. If you use PHP/Ruby, write program in cp932.

(2) If you use multibyte character in field names or table names, generate the schema table with Transactd API and set cp932 to tabledef::schemaCodePage. Alternatively, open auto-generated schema table, set cp932 to tabledef::schemaCodePage, set values in cp932 to table names and field names.

(3) Specify database names with the string which has been encoded in utf8.

Details of character code converting

Transactd client converts string character code, but Transactd plugin does not convert any string. All conversion is in client side.

OS functions MultiByteToWideChar and WideCharToMultiByte are used to convert string on Windows.

iconv is used to convert strings on Linux. Transactd caches the converting libraries for each character code combination, to convert strings faster. The combination of mbcs, utf16 and utf8 are cached by default.

SHIFT-JIS is defined as default mbcs character code in MBC_CHARSETNAME macro in mbcswchrLinux.h. If you want to change default mbcs, modify MBC_CHARSETNAME and re-compile.

Also, if you want to use multiple mbcs at the same time, you need to modify mbcswchrLinux.h and mbcswchrLinux.cpp.

Details of DDL operation

Some of Transactd DDL operations use SQL layer operation. utf8 is used in whole of MySQL/MariaDB internal processing, but the string which will be passed to SQL layer must be encoded in the character code specified in MySQL/MariaDB variable character-set-server.

Usually, programmers do not have to be aware of the character code which used to encoding string to be passed to SQL layer. However, in the case of Linux, encoding will not work well if character-set-server is not one of the cached character codes, either of mbcs (SHIFT-JIS by default), utf16 or utf8.

Codepage list

The codepages which you can specify to execCodePage or schemaCodePage are listed in characterset.cpp.