dtl

Unicode Wide Character Support in DTL

When speaking about support for unicode, there are two distinct pieces of functionality that are often requested:

Support for writing and reading wide character strings to or from a database.
Full blown wide character mode. Wide character query strings, table names, field names, error messages, connection strings etc.

The support for reading and writing wide character strings was generously donated by Dale Peakall. To read/write wide character strings we now support binding to wstring class member fields. In addition, we have upgraded the variant_row class to automatically detect wide character columns in a database and represent these columns using a wstring.

The support for full-blown wide character mode is more involved. To enable unicode mode you will need to set two pre-processor defines called "UNICODE" and "_UNICODE"; these defines can be found the in the example_unicode project. To implement wide character support, we have taken a windows-ish type approach and defined the following typedefs and macros: (actually, under Windows we include the TCHAR.H header file which defines more than we need, but here we list here only what the the program requires other than implementation details in clib_fwd.cpp)

_TEXT() macro. Usage: _TEXT("I am a string literal"). What this macro does is if UNICODE is defined it formats the string as a wide character array L"I am a string literal", otherwise it does nothing.
TCHAR typedef. Under UNICODE this resolves to wchar_t, otherwise it repersents a char.
_TUCHAR typedef. Under UNICODE this resolves to wchar_t, otherwise it repersents an unsigned char.
BYTE typedef. This always resolves to an unsigned char.

typdefs for standard C++ library constucts:


#ifdef  _UNICODE
BEGIN_DTL_NAMESPACE 
	typedef unsigned char BYTE;
	typedef STD_::wstring tstring;
	typedef STD_::wostream tostream;
	static STD_::wostream &tcout = STD_::wcout; 
	typedef STD_::wostringstream tostringstream;

	// also we define -- psuedo-code since details are messy
	tostream_iterator<typename X> ... resolves to ostream_iterator<X, wchar_t>
END_DTL_NAMESPACE 
#else
BEGIN_DTL_NAMESPACE 
	typedef unsigned char BYTE;
	typedef STD_::string tstring;
	typedef STD_::ostream tostream;
	static STD_::ostream &tcout = STD_::cout; 
	typedef STD_::ostringstream tostringstream;

	// also we define -- psuedo-code
	tostream_iterator<typename X> ... resolves to ostream_iterator<X, char>
END_DTL_NAMESPACE
#endif

Finally, we have changed all the functions in the library that accept and return string arguments to now take a tstring instead. For example, if we request a find from an IndexedDBView using an alternate key, i.e. find_AK(tstring &s, key &k), the field is now represented as a tstring. This means that in UNICODE mode we now have the capability to fully support wide field names.

There is one final wrinkle worth noting about how we implemented the UNICODE version which is our exception structure. All of our exceptions inherit from std::exception which provides a what() method to extract the exception message. Unfortunately, this what() method is standardized to return a char *, which is unsatisfactory if one wants to retrieve wide character error messages. For this reason, all of our exception classes now have an additional method called THCAR *twhat() which returns a char * in ASCII mode and a wchar_t * in unicode mode.

Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Corwin Joy and Michael Gradman make no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.

This site written using the ORB.