Wednesday, August 16, 2006

Make Unicode Configuration


To Support Unicode, two things need to be done. In VC++, go to the project menu and choose settings. On the C++ tab chose the category General and add the Preprocessor Definition for _UNICODE *don’t forget the underscore* and UNICODE and REMOVE the _MBCS (multi-byte character set) definition. Second, under the Link tab choose the category Output and set the entry point symbol to wWinMainCRTStartup.

When Unicode version of the application is to be built, both the Win32 compile-time flag UNICODE and the C run-time compile-time flag _UNICODE must be defined.

GENERAL GUIDELINES TO BE FOLLOWED.

1.Once _UNICODE and UNICODE has been defined for the project, a few steps need to be taken to ensure string handling is done properly.

The following steps (digested from the <> from MS Press) should be taken:
The code should be modified to use generic data types. Such as char, char* -> TCHAR and TCHAR*, which defined in the Win32 file WINDOWS.H, or to _TCHAR as defined in the Visual C++ file TCHAR.H. Replace instances of LPSTR and LPCH with LPTSTR and LPTCH.

2. The code should be modified to use generic function prototypes. such as use the C run-time call _tcslen instead of strlen, and use the Win32 API SetWindowText instead of SetWindowTextA.

3. Any character or string literal should be surrounded with the TEXT or _T macro. The TEXT macro conditionally places an "L" in front of a character literal or a string literal definition.

4. Pointer arithmetic should be adjusted. Subtracting char* values yields an answer in terms of bytes; subtracting wchar_t* values yields an answer in terms of 16-bit chunks. When determining the number of bytes (for example, when allocating memory for a string), the length of the string in symbols should be multiplied by sizeof (TCHAR). When determining the number of characters from the number of bytes, divide by sizeof (TCHAR).

5. Character!= byte.

A character is not necessarily one byte. In Asian "multibyte" character encodings, some characters take up 2 bytes or more, while others are one byte each. Do not jump directly into the middle of a byte array. Do not increment a char * pointer by one to move to the next character.

Check for any code that assumes a character is always 1 byte long. Code that assumes a character's value is always less than 256 (for example, code that uses a character value as an index into a table of size 256) must be changed. Make sure your definition of NULL is 16 bits long.

1. DataTypes in ANSI and the Unicode Equivalent:

S.No.

ANSI

Unicode

1

_T

2

LPCSTR
(const char *)

LPCTSTR
(const _TCHAR *)

3

char

_TCHAR

4

unsigned char

_TUCHAR

5

LPSTR
(char *)

LPTSTR
(_TCHAR *)

2. Functions in ANSI, and the Unicode equivalent:

S.No.

ANSI

Unicode

1

sprintf

_stprintf

2

atoi

_ttoi

3

_atoi64

_ttoi64

4

strcpy

_tcscpy

5

strcat

_tcscat

6

strlen

_tcslen

7

fopen

_tfopen

8

fprintf

_ftprintf

9

atol

_ttol

10

strstr

_tcsstr

11

ltoa

_ltot

12

atol

_ttol

13

atof

_tcstod

14

itoa

_itot

15

strncpy

_tcsncpy

16

strcmp

_tcscmp

17

sscanf

_stscanf

18

strchr

_tcschr

19

stricmp

_tcsicmp

20

strcspn

_tcscspn

21

printf

_tprintf

22

Fgets

_fgetts


0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home