User's Guide For Thai Input Method



Table of Contents:

 
1. Thai Character Set Standard (TIS 620 -2533)
2. Thai Character Classification
3. WTT 2.0 Input Sequence Checking
4. Thai Keyboard Layouts
5. Solaris Thai Input Method.
 

1.   Thai Character Set Standard (TIS 620-2533)


In Thailand, the Thai Character Set standard, TIS 620-2533, is a national standard for a primary set of graphic characters for Thai information interchange. It was defined by the Thai Industrial Standards Institute (TISI), Ministry of Industry, Royal Thai Government in 1986 (Buddhist year 2529) and was revised in 1990 (Buddhist year 2533).
TIS 620 defines an eight-bit character environment. Assigned character values are given below in a character table originally published by National Electronics and Computer Technology Center(NECTEC):

TIS620 Charset

2.  Thai Character Classification


Thai character classification in TIS 620 is meant to ease computer processing when dealing with displaying and input sequence checking, and is not related to the Thai linguistic.

The Thai characters are classified into six classes:

(1). Control Characters,
(2). Consonants,
(3). Vowels,
(4). Tonemarks,
(5). Diacritics,
(6). Non-composibles.

2.1  Control characters(CTRL)

These control characters are nondisplayable characters being used as control codes for output display or data communiction, totally 66 control characters.

They are (0x00) to (0x1F), (0x7F), (0x80) to (0x9F), (0xFF).

2.2  Consonants(CONS)

The TIS 620 character set contains 44 consonants, they are  from (0xA1) to (0xCE), as shown in the table:
 
Hexadecimal Character Name Thai Character
A1 KO KAI ¡
A2 KHO KHAI ¢
A3 KHO KHUAT £
A4 KHO KHWAI ¤
A5 KHO KHON ¥
A6 KHO RAKHANG ¦
A7 NGO NGU §
A8 CHO CHAN ¨
A9 CHO CHING ©
AA CHO CHANG ª
AB SO SO «
AC CHO CHOE ¬
AD YO YING ­
AE DO CHADA ®
AF TO PATAK ¯
B0 THO THOTHAN °
B1 THO NANGMONTHO ±
B2 THO PH00 THAO ²
B3 NOR NANE ³
B4 DOR DEK ´
B5 TO TAO µ
B6 THO THUNG
B7 THO THAHAN ·
B8 THO THONG ¸
B9 NO NU ¹
BA BO BAIMAI º
BB PO PLA »
BC PHO PHERNG ¼
BD FO FA ½
BE PO PAN ¾
BF FO FAN ¿
C0 PO SAMPOW À
C1 MO MA Á
C2 YO YAK Â
C3 RO RUA Ã
C5 LO LING Å
C7 WO WAEN Ç
C8 SO SALA È
C9 SO RUSI É
CA SO SUA Ê
CB HO HEEP Ë
CC LO CHULA Ì
CD O ANG Í
CE HO NOKHUK Î

2.3 Vowels (-V).

The TIS 620 character set contains 18 vowels, divided into four groups.

(1). Leading vowels (LV):

These vowels are placed before consonants, totally 5 leading vowels, as shown in below table:
 
Hexadecimal Character Name Thai Character
E0 SARA E à
E1 SARA AE á
E2 SARA O â
E3 SARA AI MAIMUAN ã
E4 SARA AI MAIMALAI ä

(2). Following Vowels (FV):

These vowels are placed after consonants. totally 6 following vowels, and the 6 following vowels are further divided into two groups.

Normal following vowels:
 
Hexadecimal Character Name Thai Character
D0 SARA A Ð
D2 SARA AAT Ò
D3 SARA AM Ó
E5 LAKKHANGYAO å

Special following vowels:
 
Hexadecimal Character Name Thai Character
C4 RU Ä
C6 LU Æ

(3). Below Vowels (BV).

These vowels are placed below consonants, totally 2 below vowels, as shown in below table:
 
Hexadecimal Character Name Thai Character
D8 SARA U Ø
D9 SARA UU Ù

(4). Above vowels (AV).

These vowels are placed above consonants, totally 5 above vowels, as shown in below table:
 
Hexadecimal Character Name Thai Character
D1 MAI HAN-AKAT Ñ
D4 SARA E Ô
D5 SARA EE Õ
D6 SARA UR Ö
D7 SARA UUR ×

2.4  Tonemarks (TONE)

The TIS 620 character set contains 4 tone marks:
 
Hexadecimal Character Name Thai Character
E8 MAI EK è
E9 MAI THO é
EA MAI TRIE ê
EB MAI CHATTAWA ë


2.5  Diacritics (-D)

The TIS 620 character set contains 5 diacritics divided into two groups.

(1) Above diacritics (AD).

These diacritics are placed above initial or final consonants, totally 4 above diacritics, shown as below table:
 
Hexadecimal Character Name Thai Character
E7 MAITAIKHU ç
EC THANTHAKHAT ì
ED NIKHAHIT í
EE YAMAKKAN î

(2) Below diacritic (BD).

The below diacritic is placed below final or clustered consonants, totally only one below diacritic, shown as below table:
 
Hexadecimal Character Name Thai Character
DA PHINTHU Ú

2.6  Non-composibles (NON)

The TIS 620 character set contains 18 noncomposible characters. These characters cannot be composed with above vowels, below vowels, tone marks, above diacritics and below diacritic. Noncomposible characters are divided into seven groups.
(1) Graphic characters.

There are 94 graphic characters, subdivided into 52 English alphabetic characters (A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z), 10 digits (0 1 2 3 4 5 6 7 8 9), and 32 special characters which include !@#$%^&*()_+[]\{}|;',./:"<>?.`~

(2) Space (0x20).

(3) One no-break space:
 
Hexadecimal Character Name Thai Character
A0 NO-BREAK SPACE

(4) Ten Thai digits:
 
Hexadecimal Character Name Thai Character
F0 THAI ZERO ð
F1 THAI ONE ñ
F2 THAI TWO ò
F3 THAI THREE ó
F4 THAI FOUR ô
F5 THAI FIVE õ
F6 THAI SIX ö
F7 THAI SEVEN ÷
F8 THAI EIGHT ø
F9 THAI NINE ù

(5) Six Thai special characters:
 
Hexadecimal Character Name Thai Character
CF PAYANGNOI Ï
DF BAHT (Thai currency sign) ß
E6 MAIYAMOK æ
EF FONGMAN ï
FA ANGKHANKHU ú
FB KHOMUT û

(6) One word separator:
 
Hexadecimal Character Name Thai Character
DC WORD SEPARATOR

This character is a nonprintable character. It is used for separating words in Thai sentences. Applications can make use of it to simplify Thai word processing.

(7) Reserved Characters.

Totally 6 reserved characters, they are (0xDB), (0xDD), (0xDE), (0xFC), (0xFD), (0xFE).

In order to describle Thai input/output methods, character classification for some classes (FV, BV, AV, and AV) have been re-classified into subclass such as FV1, FV2 and FV3. Thus a total of 17 subclasses (Thaweesak et al (1991)),  details shown as below table:
 
Class Number Description
CTRL 66 Control characters: (0x00) - (0x1F), (0x7F), (0x80) -(0x9F), (0xFF)
NON 119 Non-composible characters, include below characters:
(1) All English alphabets, (0x20) - (0x7E).
(2) TIS 620-2533 characters, such as 
       (0xA0),  (0xDC),
       (0xCF),  (0xDF), (0xE6), (0xEF), 
       (0xF0) - (0xF9),
       (0xFA), (0xFB)
(3) Reserved characters,  (0xDB), (0xDD), (0xDE), (0xFC), (0xFD), (0xFE)
CONS 44 Thai consonants, (0xA1) - (0xC3), (0xC5), (0xC7) - (0xCE)
LV 5 (0xE0), (0xE1), (0xE2), (0xE3), (0xE4)
FV1 3 (0xD0), (0xD2), (0xD3)
FV2 1 (0xE5)
FV3 2 (0xC4), (0xC6) 
BV1 1 (0xD8)
BV2 1 (0xD9)
BD 1 (0xDA)
TONE 4 (0xE8), (0xE9), (0xEA), (0xEB)
AD1 2 (0xED), (0xEC)
AD2 1 (0xE7)
AD3 1 (0xEE)
AV1 1 (0xD4)
AV2 2 (0xD1), (0xD6)
AV3 2 (0xD5), (0xD7)

Thai characterscan also be classified according to character levels. There are five character levels:

3.  WTT 2.0 Input Sequence Checking.


Prior to Unicode, there was a common convention agreed upon by vendors for implementing Thai, called WTT 2.0, based on TIS-620 eight-bit character set. (WTT, pronounced Wor Thor Thor, is a Thai abbreviation of Wing Thook Thee which means Runs Everywhere ).  It comprise 3 parts, defining the general facilities, Thai input/output method, and printer identification number, respectively.

According to Wtt 2.0, There are some basic rules concerning the Thai input sequence:

The above basic rules are used to construct the syntax diagram shown as below:

syntax diagram

WTT 2.0 defines 3 levels of syntactic strictness of input method as follow:

1).  Level 0 (Passthrough mode) does not filter at all. this allow application program to handle the checking of input sequence.
2).  Level 1 (BasicCheck mode) just ensures the input sequence to comply to the above basic rules and syntax diagram.
3).  Level 2 (Strict mode) is with more conditions to filter out some obvious illegal input  sequences.


For strict check mode, it's input sequece checking rules follow the syntax diagram below:

Strict checking

A single table, shared by the input method and output method, is defined for describing the character sequence conditions:
 

C
T
R
L
N
O
N
C
O
N
S
L
V
F
V
1
F
V
2
F
V
3
B
V
1
B
V
2
B
D
T
O
N
E
A
D
1
A
D
2
A
D
3
A
V
1
A
V
2
A
V
3
CTRL X A A A A A A R R R R R R R R R R
NON X A A A S S A R R R R R R R R R R
CONS X A A A A S A C C C C C C C C C C
LV  X S A S S S S R R R R R R R R R R
FV1  X S A S A S A R R R R R R R R R R
FV2  X A A A A S A R R R R R R R R R R
FV3  X A A A S A S R R R R R R R R R R
BV1  X A A A A S A R R R C C R R R R R
BV2  X A A A S S A R R R C R R R R R R
BD  X A A A S S A R R R R R R R R R R
TONE X A A A A A A R R R R R R R R R R
AD1  X A A A S S A R R R R R R R R R R
AD2  X A A A S S A R R R R R R R R R R
AD3  X A A A S S A R R R R R R R R R R
AV1  X A A A S S A R R R C C R R R R R
AV2  X A A A S S A R R R C R R R R R R
AV3  X A A A S S A R R R C R C R R R R

The rows are for types of previous character, and the columns are for types of following character. The codes in table cells determine the condition of the order:

Input method should behave as follows:

4. Thai Keyboard Layouts


The original Thai keyboard layout just follows closely the layout of the popular layout of Thai typewriter.

There are several keyboard layouts popular in Thailand:

(1) Ketmanee (TIS 820 - 2531) keyboard layout

In 1986, Thai Industrial Standards Institute (TISI) , announced TIS 620-2529 , the Thai standard character code for computers. Two years later TISI announced the Ketmanee layout as the standard layout for computers (TIS 820-2531).

The TIS 820 - 2531 (1988) keyboard layout shown as below:

TIS2531 keyboard

 

This "Kedmanee" keyboard layout was designed for typewriter, due to number of keys limitation of typewriter, some Thai special characters were cutted off.

(2) TIS820 - 2538 (1995) keyboard layout

     "TIS820-2538" keyboard layout is a updated version of "TIS820-2531", some Thai special characters that used to be cutted off in "Kedmanee" now back in this version.

The TIS 820 - 2538 (1995) keyboard layout shown as below:

TIS2538 keyboard

(3). Pattachote Keyboard Layout:

Pattachote keyboard was also designed for typewriter, but with better finger-load distribution. Pattachote used the statistics of Thai keystroke distributions to design a new keyboard layout using the following principles:

The Pattachote keyboard layout shown as below:

Pattachote Keyboard