This file is provided as-is by Unicode, Inc. (The Unicode Consortium).
No claims are made as to fitness for any particular purpose.  No
warranties of any kind are expressed or implied.  The recipient
agrees to determine applicability of information provided.  If this
file has been provided on magnetic media by Unicode, Inc., the sole
remedy for any claim will be exchange of defective media within 90
days of receipt.

Unicode Technical Report #1
Draft Proposals
ASCII plain text version without charts

Copyright 1992 Unicode Inc.
All Rights Reserved

Until the end of the review period in August 1993: Permission is
granted to freely reproduce this report in small quantities for
purposes of review provided this notice remains affixed.

Review period closes August 15, 1993

Introduction

This Technical Report is comprised of three concrete proposals to
which the Unicode Technical Committee is strongly committed in
their current form.  These are: Ethiopian, Burmese, and Khmer.
These proposals have been reviewed internally and have been relatively
stable over a period of time.  The committee believes they represent
good technical solutions for the proposed scripts, and therefore
also recommends that specific codepoints within the body of Unicode
be allocated for them, as follows:

		Burmese         U+0F00  U+0F7F
		Khmer		U+0F80  U+0FFF
		Ethiopian       U+1200	U+125F

Specific open issues for each of these are addressed in the respective
draft block introductions.  These open issues do not detract
substantially from the solidity of the proposals.

Acknowledgements

 * The Burmese proposal was written by Andy Daniels, with contributions
	by Lloyd Anderson and Glenn Adams.
 * The Khmer proposal was written by Andy Daniels.
 * The Ethiopian proposal was written by Joe Becker.

The following individuals contributed greatly to the production of
this report:
	Lloyd Anderson, Glenn Adams, Lee Collins


Burmese U+0F00 -> 0F7F

The Burmese script is used to write Burmese, the majority language
of Myanmar (formerly Burma) and Pali. Variations and extentions of
the script are used to write other languages of the region, such
as Shan and Mon, and also to write Sanskrit.

The Burmese script derives from 11th century Mon. The Mon script
itself is probably borrowed directly from South India. The earliest
Mon inscription, found at Lopburi in Thailand, dates from the eight
century and is written in the Pallava script used at the Hinayana
Buddhist center of Conjeeveram in the area of Madras on the east
coast of India. In A.D. 1057 one of the first Burmese kings,
Aniruddha, conquered Thaton, a major Mon center, and brought back
with him to Pagan the most learned monks, artists and artisans of
the Mon. The first inscription in Burmese dates from the following
year and is written in an alphabet almost identical with that of
the Mon inscriptions. Aside from rounding of the originally square
characters, this script has remained largely unchanged to the
present.

The Burmese script therfore ultimately derives from Brahmi, and so
shares the structural features of its relatives: Consonant symbols
include an inherent vowel; various signs are placed before, above,
below and after a consonant to indicate a vowel other than the
inherent one; ligatures and conjuncts are used to indicate consonant
clusters.

In the course of its adaptation to non-Indo-Aryan languages, the
Burmese script has acquired some features that distinguish it from
other Indic scripts. The killer, or virama, participates in some
common constructions that would be clumsy to handle the way they
would be in the other Indic scripts, so the control function of
the virama is separated from the diacritic function of the killer.
The virama, 0F4D is used to form conjunct consonants, while the
killer, 0F52, is a simple diacritic and has no effect on character
shaping. The killer is also combined with the VOWEL SIGN O (0F4B)
to form the low level tone vowel "o." When used this way, this
symbol is known as hyei hto, or "thrust forward."

Burmese distinguishes as set of "medial" consonants. Originally
conjunct forms of YA PALE, YA GAU, WA and HA, they are used in
modern Burmese to form new letters and to spell certain vowel and
consonant combinations. They are treated here as no different from
any other conjunct and should be coded using the virama.

ISSUE: There's no reason from the point of view of the rendering
engine to have separate codes for the medials.  Some implementors
feel that the medials should nevertheless have separate codes.
Including them introduces alternate spellings for the same syllables,
something that should be avoided. If there are compelling reasons
for including the medials, there is certainly room to add them.

When a syllable has more than one medial, it is recommended that
they appear in the order that such syllables are traditionally
spelled. That is, HA HTOU, before YA PIN or YA YI, before WA HSWE.
Note that YA PIN and YA YI cannot appear in the same syllable in
Burmese. For example, "cwei" ("to drop off") is coded as
0F15+0F5C+0F5E+0F47.  "Hmyu" ("to delight, allure") is coded as
0F2E+0F5F+0F5D+0F42. This differs from the order in which medials
are normally written.

ISSUE: This rule is not strictly necessary, but regularizes the
spelling and simplifies rendering, string comparison and other
functions.

Burmese has several glyphs that are used with varying semantics
which are here given separate code points for each different usage.
The following pairs of letters look the same, but must be distinguished
in the text stream:

	EHKAYA U (0F09) and NYA GALEI (0F5B)
	GA NGE (0F17) and DIGIT HYI (0F6E)
	WA (0F35) and DIGIT THOUN NYA (0F66)
	YA GAU (0F30) and DIGIT HKUN NI (0F6D)
	DIGIT LEI (0F6A) and SYMBOL LAGAUN (0F73)

The last two pairs are distinguished in some fonts but not in
others.

Also, the LETTER 0 (0F13) is distinguished from the sequence
0F48+0F5D, and the ZA MYINZWE is distinguished from 0F1A+0F5C.

Symbols not found as single characters are formed from sequences
of the basic characters given here. For example,tha ji ("great
tha") is coded by the sequence 0F38+0F4D+0F38, i.e., it is a conjunct
formed from two THAs.  Kinzi is a conjunct formed from LETTER NGA
followed by some other consonant, that is, the sequence
0F19+0F4D+Consonant. Low level tone "o" has already been noted.
Level tone "ou" is to be coded as 0F41+0F3F. Other combinations
follow similarly.

The LETTER A, though classified here are a vowel, is actually a
consonant. Thus it can combine with any of the vowel symbols.

The tone mark AUKA MYI is often written to the left of a subscript
vowel sign or medial consonant. It should, nevertheless come after
the vowel or medial in the text stream. It is also used with killed
consonants in writing closed syllables. In this case, too, the AUKA
MYI should come after the ATHA in the text stream. For example,
the word /hyun./ (short, high falling tone) should be represented
as 0F30+0F5F+0F5E+0F02+0F51.

The SYMBOL HNAI is only used in the literary combination
0F73+0F19+0F52+0F03, meaning "the aforementioned."

Burmese does not use any whitespace between words. If word boundary
indications are desired, for example for the use of automatic line
layout algorithms, U+200B, ZERO WIDTH SPACE, is to be used.

Block Structure:  Burmese characters are mapped to their corresponding
ISCII slots whenever possible. Gaps in the block result mainly from
this mapping. Several ranges of code points are reserved for future
expansion. A notable exception is the pair NYA GALEI and NYA JI.
Historically, NYA GALEI is a simple palatal nasal, while NYA JI is
a ligature representing a double NYA GALEI. NYA JI, however, has
come to be regarded as the primary form of the letter in Burmese,
so it is assigned to the "preferred" ISCII slot for the palatal
nasal (U+0F1E), and NYA GALEI is placed at U+0F5F.

	U+0F00	to	U+0F01	Unassigned
	U+0F02	to	U+0F03	Various signs
	U+0F04				Unassigned
	U+0F05	to	U+0F14	Independent vowels
	U+0F15	to	U+0F39	Consonants
	U+0F3A	to	U+0F3E	Unassigned
	U+0F3F	to	U+0F4C	Dependent vowel signs
	U+0F4D				Virama
	U+0F4E	to	U+0F50	Unassigned
	U+0F51	to	U+0F52	Tone marks
	U+0F53	to	U+0F5F	Unassigned, reserved for extensions
	U+0F60	to	U+0F63	Additional dependent vowel signs
	U+0F64	to	U+0F65	Unassigned
	U+0F66	to	U+0F6F	Digits
	U+0F70	to	U+0F73	Special symbols
	U+0F74	to	U+0F77	Unassigned, reserved for additional symbols

Note: The transliteration used here follows D. Haigh Roop, An
Introduction to the Burmese Writing System (1972). Tone indications
are left out of the character names.

ISSUE: As with Khmer, if there is a more standard transliteration,
it should be used.

ISSUE: Old Burmese has a small subscript LETTER A, which is the
precursor of the tone mark AUKA MYA and appears exactly where modern
Burmese would use the latter. This can probably be treated as a
font difference. There is also a superscript form of YA GAU, similar
in use to the Indic repha. This can probably be accommodated in
the shaping rules. This is not a major issue as there is plenty of
room to add these characters. Further investigation is required.

DRAFT 03 Nov 1992; rev 92/11/25

		DRAFT BURMESE CHARACTER NAMES
		
		0F00
		0F01
		
		@		Various Signs
		0F02	BURMESE THEIDHEI TIN
				= little thing put on
				anusvara, niggahita
		0F03	BURMESE HYEIGA PAU
				= dots ahead
				visarga
		0F04
		
		@		Independent Vowels
		0F05	BURMESE LETTER A
		0F06
		0F07	BURMESE PALI EHKAYA I
				= letter pali I
		0F08	BURMESE EHKAYA I
				= letter I
		0F09	BURMESE EHKAYA U
				= letter U
				x Burmese nya galei -> 0F5B
		0F0A
		0F0B	BURMESE LETTER VOCALIC R
				Sanskrit
		0F0C	BURMESE LETTER VOCALIC L
				Sanskrit
		0F0D
		0F0E
		0F0F	BURMESE EHKAYA EI
				= letter EI
		0F10
		0F11
		0F12
		0F13	BURMESE LETTER O
				x sra
		0F14
		
		@	Consonants
		0F15	BURMESE KA JI
				= great ka
		0F16	BURMESE HKA GWEI
				= curved hka
		0F17	BURMESE GA NGE
				= small ga
				x Burmese digit hyi -> 0F6E
		0F18	BURMESE GA JI
				= great ga
		0F19	BURMESE LETTER NGA
		0F1A	BURMESE SA LOUN
				= round sa
		0F1B	BURMESE HSA LEIN
				= twisted hsa
		0F1C	BURMESE ZA GWE
				= split za
		0F1D	BURMESE ZA MYINZWE
				= bridle za
				x cya
		0F1E	BURMESE NYA JI
				= great nya
		0F1F	BURMESE TA TALINJEI
				= bier-hook ta
		0F20	BURMESE HTA WUNBE
				= duck hta
		0F21	BURMESE DA YINGAU
				= crooked-breasted da
		0F22	BURMESE DA YEIHMOU
				= water-dipper da
		0F23	BURMESE NA JI
				= great na
		0F24	BURMESE TA WUNBU
				= pot-bellied ta
		0F25	BURMESE HTA HSINDU
				= elephant-fetter hta
		0F26	BURMESE DA DWEI
				= twisted da
		0F27	BURMESE DA AUHCAI
				= bottom-indented da
		0F28	BURMESE NA NGE
				= small na
		0F29
		0F2A	BURMESE PA ZAU
				= steep-sided pa
		0F2B	BURMESE HPA OUHTOU
				= capped hpa
		0F2C	BURMESE BA LAHCAI
				= top-indented ba
		0F2D	BURMESE BA GOUN
				= hump-backed ba
		0F2E	BURMESE LETTER MA
		0F2F	BURMESE YA PALE
				= supine ya
		0F30	BURMESE YA GAU
				= crooked ya
				x Burmese digit hkun ni -> 0F6D
		0F31
		0F32	BURMESE LETTER LA
		0F33	BURMESE LA JI
				= great la
		0F34
		0F35	BURMESE LETTER WA
				x Burmese digit thoun nya -> 0F66
		0F36	BURMESE LETTER SANSKRIT SHA
				Sanskrit
		0F37	BURMESE LETTER SANSKRIT SSA
				Sanskrit
		0F38	BURMESE LETTER THA
		0F39	BURMESE LETTER HA
		
		0F3A
		0F3B
		0F3C
		0F3D
		
		@		Vowel Signs
		0F3E	BURMESE YEI HCA
				= line drawn down
		0F3F	BURMESE LOUNJI TIN
				= big circle put on
		0F40	BURMESE LOUNJI TIN HSAN HKA
				= big circle put on with a grain of rice
		0F41	BURMESE TAHCAUN NGIN
				= one stroke drawn out
		0F42	BURMESE HNAHCAUN NGIN
				= two strokes drawn out
		0F43	BURMESE VOWEL SIGN VOCALIC R
				Sanskrit
		0F44	BURMESE VOWEL SIGN VOCALIC RR
				Sanskrit
		0F45
		0F46
		0F47	BURMESE THAWEI HTOU
				= thrust in front
		0F48	BURMESE NAU PYI
				= thrown backwards
		0F49
		0F4A
		0F4B	BURMESE VOWEL SIGN O
		0F4C
		
		@		Virama
		0F4D	BURMESE VIRAMA
				x Burmese atha -> 0F52
		0F4E
		0F4F
		0F50
		
		@		Tone Marks
		0F51	BURMESE AUKA MYI
				= stopped below
		0F52	BURMESE ATHA
				= killer
				= hyei htou, "thrust forward"
				x Burmese virama -> 0F4D
		0F53
		0F54
		0F55
		0F56
		0F57
		0F58
		0F59
		0F5A
		0F5B
		0F5C
		0F5D
		0F5E
		
		@		Consonants
		0F5F	BURMESE NYA GALEI
				= little nya
				x Burmese ehkaya u -> 0F09
		
		@		Vowel Signs
		0F60	BURMESE LETTER VOCALIC RR
				Sanskrit
		0F61	BURMESE LETTER VOCALIC LL
				Sanskrit
		0F62	BURMESE VOWEL SIGN VOCALIC L
				Sanskrit
		0F63	BURMESE VOWEL SIGN VOCALIC LL
				Sanskrit
		0F64
		0F65
		
		@		Digits
		0F66	BURMESE DIGIT THOUN NYA
				= digit zero
				x Burmese wa -> 0F35
		0F67	BURMESE DIGIT TI
				= digit one
		0F68	BURMESE DIGIT HNI
				= digit two
		0F69	BURMESE DIGIT THOUN
				= digit three
		0F6A	BURMESE DIGIT LEI
				= digit four
				x Burmese symbol lagaun -> 0F73
		0F6B	BURMESE DIGIT NGA
				= digit five
		0F6C	BURMESE DIGIT HCAU
				= digit six
		0F6D	BURMESE DIGIT HKUN NI
				= digit seven
				x Burmese ya gau -> 0F30
		0F6E	BURMESE DIGIT HYI
				= digit eight
				x Burmese ga nge -> 0F17
		0F6F	BURMESE DIGIT KOU
				= digit nine
		
		@		Various symbols
		0F70	BURMESE SYMBOL YWEI
		0F71	BURMESE SYMBOL EHKAYA I
		0F72	BURMESE SYMBOL HNAI
		0F73	BURMESE SYMBOL LAGAUN
				x Burmese digit lei -> 0F6A
		0F74
		0F75
		0F76
		0F77
		0F78
		0F79
		0F7A
		0F7B
		0F7C
		0F7D
		0F7E
		0F7F

Khmer U+0F80 -> 0FDF

Cambodian, also known as Khmer, is the official language of Cambodia.
Mutually intelligible dialects are also spoken in northeastern
Thailand and the Mekong Delta region of Vietnam. While not itself
an Indo-European language, much of the administrative, military
and literary vocabulary of Khmer is borrowed from Sanskrit.  With
the advent of Theravada Buddhism at the beginning of the fifteenth
century, Khmer began to borrow Pali words, and continues to use
Pali as a major source of neologisms today.  There is also much
cross-borrowing between Thai and Khmer, as well as a relatively
recent infusion of French words and a smattering of Chinese and
Vietnamese loanwords in colloquial speech.

The Khmer script, called a'saa kmae ("Khmer letters"), as well as
Thai, Lao, Burmese, Old Mon and others, are all descended from the
Brahmi script of South India. The exact geographical source, or
possibly sources, has not been determined, but there is a great
similarity between the earliest inscriptions in the region and the
Pallawa script of the Coromandel coast of India.

Structurally, the Khmer script stays very close to its southern
Brahmi origins. There is a set of 35 consonants, each with an
inherent vowel sound. Additional signs are placed before, above,
below and after the consonants to indicate vowels other than the
inherent one. Consonant clusters are represented by conjunct
consonants, where the first consonant of the cluster maintains its
full form and succeeding consonants are written as subscripts.

The Khmer language has a much richer set of vowels than the Indo-Aryan
languages for which the ancestral script was used. By the same
token, there is a much smaller set of consonant sounds. The Khmer
script is adapted to the language by adding extra vowel signs and
various diacritic marks, and by using the choice of consonant as
well as of vowel signs to determine the particular vowel sound
represented. Thus most vowel signs do not have a single value but
must be interpreted in the context of the associated consonant.
This is very similar to the situation in Thai and Lao, where
different consonant symbols have the same sound but encode different
tones.

There are two basic styles of script in modern Khmer, each with
two major variations. They are the a'saa criang ("slanted script")
and the a'saa muul ("round script"). There is no fundamental
structural difference between them, however, so the "standing"
variant of the slanted script is chosen here as representative.

Representation:

The Khmer script follows the model of Devanagari and other Indic
scripts. The basic unit is the syllabic cluster consisting of a
series of consonants separated by WIRIAM (0FC5), followed by one
or both of the pronounciation shifters MUSEKATOAN (0FCA) and TRUYSAP
(0FCB), followed by an optional vowel, followed by diacritics and
quality marks. For example, the word /knyom/, "I," is coded as the
string 0F81+0FC5+0F89+0FB5+0FC2.

In cases where there is already some other superscript in the
cluster, the two pronounciation shifters are written as the subscript
symbol kbiah kraom, which looks much like VOWEL SIGN O. This vowel
sign is not to be used for this purpose. It is the responsibility
of the presentation software to select the correct appearance of
the shifter. For example, /sii/, "to eat," should be coded as
0F9F+0FCB+0FB4, not as 0F9F+0FB7+0FB4.

RAWBAT (0FCC) historically corresponds to the Devanagari repha,
that is, to an initial /r/. It has lost this function in Khmer and
instead is considered a simple diacritic similar to TOANDAKHIAT in
both reading and sorting. There are also many cases of consonant
clusters with initial /r/ that should be written with a full RAW
and not a RAWBAT, so a separate character is provided for it.

Khmer writing does not normally separate words with white space as
European languages do. If it is desirable to represent word boundaries
in the text stream, for example, for use by automatic line layout
algorithms, U+200B, ZERO WIDTH SPACE, should be used.

Two relatively rare symbols in modern usage are not included here.
They are pnek moan, the "cock's eye," and "komout." They are
identical in form and function to the Thai characters FONGMAN and
KHOMUT, respectively, so the latter two should be used when these
symbols are needed.

Block Structure:

	U+0F80	to	U+0FA2	Consonants
	U+0FA3	to	U+0FB1	Independent Vowels
	U+0FB2	to	U+0FC1	Vowel Signs
	U+0FC2	to	U+0FC4	Quality Marks
	U+0FC5				Virama
	U+0FC6	to	U+0FC7	Unassigned
	U+0FC8	to	U+0FCF	Diacritics
	U+0FD0	to	U+0FD9	Digits
	U+0FDA	to	U+0FDE	Symbols and Punctuation
	U+0FDF				Unassigned

ISSUES:  The independent vowels LETTER AO TYPE 2 and LETTER AW TYPE
2 are variant forms of LETTER AO TYPE 1 and LETTER AW TYPE 1,
respectively. It is not believed that they are in free variation:
LETTER AO TYPE 2 occurs only in the combination "aoy," while LETTER
AW TYPE 2 is only cited in a few references, but not used. There
is an opportunity to unify these pairs. Note that LETTER UW and
LETTER OU are also listed as variants, but they are actually not
in free variation, so both must be provided.

It may be desirable to add the vowel sign AM instead of using the
combination AA+NIKAHAT. This would simplify a common special case
in sorting.

The punctuation marks KHAN and BARIYAOSAN may be unified with some
other characters, just as Indic dandas have been. A likely candidate
for the former is Thai PAI YAN NOI. Such a unification, as well as
that of the "cock's eye" and "cow piss" characters presents an
interesting challenge to the font mechanism of a Unicode rendering
engine: Different glyphs may be required for the same character
when used in conjunction with different scripts. This seems like
a needless complication for what are otherwise simple, non-combining
characters.

It may be more desirable from a political standpoint to follow
either the Thai or the ISCII coding schemes. Sample charts have
been produced showing how this may be done. If this is indeed the
path taken, those charts should be expanded to include all characters
in this proposal.

The vowel encoding takes an ISCII-like approach, coding as single
characters vowels that consist of two or more disjoint glyphs. If
vowel symbols are instead decomposed into their constituent glyphs
and those coded separately, there is then no advantage to the code
point assignments made here. In such a case, the assignments should
be made according to the Thai pattern.

The romanization scheme here is rather ad-hoc. If a more commonly
accepted one exists, the character names should be changed accordingly.

Draft 03 October 1992; rev 92/11/25

		DRAFT KHMER CHARACTER NAMES

		@		Consonants
		0F80	KHMER LETTER KAA
		0F81	KHMER LETTER KHAA
		0F82	KHMER LETTER KAW
		0F83	KHMER LETTER KHAW
		0F84	KHMER LETTER NGAW
		0F85	KHMER LETTER CAA
		0F86	KHMER LETTER CHAA
		0F87	KHMER LETTER CAW
		0F88	KHMER LETTER CHAW
		0F89	KHMER LETTER NYAW
		0F8A	KHMER LETTER DAA
		0F8B	KHMER LETTER RETROFLEX THAA
		0F8C	KHMER LETTER DAW
		0F8D	KHMER LETTER RETROFLEX THAW
		0F8E	KHMER LETTER NAA
		0F8F	KHMER LETTER TAA
		0F90	KHMER LETTER THAA
		0F91	KHMER LETTER TAW
		0F92	KHMER LETTER THAW
		0F93	KHMER LETTER NAW
		0F94	KHMER LETTER BAA
		0F95	KHMER LETTER PHAA
		0F96	KHMER LETTER PAW
		0F97	KHMER LETTER PHAW
		0F98	KHMER LETTER MAW
		0F99	KHMER LETTER YAW
		0F9A	KHMER LETTER RAW
		0F9B	KHMER LETTER LAW
		0F9C	KHMER LETTER WAW
		0F9D	KHMER LETTER SHAA
				Sanskrit
		0F9E	KHMER LETTER SSAA
				Sanskrit
		0F9F	KHMER LETTER SAA
		0FA0	KHMER LETTER HAA
		0FA1	KHMER LETTER LAA
		0FA2	KHMER LETTER QAA
				glottal stop
		
		@		Independent Vowels
		0FA3	KHMER LETTER E
		0FA4	KHMER LETTER EY
		0FA5	KHMER LETTER O
		0FA6	KHMER LETTER UW
		0FA7	KHMER LETTER OU
		0FA8	KHMER LETTER AE
		0FA9	KHMER LETTER AY
		0FAA	KHMER LETTER AO TYPE 1
		0FAB	KHMER LETTER AO TYPE 2
		0FAC	KHMER LETTER AW TYPE 1
		0FAD	KHMER LETTER AW TYPE 2
		0FAE	KHMER LETTER RIK
		0FAF	KHMER LETTER RII
		0FB0	KHMER LETTER LIK
		0FB1	KHMER LETTER LII
		
		@		Vowel Signs
		0FB2	KHMER VOWEL SIGN AA
		0FB3	KHMER VOWEL SIGN E
		0FB4	KHMER VOWEL SIGN EY
		0FB5	KHMER VOWEL SIGN U
		0FB6	KHMER VOWEL SIGN UI
		0FB7	KHMER VOWEL SIGN O
				x kbiah kraom
		0FB8	KHMER VOWEL SIGN OU
		0FB9	KHMER VOWEL SIGN UA
		0FBA	KHMER VOWEL SIGN AU
		0FBB	KHMER VOWEL SIGN IE
		0FBC	KHMER VOWEL SIGN IU
		0FBD	KHMER VOWEL SIGN EI
		0FBE	KHMER VOWEL SIGN AE
		0FBF	KHMER VOWEL SIGN AY
		0FC0	KHMER VOWEL SIGN AO
		0FC1	KHMER VOWEL SIGN AW
		
		@		Quality Marks
		0FC2	KHMER SIGN NIKAHAT
				= sra am
				= damla
		0FC3	KHMER SIGN REAHMUK
				= wihsakea
				= wihsancani
		0FC4	KHMER SIGN YUKALEAPINTU
				= coc pi
		
		@		Virama
		0FC5	KHMER SIGN WIRIAM
				virama
		0FC6
		0FC7
		
		@		Diacritics
		0FC8	KHMER VOWEL SIGN BANTA
				= sangkat
				= reahsannya
		0FC9	KHMER VOWEL SIGN SANYOK SANNYA
		0FCA	KHMER SIGN MUSEKATOAN
				= tmin kandao
				vowel pronounciation shifter
		0FCB	KHMER SIGN TRUYSAP
				vowel pronounciation shifter
		0FCC	KHMER SIGN RAWBAT
				= rephea
		0FCD	KHMER SIGN TOANDAKHIAT
				= samlap
				= patdesaet
		0FCE	KHMER SIGN KAKABAT
				= caung kaek
		0FCF	KHMER SIGN AHSDA
				= leik prabuy
		
		@		Digits
		0FD0	KHMER DIGIT ZERO
		0FD1	KHMER DIGIT ONE
		0FD2	KHMER DIGIT TWO
		0FD3	KHMER DIGIT THREE
		0FD4	KHMER DIGIT FOUR
		0FD5	KHMER DIGIT FIVE
		0FD6	KHMER DIGIT SIX
		0FD7	KHMER DIGIT SEVEN
		0FD8	KHMER DIGIT EIGHT
		0FD9	KHMER DIGIT NINE
		
		@		Symbols and Punctuation
		0FDA	KHMER CURRENCY SYMBOL RIAL
		0FDB	KHMER LEIK TO
				= amendit sannya
				repetition sign
		0FDC	KHMER CAMNOC PI KUH
				x (division sign -> 00F7)
				x (tibetan comma -> 1038)
				colon, semicolon
		0FDD	KHMER KHAN
				full stop, ellipsis, abbreviation
		0FDE	KHMER BARIYAOSAN
				end of section
		0FDF

Proposal for Ethiopian Encoding

The Ethiopian proposal consists of a list of questions/issues, a
chart, a character names list, and a block introduction.  The
content is based on UTC/1991-026 On the Extended Ethiopic Alphabet
of February 26, 1991 and its later adjustments by Lloyd Anderson,
unioned with features of the Xerox Amharic implementation by Joe
Becker.  The character names are based on those in DP 10646, which
came from WG2/N459 "Ethiopian character sets" by Michael Mann.

QUESTIONS FOR REVIEWERS:

1. Is this collection missing any important, well-established
"extension" letters for writing less-common languages?

2. Are the glyphs in the charts appropriate?

3. Can you supply documentation to support the specification of
the following two characters?
	121D            ETHIOPIAN CONSONANT GG 1237
ETHIOPIAN VOWEL PHONETIC AE In particular, does U+1237
occur (as a vowel, not as a mark of "w" rounding) on any consonant
other than U+1211?  Should the combination of U+1237 with U+1211
simply be encoded as a distinct consonant (to be added between
current U+1211 and U+1212)?

4. Are the following characters specified correctly?
	1256            ETHIOPIAN COMMA
				modern usage like colon
	1257            ETHIOPIAN COLON
				modern usage like semicolon
	1259            ETHIOPIAN NEW COMMA
				modern usage

5. Do syllable glyph variants ever occur distinctively within the
same text, or are they merely font design choices like the glyph
variants of Latin "a" or "g"?

ISSUES:

* In this design, no provision is made for coding the syllable
glyphs; it is intended that they be excluded from Unicode/10646
BMP.  If we learn that glyph variants may occur distinctively, then
we may need to define some additional means for specifying glyph
variants within plain text.

* Should we define an Ethiopian White Space character which can be
easily guaranteed to have the same (minimum) width as U+1255
ETHIOPIAN WORDSPACE? Currently opinion is that this is unnecessary.

Ethiopian  (U+1200 -> U+125F)

The Ethiopian script, which originally evolved for the archaic
language Ge'ez, is currently used to write several languages of
Eastern Africa, including Amharic, Tigre, and Oromo.  The script
continues to be extended for writing languages that have little
tradition of printed typography; new characters to cover such
extensions may added to the standard later as definitive information
about them becomes available.

Encoding Principles.  The visible glyphs of the Ethiopian script
are not the objects shown in the encoding chart.  The elements of
the encoding are the alphabet underlying the script, thus the
encoding is (roughly) phonetic rather than glyphic.  These alphabetic
letters are expected to be the units of keyboard input and all text
representation short of rendering.

Rendering.  Each visible glyph of the Ethiopian script represents
a syllable rather than a single letter.  The syllables can all be
treated as simple (consonant + vowel) pairs, so that each glyph
can be thought of as a ligature of two underlying letters.  Thus
the syllable "MA" would be represented in the encoding as U+1203
ETHIOPIAN CONSONANT M plus U+1233 ETHIOPIAN VOWEL A.  The syllable
glyphs themselves are not intended to be incorporated in this
encoding.  The individual consonant or vowel codes should not be
isolated (i.e. unpaired) in normal final text, and their rendering
in such circumstances is an option of the implementation.  One
possibility is to use special symbols for the individual letters,
as is done in the code charts here.

Chart Symbols Representing Individual Letters.  Since the Ethiopian
glyphs are normally syllabic, the script provides no unambiguous
way of representing the underlying individual letters.  Therefore
in the code charts and names list, a convention has been adopted
in which consonant letters are represented by their "first" form
surrounded by a dotted circle, and vowel letters are represented
by a typical glyph fragment attached to a dotted circle.  This is
not intended to imply direct glyphic composition of those forms,
but merely to signify the underlying letters.

Encoding/Rendering of "First Form" Syllables.  The circled consonants
in the charts U+1200 -> U+1224 are underlying letters, they should
not be confused with rendered full first form syllable glyphs.  As
with all glyphs in the script, the first form syllables are encoded
as simple (consonant + vowel) pairs.  Thus the glyph "MAE" would
be represented in the encoding as U+1203 ETHIOPIAN CONSONANT M plus
U+1230 ETHIOPIAN VOWEL AE.  This pair would then be rendered via
a "ligature" MAE whose appearance would resemble the chart symbol
for U+1203 ETHIOPIAN CONSONANT M without the circle.

Encoding/Rendering of Lone Consonants ("Sixth Form" Syllables).
The sixth form syllable glyphs are sometimes pronounced as though
they were lone consonants (i.e. the vowel is dropped in speech),
but this does not change their encoding. As with all glyphs in the
script, the sixth form syllables are encoded as simple (consonant
+ vowel) pairs.  Thus the spoken lone consonant "M" would be
represented in the encoding as U+1203 ETHIOPIAN CONSONANT M plus
U+1235 ETHIOPIAN VOWEL SCHWA.

Variant Glyph Forms.  The script sometimes provides different glyph
forms to represent the same syllables.  It is assumed that these
alternatives do not vary freely, in other words that is appropriate
for a given font to contain only one selected glyph form for each
syllable.  Therefore no mechanism is provided for specifying glyph
variants within a plain text stream of characters.  The situation
is analogous to that of the glyph variants of Latin "a" or "g".

Letter Names.  The Ethiopian script often has multiple letters
corresponding to the same Latin letter, making it difficult to
assign unique Latin names. Therefore the names list makes use of
certain devices (such as doubling a Latin letter in the name) merely
to create uniqueness; this has no relation to the phonetics of the
Ethiopian letters.

Encoding Order and Sorting.  The order of the letters in the encoding
is based on the traditional alphabetical order.  This order differs
from the sort order used for one or another language, if only
because in many languages various pairs or triplets of letters are
treated as equivalent in the first sorting pass.  For example, an
Amharic dictionary is likely to start out with a section headed by
three letters:

    U+1200 ETHIOPIAN CONSONANT H
    U+1202 ETHIOPIAN CONSONANT HH
    U+120E ETHIOPIAN CONSONANT X

Thus the encoding order cannot and does not implement a collation
procedure for any particular language using this script.

Space Characters.  The traditional word separator is U+1255 ETHIOPIAN
WORDSPACE ( : ), but in modern usage a plain white wordspace is
becoming common.  The ASCII character U+0020 SPACE is suitable for
the latter usage, although its (minimum) width is not guaranteed
to be the same as that of the traditional wordspace.

Diacritical Marks.  The mark U+030E NON-SPACING DOUBLE VERTICAL
LINE ABOVE may occasionally be used to indicate emphasis or
gemination.  If this or other diacritical marks are used, they
follow the vowel letter of the syllable to which they apply.

Encoding Structure.  The Unicode block for the Ethiopian script is
divided into the following ranges:

    U+1200	to	U+1224	Consonant phonetic letters
    U+1225	to	U+122F	Currently unassigned
    U+1230	to	U+123D	Vowel phonetic letters (U+1239 is an intentional gap)
    U+123E	to	U+123F	Currently unassigned
    U+1240	to	U+1254	Numbers (U+1240 is an intentional gap)
    U+1255	to	U+125B	Punctuation
    U+125C	to	U+125F	Currently unassigned

Draft October 30, 1992; rev 93/01/08

	ETHIOPIAN CHARACTER NAMES LIST
	
	@		Consonant phonetic letters
	1200	ETHIOPIAN CONSONANT H
	1201	ETHIOPIAN CONSONANT L
	1202	ETHIOPIAN CONSONANT HH
	1203	ETHIOPIAN CONSONANT M
	1204	ETHIOPIAN CONSONANT SZ
	1205	ETHIOPIAN CONSONANT R
	1206	ETHIOPIAN CONSONANT S
	1207	ETHIOPIAN CONSONANT SH
	1208	ETHIOPIAN CONSONANT Q
	1209	ETHIOPIAN CONSONANT QH
	120A	ETHIOPIAN CONSONANT B
	120B	ETHIOPIAN CONSONANT V
	120C	ETHIOPIAN CONSONANT T
	120D	ETHIOPIAN CONSONANT C
	120E	ETHIOPIAN CONSONANT X
	120F	ETHIOPIAN CONSONANT N
	1210	ETHIOPIAN CONSONANT NY
	1211	ETHIOPIAN CONSONANT GLOTTAL
	1212	ETHIOPIAN CONSONANT K
	1213	ETHIOPIAN CONSONANT XX
	1214	ETHIOPIAN CONSONANT W
	1215	ETHIOPIAN CONSONANT NULL
	1216	ETHIOPIAN CONSONANT Z
	1217	ETHIOPIAN CONSONANT ZH
	1218	ETHIOPIAN CONSONANT Y
	1219	ETHIOPIAN CONSONANT D
	121A	ETHIOPIAN CONSONANT DD
			Oromo
	121B	ETHIOPIAN CONSONANT J
	121C	ETHIOPIAN CONSONANT G
	121D	ETHIOPIAN CONSONANT GG
			Bilen
	121E	ETHIOPIAN CONSONANT TH
	121F	ETHIOPIAN CONSONANT CH
	1220	ETHIOPIAN CONSONANT PH
	1221	ETHIOPIAN CONSONANT TS
	1222	ETHIOPIAN CONSONANT TZ
	1223	ETHIOPIAN CONSONANT F
	1224	ETHIOPIAN CONSONANT P
	1225
	1226
	1227
	1228
	1229
	122A
	122B
	122C
	122D
	122E
	122F
	
	@		Vowel phonetic letters
	1230	ETHIOPIAN VOWEL AE
	1231	ETHIOPIAN VOWEL U
	1232	ETHIOPIAN VOWEL I
	1233	ETHIOPIAN VOWEL A
	1234	ETHIOPIAN VOWEL E
	1235	ETHIOPIAN VOWEL SCHWA
	1236	ETHIOPIAN VOWEL O
	1237	ETHIOPIAN VOWEL PHONETIC AE
			used primarily with U+1211 ETHIOPIAN CONSONANT GLOTTAL
	1238	ETHIOPIAN VOWEL WAE
	1239
	123A	ETHIOPIAN VOWEL WI
	123B	ETHIOPIAN VOWEL WA
	123C	ETHIOPIAN VOWEL WE
	123D	ETHIOPIAN VOWEL W
	123E
	123F
	
	@		Numbers
	1240
	1241	ETHIOPIAN NUMBER ONE
	1242	ETHIOPIAN NUMBER TWO
	1243	ETHIOPIAN NUMBER THREE
	1244	ETHIOPIAN NUMBER FOUR
	1245	ETHIOPIAN NUMBER FIVE
	1246	ETHIOPIAN NUMBER SIX
	1247	ETHIOPIAN NUMBER SEVEN
	1248	ETHIOPIAN NUMBER EIGHT
	1249	ETHIOPIAN NUMBER NINE
	124A	ETHIOPIAN NUMBER TEN
	124B	ETHIOPIAN NUMBER TWENTY
	124C	ETHIOPIAN NUMBER THIRTY
	124D	ETHIOPIAN NUMBER FORTY
	124E	ETHIOPIAN NUMBER FIFTY
	124F	ETHIOPIAN NUMBER SIXTY
	1250	ETHIOPIAN NUMBER SEVENTY
	1251	ETHIOPIAN NUMBER EIGHTY
	1252	ETHIOPIAN NUMBER NINETY
	1253	ETHIOPIAN NUMBER HUNDRED
	1254	ETHIOPIAN NUMBER TEN THOUSAND
	
	@		Punctuation
	1255	ETHIOPIAN WORDSPACE
	1256	ETHIOPIAN COMMA
			modern usage like colon
	1257	ETHIOPIAN COLON
			modern usage like semicolon
	1258	ETHIOPIAN PERIOD
	1259	ETHIOPIAN NEW COMMA
			modern usage
	125A	ETHIOPIAN QUESTION MARK
			archaic
	125B	ETHIOPIAN PARAGRAPH SEPARATOR
			archaic