-sequence>.
-
-Whether to call these combining character sequences "characters"
-depends on your point of view. If you are a programmer, you probably
-would tend towards seeing each element in the sequences as one unit,
-or "character". The whole sequence could be seen as one "character",
-however, from the user's point of view, since that's probably what it
-looks like in the context of the user's language.
-
-With this "whole sequence" view of characters, the total number of
-characters is open-ended. But in the programmer's "one unit is one
-character" point of view, the concept of "characters" is more
-deterministic. In this document, we take that second point of view:
-one "character" is one Unicode code point, be it a base character or
-a combining character.
-
-For some combinations, there are I<precomposed> characters.
-C<LATIN CAPITAL LETTER A WITH ACUTE>, for example, is defined as
-a single code point. These precomposed characters are, however,
-only available for some combinations, and are mainly
-meant to support round-trip conversions between Unicode and legacy
-standards (like the ISO 8859). In the general case, the composing
-method is more extensible. To support conversion between
-different compositions of the characters, various I<normalization
-forms> to standardize representations are also defined.
+sequence>. Some non-western languages require more complicated
+models, so Unicode created the I<grapheme cluster> concept, which was
+later further refined into the I<extended grapheme cluster>. For
+example, a Korean Hangul syllable is considered a single logical
+character, but most often consists of three actual
+Unicode characters: a leading consonant followed by an interior vowel followed
+by a trailing consonant.
+
+Whether to call these extended grapheme clusters "characters" depends on your
+point of view. If you are a programmer, you probably would tend towards seeing
+each element in the sequences as one unit, or "character". However from
+the user's point of view, the whole sequence could be seen as one
+"character" since that's probably what it looks like in the context of the
+user's language. In this document, we take the programmer's point of
+view: one "character" is one Unicode code point.
+
+For some combinations of base character and modifiers, there are
+I<precomposed> characters. There is a single character equivalent, for
+example, for the sequence C<LATIN CAPITAL LETTER A> followed by
+C<COMBINING ACUTE ACCENT>. It is called C<LATIN CAPITAL LETTER A WITH
+ACUTE>. These precomposed characters are, however, only available for
+some combinations, and are mainly meant to support round-trip
+conversions between Unicode and legacy standards (like ISO 8859). Using
+sequences, as Unicode does, allows for needing fewer basic building blocks
+(code points) to express many more potential grapheme clusters. To
+support conversion between equivalent forms, various I<normalization
+forms> are also defined. Thus, C<LATIN CAPITAL LETTER A WITH ACUTE> is
+in I<Normalization Form Composed>, (abbreviated NFC), and the sequence
+C<LATIN CAPITAL LETTER A> followed by C<COMBINING ACUTE ACCENT>
+represents the same character in I<Normalization Form Decomposed> (NFD).