Mapping XML Named Character References to Unicode Characters


Characters & Code Points

Character sets provide mappings between numeric code points and the semantics of characters at each code point.

  • ISO 10646 (1990): Universal Character Set/UCS.
  • Unicode 1.0 (1991).
  • Unicode 2.0 (1996).
  • Unicode 5.2 (2009).

    Unicode 2.0 (1996) exactly matches the characters/code points defined in ISO 10646-1 (UCS, 1993), and since then their development has stayed aligned.

Mapping named entity references to characters

Markup languages (e.g. DocBook, TEI, MathML, (X)HTML) define mappings between named character entity references (e.g. π) and Unicode code points.

Named character reference Numeric character reference (hex) Numeric character reference (dec) Unicode character

Entity sets

The W3C's "Entity Definitions for Characters" specification is in Last Call Working Draft status, and the deadline for reviews was last week. Hopefully it will become an official recommendation soon.