HOME arrow SOLUTIONS arrow Glossary
Glossary

Character Encoding

A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses in order to facilitate the storage of text in computers and the transmission of text through telecommunication networks. Common examples include Morse code, which encodes letters of the Latin alphabet as series of long and short depressions of a telegraph key; and ASCII, which encodes letters, numerals, and other symbols, both as integers and as 7-bit binary versions of those integers, generally extended with an extra zero-bit to facilitate storage in 8-bit bytes (octets).
In earlier days of computing, the introduction of character sets such as ASCII (1963) and EBCDIC (1964) began the process of standardization. The limitations of such sets soon became apparent, and a number of ad-hoc methods developed to extend them. The need to support multiple writing systems (Languages), including the CJK family of East Asian scripts, required support for a far larger number of characters and demanded a systematic approach to character encoding rather than the previous ad hoc approaches.
DocFamily supports all possible types of encoding to process input-data in the right form. XML and Java delivers the core technology to DocFamily. The most commonly used encoding is the Unicode encoding (UTF-8, UTF-16, etc.).


< back