Learn what the GSM-7 Extension Table (GSM_7BIT_EX) is, how the escape mechanism works, which extra characters are included, and how these affect SMS message length and encoding on GSM networks.
GSM-7 Extension Table (GSM_7BIT_EX): Full Guide
GSM-7 is the standard character encoding for SMS messages on GSM networks, efficiently packing the most common Latin letters and symbols into 7 bits per character. However, some characters needed for messaging—such as curly brackets, square brackets, and the Euro sign—are not present in the main GSM-7 alphabet. To address this, the GSM-7 Default Alphabet Extension Table (often called GSM_7BIT_EX) was introduced.
What is the GSM-7 Extension Table?
The GSM-7 extension table adds a small set of 10 additional characters that are not part of the main 128-character GSM-7 alphabet. These characters are:
- Form feed (FF)
- Caret/circumflex (^)
- Left curly bracket ({)
- Right curly bracket (})
- Backslash ($$
- Left square bracket ([)
- Tilde (~)
- Right square bracket (])
- Vertical bar (|)
- Euro sign (€)
How Does the Escape Mechanism Work?
To encode these extra characters, GSM-7 uses an escape character (with the hexadecimal value 0x1B) followed by a specific code for the desired character. This two-byte sequence signals to the receiving device that the character should be interpreted from the extension table rather than the main alphabet.
Example:
The Euro sign (€) is represented by the sequence 0x1B65, where 0x1B is the escape character and 0x65 is the code for the Euro sign.
If a device does not support the extension mechanism, the escape character is typically rendered as a space, and the following character may be misinterpreted or ignored.
Encoding Impact on SMS Length
A standard SMS message using GSM-7 encoding can contain up to 160 characters, as 140 bytes (octets) are available and each character takes 7 bits:
However, each character from the extension table takes up two character slots—one for the escape character and one for the actual character code. This means that every time you use an extension character (such as ‘{‘, ‘}’, ‘[‘, ‘]’, ”, ‘~’, ‘|’, ‘^’, ‘€’, or FF), it counts as two characters toward your SMS’s 160-character limit. If your message contains several of these, the maximum message length decreases proportionally.
Why Does This Matter?
- Cost: SMS billing is often based on the number of messages sent. If your message exceeds 160 GSM-7 characters (or fewer if you use extension characters), it will be split into multiple messages, increasing cost.
- Compatibility: Not all devices or networks handle the extension table perfectly. Some may display unsupported characters incorrectly if they don’t recognize the escape mechanism.
- Language Support: The extension table makes it possible to include certain symbols without switching to UCS-2 encoding, which would halve the message length to 70 characters per SMS.
List of GSM-7 Extension Table Characters
Character | Description | Escape Sequence (Hex) |
---|---|---|
FF | Form feed | 0x1B0A |
^ | Circumflex | 0x1B14 |
{ | Left curly brace | 0x1B28 |
} | Right curly brace | 0x1B29 |
\ | Backslash | 0x1B2F |
[ | Left square bracket | 0x1B3C |
~ | Tilde | 0x1B3D |
] | Right square bracket | 0x1B3E |
Vertical bar | ||
€ | Euro sign | 0x1B65 |
(Escape sequences may vary slightly depending on documentation, but these are standard as per GSM 03.38 and 3GPP TS 23.038.)
Summary
- The GSM-7 extension table allows 10 extra characters to be used in SMS.
- Each extension character uses an escape sequence, counting as two characters in the message.
- Using these characters reduces the number of characters you can fit in a single SMS.
- If unsupported, escape characters may be displayed as spaces or misinterpreted.
For developers and businesses sending SMS: Always be mindful of which characters are in your messages, as using extension table characters can affect both message length and cost.