EBS: GSM-7 and UCS-2 Character Encodings and Their Importance When Sending SMS Messages

What is GSM-7?

Encoding Character Set

GSM-7 is the standard alphabet, or better known as character encoding, used for SMS messages. It consists of 140 characters, includes most of the characters used in Latin-based languages, such as English, French (including accented characters), and Nordic characters. Additionally, all punctuation and many commonly used special characters are considered part of the GSM-7 encoding. See the table on the right for the full GSM-7 character encoding. More details on GSM-7 can be found at GSM 03.38.


Table/image source: https://en.wikipedia.org/wiki/GSM_03.38#GSM_7-bit_default_alphabet_and_extension_table_of_3GPP_TS_23.038_/_GSM_03.38

What is UCS-2?

UCS-2 is an alphabet, or better known as character encoding, used for SMS messages when the more commonly used GSM-7 encoding can not be used. It consists of more than 65,000 characters including all of the East Asian language (Chinese, Japanese, Korean) characters, Arabic characters, and most special characters not covered by GSM-7. More details on UCS-2 can be found at Universal Coded Character Set

Why are the GSM-7 and UCS-2 encodings important when sending SMS messages?

Example GSM-7 and UCS-2

Most mobile phones and operators worldwide prefer GSM-7 as the de facto encoding for transferring SMS messages. However, if the message being transferred contains a single non-GSM-7 character, then the message will be transmitted using the UCS-2 encoding instead.

The SMS protocol allows for up to 160 GSM-7 characters, or 70 UCS-2 characters, to be transmitted per message segment. When a message is larger than the allowed number of characters, then the mobile operator will split the message and send multiple message segments which are then combined together and displayed as a single message by the handheld device. This message segmentation process consumes some characters for building message headers in each segment, to allow the handheld device to combine the segments together into a single message. As a result, a GSM-7 message segment is 153 characters, and a UCS-2 message segment is 67 characters, when segmentation occurs. See the diagram on the right for examples.

Additional Considerations with GSM-7

  1. A few characters, even though considered part of the standard GSM-7 encoding, consume two character spaces instead of one. These characters include the open/close square and curly brackets ('[', ']', '{', '}'). The full set of such characters are known as the Basic Character Set Extension, as demonstrated in the GSM-7 character table above.
  2. GSM-7 supports a concept know as shift tables to accommodate additional characters beyond the standard 140. However, this is not a widely supported mechanism by mobile operators globally. As a result, Everbridge only considers the standard characters when calculating the length of a message.

How does SMS message splitting affect my usage?

Each message segment counts towards your usage credit. For example, a 500 character message using all GSM-7 characters will be split into 4 message segments and count as 4 SMS messages in your usage report.

How can I keep my costs associated with multi-segmented SMS messages to a minimum?

One of the features in the Everbridge SmartPath suite is Single SMS. With Single SMS enabled for your Organization, you can be sure that SMS messages will be kept to a single segment whenever possible, while still ensuring that your contacts can get to the message in its entirety.

