Main Page | Class Hierarchy | Alphabetical List | Data Structures | Directories | File List | Data Fields | Globals | Related Pages

utf16.h File Reference

C API: 16-bit Unicode handling macros. More...

#include "unicode/utf.h"

Go to the source code of this file.

Defines

#define U16_IS_SINGLE(c)   !U_IS_SURROGATE(c)
 Does this code unit alone encode a code point (BMP, not a surrogate)?
#define U16_IS_LEAD(c)   (((c)&0xfffffc00)==0xd800)
 Is this code unit a lead surrogate (U+d800.
#define U16_IS_TRAIL(c)   (((c)&0xfffffc00)==0xdc00)
 Is this code unit a trail surrogate (U+dc00.
#define U16_IS_SURROGATE(c)   U_IS_SURROGATE(c)
 Is this code unit a surrogate (U+d800.
#define U16_IS_SURROGATE_LEAD(c)   (((c)&0x400)==0)
 Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?
#define U16_SURROGATE_OFFSET   ((0xd800<<10UL)+0xdc00-0x10000)
 Helper constant for U16_GET_SUPPLEMENTARY.
#define U16_GET_SUPPLEMENTARY(lead, trail)   (((lead)<<10UL)+(trail)-U16_SURROGATE_OFFSET)
 Get a supplementary code point value (U+10000.
#define U16_LEAD(supplementary)   (UChar)(((supplementary)>>10)+0xd7c0)
 Get the lead surrogate (0xd800.
#define U16_TRAIL(supplementary)   (UChar)(((supplementary)&0x3ff)|0xdc00)
 Get the trail surrogate (0xdc00.
#define U16_LENGTH(c)   ((uint32_t)(c)<=0xffff ? 1 : 2)
 How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000.
#define U16_MAX_LENGTH   2
 The maximum number of 16-bit code units per Unicode code point (U+0000.
#define U16_GET_UNSAFE(s, i, c)
 Get a code point from a string at a random-access offset, without changing the offset.
#define U16_GET(s, start, i, length, c)
 Get a code point from a string at a random-access offset, without changing the offset.
#define U16_NEXT_UNSAFE(s, i, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.
#define U16_NEXT(s, i, length, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.
#define U16_APPEND_UNSAFE(s, i, c)
 Append a code point to a string, overwriting 1 or 2 code units.
#define U16_APPEND(s, i, capacity, c, isError)
 Append a code point to a string, overwriting 1 or 2 code units.
#define U16_FWD_1_UNSAFE(s, i)
 Advance the string offset from one code point boundary to the next.
#define U16_FWD_1(s, i, length)
 Advance the string offset from one code point boundary to the next.
#define U16_FWD_N_UNSAFE(s, i, n)
 Advance the string offset from one code point boundary to the n-th next one, i.e.
#define U16_FWD_N(s, i, length, n)
 Advance the string offset from one code point boundary to the n-th next one, i.e.
#define U16_SET_CP_START_UNSAFE(s, i)
 Adjust a random-access offset to a code point boundary at the start of a code point.
#define U16_SET_CP_START(s, start, i)
 Adjust a random-access offset to a code point boundary at the start of a code point.
#define U16_PREV_UNSAFE(s, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them.
#define U16_PREV(s, start, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them.
#define U16_BACK_1_UNSAFE(s, i)
 Move the string offset from one code point boundary to the previous one.
#define U16_BACK_1(s, start, i)
 Move the string offset from one code point boundary to the previous one.
#define U16_BACK_N_UNSAFE(s, i, n)
 Move the string offset from one code point boundary to the n-th one before it, i.e.
#define U16_BACK_N(s, start, i, n)
 Move the string offset from one code point boundary to the n-th one before it, i.e.
#define U16_SET_CP_LIMIT_UNSAFE(s, i)
 Adjust a random-access offset to a code point boundary after a code point.
#define U16_SET_CP_LIMIT(s, start, i, length)
 Adjust a random-access offset to a code point boundary after a code point.


Detailed Description

C API: 16-bit Unicode handling macros.

This file defines macros to deal with 16-bit Unicode (UTF-16) code units and strings. utf16.h is included by utf.h after unicode/umachine.h and some common definitions.

For more information see utf.h and the ICU User Guide Strings chapter (http://oss.software.ibm.com/icu/userguide/).

Usage: ICU coding guidelines for if() statements should be followed when using these macros. Compound statements (curly braces {}) must be used for if-else-while... bodies and all macro statements should be terminated with semicolon.


Define Documentation

#define U16_APPEND s,
i,
capacity,
c,
isError   ) 
 

Value:

{ \
    if((uint32_t)(c)<=0xffff) { \
        (s)[(i)++]=(uint16_t)(c); \
    } else if((uint32_t)(c)<=0x10ffff && (i)+1<(capacity)) { \
        (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
        (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
    } else /* c>0x10ffff or not enough space */ { \
        (isError)=TRUE; \
    } \
}
Append a code point to a string, overwriting 1 or 2 code units.

The offset points to the current end of the string contents and is advanced (post-increment). "Safe" macro, checks for a valid code point. If a surrogate pair is written, checks for sufficient space in the string. If the code point is not valid or a trail surrogate does not fit, then isError is set to TRUE.

Parameters:
s const UChar * string buffer
i string offset, i<length
capacity size of the string buffer
c code point to append
isError output UBool set to TRUE if an error occurs, otherwise not modified
See also:
U16_APPEND_UNSAFE ICU 2.4

#define U16_APPEND_UNSAFE s,
i,
 ) 
 

Value:

{ \
    if((uint32_t)(c)<=0xffff) { \
        (s)[(i)++]=(uint16_t)(c); \
    } else { \
        (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
        (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
    } \
}
Append a code point to a string, overwriting 1 or 2 code units.

The offset points to the current end of the string contents and is advanced (post-increment). "Unsafe" macro, assumes a valid code point and sufficient space in the string. Otherwise, the result is undefined.

Parameters:
s const UChar * string buffer
i string offset
c code point to append
See also:
U16_APPEND ICU 2.4

#define U16_BACK_1 s,
start,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[--(i)]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \
        --(i); \
    } \
}
Move the string offset from one code point boundary to the previous one.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, start<=i
See also:
U16_BACK_1_UNSAFE ICU 2.4

#define U16_BACK_1_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[--(i)])) { \
        --(i); \
    } \
}
Move the string offset from one code point boundary to the previous one.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_BACK_1 ICU 2.4

#define U16_BACK_N s,
start,
i,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0 && (i)>(start)) { \
        U16_BACK_1(s, start, i); \
        --__N; \
    } \
}
Move the string offset from one code point boundary to the n-th one before it, i.e.

, move backward by n code points. (Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start start of string
i string offset, i<length
n number of code points to skip
See also:
U16_BACK_N_UNSAFE ICU 2.4

#define U16_BACK_N_UNSAFE s,
i,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0) { \
        U16_BACK_1_UNSAFE(s, i); \
        --__N; \
    } \
}
Move the string offset from one code point boundary to the n-th one before it, i.e.

, move backward by n code points. (Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
n number of code points to skip
See also:
U16_BACK_N ICU 2.4

#define U16_FWD_1 s,
i,
length   ) 
 

Value:

{ \
    if(U16_IS_LEAD((s)[(i)++]) && (i)<(length) && U16_IS_TRAIL((s)[i])) { \
        ++(i); \
    } \
}
Advance the string offset from one code point boundary to the next.

(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
i string offset, i<length
length string length
See also:
U16_FWD_1_UNSAFE ICU 2.4

#define U16_FWD_1_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_LEAD((s)[(i)++])) { \
        ++(i); \
    } \
}
Advance the string offset from one code point boundary to the next.

(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_FWD_1 ICU 2.4

#define U16_FWD_N s,
i,
length,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0 && (i)<(length)) { \
        U16_FWD_1(s, i, length); \
        --__N; \
    } \
}
Advance the string offset from one code point boundary to the n-th next one, i.e.

, move forward by n code points. (Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
i string offset, i<length
length string length
n number of code points to skip
See also:
U16_FWD_N_UNSAFE ICU 2.4

#define U16_FWD_N_UNSAFE s,
i,
 ) 
 

Value:

{ \
    int32_t __N=(n); \
    while(__N>0) { \
        U16_FWD_1_UNSAFE(s, i); \
        --__N; \
    } \
}
Advance the string offset from one code point boundary to the n-th next one, i.e.

, move forward by n code points. (Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
n number of code points to skip
See also:
U16_FWD_N ICU 2.4

#define U16_GET s,
start,
i,
length,
 ) 
 

Value:

{ \
    (c)=(s)[i]; \
    if(U16_IS_SURROGATE(c)) { \
        uint16_t __c2; \
        if(U16_IS_SURROGATE_LEAD(c)) { \
            if((i)+1<(length) && U16_IS_TRAIL(__c2=(s)[(i)+1])) { \
                (c)=U16_GET_SUPPLEMENTARY((c), __c2); \
            } \
        } else { \
            if((i)-1>=(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
                (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
            } \
        } \
    } \
}
Get a code point from a string at a random-access offset, without changing the offset.

"Safe" macro, handles unpaired surrogates and checks for string boundaries.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. If the offset points to a single, unpaired surrogate, then that itself will be returned as the code point. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, start<=i<length
length string length
c output UChar32 variable
See also:
U16_GET_UNSAFE ICU 2.4

#define U16_GET_SUPPLEMENTARY lead,
trail   )     (((lead)<<10UL)+(trail)-U16_SURROGATE_OFFSET)
 

Get a supplementary code point value (U+10000.

.U+10ffff) from its lead and trail surrogates. The result is undefined if the input values are not lead and trail surrogates.

Parameters:
lead lead surrogate (U+d800..U+dbff)
trail trail surrogate (U+dc00..U+dfff)
Returns:
supplementary code point (U+10000..U+10ffff) ICU 2.4

#define U16_GET_UNSAFE s,
i,
 ) 
 

Value:

{ \
    (c)=(s)[i]; \
    if(U16_IS_SURROGATE(c)) { \
        if(U16_IS_SURROGATE_LEAD(c)) { \
            (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)+1]); \
        } else { \
            (c)=U16_GET_SUPPLEMENTARY((s)[(i)-1], (c)); \
        } \
    } \
}
Get a code point from a string at a random-access offset, without changing the offset.

"Unsafe" macro, assumes well-formed UTF-16.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. The result is undefined if the offset points to a single, unpaired surrogate. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.

Parameters:
s const UChar * string
i string offset
c output UChar32 variable
See also:
U16_GET ICU 2.4

#define U16_IS_LEAD  )     (((c)&0xfffffc00)==0xd800)
 

Is this code unit a lead surrogate (U+d800.

.U+dbff)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE ICU 2.4

#define U16_IS_SINGLE  )     !U_IS_SURROGATE(c)
 

Does this code unit alone encode a code point (BMP, not a surrogate)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE ICU 2.4

#define U16_IS_SURROGATE  )     U_IS_SURROGATE(c)
 

Is this code unit a surrogate (U+d800.

.U+dfff)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE ICU 2.4

#define U16_IS_SURROGATE_LEAD  )     (((c)&0x400)==0)
 

Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE ICU 2.4

#define U16_IS_TRAIL  )     (((c)&0xfffffc00)==0xdc00)
 

Is this code unit a trail surrogate (U+dc00.

.U+dfff)?

Parameters:
c 16-bit code unit
Returns:
TRUE or FALSE ICU 2.4

#define U16_LEAD supplementary   )     (UChar)(((supplementary)>>10)+0xd7c0)
 

Get the lead surrogate (0xd800.

.0xdbff) for a supplementary code point (0x10000..0x10ffff).

Parameters:
supplementary 32-bit code point (U+10000..U+10ffff)
Returns:
lead surrogate (U+d800..U+dbff) for supplementary ICU 2.4

#define U16_LENGTH  )     ((uint32_t)(c)<=0xffff ? 1 : 2)
 

How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000.

.U+10ffff).

Parameters:
c 32-bit code point
Returns:
1 or 2 ICU 2.4

#define U16_MAX_LENGTH   2
 

The maximum number of 16-bit code units per Unicode code point (U+0000.

.U+10ffff).

Returns:
2 ICU 2.4

#define U16_NEXT s,
i,
length,
 ) 
 

Value:

{ \
    (c)=(s)[(i)++]; \
    if(U16_IS_LEAD(c)) { \
        uint16_t __c2; \
        if((i)<(length) && U16_IS_TRAIL(__c2=(s)[(i)])) { \
            ++(i); \
            (c)=U16_GET_SUPPLEMENTARY((c), __c2); \
        } \
    } \
}
Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate or to a single, unpaired lead surrogate, then that itself will be returned as the code point.

Parameters:
s const UChar * string
i string offset, i<length
length string length
c output UChar32 variable
See also:
U16_NEXT_UNSAFE ICU 2.4

#define U16_NEXT_UNSAFE s,
i,
 ) 
 

Value:

{ \
    (c)=(s)[(i)++]; \
    if(U16_IS_LEAD(c)) { \
        (c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)++]); \
    } \
}
Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Unsafe" macro, assumes well-formed UTF-16.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate, then that itself will be returned as the code point. The result is undefined if the offset points to a single, unpaired lead surrogate.

Parameters:
s const UChar * string
i string offset
c output UChar32 variable
See also:
U16_NEXT ICU 2.4

#define U16_PREV s,
start,
i,
 ) 
 

Value:

{ \
    (c)=(s)[--(i)]; \
    if(U16_IS_TRAIL(c)) { \
        uint16_t __c2; \
        if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
            --(i); \
            (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
        } \
    } \
}
Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate or behind a single, unpaired trail surrogate, then that itself will be returned as the code point.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, start<=i
c output UChar32 variable
See also:
U16_PREV_UNSAFE ICU 2.4

#define U16_PREV_UNSAFE s,
i,
 ) 
 

Value:

{ \
    (c)=(s)[--(i)]; \
    if(U16_IS_TRAIL(c)) { \
        (c)=U16_GET_SUPPLEMENTARY((s)[--(i)], (c)); \
    } \
}
Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Unsafe" macro, assumes well-formed UTF-16.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate, then that itself will be returned as the code point. The result is undefined if the offset is behind a single, unpaired trail surrogate.

Parameters:
s const UChar * string
i string offset
c output UChar32 variable
See also:
U16_PREV ICU 2.4

#define U16_SET_CP_LIMIT s,
start,
i,
length   ) 
 

Value:

{ \
    if((start)<(i) && (i)<(length) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \
        ++(i); \
    } \
}
Adjust a random-access offset to a code point boundary after a code point.

If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, start<=i<=length
length string length
See also:
U16_SET_CP_LIMIT_UNSAFE ICU 2.4

#define U16_SET_CP_LIMIT_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_LEAD((s)[(i)-1])) { \
        ++(i); \
    } \
}
Adjust a random-access offset to a code point boundary after a code point.

If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_SET_CP_LIMIT ICU 2.4

#define U16_SET_CP_START s,
start,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[i]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \
        --(i); \
    } \
}
Adjust a random-access offset to a code point boundary at the start of a code point.

If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters:
s const UChar * string
start starting string offset (usually 0)
i string offset, start<=i
See also:
U16_SET_CP_START_UNSAFE ICU 2.4

#define U16_SET_CP_START_UNSAFE s,
 ) 
 

Value:

{ \
    if(U16_IS_TRAIL((s)[i])) { \
        --(i); \
    } \
}
Adjust a random-access offset to a code point boundary at the start of a code point.

If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Unsafe" macro, assumes well-formed UTF-16.

Parameters:
s const UChar * string
i string offset
See also:
U16_SET_CP_START ICU 2.4

#define U16_SURROGATE_OFFSET   ((0xd800<<10UL)+0xdc00-0x10000)
 

Helper constant for U16_GET_SUPPLEMENTARY.

For internal use only.

#define U16_TRAIL supplementary   )     (UChar)(((supplementary)&0x3ff)|0xdc00)
 

Get the trail surrogate (0xdc00.

.0xdfff) for a supplementary code point (0x10000..0x10ffff).

Parameters:
supplementary 32-bit code point (U+10000..U+10ffff)
Returns:
trail surrogate (U+dc00..U+dfff) for supplementary ICU 2.4


Generated on Wed May 18 17:29:16 2005 for ICU 2.8 by  doxygen 1.4.2