Main Page | Class Hierarchy | Alphabetical List | Data Structures | File List | Data Fields | Globals | Related Pages

uset.h File Reference

C API: Unicode Set. More...

#include "unicode/utypes.h"

Go to the source code of this file.

Data Structures

struct  USerializedSet
 A serialized form of a Unicode set. More...


Typedefs

typedef USet USet
 A UnicodeSet.

typedef USerializedSet USerializedSet
 A serialized form of a Unicode set.


Enumerations

enum  { USET_IGNORE_SPACE = 1, USET_CASE_INSENSITIVE = 2, USET_CASE = 2, USET_SERIALIZED_STATIC_ARRAY_CAPACITY = 8 }
 Bitmask values to be passed to uset_openPatternOptions() or uset_applyPattern() taking an option parameter. More...


Functions

U_CAPI USet *U_EXPORT2 uset_open (UChar32 start, UChar32 end)
 Creates a USet object that contains the range of characters start..end, inclusive.

U_CAPI USet *U_EXPORT2 uset_openPattern (const UChar *pattern, int32_t patternLength, UErrorCode *ec)
 Creates a set from the given pattern.

U_CAPI USet *U_EXPORT2 uset_openPatternOptions (const UChar *pattern, int32_t patternLength, uint32_t options, UErrorCode *ec)
 Creates a set from the given pattern.

U_CAPI void U_EXPORT2 uset_close (USet *set)
 Disposes of the storage used by a USet object.

U_CAPI int32_t U_EXPORT2 uset_applyPattern (USet *set, const UChar *pattern, int32_t patternLength, uint32_t options, UErrorCode *status)
 Modifies the set to represent the set specified by the given pattern.

U_CAPI int32_t U_EXPORT2 uset_toPattern (const USet *set, UChar *result, int32_t resultCapacity, UBool escapeUnprintable, UErrorCode *ec)
 Returns a string representation of this set.

U_CAPI void U_EXPORT2 uset_add (USet *set, UChar32 c)
 Adds the given character to the given USet.

U_CAPI void U_EXPORT2 uset_addAll (USet *set, const USet *additionalSet)
 Adds all of the elements in the specified set to this set if they're not already present.

U_CAPI void U_EXPORT2 uset_addRange (USet *set, UChar32 start, UChar32 end)
 Adds the given range of characters to the given USet.

U_CAPI void U_EXPORT2 uset_addString (USet *set, const UChar *str, int32_t strLen)
 Adds the given string to the given USet.

U_CAPI void U_EXPORT2 uset_remove (USet *set, UChar32 c)
 Removes the given character from the given USet.

U_CAPI void U_EXPORT2 uset_removeRange (USet *set, UChar32 start, UChar32 end)
 Removes the given range of characters from the given USet.

U_CAPI void U_EXPORT2 uset_removeString (USet *set, const UChar *str, int32_t strLen)
 Removes the given string to the given USet.

U_CAPI void U_EXPORT2 uset_complement (USet *set)
 Inverts this set.

U_CAPI void U_EXPORT2 uset_clear (USet *set)
 Removes all of the elements from this set.

U_CAPI UBool U_EXPORT2 uset_isEmpty (const USet *set)
 Returns TRUE if the given USet contains no characters and no strings.

U_CAPI UBool U_EXPORT2 uset_contains (const USet *set, UChar32 c)
 Returns TRUE if the given USet contains the given character.

U_CAPI UBool U_EXPORT2 uset_containsRange (const USet *set, UChar32 start, UChar32 end)
 Returns TRUE if the given USet contains all characters c where start <= c && c <= end.

U_CAPI UBool U_EXPORT2 uset_containsString (const USet *set, const UChar *str, int32_t strLen)
 Returns TRUE if the given USet contains the given string.

U_CAPI int32_t U_EXPORT2 uset_size (const USet *set)
 Returns the number of characters and strings contained in the given USet.

U_CAPI int32_t U_EXPORT2 uset_getItemCount (const USet *set)
 Returns the number of items in this set.

U_CAPI int32_t U_EXPORT2 uset_getItem (const USet *set, int32_t itemIndex, UChar32 *start, UChar32 *end, UChar *str, int32_t strCapacity, UErrorCode *ec)
 Returns an item of this set.

U_CAPI int32_t U_EXPORT2 uset_serialize (const USet *set, uint16_t *dest, int32_t destCapacity, UErrorCode *pErrorCode)
 Serializes this set into an array of 16-bit integers.

U_CAPI UBool U_EXPORT2 uset_getSerializedSet (USerializedSet *fillSet, const uint16_t *src, int32_t srcLength)
 Given a serialized array, fill in the given serialized set object.

U_CAPI void U_EXPORT2 uset_setSerializedToOne (USerializedSet *fillSet, UChar32 c)
 Set the USerializedSet to contain the given character (and nothing else).

U_CAPI UBool U_EXPORT2 uset_serializedContains (const USerializedSet *set, UChar32 c)
 Returns TRUE if the given USerializedSet contains the given character.

U_CAPI int32_t U_EXPORT2 uset_getSerializedRangeCount (const USerializedSet *set)
 Returns the number of disjoint ranges of characters contained in the given serialized set.

U_CAPI UBool U_EXPORT2 uset_getSerializedRange (const USerializedSet *set, int32_t rangeIndex, UChar32 *pStart, UChar32 *pEnd)
 Returns a range of characters contained in the given serialized set.


Detailed Description

C API: Unicode Set.

This is a C wrapper around the C++ UnicodeSet class.


Typedef Documentation

typedef struct USerializedSet USerializedSet
 

A serialized form of a Unicode set.

Limited manipulations are possible directly on a serialized set. See below. ICU 2.4

typedef struct USet USet
 

A UnicodeSet.

Use the uset_* API to manipulate. Create with uset_open*, and destroy with uset_close. ICU 2.4


Enumeration Type Documentation

anonymous enum
 

Bitmask values to be passed to uset_openPatternOptions() or uset_applyPattern() taking an option parameter.

ICU 2.4

Enumeration values:
USET_IGNORE_SPACE  Ignore white space within patterns unless quoted or escaped.

ICU 2.4

USET_CASE_INSENSITIVE  Enable case insensitive matching.

E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. ICU 2.4

USET_CASE  Bitmask for UnicodeSet::closeOver() indicating letter case.

This may be ORed together with other selectors.

For internal use only.

USET_SERIALIZED_STATIC_ARRAY_CAPACITY  Enough for any single-code point set.

For internal use only.


Function Documentation

U_CAPI void U_EXPORT2 uset_add USet set,
UChar32  c
 

Adds the given character to the given USet.

After this call, uset_contains(set, c) will return TRUE.

Parameters:
set the object to which to add the character
c the character to add ICU 2.4

U_CAPI void U_EXPORT2 uset_addAll USet set,
const USet additionalSet
 

Adds all of the elements in the specified set to this set if they're not already present.

This operation effectively modifies this set so that its value is the union of the two sets. The behavior of this operation is unspecified if the specified collection is modified while the operation is in progress.

Parameters:
set the object to which to add the set
additionalSet the source set whose elements are to be added to this set. ICU 2.6

U_CAPI void U_EXPORT2 uset_addRange USet set,
UChar32  start,
UChar32  end
 

Adds the given range of characters to the given USet.

After this call, uset_contains(set, start, end) will return TRUE.

Parameters:
set the object to which to add the character
start the first character of the range to add, inclusive
end the last character of the range to add, inclusive ICU 2.2

U_CAPI void U_EXPORT2 uset_addString USet set,
const UChar *  str,
int32_t  strLen
 

Adds the given string to the given USet.

After this call, uset_containsString(set, str, strLen) will return TRUE.

Parameters:
set the object to which to add the character
str the string to add
strLen the length of the string or -1 if null terminated. ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_applyPattern USet set,
const UChar *  pattern,
int32_t  patternLength,
uint32_t  options,
UErrorCode status
 

Modifies the set to represent the set specified by the given pattern.

See the UnicodeSet class description for the syntax of the pattern language. See also the User Guide chapter about UnicodeSet. Empties the set passed before applying the pattern.

Parameters:
set The set to which the pattern is to be applied.
pattern A pointer to UChar string specifying what characters are in the set. The character at pattern[0] must be a '['.
patternLength The length of the UChar string. -1 if NUL terminated.
options A bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
status Returns an error if the pattern cannot be parsed.
Returns:
Upon successful parse, the value is either the index of the character after the closing ']' of the parsed pattern. If the status code indicates failure, then the return value is the index of the error in the source.
ICU 2.8

U_CAPI void U_EXPORT2 uset_clear USet set  ) 
 

Removes all of the elements from this set.

This set will be empty after this call returns.

Parameters:
set the set ICU 2.4

U_CAPI void U_EXPORT2 uset_close USet set  ) 
 

Disposes of the storage used by a USet object.

This function should be called exactly once for objects returned by uset_open().

Parameters:
set the object to dispose of ICU 2.4

U_CAPI void U_EXPORT2 uset_complement USet set  ) 
 

Inverts this set.

This operation modifies this set so that its value is its complement. This operation does not affect the multicharacter strings, if any.

Parameters:
set the set ICU 2.4

U_CAPI UBool U_EXPORT2 uset_contains const USet set,
UChar32  c
 

Returns TRUE if the given USet contains the given character.

Parameters:
set the set
c The codepoint to check for within the set
Returns:
true if set contains c ICU 2.4

U_CAPI UBool U_EXPORT2 uset_containsRange const USet set,
UChar32  start,
UChar32  end
 

Returns TRUE if the given USet contains all characters c where start <= c && c <= end.

Parameters:
set the set
start the first character of the range to test, inclusive
end the last character of the range to test, inclusive
Returns:
TRUE if set contains the range ICU 2.2

U_CAPI UBool U_EXPORT2 uset_containsString const USet set,
const UChar *  str,
int32_t  strLen
 

Returns TRUE if the given USet contains the given string.

Parameters:
set the set
str the string
strLen the length of the string or -1 if null terminated.
Returns:
true if set contains str ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_getItem const USet set,
int32_t  itemIndex,
UChar32 start,
UChar32 end,
UChar *  str,
int32_t  strCapacity,
UErrorCode ec
 

Returns an item of this set.

An item is either a range of characters or a single multicharacter string.

Parameters:
set the set
itemIndex a non-negative integer in the range 0.. uset_getItemCount(set)-1
start pointer to variable to receive first character in range, inclusive
end pointer to variable to receive last character in range, inclusive
str buffer to receive the string, may be NULL
strCapacity capacity of str, or 0 if str is NULL
ec error code
Returns:
the length of the string (>= 2), or 0 if the item is a range, in which case it is the range *start..*end, or -1 if itemIndex is out of range ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_getItemCount const USet set  ) 
 

Returns the number of items in this set.

An item is either a range of characters or a single multicharacter string.

Parameters:
set the set
Returns:
a non-negative integer counting the character ranges and/or strings contained in set ICU 2.4

U_CAPI UBool U_EXPORT2 uset_getSerializedRange const USerializedSet set,
int32_t  rangeIndex,
UChar32 pStart,
UChar32 pEnd
 

Returns a range of characters contained in the given serialized set.

Parameters:
set the serialized set
rangeIndex a non-negative integer in the range 0.. uset_getSerializedRangeCount(set)-1
pStart pointer to variable to receive first character in range, inclusive
pEnd pointer to variable to receive last character in range, inclusive
Returns:
true if rangeIndex is valid, otherwise false ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_getSerializedRangeCount const USerializedSet set  ) 
 

Returns the number of disjoint ranges of characters contained in the given serialized set.

Ignores any strings contained in the set.

Parameters:
set the serialized set
Returns:
a non-negative integer counting the character ranges contained in set ICU 2.4

U_CAPI UBool U_EXPORT2 uset_getSerializedSet USerializedSet fillSet,
const uint16_t *  src,
int32_t  srcLength
 

Given a serialized array, fill in the given serialized set object.

Parameters:
fillSet pointer to result
src pointer to start of array
srcLength length of array
Returns:
true if the given array is valid, otherwise false ICU 2.4

U_CAPI UBool U_EXPORT2 uset_isEmpty const USet set  ) 
 

Returns TRUE if the given USet contains no characters and no strings.

Parameters:
set the set
Returns:
true if set is empty ICU 2.4

U_CAPI USet* U_EXPORT2 uset_open UChar32  start,
UChar32  end
 

Creates a USet object that contains the range of characters start..end, inclusive.

Parameters:
start first character of the range, inclusive
end last character of the range, inclusive
Returns:
a newly created USet. The caller must call uset_close() on it when done. ICU 2.4

U_CAPI USet* U_EXPORT2 uset_openPattern const UChar *  pattern,
int32_t  patternLength,
UErrorCode ec
 

Creates a set from the given pattern.

See the UnicodeSet class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
patternLength the length of the pattern, or -1 if null terminated
ec the error code ICU 2.4

U_CAPI USet* U_EXPORT2 uset_openPatternOptions const UChar *  pattern,
int32_t  patternLength,
uint32_t  options,
UErrorCode ec
 

Creates a set from the given pattern.

See the UnicodeSet class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
patternLength the length of the pattern, or -1 if null terminated
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
ec the error code ICU 2.4

U_CAPI void U_EXPORT2 uset_remove USet set,
UChar32  c
 

Removes the given character from the given USet.

After this call, uset_contains(set, c) will return FALSE.

Parameters:
set the object from which to remove the character
c the character to remove ICU 2.4

U_CAPI void U_EXPORT2 uset_removeRange USet set,
UChar32  start,
UChar32  end
 

Removes the given range of characters from the given USet.

After this call, uset_contains(set, start, end) will return FALSE.

Parameters:
set the object to which to add the character
start the first character of the range to remove, inclusive
end the last character of the range to remove, inclusive ICU 2.2

U_CAPI void U_EXPORT2 uset_removeString USet set,
const UChar *  str,
int32_t  strLen
 

Removes the given string to the given USet.

After this call, uset_containsString(set, str, strLen) will return FALSE.

Parameters:
set the object to which to add the character
str the string to remove
strLen the length of the string or -1 if null terminated. ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_serialize const USet set,
uint16_t *  dest,
int32_t  destCapacity,
UErrorCode pErrorCode
 

Serializes this set into an array of 16-bit integers.

Serialization (currently) only records the characters in the set; multicharacter strings are ignored.

The array has following format (each line is one 16-bit integer):

length = (n+2*m) | (m!=0?0x8000:0) bmpLength = n; present if m!=0 bmp[0] bmp[1] ... bmp[n-1] supp-high[0] supp-low[0] supp-high[1] supp-low[1] ... supp-high[m-1] supp-low[m-1]

The array starts with a header. After the header are n bmp code points, then m supplementary code points. Either n or m or both may be zero. n+2*m is always <= 0x7FFF.

If there are no supplementary characters (if m==0) then the header is one 16-bit integer, 'length', with value n.

If there are supplementary characters (if m!=0) then the header is two 16-bit integers. The first, 'length', has value (n+2*m)|0x8000. The second, 'bmpLength', has value n.

After the header the code points are stored in ascending order. Supplementary code points are stored as most significant 16 bits followed by least significant 16 bits.

Parameters:
set the set
dest pointer to buffer of destCapacity 16-bit integers. May be NULL only if destCapacity is zero.
destCapacity size of dest, or zero. Must not be negative.
pErrorCode pointer to the error code. Will be set to U_INDEX_OUTOFBOUNDS_ERROR if n+2*m > 0x7FFF. Will be set to U_BUFFER_OVERFLOW_ERROR if n+2*m+(m!=0?2:1) > destCapacity.
Returns:
the total length of the serialized format, including the header, that is, n+2*m+(m!=0?2:1), or 0 on error other than U_BUFFER_OVERFLOW_ERROR. ICU 2.4

U_CAPI UBool U_EXPORT2 uset_serializedContains const USerializedSet set,
UChar32  c
 

Returns TRUE if the given USerializedSet contains the given character.

Parameters:
set the serialized set
c The codepoint to check for within the set
Returns:
true if set contains c ICU 2.4

U_CAPI void U_EXPORT2 uset_setSerializedToOne USerializedSet fillSet,
UChar32  c
 

Set the USerializedSet to contain the given character (and nothing else).

Parameters:
fillSet pointer to result
c The codepoint to set ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_size const USet set  ) 
 

Returns the number of characters and strings contained in the given USet.

Parameters:
set the set
Returns:
a non-negative integer counting the characters and strings contained in set ICU 2.4

U_CAPI int32_t U_EXPORT2 uset_toPattern const USet set,
UChar *  result,
int32_t  resultCapacity,
UBool  escapeUnprintable,
UErrorCode ec
 

Returns a string representation of this set.

If the result of calling this function is passed to a uset_openPattern(), it will produce another set that is equal to this one.

Parameters:
set the set
result the string to receive the rules, may be NULL
resultCapacity the capacity of result, may be 0 if result is NULL
escapeUnprintable if TRUE then convert unprintable character to their hex escape representations, or . Unprintable characters are those other than U+000A, U+0020..U+007E.
ec error code.
Returns:
length of string, possibly larger than resultCapacity ICU 2.4


Generated on Wed Jul 28 09:15:55 2004 for ICU 2.8 by doxygen 1.3.7