net.sf.saxon.codenorm

Class Normalizer


public class Normalizer
extends java.lang.Object

Implements Unicode Normalization Forms C, D, KC, KD. Copyright (c) 1991-2005 Unicode, Inc. For terms of use, see http://www.unicode.org/terms_of_use.html For documentation, see UAX#15.
The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information here.
Author:
Mark Davis Updates for supplementary code points: Vladimir Weinstein & Markus Scherer Modified to remove dependency on ICU code: Michael Kay

Field Summary

static byte
C
Normalization Form Selector
static byte
D
Normalization Form Selector
static byte
KC
Normalization Form Selector
static byte
KD
Normalization Form Selector
static byte
NO_ACTION
Normalization Form Selector

Constructor Summary

Normalizer(CharSequence formCS)
Create a normalizer for a given form, expressed as a character string
Normalizer(byte form)
Create a normalizer for a given form.

Method Summary

CharSequence
normalize(CharSequence source)
Normalizes text according to the chosen form

Field Details

C

public static final byte C
Normalization Form Selector
Field Value:
2

D

public static final byte D
Normalization Form Selector
Field Value:
0

KC

public static final byte KC
Normalization Form Selector
Field Value:
3

KD

public static final byte KD
Normalization Form Selector
Field Value:
1

NO_ACTION

public static final byte NO_ACTION
Normalization Form Selector
Field Value:
8

Constructor Details

Normalizer

public Normalizer(CharSequence formCS)
            throws XPathException
Create a normalizer for a given form, expressed as a character string
Parameters:
formCS - the normalization form required: for example "NFC" or "NFD"

Normalizer

public Normalizer(byte form)
Create a normalizer for a given form.

Method Details

normalize

public CharSequence normalize(CharSequence source)
Normalizes text according to the chosen form
Parameters:
source - the original text, unnormalized
Returns:
target the resulting normalized text