Package org.apache.solr.analysis

Interface Summary
TokenFilterFactory A TokenFilterFactory creates a TokenFilter to transform one TokenStream into another.
TokenizerFactory A TokenizerFactory breaks up a stream of characters into tokens.
 

Class Summary
BaseTokenFilterFactory Simple abstract implementation that handles init arg processing.
BaseTokenizerFactory Simple abstract implementation that handles init arg processing.
BufferedTokenStream Handles input and output buffering of TokenStream
EdgeNGramTokenizerFactory Creates new instances of EdgeNGramTokenizer.
EnglishPorterFilterFactory  
HTMLStripReader A Reader that wraps another reader and attempts to strip out HTML constructs.
HTMLStripStandardTokenizerFactory  
HTMLStripWhitespaceTokenizerFactory  
HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
HyphenatedWordsFilterFactory Factory for HyphenatedWordsFilter
ISOLatin1AccentFilterFactory Factory for ISOLatin1AccentFilter $Id: ISOLatin1AccentFilterFactory.java 540849 2007-05-23 05:57:03Z otis $
KeywordTokenizerFactory  
LengthFilter  
LengthFilterFactory  
LetterTokenizerFactory  
LowerCaseFilterFactory  
LowerCaseTokenizerFactory  
NGramTokenizerFactory Creates new instances of NGramTokenizer.
PatternReplaceFilter A TokenFilter which applies a Pattern to each token in the stream, replacing match occurances with the specified replacement string.
PatternReplaceFilterFactory  
PatternTokenizerFactory This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
PhoneticFilter Create tokens for phonetic matches.
PhoneticFilterFactory Create tokens based on phonetic encoders http://jakarta.apache.org/commons/codec/api-release/org/apache/commons/codec/language/package-summary.html This takes two arguments: "encoder" required, one of "DoubleMetaphone", "Metaphone", "Soundex", "RefinedSoundex" "inject" (default=true) add tokens to the stream with the offset=0
PorterStemFilterFactory  
RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
RemoveDuplicatesTokenFilterFactory  
SnowballPorterFilterFactory Factory for SnowballFilters, with configurable language Browsing the code, SnowballFilter uses reflection to adapt to Lucene...
SolrAnalyzer  
StandardFilterFactory  
StandardTokenizerFactory  
StopFilterFactory  
SynonymFilter SynonymFilter handles multi-token synonyms with variable position increment offsets.
SynonymFilterFactory  
SynonymMap Mapping rules for use with SynonymFilter
TokenizerChain  
TrimFilter Trims leading and trailing whitespace from Tokens in the stream.
TrimFilterFactory  
WhitespaceTokenizerFactory  
WordDelimiterFilterFactory  
 



Copyright © 2006 - 2008 The Apache Software Foundation