org.apache.solr.analysis
Interface TokenizerFactory

All Known Implementing Classes:
BaseTokenizerFactory, EdgeNGramTokenizerFactory, HTMLStripStandardTokenizerFactory, HTMLStripWhitespaceTokenizerFactory, KeywordTokenizerFactory, LetterTokenizerFactory, LowerCaseTokenizerFactory, NGramTokenizerFactory, PatternTokenizerFactory, StandardTokenizerFactory, WhitespaceTokenizerFactory

public interface TokenizerFactory

A TokenizerFactory breaks up a stream of characters into tokens.

TokenizerFactories are registered for FieldTypes with the IndexSchema through the schema.xml file.

Example schema.xml entry to register a TokenizerFactory implementation to tokenize fields of type "cool"

  <fieldtype name="cool" class="solr.TextField">
      <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      ...
 

A single instance of any registered TokenizerFactory is created via the default constructor and is reused for each FieldType.

Version:
$Id: TokenizerFactory.java 472574 2006-11-08 18:25:52Z yonik $
Author:
yonik

Method Summary
 org.apache.lucene.analysis.TokenStream create(java.io.Reader input)
          Creates a TokenStream of the specified input
 java.util.Map<java.lang.String,java.lang.String> getArgs()
          Accessor method for reporting the args used to initialize this factory.
 void init(java.util.Map<java.lang.String,java.lang.String> args)
          init will be called just once, immediately after creation.
 

Method Detail

init

void init(java.util.Map<java.lang.String,java.lang.String> args)
init will be called just once, immediately after creation.

The args are user-level initialization parameters that may be specified when declaring a the factory in the schema.xml


getArgs

java.util.Map<java.lang.String,java.lang.String> getArgs()
Accessor method for reporting the args used to initialize this factory.

Implementations are strongly encouraged to return the contents of the Map passed to to the init method


create

org.apache.lucene.analysis.TokenStream create(java.io.Reader input)
Creates a TokenStream of the specified input



Copyright © 2006 - 2008 The Apache Software Foundation