org.apache.solr.analysis
Class HyphenatedWordsFilter
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.solr.analysis.HyphenatedWordsFilter
public final class HyphenatedWordsFilter
- extends org.apache.lucene.analysis.TokenFilter
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines. This is often the case with documents where narrow text columns are used, such as newsletters.
In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together.
This filter should be used on indexing time only.
Example field definition in schema.xml:
- Author:
- Boris Vitez
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
input |
Method Summary |
org.apache.lucene.analysis.Token |
next()
|
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
close |
Methods inherited from class org.apache.lucene.analysis.TokenStream |
reset |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HyphenatedWordsFilter
public HyphenatedWordsFilter(org.apache.lucene.analysis.TokenStream in)
next
public final org.apache.lucene.analysis.Token next()
throws java.io.IOException
- Specified by:
next
in class org.apache.lucene.analysis.TokenStream
- Throws:
java.io.IOException
- See Also:
TokenStream.next()
Copyright © 2006 - 2008 The Apache Software Foundation