org.apache.solr.analysis
Class HyphenatedWordsFilter

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStream
      extended by org.apache.lucene.analysis.TokenFilter
          extended by org.apache.solr.analysis.HyphenatedWordsFilter

public final class HyphenatedWordsFilter
extends org.apache.lucene.analysis.TokenFilter

When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines. This is often the case with documents where narrow text columns are used, such as newsletters. In order to increase search efficiency, this filter puts hyphenated words broken into two lines back together. This filter should be used on indexing time only. Example field definition in schema.xml:

 
        
                
      
      
      
      
      
      
  
  
      
      
      
      
      
      
  
 

Author:
Boris Vitez

Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
HyphenatedWordsFilter(org.apache.lucene.analysis.TokenStream in)
           
 
Method Summary
 org.apache.lucene.analysis.Token next()
           
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HyphenatedWordsFilter

public HyphenatedWordsFilter(org.apache.lucene.analysis.TokenStream in)
Method Detail

next

public final org.apache.lucene.analysis.Token next()
                                            throws java.io.IOException
Specified by:
next in class org.apache.lucene.analysis.TokenStream
Throws:
java.io.IOException
See Also:
TokenStream.next()


Copyright © 2006 - 2008 The Apache Software Foundation