weka.classifiers.meta.ensembleSelection
Class EnsembleSelectionLibrary

java.lang.Object
  extended by weka.classifiers.EnsembleLibrary
      extended by weka.classifiers.meta.ensembleSelection.EnsembleSelectionLibrary
All Implemented Interfaces:
java.io.Serializable

public class EnsembleSelectionLibrary
extends EnsembleLibrary
implements java.io.Serializable

This class represents an ensemble library. That is a collection of models that will be combined via the ensemble selection algorithm. This class is responsible for tracking all of the unique model specifications in the current library and trainined them when asked. There are also methods to save/load library model list files.

Version:
$Revision: 1.1 $
Author:
Robert Jung, David Michael
See Also:
Serialized Form

Field Summary
 boolean m_Debug
          Whether we should print debug messages.
 
Fields inherited from class weka.classifiers.EnsembleLibrary
FLAT_FILE_EXTENSION, m_Models, XML_FILE_EXTENSION
 
Constructor Summary
EnsembleSelectionLibrary()
          Creates a default libary.
EnsembleSelectionLibrary(java.io.InputStream stream)
          This constructor will create a library from the given XML stream.
EnsembleSelectionLibrary(java.lang.String libraryFileName)
          This constructor will create a library from a model list file given by the file name argument
EnsembleSelectionLibrary(java.lang.String dir, int seed, int folds, double validationRatio)
          Creates a default libary.
 
Method Summary
 void addWorkingDirectoryListener(java.beans.PropertyChangeListener listener)
          Adds an object to the list of those that wish to be informed when the eotking directory changes.
 EnsembleLibraryModel createModel(Classifier classifier)
          creates a LibraryModel from a set of arguments
 EnsembleLibraryModel createModel(java.lang.String modelString)
          This method takes a String argument defining a classifier and uses it to create a base Classifier.
 void createWorkingDirectory(java.lang.String dirName)
          Creates the working directory associated with this library
static java.lang.String getDataDirectoryName(Instances instances)
          Returns the unique name for the set of instances supplied.
 double[][][] getHillclimbPredictions()
          This method will get the predictions for all the models in the ensemble library.
static java.lang.String getInstancesChecksum(Instances instances)
          This method takes an Instances object and returns a checksum of its toString method - that is the checksum of the .arff file that would be created if the Instances object were transformed into an arff file in the file system.
 java.lang.String getModelListFile()
          Gets the model list file that holds the list of models in the ensemble library.
 java.util.Set getModelNames()
          This method will return a Set object containing all the String representations of the models.
 java.io.File getWorkingDirectory()
          Gets the working Directory of the ensemble library.
 void removeModel(java.lang.String modelKey)
          This will remove the model associated with the given String from the model libraryHashMap
 void setDebug(boolean debug)
          Set debug flag for the library and all its models.
 void setModelListFile(java.lang.String modelListFile)
          Sets the model list file that holds the list of models in the ensemble library.
 void setNumFolds(int numFolds)
          Set the number of folds for cross validation.
 void setValidationRatio(double validationRatio)
          Sets the validation-set ratio.
 void setWorkingDirectory(java.io.File workingDirectory)
          Sets the working Directory of the ensemble library.
 Instances trainAll(Instances data, java.lang.String directory, int algorithm)
          This method will iterate through the TreeMap of models and train all models that do not currently exist (are not yet trained).
 
Methods inherited from class weka.classifiers.EnsembleLibrary
addModel, addModel, addPropertyChangeListener, clearModels, getModels, loadLibrary, loadLibrary, loadLibrary, removeModel, saveLibrary, setModels, size
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_Debug

public transient boolean m_Debug
Whether we should print debug messages.

Constructor Detail

EnsembleSelectionLibrary

public EnsembleSelectionLibrary()
Creates a default libary. Library should be associated with


EnsembleSelectionLibrary

public EnsembleSelectionLibrary(java.lang.String dir,
                                int seed,
                                int folds,
                                double validationRatio)
Creates a default libary. Library should be associated with a working directory

Parameters:
dir - the working directory form the ensemble library
seed - the seed value
folds - the number of folds
validationRatio - the ratio to use

EnsembleSelectionLibrary

public EnsembleSelectionLibrary(java.lang.String libraryFileName)
This constructor will create a library from a model list file given by the file name argument

Parameters:
libraryFileName - the library filename

EnsembleSelectionLibrary

public EnsembleSelectionLibrary(java.io.InputStream stream)
This constructor will create a library from the given XML stream.

Parameters:
stream - the XML library stream
Method Detail

setDebug

public void setDebug(boolean debug)
Set debug flag for the library and all its models. The debug flag determines whether we print debugging information to stdout.

Parameters:
debug - if true debug mode is on

setValidationRatio

public void setValidationRatio(double validationRatio)
Sets the validation-set ratio. This is the portion of the training set that is set aside for hillclimbing. Note that this value is ignored if we are doing cross-validation (indicated by the number of folds being > 1).

Parameters:
validationRatio - the new ratio

setNumFolds

public void setNumFolds(int numFolds)
Set the number of folds for cross validation. If the number of folds is > 1, the validation ratio is ignored.

Parameters:
numFolds - the number of folds to use

trainAll

public Instances trainAll(Instances data,
                          java.lang.String directory,
                          int algorithm)
                   throws java.lang.Exception
This method will iterate through the TreeMap of models and train all models that do not currently exist (are not yet trained).

Returns the data set which should be used for hillclimbing.

If training a model fails then an error will be sent to stdout and that model will be removed from the TreeMap. FIXME Should we maybe raise an exception instead?

Parameters:
data - the data to work on
directory - the working directory
algorithm - the type of algorithm
Returns:
the data that should be used for hillclimbing
Throws:
java.lang.Exception - if something goes wrong

createWorkingDirectory

public void createWorkingDirectory(java.lang.String dirName)
Creates the working directory associated with this library

Parameters:
dirName - the new directory

removeModel

public void removeModel(java.lang.String modelKey)
This will remove the model associated with the given String from the model libraryHashMap

Parameters:
modelKey - the key of the model

getModelNames

public java.util.Set getModelNames()
This method will return a Set object containing all the String representations of the models. The iterator across this Set object will return the model name in alphebetical order.

Returns:
all model representations

getHillclimbPredictions

public double[][][] getHillclimbPredictions()
This method will get the predictions for all the models in the ensemble library. If cross validaiton is used, then predictions will be returned for the entire training set. If cross validation is not used, then predictions will only be returned for the ratio of the training set reserved for validation.

Returns:
the predictions

getWorkingDirectory

public java.io.File getWorkingDirectory()
Gets the working Directory of the ensemble library.

Returns:
the working directory.

setWorkingDirectory

public void setWorkingDirectory(java.io.File workingDirectory)
Sets the working Directory of the ensemble library.

Parameters:
workingDirectory - the working directory to use.

getModelListFile

public java.lang.String getModelListFile()
Gets the model list file that holds the list of models in the ensemble library.

Returns:
the working directory.

setModelListFile

public void setModelListFile(java.lang.String modelListFile)
Sets the model list file that holds the list of models in the ensemble library.

Parameters:
modelListFile - the model list file to use

createModel

public EnsembleLibraryModel createModel(Classifier classifier)
creates a LibraryModel from a set of arguments

Overrides:
createModel in class EnsembleLibrary
Parameters:
classifier - the classifier to use
Returns:
the generated library model

createModel

public EnsembleLibraryModel createModel(java.lang.String modelString)
This method takes a String argument defining a classifier and uses it to create a base Classifier. WARNING! This method is only called when trying to craete models from flat files (.mlf). This method is highly untested and foreseeably will cause problems when trying to nest arguments within multiplte meta classifiers. To avoid any problems we recommend using only XML serialization, via saving to .model.xml and using only the createModel(Classifier) method above.

Overrides:
createModel in class EnsembleLibrary
Parameters:
modelString - the classifier definition
Returns:
the generated library model

getInstancesChecksum

public static java.lang.String getInstancesChecksum(Instances instances)
This method takes an Instances object and returns a checksum of its toString method - that is the checksum of the .arff file that would be created if the Instances object were transformed into an arff file in the file system.

Parameters:
instances - the data to get the checksum for
Returns:
the checksum

getDataDirectoryName

public static java.lang.String getDataDirectoryName(Instances instances)
Returns the unique name for the set of instances supplied. This is used to create a directory for all of the models corresponding to that set of instances. This was intended as a way to keep Working Directories "organized"

Parameters:
instances - the data to get the directory for
Returns:
the directory

addWorkingDirectoryListener

public void addWorkingDirectoryListener(java.beans.PropertyChangeListener listener)
Adds an object to the list of those that wish to be informed when the eotking directory changes.

Parameters:
listener - a new listener to add to the list