weka.classifiers.meta.ensembleSelection
Class EnsembleSelectionLibraryModel

java.lang.Object
  extended by weka.classifiers.EnsembleLibraryModel
      extended by weka.classifiers.meta.ensembleSelection.EnsembleSelectionLibraryModel
All Implemented Interfaces:
java.io.Serializable

public class EnsembleSelectionLibraryModel
extends EnsembleLibraryModel
implements java.io.Serializable

This class represents a library model that is used for EnsembleSelection. At this level the concept of cross validation is abstracted away. This class keeps track of the performance statistics and bookkeeping information for its "model type" accross all the CV folds. By "model type", I mean the combination of both the Classifier type (e.g. J48), and its set of parameters (e.g. -C 0.5 -X 1 -Y 5). So for example, if you are using 5 fold cross validaiton, this model will keep an array of classifiers[] of length 5 and will keep track of their performances accordingly. This class also has methods to deal with serializing all of this information into the .elm file that will represent this model.

Also it is worth mentioning that another important function of this class is to track all of the dataset information that was used to create this model. This is because we want to protect users from doing foreseeably bad things. e.g., trying to build an ensemble for a dataset with models that were trained on the wrong partitioning of the dataset. This could lead to artificially high performance due to the fact that instances used for the test set to gauge performance could have accidentally been used to train the base classifiers. So in a nutshell, we are preventing people from unintentionally "cheating" by enforcing that the seed, #folds, validation ration, and the checksum of the Instances.toString() method ALL match exactly. Otherwise we throw an exception.

Version:
$Revision: 1.1 $
Author:
Robert Jung (mrbobjung@gmail.com)
See Also:
Serialized Form

Field Summary
static java.lang.String FILE_EXTENSION
          The default file extension for ensemble library models
 boolean m_Debug
          The debug flag as propagated from the main EnsembleSelection class.
 
Constructor Summary
EnsembleSelectionLibraryModel()
          Default Constructor
EnsembleSelectionLibraryModel(Classifier classifier)
          Basic Constructor
EnsembleSelectionLibraryModel(Classifier classifier, int seed, java.lang.String checksum, double validationRatio, int folds)
          Constructor for LibaryModel
 
Method Summary
 void createModel(Instances[] data, Instances[] hillclimbData, java.lang.String dataDirectoryName, int algorithm)
          Creates the model.
 double[] getAveragePrediction(Instance instance)
          Returns the average of the prediction of the models across all folds.
 java.lang.String getChecksum()
          get the checksum
static java.lang.String getFileName(java.lang.String stringRepresentation)
          The purpose of this method is to get an appropriate file name for a model based on its string representation of a model.
 double[] getFoldPrediction(Instance instance, int fold)
          Returns prediction of the classifier for the specified fold.
 int getFolds()
          get the number of folds
 Classifier[] getModels()
          Returs the array of classifiers
 int getSeed()
          Get the seed
static java.lang.String getStringChecksum(java.lang.String string)
          Gets a checksum for the string defining this classifier.
 double[][] getValidationPredictions()
          getter for validation predictions
 double getValidationRatio()
          get validationRatio
static EnsembleSelectionLibraryModel loadModel(java.lang.String modelFilePath)
          loads the specified model
 void rehydrateModel(java.lang.String workingDirectory)
          The purpose of this method is to "rehydrate" the classifier object fot this library model from the filesystem.
 void releaseModel()
          Releases the model from memory.
static void saveModel(java.lang.String directory, EnsembleSelectionLibraryModel model)
          Saves the given model to the specified file.
 void setChecksum(java.lang.String instancesChecksum)
          set the checksum
 void setDebug(boolean debug)
          This is used to propagate the m_Debug flag of the EnsembleSelection classifier to this class.
 void setFileName(java.lang.String fileName)
          Sets the .elm file name for this library model
 void setFolds(int folds)
          Set the number of folds for cross validation.
 void setSeed(int seed)
          Set the seed
 void setValidationPredictions(double[][] predictions)
          setter for validation predictions
 void setValidationRatio(double validationRatio)
          Sets the validation set ratio (only meaningful if folds == 1)
 void train(Instances trainData, int fold)
          Train the classifier for the specified fold on the given data
 
Methods inherited from class weka.classifiers.EnsembleLibraryModel
getClassifier, getDescriptionText, getErrorText, getModelClass, getOptions, getOptionsWereValid, getStringRepresentation, setDescriptionText, setErrorText, setOptionsWereValid, testOptions, toString, updateDescriptionText
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

FILE_EXTENSION

public static final java.lang.String FILE_EXTENSION
The default file extension for ensemble library models

See Also:
Constant Field Values

m_Debug

public transient boolean m_Debug
The debug flag as propagated from the main EnsembleSelection class.

Constructor Detail

EnsembleSelectionLibraryModel

public EnsembleSelectionLibraryModel()
Default Constructor


EnsembleSelectionLibraryModel

public EnsembleSelectionLibraryModel(Classifier classifier,
                                     int seed,
                                     java.lang.String checksum,
                                     double validationRatio,
                                     int folds)
Constructor for LibaryModel

Parameters:
classifier - the classifier to use
seed - the random seed value
checksum - the checksum
validationRatio - the ration to use
folds - the number of folds to use

EnsembleSelectionLibraryModel

public EnsembleSelectionLibraryModel(Classifier classifier)
Basic Constructor

Parameters:
classifier - the classifier to use
Method Detail

setDebug

public void setDebug(boolean debug)
This is used to propagate the m_Debug flag of the EnsembleSelection classifier to this class. There are things we would want to print out here also.

Parameters:
debug - if true additional information is output

getAveragePrediction

public double[] getAveragePrediction(Instance instance)
                              throws java.lang.Exception
Returns the average of the prediction of the models across all folds.

Parameters:
instance - the instance to get predictions for
Returns:
the average prediction
Throws:
java.lang.Exception - if something goes wrong

getFoldPrediction

public double[] getFoldPrediction(Instance instance,
                                  int fold)
                           throws java.lang.Exception
Returns prediction of the classifier for the specified fold.

Parameters:
instance - instance for which to make a prediction.
fold - fold number of the classifier to use.
Returns:
the prediction for the classes
Throws:
java.lang.Exception - if prediction fails

createModel

public void createModel(Instances[] data,
                        Instances[] hillclimbData,
                        java.lang.String dataDirectoryName,
                        int algorithm)
                 throws java.lang.Exception
Creates the model. If there are n folds, it constructs n classifiers using the current Classifier class and options. If the model has already been created or loaded, starts fresh.

Parameters:
data - the data to work with
hillclimbData - the data for hillclimbing
dataDirectoryName - the directory to use
algorithm - the type of algorithm
Throws:
java.lang.Exception - if something goeds wrong

rehydrateModel

public void rehydrateModel(java.lang.String workingDirectory)
The purpose of this method is to "rehydrate" the classifier object fot this library model from the filesystem.

Parameters:
workingDirectory - the working directory to use

releaseModel

public void releaseModel()
Releases the model from memory. TODO - need to be saving these so we can retrieve them later!!


train

public void train(Instances trainData,
                  int fold)
           throws java.lang.Exception
Train the classifier for the specified fold on the given data

Parameters:
trainData - the data to train with
fold - the fold number
Throws:
java.lang.Exception - if something goes wrong, e.g., out of memory

setSeed

public void setSeed(int seed)
Set the seed

Parameters:
seed - the seed value

getSeed

public int getSeed()
Get the seed

Returns:
the seed value

setValidationRatio

public void setValidationRatio(double validationRatio)
Sets the validation set ratio (only meaningful if folds == 1)

Parameters:
validationRatio - the new ration

getValidationRatio

public double getValidationRatio()
get validationRatio

Returns:
the current ratio

setFolds

public void setFolds(int folds)
Set the number of folds for cross validation. The number of folds also indicates how many classifiers will be built to represent this model.

Parameters:
folds - the number of folds to use

getFolds

public int getFolds()
get the number of folds

Returns:
the current number of folds

setChecksum

public void setChecksum(java.lang.String instancesChecksum)
set the checksum

Parameters:
instancesChecksum - the new checksum

getChecksum

public java.lang.String getChecksum()
get the checksum

Returns:
the current checksum

getModels

public Classifier[] getModels()
Returs the array of classifiers

Returns:
the current models

setFileName

public void setFileName(java.lang.String fileName)
Sets the .elm file name for this library model

Parameters:
fileName - the new filename

getStringChecksum

public static java.lang.String getStringChecksum(java.lang.String string)
Gets a checksum for the string defining this classifier. This is used to preserve uniqueness in the classifier names.

Parameters:
string - the classifier definition
Returns:
the checksum string

getFileName

public static java.lang.String getFileName(java.lang.String stringRepresentation)
The purpose of this method is to get an appropriate file name for a model based on its string representation of a model. All generated filenames are limited to less than 128 characters and all of them will end with a 64 bit checksum value of their string representation to try to maintain some uniqueness of file names.

Parameters:
stringRepresentation - string representation of model
Returns:
unique filename

saveModel

public static void saveModel(java.lang.String directory,
                             EnsembleSelectionLibraryModel model)
Saves the given model to the specified file.

Parameters:
directory - the directory to save the model to
model - the model to save

loadModel

public static EnsembleSelectionLibraryModel loadModel(java.lang.String modelFilePath)
loads the specified model

Parameters:
modelFilePath - the path of the model
Returns:
the model

getValidationPredictions

public double[][] getValidationPredictions()
getter for validation predictions

Returns:
the current validation predictions

setValidationPredictions

public void setValidationPredictions(double[][] predictions)
setter for validation predictions

Parameters:
predictions - the new validation predictions