weka.classifiers.misc.monotone
Class InstancesUtil

java.lang.Object
  extended by weka.classifiers.misc.monotone.InstancesUtil

public class InstancesUtil
extends java.lang.Object

This class contains some methods for working with objects of type Instance and Instances, not provided by there respective classes.

This implementation is part of the master's thesis: "Studie en implementatie van instantie-gebaseerde algoritmen voor gesuperviseerd rangschikken", Stijn Lievens, Ghent University, 2004.

Version:
$Revision: 1.2 $
Author:
Stijn Lievens (stijn.lievens@ugent.be)

Constructor Summary
InstancesUtil()
           
 
Method Summary
static void classifyInstances(Instances instances, Classifier classifier)
          Classify a set of instances using a given classifier.
static boolean comparable(Instance i1, Instance i2)
          Checks if two instances are comparable in the data space, this is ignoring the class attribute.
static int containsIgnoreClass(Instances instances, Instance instance)
          Get the index of an instance in a set of instances, where instances are compared ignoring the class attribute.
static DiscreteEstimator countValues(Instances instances, int attributeIndex)
          Return a histogram of the values for the specified attribute.
static boolean doubt(Instance i1, Instance i2)
          Checks it two instances give rise to doubt.
static boolean equalIgnoreClass(Instance i1, Instance i2)
          Compares two instances, ignoring the class attribute (if any)
static Instances generateRandomSample(Instances headerInfo, int numberOfExamples, java.util.Random random)
          Generates a random sample of instances.
static BooleanBitMatrix getBitMatrix(Instances instances)
          Calculates the relation (poset) formed by the instances.
static boolean isHomogeneous(Instances instances)
          Check if all instances have the same class value.
static boolean isMonotone(Instances instances)
          Checks if the given data set is monotone.
static boolean isQuasiMonotone(Instances ground, Instances other)
          Test if a set of instances is quasi monotone.
static double maximalExtension(Instances instances, Instance instance)
          Computes the maximal extension for a given instance.
static double maximalExtension(Instances instances, Instance instance, double maxValue)
          Computes the maximal extension of a given instance, but the maximal value returned is maxValue.
static double minimalExtension(Instances instances, Instance instance)
          Computes the minimal extension for a given instance.
static double minimalExtension(Instances instances, Instance instance, double minValue)
          Computes the minimal extension of a given instance, but the minimal value returned is minValue.
static int nextOccurenceIgnoreClass(Instances instances, Instance instance, int index)
          Find the next occurence of an instance, ignoring the class, for which the index in the dataset is at least index.
static int nrOfRedundant(Instances instances)
          Counts the number of redundant pairs in the sense of OLM.
static int[] nrOfReversedPreferences(Instances instances)
          Gather some statistics regarding reversed preferences.
static int[] nrStochasticReversedPreference(Instances instances)
          Find the number of stochastic reversed preferences in the dataset.
static double numberInInterval(Instance low, Instance up)
          Calculatus the number of elements in the closed interval [low,up].
static double numberOfGreaterVectors(Instance instance)
          Calculatutes the number of vectors in the data space that are greater or equal than the given instance.
static double numberOfSmallerVectors(Instance instance)
          Calculatutes the number of vectors in the data space that are smaller or equal than the given instance.
static boolean reversedPreference(Instance i1, Instance i2)
          Checks if two instances give rise to reversed preference.
static Instances sampleWithoutReplacement(Instances instances, int size, java.util.Random random)
          Create, without replacement, a random subsample of the given size from the given instances.
static boolean smallerOrEqual(Instance i1, Instance i2)
          Compares two instances in the data space, this is, ignoring the class attribute.
static boolean strictlySmaller(Instance i1, Instance i2)
          Compares two instances in the data space, this is ignoring the class attribute.
static double[] toDataDouble(Instance instance)
          Returns an array containing the attribute values (in internal floating point format) of the given instance in data space, this is, the class attribute (if any) is removed.
static double totalLoss(Classifier classifier, Instances instances, NominalLossFunction lossFunction)
          Calulates the total loss over the instances , using the trained classifier and the specified lossFunction.
static void write(Instances instances, java.io.BufferedWriter file)
          Write the instances in ARFF-format to the indicated BufferedWriter .
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InstancesUtil

public InstancesUtil()
Method Detail

equalIgnoreClass

public static boolean equalIgnoreClass(Instance i1,
                                       Instance i2)
Compares two instances, ignoring the class attribute (if any)

Parameters:
i1 - the first instance
i2 - the second instance
Returns:
true if both instances are equal (ignoring the class attribute), false otherwise

containsIgnoreClass

public static int containsIgnoreClass(Instances instances,
                                      Instance instance)
Get the index of an instance in a set of instances, where instances are compared ignoring the class attribute.

Parameters:
instances - the set of instances
instance - to instance to be found in the given set of instances
Returns:
the index of the first instance that equals the given instance (ignoring the class attribute), -1 if the instance was not found

nextOccurenceIgnoreClass

public static int nextOccurenceIgnoreClass(Instances instances,
                                           Instance instance,
                                           int index)
Find the next occurence of an instance, ignoring the class, for which the index in the dataset is at least index.

Parameters:
instances - the set of instances to be searched
instance - the instance to be found
index - the minimum index that might be returned
Returns:
the index of the first instance with index at least index that equals the given instance (ignoring the class attribute), -1 if the instance was not found

isHomogeneous

public static boolean isHomogeneous(Instances instances)
Check if all instances have the same class value.

Parameters:
instances - the instances to be checked for homogeneity
Returns:
true if the instances have the same class value, false otherwise

strictlySmaller

public static boolean strictlySmaller(Instance i1,
                                      Instance i2)
Compares two instances in the data space, this is ignoring the class attribute. An instance is strictly smaller than another instance if the same holds for the Coordinates based on these instances.

Parameters:
i1 - the first instance
i2 - the second instance
Returns:
true if the first instance is strictly smaller than the second instance, false otherwise

smallerOrEqual

public static boolean smallerOrEqual(Instance i1,
                                     Instance i2)
Compares two instances in the data space, this is, ignoring the class attribute. An instance is smaller or equal than another instance if the same holds for the Coordinates based on these instances.

Parameters:
i1 - the first instance
i2 - the second instance
Returns:
true if the first instance is smaller or equal than the second instance, false otherwise

comparable

public static boolean comparable(Instance i1,
                                 Instance i2)
                          throws java.lang.IllegalArgumentException
Checks if two instances are comparable in the data space, this is ignoring the class attribute. Two instances are comparable if the first is smaller or equal than the second, or the other way around.

Parameters:
i1 - the first instance
i2 - the second instance
Returns:
true if the given instances are comparable, false otherwise
Throws:
java.lang.IllegalArgumentException - if the two instances don't have the same length

doubt

public static boolean doubt(Instance i1,
                            Instance i2)
Checks it two instances give rise to doubt. There is doubt between two instances if their Coordinates are equal, but their class value is different.

Parameters:
i1 - the first instance
i2 - the second instance
Returns:
true if there is doubt between the two given instances, false otherwise

reversedPreference

public static boolean reversedPreference(Instance i1,
                                         Instance i2)
                                  throws java.lang.IllegalArgumentException
Checks if two instances give rise to reversed preference. Two instances give rise to reversed preference in the data space, if their Coordinates are comparable but different, and their class values are not related in the same way.

Parameters:
i1 - the first instance
i2 - the second instance
Returns:
true if i1 and i2 give rise to reversed preference, false otherwise
Throws:
java.lang.IllegalArgumentException - if the two instances don't have the same length

isMonotone

public static boolean isMonotone(Instances instances)
Checks if the given data set is monotone. We say that a data set is monotone if it contains doubt nor reversed preferences.

Parameters:
instances - the data set to be checked
Returns:
true if the given data set if monotone, false otherwise

isQuasiMonotone

public static boolean isQuasiMonotone(Instances ground,
                                      Instances other)
Test if a set of instances is quasi monotone. We say that a set of instances S is quasi monotone with respect to a set of instances D iff [x,y] \cap D \neq \emptyset \implies class(x) \leq class(y). This implies that D itself is monotone.

Parameters:
ground - the instances playing the role of D
other - the instances playing the role of S
Returns:
true if the instances are quasi monotone, false otherwise

nrOfReversedPreferences

public static int[] nrOfReversedPreferences(Instances instances)
Gather some statistics regarding reversed preferences.

Parameters:
instances - the instances to be examined
Returns:
array of length 3; position 0 indicates the number of couples that have reversed preference, position 1 the number of couples that are comparable, and position 2 the total number of couples
See Also:
reversedPreference(Instance, Instance)

nrStochasticReversedPreference

public static int[] nrStochasticReversedPreference(Instances instances)
                                            throws java.lang.IllegalArgumentException
Find the number of stochastic reversed preferences in the dataset.

Parameters:
instances - the instances to be examined
Returns:
an array of integers containing at position
  • 0: number of different coordinates, this is the size of S_X
  • 1: number of couples showing reversed preference:
    x < y and not (F_x leqstoch F_y)
  • 2: number of couples having
    x < y and F_y leqstoch F_x and F_x neq F_y
  • 3: number of couples that are comparable
    |\{ (x,y)\in S_X \times S_x | x < y\}|
  • 4: number of couples in S_X
Throws:
java.lang.IllegalArgumentException - if there are no instances with a non-missing class value, or if the class is not set

nrOfRedundant

public static int nrOfRedundant(Instances instances)
Counts the number of redundant pairs in the sense of OLM. Two instances are redundant if they are comparable and have the same class value.

Parameters:
instances - the instances to be checked
Returns:
the number of redundant pairs in the given set of instances

totalLoss

public static double totalLoss(Classifier classifier,
                               Instances instances,
                               NominalLossFunction lossFunction)
Calulates the total loss over the instances , using the trained classifier and the specified lossFunction. The instances should not contain missing values in the class attribute.

Parameters:
classifier - the trained classifier to use
instances - the test instances
lossFunction - the loss function to use
Returns:
the total loss of all the instances using the given classifier and loss function

classifyInstances

public static void classifyInstances(Instances instances,
                                     Classifier classifier)
                              throws java.lang.Exception
Classify a set of instances using a given classifier. The class value of the instances are set.

Parameters:
instances - the instances to be classified
classifier - a built classifier
Throws:
java.lang.Exception - if one of the instances could no be classified

getBitMatrix

public static BooleanBitMatrix getBitMatrix(Instances instances)
Calculates the relation (poset) formed by the instances.

Parameters:
instances - the instances for which the poset is to be formed
Returns:
a BooleanBitMatrix for which position bm.get(i,j) == true iff InstancesUtil.strictlySmaller(instances.instance(i), instances.instance(j)) == true

numberInInterval

public static double numberInInterval(Instance low,
                                      Instance up)
                               throws java.lang.IllegalArgumentException
Calculatus the number of elements in the closed interval [low,up]. If the class index is set, then the class attribute does not play part in the calculations, this is we work in the data space. The code also works with numeric attributes, but is primarily intended for ordinal attributes.

Parameters:
low - the lower bound of the interval
up - the upper bound of the interval
Returns:
the size of the interval (in floating point format)
Throws:
java.lang.IllegalArgumentException - if the given instances do not constitute an interval.

numberOfSmallerVectors

public static double numberOfSmallerVectors(Instance instance)
                                     throws java.lang.IllegalArgumentException
Calculatutes the number of vectors in the data space that are smaller or equal than the given instance.

Parameters:
instance - the given instance
Returns:
the number of vectors in the data space smaller or equal than the given instance
Throws:
java.lang.IllegalArgumentException - if there are numeric attributes

numberOfGreaterVectors

public static double numberOfGreaterVectors(Instance instance)
                                     throws java.lang.IllegalArgumentException
Calculatutes the number of vectors in the data space that are greater or equal than the given instance.

Parameters:
instance - the given instance
Returns:
the number of vectors in the data space greater of equal than the given instance
Throws:
java.lang.IllegalArgumentException - if there are numeric attributes

write

public static void write(Instances instances,
                         java.io.BufferedWriter file)
                  throws java.io.IOException
Write the instances in ARFF-format to the indicated BufferedWriter .

Parameters:
instances - the instances to write
file - the BufferedWriter to write to
Throws:
java.io.IOException - if something goes wrong while writing the instances

countValues

public static DiscreteEstimator countValues(Instances instances,
                                            int attributeIndex)
                                     throws java.lang.IllegalArgumentException
Return a histogram of the values for the specified attribute.

Parameters:
instances - the instances
attributeIndex - the attribute to consider
Returns:
a DiscreteEstimator where the ith
Throws:
java.lang.IllegalArgumentException - if the attribute at the specified index is numeric

sampleWithoutReplacement

public static Instances sampleWithoutReplacement(Instances instances,
                                                 int size,
                                                 java.util.Random random)
Create, without replacement, a random subsample of the given size from the given instances.

Parameters:
instances - the instances to sample from
size - the requested size of the sample
random - the random generator to use
Returns:
a sample of the requested size, drawn from the given instances without replacement
Throws:
java.lang.IllegalArgumentException - if the size exceeds the number of instances

generateRandomSample

public static Instances generateRandomSample(Instances headerInfo,
                                             int numberOfExamples,
                                             java.util.Random random)
                                      throws java.lang.IllegalArgumentException
Generates a random sample of instances. Each attribute must be nominal, and the class labels are not set.

Parameters:
headerInfo - Instances whose header information is used to determine how the set of returned instances will look
numberOfExamples - the desired size of the returned set
random - the random number generator to use
Returns:
a set of Instances containing the random sample.
Throws:
java.lang.IllegalArgumentException - if numeric attributes are given

toDataDouble

public static double[] toDataDouble(Instance instance)
Returns an array containing the attribute values (in internal floating point format) of the given instance in data space, this is, the class attribute (if any) is removed.

Parameters:
instance - the instance to get the attribute values from
Returns:
array of doubles containing the attribute values

minimalExtension

public static double minimalExtension(Instances instances,
                                      Instance instance)
Computes the minimal extension for a given instance.

Parameters:
instances - the set of instances
instance - the instance for which the minimal extension is to be calculated
Returns:
the value of the minimal extension, in internal floating point format

minimalExtension

public static double minimalExtension(Instances instances,
                                      Instance instance,
                                      double minValue)
Computes the minimal extension of a given instance, but the minimal value returned is minValue. This method may have its applications when the training set is divided into multiple Instances objects.

Parameters:
instances - the set of instances
instance - the instance for which the minimal extension is to be calculated
minValue - a double indicating the minimal value that should be returned
Returns:
the label of the minimal extension, in internal floating point format

maximalExtension

public static double maximalExtension(Instances instances,
                                      Instance instance)
Computes the maximal extension for a given instance.

Parameters:
instances - the set of instances
instance - the instance for which the minimal extension is to be calculated
Returns:
the value of the minimal extension, in internal floating point format

maximalExtension

public static double maximalExtension(Instances instances,
                                      Instance instance,
                                      double maxValue)
Computes the maximal extension of a given instance, but the maximal value returned is maxValue. This method may have its applications when the training set is divided into multiple Instances objects.

Parameters:
instances - the set of instances
instance - the instance for which the maximal extension is to be calculated
maxValue - a double indicating the maximal value that should be returned
Returns:
the value of the minimal extension, in internal floating point format