Skip to content

Latest commit

 

History

History
1874 lines (1811 loc) · 43.9 KB

Protocols.md

File metadata and controls

1874 lines (1811 loc) · 43.9 KB

Protocols

This section of the manual describes all the details of the Swift protocols used by the AIToolbox framework.

##AlphaBetaNode The AlphaBetaNode protocol defines the two required functions for a node in an alpha-beta pruning search problem. The nodes need to generate the child nodes (the 'moves' from the current node state), and be able to be evaluated so the most advantageous path can be determined. The protocol has the following required functions:

####generateMoves

Template generateMoves(_ forMaximizer: Bool) -> [AlphaBetaNode]
Description This method returns the complete list of child nodes (the moves available starting from this node's state)
Inputs
name Type Description
forMaximizer Bool Whether these moves are for the player you are trying to maximize (pass in a true value), or for the opponent, who you are trying to minimize the score
Output [AlphaBetaNode] - The nodes that have a state equal to all the possible moves starting from this node's state
Throws No

####staticEvaluation

Template staticEvaluation() -> Double
Description This method returns the evaluated worth of the node's state
Inputs None
Output Double - The estimated worth of the state for the node
Throws No

##Classifier The Classifier protocol is an interface used by the majority of classification models in AIToolbox. Using the protocol allows all the models to behave the same, allowing them to use Validation methods for parameter tuning and model selection without custom code. The protocol has the following required functions:

####getInputDimension

Template getInputDimension() -> Int
Description This method gets the required dimension of the input vectors used by the model
Inputs None
Output Int - The dimension of the input vector used by the model
Throws No

####getParameterDimension

Template getParameterDimension() -> Int
Description This method gets the number of parameters used by the model. It may only be valid after training
Inputs None
Output Int - The number of parameters used by the model
Throws No

####getNumberOfClasses

Template getNumberOfClasses() -> Int
Description This method gets the number of class labels that the model understands. It may only be valid after training
Inputs None
Output Int - The number of class labels known by the model
Throws No

####setParameters

Template setParameters(_ parameters: [Double]) throws
Description This method will set the parameters of the model from the passed in array of values
Inputs
name Type Description
parameters [Double] The values to set the parameters to. Should be sized based on the getParameterDimension result
Output None
Throws
Error Description
MachineLearningError.notEnoughData thrown if the size of the passed in array is smaller than that required. Use the getParameterDimension method to get the required size

####setCustomInitializer

Template setCustomInitializer(_ function: ((_ trainData: MLDataSet)->[Double])!)
Description This method will set a function that will be called on parameter initialization, with the initial training data set as the parameter. If nil [the default state], the parameters will be set to random values. If this function is not nil, the results of this function is used to initialize the parameters
Inputs
name Type Description
function ((_ trainData: MLDataSet)->[Double])! The function that will get the initial training data set passed to it and should return the values to be used to initialize the parameters result
Output None
Throws No

####getParameters

Template getParameters() throws -> [Double]
Description This method will get the parameters of the model. The model may need to be trained first
Inputs None
Output [Double] - the parameters of the model
Throws
Error Description
MachineLearningError.notTrained thrown if the model has not been trained, and is needed by the method before parameters are created

####trainClassifier

Template trainClassifier(_ trainData: MLClassificationDataSet) throws
Description This trains the classification model on the data set passed in. The model parameters will be initialized by the method before training begins.
Inputs
name Type Description
trainData MLClassificationDataSet The data set to train on. The data set dimensions (input) must match that of the model
Output None
Throws
Error Description
MachineLearningError.dataNotClassification thrown if the data set used for training is not a classification set
MachineLearningError.dataWrongDimension thrown if the data set used for training does not match the model
MachineLearningError.notEnoughData thrown if the data set used for training does not have enough data to train the model (usually the number of parameters in the model exceeds the data set size)
Additional exceptions may be thrown by individual model classes

####continueTrainingClassifier

Template continueTrainingClassifier(_ trainData: MLClassificationDataSet) throws
Description This trains the classification model on the data set passed in. The model parameters are not initialized first, so training continues with the current parameter set. Not all classification models support this.
Inputs
name Type Description
trainData MLClassificationDataSet The data set to train on. The data set dimensions (input) must match that of the model
Output None
Throws
Error Description
MachineLearningError.dataNotClassification thrown if the data set used for training is not a classification set
MachineLearningError.dataWrongDimension thrown if the data set used for training does not match the model
MachineLearningError.continuationNotSupported thrown if the classification model does not support training continuation
Additional exceptions may be thrown by individual model classes

####classifyOne

Template classifyOne(_ inputs: [Double]) throws ->Int
Description This gets the classification results for a single input vector. The input vector size must match the model input size.
Inputs
name Type Description
inputs [Double] The input vector to get the class label for
Output Int - the resulting class label
Throws
Error Description
DataTypeError.wrongDimensionOnInput thrown if the input vector dimension does not match the model
MachineLearningError.notTrained thrown if the model has not yet been trained
Additional exceptions may be thrown by individual model classes

####classify

Template classify(_ testData: MLClassificationDataSet) throws
Description This gets the classification results for all points in a data set. The data set type and dimensions must match the model requirements.
Inputs
name Type Description
testData MLClassificationDataSet The data set to get results for. The data set dimensions (input) must match that of the model
Output (the data set class passed in is modified to have the output results)
Throws
Error Description
MachineLearningError.dataNotClassification thrown if the data set used for training is not a classificaiton set
MachineLearningError.dataWrongDimension thrown if the data set used for training does not match the model
MachineLearningError.notTrained thrown if the model has not yet been trained
Additional exceptions may be thrown by individual model classes

##ConstraintProblemConstraint The ConstraintProblemConstraint protocol defines the required interface for a class used as custom constraints in a constraint-propagation problem. The protocol contains the following required member variable:

var Type Access Description
isSelfConstraint | Bool | get |  the flag indicating if the constraint is for the current node, or a connected node

The ConstraintProblemConstraint protocol has only one defined function: ####enforceConstraint

Template enforceConstraint(_ graphNodeList: [ConstraintProblemNode], forNodeIndex: Int) -> [EnforcedConstraint]
Description This method is called to enforce the constraint on the graph node given by the passed in index in the passed in graph node list. The function should return a list of EnforcedConstraint structures, each giving a node that was changed and the domain index that was removed from that node
Inputs
name Type Description
graphNodeList [ConstraintProblemNode] the array of graph nodes that make up the problem
forNodeIndex Int index of of the graph node that the constraint is to be enforced on
Output [EnforcedConstraint] - the array of EnforcedConstraint structures that make up the graph change when the constraint is enforced, each giving a node that was changed and the domain index that was removed from that node
Throws No

##MLDataSet The MLDataSet protocol defines the required interface for the 'input' side of a data set. The protocol contains the following required member variables:

var Type Access Description
dataType | DataSetType | get |  the type of data set.  This is a DataSetType enumeration that declares the type of output from this data set
inputDimension | Int | get | the dimension of the input vector for the data set
outputDimension | Int | get | the dimension of the output vector if the dataType is .regression or .realAndClass
size | Int | get | the number of data points defined in the set
optionalData | AnyObject? | get, set | an optional object that can be attached to the data set by an algorithm for additional data storage during computation

The MLDataSet protocol has only one required function: ####getInput

Template getInput(_ index: Int) throws ->[Double]
Description This method gets the input vector for the specified index
Inputs
name Type Description
index Int index of input vector to get. The index should be between zero and size-1
Output [Double] - The input vector for the specified index
Throws
Error Description
DataIndexError.negative thrown if the index is less than zero
DataIndexError.indexAboveDataSetSize thrown if the index is outside the valid range

The MLDataSet protocol also has one function in an extension that is already filled out:

    public func getRandomIndexSet() -> [Int]

This function returns a random set if indices into the data set. This array of random indices is useful for stochastic batch training routines.

##MLClassificationDataSet The MLClassificationDataSet protocol is the required interface for a data set that will be used by classification algorithms (unless it will also have an output vector of reals for other purposes). It inherits the MLDataSet protocol, so all input requirements from that protocol are already defined. The MLClassificationDataSet defines the following functions regarding the outputs of a data set: ####getClass

Template getClass(_ index: Int) throws ->Int
Description This method gets the output class label for the specified index
Inputs
name Type Description
index Int index of output class label to get. The index should be between zero and size-1
Output Int - The class label for the specified index
Throws
Error Description
DataIndexError.negative thrown if the index is less than zero
DataIndexError.indexAboveDataSetSize thrown if the index is outside the valid range
DataTypeError.dataWrongForType thrown if the data set is not a classification data set

####setClass

Template setClass(_ index: Int, newClass : Int) throws
Description This method sets the output vector for the specified index
Inputs
name Type Description
index Int index of the class label to set. The index should be between zero and size-1
newClass Int class label to be set on the point indicated by the index parameter
Output None
Throws
Error Description
DataIndexError.negative thrown if the index is less than zero
DataIndexError.indexAboveDataSetSize thrown if the index is outside the valid range
DataTypeError.dataWrongForType thrown if the data set is not a classification data set

##MLCombinedDataSet The MLCombinedDataSet protocol is the required interface for a data set that will be used by classification algorithms that also have an output vector of reals for other purposes. It inherits the MLRegressionDataSet and MLClassificationDataSet protocols, so all input and output requirements from those protocols are already defined. No non-inherited functionality is defined by the protocol.

##MLPersistence The MLPersistence protocol defines methods for reading and writing a machine-learning model to a dictionary. The format of the dictionary uses string keys and AnyObject values that conform to the requirements for a PList file. The dictionary can be part of a large dictionary containing multiple models if needed. The following two functions are required by the protocol, one to get the dictionary, and an initializer that creates the machine learning object from a dictionary: ####init?

Template init?(fromDictionary: [String: AnyObject])
Description This is a failable initializer that takes a dictionary (probably created by the getPersistenceDictionary method) and initializes an instance of the machine learning object
Inputs
name Type Description
fromDictionary [String: AnyObject] dictionary containing PList representation of the object
Output the object, or nil if initialization failed
Throws No

The following code snippet can be used to read a dictionary from a given path that can be used by the init function:

        let pList = NSDictionary(contentsOfFile: path)
        if pList == nil { /*  error handling  */ }
        let dictionary : Dictionary = pList! as! Dictionary<String, AnyObject>

####getPersistenceDictionary

Template getPersistenceDictionary() -> [String: AnyObject]
Description This method returns a dictionary containing a PList representation of all data required to reconstruct the machine learning object.
Inputs None
Output [String: AnyObject] - the dictionary containing the PList representation
Throws No

The dictionary returned can be written to a file using the following code:

        let pList = NSDictionary(dictionary: modelDictionary)
        if !pList.write(toFile: path, atomically: false) { /* do error handling */ }

##MLRegressionDataSet The MLRegressionDataSet protocol is the required interface for a data set that will be used by regression algorithms. It inherits the MLDataSet protocol, so all input requirements from that protocol are already defined. The MLRegressionDataSet defines the following functions regarding the outputs of a data set: ####getOutput

Template getOutput(_ index: Int) throws ->[Double]
Description This method gets the output vector for the specified index
Inputs
name Type Description
index Int index of output vector to get. The index should be between zero and size-1
Output [Double] - The output vector for the specified index
Throws
Error Description
DataIndexError.negative thrown if the index is less than zero
DataIndexError.indexAboveDataSetSize thrown if the index is outside the valid range
DataTypeError.dataWrongForType thrown if the data set is not a regression data set

####setOutput

Template setOutput(_ index: Int, newOutput : [Double]) throws
Description This method sets the output vector for the specified index
Inputs
name Type Description
index Int index of output vector to set. The index should be between zero and size-1
newOutput [Double] output vector to be set on the point indicated by the index parameter
Output None
Throws
Error Description
DataIndexError.negative thrown if the index is less than zero
DataIndexError.indexAboveDataSetSize thrown if the index is outside the valid range
DataTypeError.dataWrongForType thrown if the data set is not a regression data set

##MLViewItem The MLViewItem protocol defines the required functions needed by an item added to an MLView class. The functions include initialization routines such as scale setting, data set routines (giving the axis being used by the plot at that time), and the draw function.

####setColor

Template setColor(_ color: NSColor)
Description This method sets the default color for the item
Inputs
name Type Description
color NSColor the color to set the item's default color to
Output None
Throws No

####setScale

Template setScale(_ scale: (minX: Double, maxX: Double, minY: Double, maxY: Double))
Description This method sets the scale to the provided factors, or the item can calculate it's own. One item in the MLView is the master of the scale. That item has the getScale method called on it, and those values are passed to all the other items with this method
Inputs
name Type Description
scale (minX: Double, maxX: Double, minY: Double, maxY: Double) [tuple] the minimum and maximum scale values for the two axis
Output None
Throws No

####draw

Template draw(_ bounds: CGRect)
Description This method is called to have the item draw itself into the MLView. The bounds of the view are passed in to the method. The context will be set the MLView before this method is called.
Inputs
name Type Description
bounds CGRect the bounding rectangle of the MLView that the item should draw itself into.
Output None
Throws No

####getScale

Template getScale() -> (minX: Double, maxX: Double, minY: Double, maxY: Double)?
Description This method returns the scale factors used by the item. The 'master' object provides the scaling for the entire view. It is this designated object that gets this method called on it, while the other items get their scales set using the setScale method.
Inputs None
Output (minX: Double, maxX: Double, minY: Double, maxY: Double)? - the optional scale values for the X and Y axis that this item wants to have. Return a nil if no scaling information is available from the item (like Legend items will)
Throws No

####setInputVector

Template setInputVector(_ vector: [Double]) throws
Description This method sets the input vector that the view currently has. Vector elements that are used as axis elements should be ignored as needed to plot/draw the data. This method sets the non-plotted input values for the current update.
Inputs
name Type Description
vector [Double] the input vector
Output None
Throws
Error Description
MLViewError.inputVectorNotOfCorrectSize thrown if the input vector size does not match the items requirements

####setXAxisSource

Template setXAxisSource(_ source: MLViewAxisSource, index: Int) throws
Description This method sets the item that should be used for the X axis variable. The type (input, output, or class), and index of the item within the model being displayed is used to identify the axis source.
Inputs
name Type Description
source MLViewAxisSource the type of item (input value, output value, or class label) to be used as the X axis source
index [Int] the index of the item to be used as the X axis source. If the type is 'class label', this index is ignored
Output None
Throws
Error Description
MLViewError.inputIndexOutsideOfRange thrown if the type is 'input' and the index is outside the range of the input vector
MLViewError.dataSetNotRegression thrown if the data set is not a regression data set and an 'output' type is specified
MLViewError.outputIndexOutsideOfRange thrown if the type is 'output' and the index is outside the range of the input vector
MLViewError.dataSetNotClassification thrown if the data set is not a regression data set and an 'class label' type is specified

####setYAxisSource

Template setYAxisSource(_ source: MLViewAxisSource, index: Int) throws
Description This method sets the item that should be used for the Y axis variable. The type (input, output, or class), and index of the item within the model being displayed is used to identify the axis source.
Inputs
name Type Description
source MLViewAxisSource the type of item (input value, output value, or class label) to be used as the Y axis source
index [Int] the index of the item to be used as the Y axis source. If the type is 'class label', this index is ignored
Output None
Throws
Error Description
MLViewError.inputIndexOutsideOfRange thrown if the type is 'input' and the index is outside the range of the input vector
MLViewError.dataSetNotRegression thrown if the data set is not a regression data set and an 'output' type is specified
MLViewError.outputIndexOutsideOfRange thrown if the type is 'output' and the index is outside the range of the input vector
MLViewError.dataSetNotClassification thrown if the data set is not a regression data set and an 'class label' type is specified

##NonLinearEquation The NonLinearEquation protocol is for a class that wraps a non-linear equation for non-linear regression algorithms. The equation is assumed to contain a set of parameters used in the equation that are learnable by the regression model. The protocol defines methods to get the values and gradients of the parameters for use by the regression model. If output dimension is greater than one, the parameter arguments are a matrix with each row the parameters for one of the outputs. The NonLinearEquation protocol has one required variable:

var Type Access Description
parameters | [Double] | get, set |  the parameters that define the learnable portion of the non-linear equation

The NonLinearEquation protocol has following required functions:

####getInputDimension

Template getInputDimension() -> Int
Description This method gets the required dimension of the input vectors used by the non-linear equation
Inputs None
Output Int - The dimension of the input vector used by the non-linear equation
Throws No

####getOutputDimension

Template getOutputDimension() -> Int
Description This method gets the dimension of the output vectors returned by the non-linear equation
Inputs None
Output Int - The dimension of the output vector returned by the non-linear equation
Throws No

####getParameterDimension

Template getParameterDimension() -> Int
Description This method gets the number of parameters used by the non-linear equation. This must be an integer multiple of output dimension.
Inputs None
Output Int - The number of parameters used by the non-linear equation
Throws No

####setParameters

Template setParameters(_ parameters: [Double]) throws
Description This method will set the parameters of the non-linear equation from the passed in array of values
Inputs
name Type Description
parameters [Double] The values to set the parameters to. Should be sized based on the getParameterDimension result
Output None
Throws
Error Description
MachineLearningError.notEnoughData thrown if the size of the passed in array is smaller than that required. Use the getParameterDimension method to get the required size

####getOutputs

Template getOutputs(_ inputs: [Double]) throws -> [Double]
Description This method gets the resulting output vector given the input vector
Inputs
name Type Description
inputs [Double] The input vector to get the non-linear result values for
Output [Double] - the resulting non=linear equation values
Throws
Error Description
DataTypeError.wrongDimensionOnInput thrown if the input vector dimension does not match the model
Additional exceptions may be thrown by individual model classes

####getGradient

Template getGradient(_ inputs: [Double]) throws -> [Double]
Description This method gets the resulting parameter gradient vector given the input vector
Inputs
name Type Description
inputs [Double] The input vector to get the gradient values for
Output [Double] - the resulting gradient values for the parameters. This vector will be sized to the parameter dimension
Throws
Error Description
DataTypeError.wrongDimensionOnInput thrown if the input vector dimension does not match the model
Additional exceptions may be thrown by individual model classes

##Regressor The Regressor protocol is an interface used by the majority of regression models in AIToolbox. Using the protocol allows all the models to behave the same, allowing them to use Validation methods for parameter tuning and model selection without custom code. The protocol has the following required functions:

####getInputDimension

Template getInputDimension() -> Int
Description This method gets the required dimension of the input vectors used by the model
Inputs None
Output Int - The dimension of the input vector used by the model
Throws No

####getOutputDimension

Template getOutputDimension() -> Int
Description This method gets the required dimension of the output vectors used by the model
Inputs None
Output Int - The dimension of the output vector used by the model
Throws No

####getParameterDimension

Template getParameterDimension() -> Int
Description This method gets the number of parameters used by the model. It may only be valid after training
Inputs None
Output Int - The number of parameters used by the model
Throws No

####setParameters

Template setParameters(_ parameters: [Double]) throws
Description This method will set the parameters of the model from the passed in array of values
Inputs
name Type Description
parameters [Double] The values to set the parameters to. Should be sized based on the getParameterDimension result
Output None
Throws
Error Description
MachineLearningError.notEnoughData thrown if the size of the passed in array is smaller than that required. Use the getParameterDimension method to get the required size

####setCustomInitializer

Template setCustomInitializer(_ function: ((_ trainData: MLDataSet)->[Double])!)
Description This method will set a function that will be called on parameter initialization, with the initial training data set as the parameter. If nil [the default state], the parameters will be set to random values. If this function is not nil, the results of this function is used to initialize the parameters
Inputs
name Type Description
function ((_ trainData: MLDataSet)->[Double])! The function that will get the initial training data set passed to it and should return the values to be used to initialize the parameters result
Output None
Throws No

####getParameters

Template getParameters() throws -> [Double]
Description This method will get the parameters of the model. The model may need to be trained first
Inputs None
Output [Double] - the parameters of the model
Throws
Error Description
MachineLearningError.notTrained thrown if the model has not been trained, and is needed by the method before parameters are created

####trainRegressor

Template trainRegressor(_ trainData: MLRegressionDataSet) throws
Description This trains the regression model on the data set passed in. The model parameters will be initialized by the method before training begins.
Inputs
name Type Description
trainData MLRegressionDataSet The data set to train on. The data set dimensions (input and output) must match that of the model
Output None
Throws
Error Description
MachineLearningError.dataNotRegression thrown if the data set used for training is not a regression set
MachineLearningError.dataWrongDimension thrown if the data set used for training does not match the model
MachineLearningError.notEnoughData thrown if the data set used for training does not have enough data to train the model (usually the number of parameters in the model exceeds the data set size)
Additional exceptions may be thrown by individual model classes

####continueTrainingRegressor

Template continueTrainingRegressor(_ trainData: MLRegressionDataSet) throws
Description This trains the regression model on the data set passed in. The model parameters are not initialized first, so training continues with the current parameter set. Not all regression models support this.
Inputs
name Type Description
trainData MLRegressionDataSet The data set to train on. The data set dimensions (input and output) must match that of the model
Output None
Throws
Error Description
MachineLearningError.dataNotRegression thrown if the data set used for training is not a regression set
MachineLearningError.dataWrongDimension thrown if the data set used for training does not match the model
MachineLearningError.continuationNotSupported thrown if the regression model does not support training continuation
Additional exceptions may be thrown by individual model classes

####predictOne

Template predictOne(_ inputs: [Double]) throws ->[Double]
Description This gets the regression results for a single input vector. The input vector size must match the model input size.
Inputs
name Type Description
inputs [Double] The input vector to get the regression values for
Output [Double] - the resulting regression values
Throws
Error Description
DataTypeError.wrongDimensionOnInput thrown if the input vector dimension does not match the model
MachineLearningError.notTrained thrown if the model has not yet been trained
Additional exceptions may be thrown by individual model classes

####predict

Template predict(_ testData: MLRegressionDataSet) throws
Description This gets the regression results for all points in a data set. The data set type and dimensions must match the model requirements.
Inputs
name Type Description
testData MLRegressionDataSet The data set to get results for. The data set dimensions (input and output) must match that of the model
Output (the data set class passed in is modified to have the output results)
Throws
Error Description
MachineLearningError.dataNotRegression thrown if the data set used for training is not a regression set
MachineLearningError.dataWrongDimension thrown if the data set used for training does not match the model
MachineLearningError.notTrained thrown if the model has not yet been trained
Additional exceptions may be thrown by individual model classes