Table of Contents

The AMORE package was born to provide the user with an unusual neural network simulator: a highly flexible environment that should allow the user to get direct access to the network parameters, providing more control over the learning details and allowing the user to customize the available functions in order to suit their needs. The current status (version 0.2-9) of the package is capable of training a multilayer feedforward network according to both the adaptive and the batch versions of the gradient descent with momentum backpropagation algorithm. Thanks to the structure adopted, expanding the number of available error criteria is as difficult as programming the corresponding `R`

costs functions. Everyone is invited to grow the number of available functions and share their experiences.

The end users wanting to merely use common neural networks already have various and sundry good simulators to choose from. Most likely, among these, those familiar with `R`

may wish to be able to train their nets without having to leave the `R`

environment. The *AMORE* package may be useful for that aim, but in truth these users may find other faster alternatives. But for the researchers interested in neural networks and robust modeling, those wanting to gain fine control on the operating procedures of the training algorithms, this package offers a useful platform to program their own algorithms with relative ease.

The users may choose among three different degrees of involvement with the package: using the already available functions, programming their own functions playing in the easy `R`

arena during the first steps and trials, and using the C programming language to speed up the training heavy tasks.

We hope that the careful reading of the functions may serve as a way of gaining a greater understanding about the training methods already programmed, as well as an inspiration about how to expand the current possibilities.

In order to ease the understanding of these readings, we will show in the following lines the main elements needed to make the whole thing work. We describe in the following sections the basic objects and their corresponding functions, using the net shown in the following figure for exemplification purposes: a multilayer feedforward network, featuring two hidden layers with four and two neurons, which considers two input variables in order to provide a bidimensional output.

We have modeled the ANN using lists wherever a complex structure was needed. The preferred method to create the network is by using the `newff`

function. For our example we would use something like:

> net.start <- newff(n.neurons=c(2,4,2,2),learning.rate.global=1e-2, momentum.global=0.5, error.criterium="LMS", Stao=NA, hidden.layer="tansig", output.layer="purelin", method="ADAPTgdwm")

The resulting net contains the following elements, in the following order:

A list of as many numerical vectors as layers are needed. Each vector contains the indexes of the neurons that belong to that layer. In the previous line of our example, we have especified the value `c(2,4,2,2)`

for the `n.neurons`

parameter: that is to say, that we want our net to have two input neurons, a first hidden layer with four neurons, a second hidden layer with two neurons and, finally, an output layer with two output neurons. The `layers`

element of `net.start`

should have the following contents:

> net.start$layers [[1]] [1] -1 -2 [[2]] [1] 1 2 3 4 [[3]] [1] 5 6 [[4]] [1] 7 8

Thus, we have four layers, the first containing the neurons number -1 and -2, the second with neurons numbered 1, 2, 3 and 4 and 5, the third with neurons 5 and 6, and the last with neurons 7 and 8. The reader may notice that the indexes of the two input neurons are negative. The input neurons are merely *virtual* neurons, they are conceptual supports to get access to the different `input`

vector components. The absolute value of the indexes of these input neurons gives information about which particular component of the `input`

vector we are referring to.

A list containing the neurons, which are lists in themselves as well, and deserve a deeper explanation. We provide it in the following section.

A numerical vector that contains the values of the input variables for a single case from the data set under consideration. It contains the input signals to be propagated through the network. In our example, it would contain the value of *Input 1* and *Input 2*.

A numerical vector that contains the resulting predictions once the input signals have been propagated.

A numerical vector that contains the expected target values towards which the network’s output must approach.

The cost function used to measure the errors amongst the outputs and the targets. It is written in R code so as to ease its modification according to the user needs. Currently we provide cost functions for the Least Mean Squares, Least Mean Log Squares and TAO robust criteria. In our example we especified the option `error.criterium=”LMS”`

, thus we should be using the Least Mean Squares cost function:

> net.start$deltaE function (arguments) { prediction <- arguments[[1]] target <- arguments[[2]] residual <- prediction - target return(residual) }

As the reader may see, it is remarkably easy to modify this function so as to use a different criterium.

This element is a list that contains auxiliary elements. Currently only the Stao parameter used by the TAO robust cost function is contained in this list.

We have chosen to represent an artificial neuron as a list whose elements contain the neuron weights, bias term, activation function (written in `R`

code), and the necessary properties to allow the propagation of the signal from the inputs of the neuron to its output. The `newff`

function calls the `init.neuron`

function in order to properly create the network neurons. It allows the user to choose the neuron to be either an *output* neuron or a *hidden* one, to represent a linear activation function (*pureline*), a hyperbolic tangent (*tansig*) or a sigmoidal (*sigmoid*) activation function; or even define it as *custom* in case we would like to setup its activation function accordingly programming the `R`

code of the function and its derivative. The neuron contains the following elements in the following order:

> names(net.start$neurons[[1]]) [1] "id" "type" [3] "activation.function" "output.links" [5] "output.aims" "input.links" [7] "weights" "bias" [9] "v0" "v1" [11] "f0" "f1" [13] "method" "method.dep.variables"

We will spend a few lines describing the meaning of each element by means of the neurons of `net.start`

.

The index of the neuron. It is supposed to be redundant since it should be numerically equal to the `R`

index of the element in the `neurons`

list. Not surprisingly, for the neurons considered:

> net.start$neurons[[1]]$id [1] 1 > net.start$neurons[[5]]$id [1] 5 > net.start$neurons[[8]]$id [1] 8

Information about the nature of the layer this neuron belongs to: It can be either *hidden* or *output*. We recall the reader that we have considered that the input neurons do not “really” exist.

> net.start$neurons[[1]]$type [1] "hidden" > net.start$neurons[[5]]$type [1] "hidden" > net.start$neurons[[8]]$type [1] "output"

The name of the activation function that characterizes the neuron. Available functions are *tansig*, *purelin*, *sigmoid* and *hardlim*. The *custom* case allows the user willing to perform the needed changes on the `f0`

and `f1`

functions to use their own ones.

> net.start$neurons[[1]]$activation.function [1] "tansig" > net.start$neurons[[5]]$activation.function [1] "tansig" > net.start$neurons[[8]]$activation.function [1] "purelin"

The indexes of the neurons towards which the output of this neuron will be propagated, that is to say, the indexes of the neurons that use this neuron’s output as one of their inputs.

> net.start$neurons[[1]]$output.links [1] 5 6 > net.start$neurons[[5]]$output.links [1] 7 8 > net.start$neurons[[8]]$output.links [1] NA

Most frequently, output neurons do not point to any other neuron and a “NA” value reflects this.

The neuron may use the outputs of many other neurons as inputs, and each input has to be weighted. This requires the inputs to be ordered, and this element accounts for that order: the position of this neuron’s output at the subsequent neuron’s input, according to the `output.links`

.

> net.start$neurons[[1]]$output.aims [1] 1 1 > net.start$neurons[[3]]$output.aims [1] 3 3 > net.start$neurons[[8]]$output.aims [1] 2

The indexes of those neurons that feed this one.

> net.start$neurons[[1]]$input.links [1] -1 -2 > net.start$neurons[[5]]$input.links [1] 1 2 3 4 > net.start$neurons[[8]]$input.links [1] 5 6

The weights of the conexions indicated by `input.links`

.

> net.start$neurons[[1]]$weights [1] -0.32384606 0.09150842 > net.start$neurons[[5]]$weights [1] 0.31000780 -0.03621645 [3] 0.31094491 -0.25087121 > net.start$neurons[[8]]$weights [1] -0.24677257 0.07988028

The value of the neuron’s bias term.

> net.start$neurons[[1]]$bias [1] 0.07560576 > net.start$neurons[[5]]$bias [1] -0.2522307 > net.start$neurons[[8]]$bias [1] -0.04253238

It stores the last value provided by applying `f0`

.

> net.start$neurons[[1]]$v0 [1] 0 > net.start$neurons[[5]]$v0 [1] 0 > net.start$neurons[[8]]$v0 [1] 0

It stores the last value provided by applying `f1`

.

> net.start$neurons[[1]]$v1 [1] 0 > net.start$neurons[[5]]$v1 [1] 0 > net.start$neurons[[8]]$v1 [1] 0

The activation function. In our example, the neurons at the hidden layers have been defined as *tansig*, while the outputs are *linear*; thus, not surprinsingly:

> net.start$neurons[[1]]$f0 function (v) { a.tansig <- 1.71590470857554 b.tansig <- 0.666666666666667 return(a.tansig * tanh(v * b.tansig)) } > net.start$neurons[[5]]$f0 function (v) { a.tansig <- 1.71590470857554 b.tansig <- 0.666666666666667 return(a.tansig * tanh(v * b.tansig)) } > net.start$neurons[[8]]$f0 function (v) { return(v) }

The derivative of the activation function. Following our example,

> net.start$neurons[[1]]$f1 function (v) { a.tansig <- 1.71590470857554 b.tansig <- 0.666666666666667 return(a.tansig*b.tansig*(1-tanh(v*b.tansig)^2)) } > net.start$neurons[[5]]$f1 function (v) { a.tansig <- 1.71590470857554 b.tansig <- 0.666666666666667 return(a.tansig * b.tansig*(1-tanh(v*b.tansig)^2)) }

whilst the outputs are linear, so

> net.start$neurons[[8]]$f1 function (v) { return(1) }

The training method. Currently, the user may choose amongst the adaptative and the batch mode of the gradient descent backpropagation training method, both with or without momentum: The names of the methods are `ADAPTgd`

, `ADAPTgdwm`

, `BATCHgd`

, ...

> net.start$neurons[[1]]$method [1] "ADAPTgdwm" > net.start$neurons[[5]]$method [1] "ADAPTgdwm" > net.start$neurons[[8]]$method [1] "ADAPTgdwm"

Those variables specifically needed by the training method. Their are shown in the table below.

**delta**: This element stores the correction effect due to the derivative of the cost`deltaE`

over the weights and the bias. Depending on the neuron’s type, this is obtained simply by multiplying the value of`deltaE`

times`v1`

, for the output neurons; while for the hidden neurons, it is obtained through the multiplication of`v1`

times the summatory of the weights times the`delta`

value of the other neurons pointed by this one.

**learning.rate**: It contains the learning rate value for this particular neuron. It is usually set to be equal to the`learning.rate.global`

, but more sophisticated training methods may make use of this variable to assign different learning rate values to each neuron.

**sum.delta.x**: Used by the batch methods to accumulate the individual error effects during the forward pass over the whole training set so as to calculate the weight’s correction later during the backward pass.

**sum.delta.bias**: Similar to`sum.delta.x`

, but now concerning the correction of the bias term.

**momentum**: Similarly to the`learning.rate`

variable, this variable is usually equal to the`momentum.global`

, but again, new training methods may make use of this variable to assign different momentum rate values to each neuron.

**former.weight.change**: It contains the former value of the weight change that was applied during the previous iteration.

**former.bias.change**: It contains the former values of the bias change that was applied during the previous iteration.

ADAPTgd | ADAPTgdwm | BATCHgd | BATCHgdwm |
---|---|---|---|

delta | delta | delta | delta |

learning.rate | learning.rate | learning.rate | learning.rate |

momentum | sum.delta.x | sum.delta.x | |

former.weight.change | sum.delta.bias | sum.delta.bias | |

former.bias.change | momentum | ||

former.weight.change | |||

former.bias.change |

Training methods and method dependent variables.

Looking at our example:

> net.start$neurons[[1]]$method.dep.variables $delta [1] 0 $learning.rate [1] 0.01 $momentum [1] 0.5 $former.weight.change [1] 0 0 $former.bias.change [1] 0 > net.start$neurons[[5]]$method.dep.variables $delta [1] 0 $learning.rate [1] 0.01 $momentum [1] 0.5 $former.weight.change [1] 0 0 0 0 $former.bias.change [1] 0 > net.start$neurons[[8]]$method.dep.variables $delta [1] 0 $learning.rate [1] 0.01 $momentum [1] 0.5 $former.weight.change [1] 0 0 $former.bias.change [1] 0

Only a few functions are needed. We have the `newff`

function to correctly create the net, the `train`

function to train it, and the `sim.MLPnet`

function to simulate the response. We can alse make use of the `training.report`

function, editing and customizing it so as to provide our prefered results during the training process, or even to plot some graphics. This can be done, for example, while editing the function, `fix(training.report)`

, by just substituting line 2 which reads:

P.sim <- sim.MLPnet(net, P)

for the corresponding plotting commands:

P.sim <- sim.MLPnet(net, P) plot(P,T, pch="+") points(P,P.sim, col="red", pch="+")

That change will provide, dimension of the data permitting, the desired graphical output every `show.step`

. as defined in the `train`

function. In the following paragraph we provide the commands to train a simple network:

require(AMORE) ## We create two artificial data sets. ''P'' is the input data set. ''target'' is the output. P <- matrix(sample(seq(-1,1,length=500), 500, replace=FALSE), ncol=1) target <- P^2 + rnorm(500, 0, 0.5) ## We create the neural network object net.start <- newff(n.neurons=c(1,3,1), learning.rate.global=1e-2, momentum.global=0.5, error.criterium="LMS", Stao=NA, hidden.layer="tansig", output.layer="purelin", method="ADAPTgdwm") ## We train the network according to P and target. result <- train(net.start, P, target, error.criterium="LMS", report=TRUE, show.step=100, n.shows=5 ) ## Several graphs, mainly to remark that ## now the trained network is is an element of the resulting list. y <- sim(result$net, P) plot(P,y, col="blue", pch="+") points(P,target, col="red", pch="x")

The following pictures show two resulting plots. Epoch number 750. Epoch number 1000.

The natural development would be to provide the package with more sophisticated training methods and to speed up the training functions, mostly the C code, while keeping the flexibility. Right now, we are developing an extension of *AMORE* so as to provide support for RBF networks, which we hope to deliver soon.

The authors gratefully acknowledge the financial support of the Ministerio de Educación y Ciencia through project grants DPI2006–14784, DPI-2006-02454, DPI2006-03060 and DPI2007-61090; of the European Union through project grant RFSR-CT-2008-00034; and of the Autonomous Government of La Rioja for its support through the 3rd Plan Riojano de I+D+i.

**Version:**0.2-11**Date:**2009-02-19**Authors:**Manuel Castejón Limas, Joaquín B. Ordieres Meré, Francisco Javier Martínez de Pisón Ascacibar, Alpha V. Pernía Espinoza, Fernando Alba Elías, Ana González Marcos, Eliseo P. Vergara González,**Maintainer:**Manuel Castejón Limas**License:**GPL version 2 or newer.**Correspondence Author:**

Manuel Castejón Limas, Área de Proyectos de Ingeniería. Universidad de León. Escuela de Ingenierías Industrial e Informática. Campus de Vegazana sn. León. Castilla y León. Spain. e-mail: manuel.castejon at unileon.es