USAGE:
#import nnet
SYNOPSIS:
Provides routines to create, train, use and inspect
neural networks of arbitrary topology.
Neurons can be grouped into layers. Different
layers can have different activation functions. Learning
rules for connections between layers include backpropagation
(on-line, epoch-wise, resilient), hebbian learning,
delta-rule and no learning (constant weights).
In addition, the package supports training of partially
recurrent networks (Elman and Jordan-type) and training
of fully recurrent networks with backpropagation through
time. To support localized receptive fields (in block-rectangular
areas) and weight sharing, network layers can be specified
as of 1d, 2d or 3dimensional shape.
Iterators for neurons and weights, a contiguous arrangement
of all weights and activities in memory, and default positions
of neurons in 2d and 3d space, offer a simple yet flexible
and convenient interface for visualization
and optimization via external algorithms.
CONSTRUCTORS:
The first constructor allows the construction of networks
with a general topology. The remaining two constructors
facilitate the specification of layered networks.
A newly constructed network will have its weights
initialized to values in the range -0.01..0.01.
- mlp(int nodetype,int *topology)
- create network with
general topology, with inner and/or output nodes of type
nodetype (one of TANHxd,FERMIxd,LINEARxd,RATIOxd,LOGxd,
x=1,2 or 3).
Input nodes will be chosen as linear.
Each connection is specified by a pair
of successive integers in array topology, i.e.,
topology[]=i1,j1,i2,j2,...ik,jk specifies connections
i1->j1,i2->j2,..ik->jk. Use neuron indices i,j from
the range [-1,0,1,2,..n], where -1 represents an
invisible bias neuron that has the constant value of 1.
(for a network with general topology, bias connections are
not automatically generated but must be explicitly
included in the list).
- mlp(int n1, n2, ... nk)
- create network of k
layers. The i-th layer has ni neurons, successive
layers are fully connected. For each neuron, an
additional connection to a fixed [invisible] bias
neuron with fixed activity=1 is automatically
generated.
- mlp(int *layers, int *connections=NULL)
- create layered
network with layer sizes and layer node types
specified in layers[]. If array connections is NULL,
the default topology of a full set of connections between
successive layers is chosen.
Otherwise, each element pair i,j in connections[]
specifies the insertion of a full connection bundle from
layer i towards layer j. Example:
layers[] = {4, 10, 5}
connections[] = {0,1, 1,2, 0,2}
specifies a (4,10,5) network with an additional set
of "skip connections" that go directly from input layer 0
to the output layer 2.
SPECIFYING ACTIVATION FUNCTIONS AND LAYER TOPOLOGY:
The entries in layers[] may optionally be delimited with type constants
LINEAR1d..LINEAR3d, FERMI1d..FERMI3d, TANH1d..TANH3d,
LOG1d..LOG3d or
RATIO1d..RATIO3d to define for each layer a (multi-)rectangular
shape and its own type of activation functions, e.g,
layers[] = {LINEAR2d,10,20, FERMI3d,5,5,5, TANH1d,100}
defines an input layer of 200 neurons with linear activation
function, topologically arranged as a 10*20 grid, a hidden
layer of 125 fermi neurons, arranged as a 5*5*5 grid, and a
linear layer of 100 tanh-neurons (when the last type token
is omitted, the last FERMI3d would be used as the default).
The layer topology becomes important, when layers are connected
by spatially structured connection bundles (cf. below).
When no layer type tokens are used, each layer is considered
as one-dimensional, and default choices for the activation
functions are LINEAR1d for the input layer and
FERMI1d for the remaining layers.
Predefined activation function types are:
- FERMI1d,FERMI2d,FERMI3d:
- 1d,2d,3d layers of fermi function neurons
- TANH1d,TANH2d,TANH3d:
- 1d,2d,3d layers of tanh function neurons
- LINEAR1d,LINEAR2d,LINEAR3d:
- 1d,2d,3d layers of linear function neurons
(the output layer is always linear)
- RATIO1d,RATIO2d,RATIO3d:
- 1d,2d,3d layers of neurons with the
weakly decaying rational activity function x/(1+abs(x)).
- LOG1d,LOG2d,LOG3d:
- 1d,2d,3d layers of neurons with the
weakly unbounded activity function sgn(x)*log(1+abs(x)).
SPECIFYING TIME DELAY LAYERS FOR TDNN NETWORKS:
For layered networks, each layer can be created in n
separate copies (default n=1) which
are treated as time-delay copies: the first copy (k=0) holds the current
layer activity, the k-th copy (k>0) stores the k-th previous activity
of the k=0-copy. The value of n can be set for each layer individually
by adding the predefined constant TDL to the layer type and providing
n as an additional integer argument that follows after the size
parameters. Example:
layers[] = {LINEAR1d+TDL,4,2, FERMI1d+TDL,5,3, FERMI1d,1}
will create a 4-5-1 perceptron in which the input layer has
two and the middle layer has 3 time-delay copies of FERMI1d neurons,
while the output layer is a normal layer with a single neuron.
Outgoing connections of the additional copies need not (in fact
cannot) be defined explicitly: each copy will get a similar
(but independent) set of outgoing connections, with the same
types and targets as for the unshifted copy for index k=0.
The time delay copies will own no input weights, since they get
their inputs from the corresponding previous copy. This copying
is implemented without using any weights. Therefore, the
weight block that pertains to each copy is somewhat smaller
than the weight block of the k=0 copy (which may have input
weights while the remaining copies only have output weights).
Apart from this difference,
activities and weights of the copies are laid out in memory as
contiguous blocks of analogous organization as the index k=0 layer
and follow the index k=0 block.
SPECIFYING CONNECTION TYPES:
Similarly, each pair in connections can enclose an optional
connection type specifier (from a set of predefined type constants)
to specify the type of the connection bundle further (type constants
are always negative so that they get not confused with layer
indices). Example:
layers[] = {5, 10, 10, 2}
connections[] = {0,HEBB_W,2, 2,ELMAN_W,1, 2,BPROP_W,3}
which specifies an Elman network of four layers: the feedforward
part consists of layers 0,2 and 3, which are connected
in simple forward order, but using the Hebb rule for the
weights between layers 0,2 and the backpropagation rule for the
weights between layer 2 and 3. Layer 1 is an Elman layer
which receives ELMAN-type feedback connections from layer 2
(the order of the triples is arbitrary).
SPATIALLY RESTRICTED CONNECTIONS AND WEIGHT SHARING:
By adding the predefined constant LOCAL_W to a weight
type, the corresponding connection bundle will become
spatially restricted to a (hyper)rectangular region
(localized receptive field). The size of the region will
(for each axis separately) be inferred from the edge lengths
SrcLen and DestLen of the connected layers:
if DestLen<SrcLen, the receptive field will have an
extension of SrcLen-DestLen+1 along the considered axis
direction. If iDestLen>=SrcLen, the connection (along this
direction) will be unrestricted, with the only exception
of layers with equal numbers of neurons connected by ELMAN
weights: in this case, the connectivity will be one-to-one.
By adding the predefined constant SHARED_W to a weight
type, the corresponding connection bundle will be
shared among all neurons of the destination layer.
E.g., BPROP_W+LOCAL_W+SHARED_W makes localized shared
input weights.
NOTE: SHARED_W full connections make no sense, therefore,
shared should always imply LOCAL (but not yet implemented).
As a shortcut, connections[] may also consist of
a list of weight layer types only. In this case, each weight
layer type specifies the type of the connections between
successive layers specified with the
layers[] array. Example:
layers[] = {10,8,5}
connections[] = {BPROP_W+LOCAL_W, BPROP+SHARED_W}
specifies a 10-8-5 MLP with localized connections in the
first weight layer and shared weights in the second
weight layer.
USING AND TRAINING OF FEEDFORWARD NETWORKS:
When one of the constructor functions for layered networks
is used, the network topology is not analyzed automatically.
When only the layers[] array is used, the resulting network
is guaranteed to be feedforward. Otherwise, when specifying
connections with the connections[] array, one should keep
in mind that layers will be executed in the order of their
numbering, i.e., any connection entry i,j with i>=j will
destroy the feedforward property of the created network, which
may lead to incorrect results (unless recurrent backpropagation
training is used, cf. below).
When the constructor for a general network topology is used,
there is an automatic topology analysis. When the structure
is feedforward, the constructor will then automatically determine
a suitable update sequence. Otherwise, the update sequence
will be in increasing index order.
In both cases, the indicator function feedforward() indicates
whether the network topology is feedforward or not,
and public array order[] allows to specify any
desired user-defined update order for the created network
(assign the desired index sequence).
- void exec(float *inp=NULL)
- evaluate network for input
data inp (if inp is NULL, the current input activities
will be used without change. The array order[] can
be filled with the desired sequence in which neuron
activities shall be updated.
- void adapt(float *target)
- perform one adaptation step
(after a previous exec() call), using target values
target. During online learning, each adapt() call
will cause an immediate weight change.
During epoch learning, error gradient results
will be accumulated until a full epoch is completed.
Only then will the accumulated values lead to weight
changes.
- void mode(int mode, float f1=0, f2=0, f3=0)
-
Choose learning mode (currently, one of BPROP_M or
RPROP_M). For BPROP_M, any nonzero f1 will redefine
the global learning rate, f2 the momentum alpha,
f3 the flat spot elimination constant (the latter
not yet implemented). For RPROP_M, any nonzero f1
and/or f2 will set the learning step decay and growth
factors, and f3 the maximal step size.
- void init(void)
- initialize learning constants
to values set with mode() function, clear all
memorized gradients and start a new epoch, but
don't change the current weights or activities
(they can be changed explicitly through the
weight[] and act[] arrays).
PARTIALLY RECURRENT NETWORKS:
Layers that have the same number of neurons (they need not
be arranged in the same topology) may be connected by
connections of type ELMAN. This will produce a set
of unmodifiable 1:1-connections of value 1 between
corresponding (in their linear ordering sequence)
neurons of both layers. An ELMAN weight will not participate
in adaptation, but during execution, it will transmit to
its destination neuron the activity value that its source
neuron had in the previous time step. With this behavior,
the standard backpropagation algorithm (exec/adapt pair)
generalizes correctly to the training of partially recurrent
networks which are characterized by the weaker criterion of
an absence of any recurrent connections only among their
non-ELMAN weights. Note that this latter condition requires
that the index of an ELMAN layer must always be lower than
that of any layer to which it is connected with non-ELMAN weights.
RECURRENT BACKPROPAGATION:
Recurrent networks can be trained with Backpropagation-through-Time
(BPTT). This requires to enclose each training sequence in a
bgn_bptt()...end_bptt() function call pair. For good efficiency,
the (maximal) length of the training sequence should be passed as an
argument to bgn_bptt() (this avoids repeated resizing of internal
buffers). If all training sequences have the same length len,
one can also use the method train() (and similarly, test(),
cf. below) in
the following manner:
net.bgn_bptt(len);
net.train(...);
net.end_bptt(len);
This will cause train() method to consider its training data
as the concatenation of a number of training sequences (with each
sequence consisting of len successive input-output-blocks).
Technically, train() will insert a end_bptt();bgn_bptt(len)
pair after every len exec/adapt calls, so that the desired
BPTT for the entire data set results.
- bgn_bptt(int iSteps=0)
- begin backpropagation-in-time
sequence and prepare buffers for a time series of maximally iSteps.
Assumes that up to iSteps exec()/adapt() calls follow (it is
permitted to have longer time sequences. In this case, the necessary
buffer enlargement will occur automatically, but it is more efficient
to tell the appropriate buffer size in advance). For iSteps=0,
a default of 32 steps will be assumed.
- end_bptt(void)
- end current backpropagation-in-time sequence
and performs one learning step for the time series seen through
the exec/adapt sequence since the most recent bgn_bptt() call.
For BPTT training, this call is essential: without it, the preceding
adapt() calls will be without any effect!!
- no_bptt(void)
- end BPTT mode altogether (frees all BPTT specific
buffers).
Currently, BPTT works only with fixed step sizes that must be set in
the eta[] array (cf. next section).
PUBLIC CLASS VARIABLES:
- float act[]
- activities of all network neurones
(when the bias neuron has been automatically created,
it will not be included in act[]).
- float weight[]
- network weights, order in the
same manner as the pairs in array topology or
layerwise, if the constructors for layered networks
are used (in this case, for each neuron its bias weight
comes first).
- float eta[]
- for each weight a learning parameter.
For learning mode BPROP_M, this is the learning rate,
for RPROP_M, it is the learning step size.
- float alpha
- momentum term for BPROP_M learning mode:
alpha=0 (the default) will yield standard backprop learning.
A value 0<alpha<1 will modify the weight update to
Delta(t) = eta*Dw(t) + alpha*Delta(t-1)
Here, Delta(t) is the change of a weight at update step t,
and eta*Dw(t) is the change computed from the gradient according
to the unmodified backpropagation rule with learning rates eta
(for the single step in online learning, for the entire epoch in epoch learning).
NOTE: during online learning, t increments at each adaptation
step, while during epoch learning, t increments only at each epoch, i.e.,
the momentum alpha refers to correspondingly different time scales!
Useful values of alpha are somewhat below 1.
In RPROP_M mode, alpha is ignored.
- float lambda
- a global weight decay parameter.
After each update, all weights are multiplied by a factor
1-lambda (again we have two different timescales for online
and epoch learning here!).
- float mse[]
- mean square error for each output
value. Average will be over current epoch.
During online learing, average
will be since last init() call.
- float margin[]
- has one element for each output and
allows to specify an error margin below which that output
will be considered as error free (i.e., it then will not
contribute to the computation of the average error).
However, for positive margin value the associated error
(even if below the margin) will still be backpropagated to
drive the adaptation process. To omit below-margin errors
in all or some output dimensions for the adaptation process,
specify the corresponding margin values with a negative sign
(which will only be recognized as a flag).
- int order[]
- order in which neuron activities
will be recomputed by the exec() method (the adapt()
method will use the reverse order). For pure feedforward
networks, the constructor function will fill order[]
with a suitable sequence automatically. For networks
with feedback connections, order[] must be specified
by the user.
- int epochlen
- if 0, adapt() will use online
learning, otherwise epoch learning with epoch length
epochlen.
- int wtype[]
- for each weight a type specifier.
Usually set by the constructor function, but can
be overridden with user-supplied values (however,
changing the weight type, e.g., will not
change the connection topology in any way and thus
may lead to inconsistencies in some cases (e.g., when
specifying type ELMAN, which assumes 1:1 connections)).
FURTHER METHODS FOR TRAINING OR TESTING:
- float train(int epochs, int dim, float *data, int num, int first=0)
-
perform epochs training epochs for dim dimensional rows in array data.
The initial numinputs elements in each row will be used as input data,
the next numoutput elements will be the output data, any further row
elements are ignored (the condition numinputs+numoutputs<=dim is checked).
For num>=0, training will use num successive rows, starting at row
first. For num<0, training will use the entire data set, except for
abs(num) rows starting at row first (this is convenient for
crossvalidation). Return value is the root mean square error over
the epochs (the elementwise errors can be found in array mse[],
but note that the latter are still squared!). For epochs<0,
training will use maximally abs(epochs), but will terminate
as soon as the error in one epoch has reached 0.
- float train(int dim, float *data, int num, int first=0)
-
Shorthand for the above with epochs=1.
- float test(int dim, float *data, int num, int first=0)
-
Similar to train, but without adaptation, i.e., only error
statistics is computed and the root mean square error is returned.
- int schedule(void)
- find update order. When the network
is purely feedforward, this function has already been called
by the constructor. For recurrent networks, it should be called
when weight types for a partially recurrent topology have been
assigned to the wtype[] array. A nonzero return then
indicates that the network can be trained with the usual
exec/adapt methods (i.e., it is partially recurrent).
(the schedule() function ignores any ELMAN connections
for its check, ie., checks whether the network can be trained
with the standard backprop method).
INSPECTION METHODS:
- int neurons()
- return nr of neurons (bias neuron not included)
- int layers()
- return nr of layers (includes input layer
0 in its count).
- int layer_of(int id):
- get layer of neuron id.
- int for_inp_of(int n,char *fmt,void(*f)(float*),int *layers=NULL)
-
- int for_out_of(int n,char *fmt,void (*f)(float*),int *layers=NULL)
-
Iterate function f() over inputs (outputs) of neuron n. When
layers!=NULL, the iteration will be restricted to those inputs
(outputs) that originate (end) in layers specified by the entries
in layers. There is a second pair of routines where the last
elements are up to three ints to allow a simpler specification when
the number of included layers is small.
Return value is the number of iteration steps done.
On each call, f() is passed a vector v whose elements are assembled
according to the format string fmt: each character in fmt specifies
a particular data element for the passed vector v. The order and number
of elements in v will match the order and number of (non-blank)
characters in fmt. In the following, the selected neuron n will
be referred to as the proximal neuron, the neuron at the other end
of an input/output connection as the distal neuron.
The following format characters are defined:
- #
- index of weight between distal and proximal neuron
- w
- value of connection weight between distal and proximal neuron
- r
- learning rate of weight between distal and proximal neuron
- ?
- type of weight between distal and proximal neuron
- a,A
- activity of distal (proximal) neuron
- e,E
- error value at distal (proximal) neuron
- x,y,z,X,Y,Z
- coordinates of distal (upper case: proximal) neuron
- l,L
- layer to which distal (proximal) neuron belongs
- n,N
- index of distal (proximal) neuron (the latter coincides with arg1)
- f,F
- activity function type of distal (proximal) neuron
- i,j,k,I,J,K
- index positions of distal (upper case: proximal)
neuron in its layer
- <
- nr of inputs of distal neuron
- >
- nr of outputs of distal neuron
- float *for_inp_of(int n, char *fmt, int *layers=NULL)
-
- float *for_out_of(int n, char *fmt, int *layers=NULL)
-
Similar as before, but instead of iterating, returns a newly allocated
vector consisting of the concatenation of the argument vectors that would
have been passed to the iterated function. E.g., to get the input
weights of neuron n in layer l. concatenated into a single vector v, call
v := for_inp_of(n,"w");
- int for_neurons_of(int lay, char *fmt, void (*f)(float*))
-
- float *for_neurons_of(int lay, char *fmt)
-
Similar iterators for iterating over all neurons of a layer.
Format characters to be used for fmt are a subset of the above, ie.
- a
- activity of neuron
- b
- bias weight of neuron
- e
- error value at neuron
- n
- index of neuron
- f
- activity function type of neuron
- i,j,k
- index positions of neuron in its layer
- x,y,z
- coordinates of neuron
- l
- layer of neuron (always equal to arg1)
- <
- nr of inputs
- >
- nr of outputs
- void query_neuron(int neuron, char *fmt, float *vec)
-
Similar as for_neurons_of, but queries data only for a single neuron
and writes result into vector vec (clipping, if vec is too short).
AUXILIARY METHODS FOR VISUALIZATION:
- void bbox(int l, float x,y,z=0,dx=0,dy=0,dz=0)
-
specify a bounding box geometry (position and edge lengths) for layer l.
- void bbox(int l, float* x,y,z,dx,dy,dz)
- similar, but to
specify (retrieve) a bounding box for layer l. Zero delta values
(last args) will cause no change to the current delta settings.
- void bbox(int l, void (*fun)(float x1,y1,x2,y2))
-
- void bbox(int l, void (*fun)(float x1,y1,z1,x2,y2,z2))
-
Iterate drawing function fun over edges of bounding box of layer l.
- int pick(int l, int i,j=0,k=0):
-
get neuron id from layer number l and within-layer index coord i,j,k.
- int pick(float x, float y):
-
get neuron id from location
- void layout2d(float fGap=1,xll=0,yll=0,xur=1,
-
- yur=1,int *layers=NULL)
-
- void layout3d(float fGap=1,xll=0,yll=0,zll=0,
-
- xur=1,yur=1,zur=0,int *layers=NULL)
-
Specify arrangement of layers in rectanglular 2d (3d) box. The box is specified
by its lower left and upper right corner. fGap specifies the ratio of the
gap between adjacent neurons of the different and in the same layer (i.e.,
fGap=2 chooses the gap between layers twice as large as within a layer).
If layers=NULL, all layers will be chosen, otherwise the positioning
will be restricted to the layers whose numbers are listed in layers[].
Presently, there is only limited support for visualization of temporal delay
layers: when spatial dimensionality of a layer is lower than three, the
presence of a temporal delay dimension is treated as if the next higher
spatial dimension were present. I.e., when dx>1, dy=dz=1, a dt>1 will
be treated in the layout as if it had the place of dy; similarly, when
dx>1,dy>1,dz=1 a dt>1 will be treated as if it had the place of dz.
Therefore, TDNN networks with layer dimensionalities not larger than 2
can be visualized adequately, however, there is currently no support
for visualizing temporal delay layers for three-dimensional layers.
REMARKS:
Remove rule that default for output layer is Linear.
ALGORITHMS:
NOT YET IMPLEMENTED:
Later, one may add connection bundles one-by-one, using the
syntax [this reallocates w]
join(i,j,rule,type1,type2,type3)
- int pick(float x,y,z):
-
get neuron whose position is closest to x,y,z.
FILE
nnet.c