<B>nnet.so</B> - general neural network

USAGE:

#import nnet

SYNOPSIS:

Provides routines to create, train, use and inspect neural networks of arbitrary topology. Neurons can be grouped into layers. Different layers can have different activation functions. Learning rules for connections between layers include backpropagation (on-line, epoch-wise, resilient), hebbian learning, delta-rule and no learning (constant weights). In addition, the package supports training of partially recurrent networks (Elman and Jordan-type) and training of fully recurrent networks with backpropagation through time. To support localized receptive fields (in block-rectangular areas) and weight sharing, network layers can be specified as of 1d, 2d or 3dimensional shape. Iterators for neurons and weights, a contiguous arrangement of all weights and activities in memory, and default positions of neurons in 2d and 3d space, offer a simple yet flexible and convenient interface for visualization and optimization via external algorithms.

CONSTRUCTORS:

The first constructor allows the construction of networks with a general topology. The remaining two constructors facilitate the specification of layered networks. A newly constructed network will have its weights initialized to values in the range -0.01..0.01.

mlp(int nodetype,int *topology): create network with general topology, with inner and/or output nodes of type nodetype (one of TANHxd,FERMIxd,LINEARxd,RATIOxd,LOGxd, x=1,2 or 3). Input nodes will be chosen as linear. Each connection is specified by a pair of successive integers in array topology, i.e., topology[]=i1,j1,i2,j2,...ik,jk specifies connections i1->j1,i2->j2,..ik->jk. Use neuron indices i,j from the range [-1,0,1,2,..n], where -1 represents an invisible bias neuron that has the constant value of 1. (for a network with general topology, bias connections are not automatically generated but must be explicitly included in the list).

mlp(int n1, n2, ... nk): create network of k layers. The i-th layer has ni neurons, successive layers are fully connected. For each neuron, an additional connection to a fixed [invisible] bias neuron with fixed activity=1 is automatically generated.

mlp(int *layers, int *connections=NULL): create layered network with layer sizes and layer node types specified in layers[]. If array connections is NULL, the default topology of a full set of connections between successive layers is chosen.

Otherwise, each element pair i,j in connections[] specifies the insertion of a full connection bundle from layer i towards layer j. Example:


   layers[] = {4, 10, 5}
   connections[] = {0,1, 1,2, 0,2}

specifies a (4,10,5) network with an additional set of "skip connections" that go directly from input layer 0 to the output layer 2.

SPECIFYING ACTIVATION FUNCTIONS AND LAYER TOPOLOGY:

The entries in layers[] may optionally be delimited with type constants LINEAR1d..LINEAR3d, FERMI1d..FERMI3d, TANH1d..TANH3d, LOG1d..LOG3d or RATIO1d..RATIO3d to define for each layer a (multi-)rectangular shape and its own type of activation functions, e.g,


   layers[] = {LINEAR2d,10,20, FERMI3d,5,5,5, TANH1d,100}

defines an input layer of 200 neurons with linear activation function, topologically arranged as a 10*20 grid, a hidden layer of 125 fermi neurons, arranged as a 5*5*5 grid, and a linear layer of 100 tanh-neurons (when the last type token is omitted, the last FERMI3d would be used as the default). The layer topology becomes important, when layers are connected by spatially structured connection bundles (cf. below). When no layer type tokens are used, each layer is considered as one-dimensional, and default choices for the activation functions are LINEAR1d for the input layer and FERMI1d for the remaining layers.

Predefined activation function types are:

FERMI1d,FERMI2d,FERMI3d:: 1d,2d,3d layers of fermi function neurons
TANH1d,TANH2d,TANH3d:: 1d,2d,3d layers of tanh function neurons
LINEAR1d,LINEAR2d,LINEAR3d:: 1d,2d,3d layers of linear function neurons (the output layer is always linear)
RATIO1d,RATIO2d,RATIO3d:: 1d,2d,3d layers of neurons with the weakly decaying rational activity function x/(1+abs(x)).
LOG1d,LOG2d,LOG3d:: 1d,2d,3d layers of neurons with the weakly unbounded activity function sgn(x)*log(1+abs(x)).

SPECIFYING TIME DELAY LAYERS FOR TDNN NETWORKS:

For layered networks, each layer can be created in n separate copies (default n=1) which are treated as time-delay copies: the first copy (k=0) holds the current layer activity, the k-th copy (k>0) stores the k-th previous activity of the k=0-copy. The value of n can be set for each layer individually by adding the predefined constant TDL to the layer type and providing n as an additional integer argument that follows after the size parameters. Example:


    layers[] = {LINEAR1d+TDL,4,2, FERMI1d+TDL,5,3, FERMI1d,1}

will create a 4-5-1 perceptron in which the input layer has two and the middle layer has 3 time-delay copies of FERMI1d neurons, while the output layer is a normal layer with a single neuron. Outgoing connections of the additional copies need not (in fact cannot) be defined explicitly: each copy will get a similar (but independent) set of outgoing connections, with the same types and targets as for the unshifted copy for index k=0. The time delay copies will own no input weights, since they get their inputs from the corresponding previous copy. This copying is implemented without using any weights. Therefore, the weight block that pertains to each copy is somewhat smaller than the weight block of the k=0 copy (which may have input weights while the remaining copies only have output weights). Apart from this difference, activities and weights of the copies are laid out in memory as contiguous blocks of analogous organization as the index k=0 layer and follow the index k=0 block.

SPECIFYING CONNECTION TYPES:

Similarly, each pair in connections can enclose an optional connection type specifier (from a set of predefined type constants) to specify the type of the connection bundle further (type constants are always negative so that they get not confused with layer indices). Example:


   layers[] = {5, 10, 10, 2}
   connections[] = {0,HEBB_W,2, 2,ELMAN_W,1, 2,BPROP_W,3}

which specifies an Elman network of four layers: the feedforward part consists of layers 0,2 and 3, which are connected in simple forward order, but using the Hebb rule for the weights between layers 0,2 and the backpropagation rule for the weights between layer 2 and 3. Layer 1 is an Elman layer which receives ELMAN-type feedback connections from layer 2 (the order of the triples is arbitrary).

SPATIALLY RESTRICTED CONNECTIONS AND WEIGHT SHARING:

By adding the predefined constant LOCAL_W to a weight type, the corresponding connection bundle will become spatially restricted to a (hyper)rectangular region (localized receptive field). The size of the region will (for each axis separately) be inferred from the edge lengths SrcLen and DestLen of the connected layers: if DestLen<SrcLen, the receptive field will have an extension of SrcLen-DestLen+1 along the considered axis direction. If iDestLen>=SrcLen, the connection (along this direction) will be unrestricted, with the only exception of layers with equal numbers of neurons connected by ELMAN weights: in this case, the connectivity will be one-to-one. By adding the predefined constant SHARED_W to a weight type, the corresponding connection bundle will be shared among all neurons of the destination layer. E.g., BPROP_W+LOCAL_W+SHARED_W makes localized shared input weights. NOTE: SHARED_W full connections make no sense, therefore, shared should always imply LOCAL (but not yet implemented). As a shortcut, connections[] may also consist of a list of weight layer types only. In this case, each weight layer type specifies the type of the connections between successive layers specified with the layers[] array. Example:


    layers[] = {10,8,5}
    connections[] = {BPROP_W+LOCAL_W, BPROP+SHARED_W}

specifies a 10-8-5 MLP with localized connections in the first weight layer and shared weights in the second weight layer.

USING AND TRAINING OF FEEDFORWARD NETWORKS:

When one of the constructor functions for layered networks is used, the network topology is not analyzed automatically. When only the layers[] array is used, the resulting network is guaranteed to be feedforward. Otherwise, when specifying connections with the connections[] array, one should keep in mind that layers will be executed in the order of their numbering, i.e., any connection entry i,j with i>=j will destroy the feedforward property of the created network, which may lead to incorrect results (unless recurrent backpropagation training is used, cf. below). When the constructor for a general network topology is used, there is an automatic topology analysis. When the structure is feedforward, the constructor will then automatically determine a suitable update sequence. Otherwise, the update sequence will be in increasing index order. In both cases, the indicator function feedforward() indicates whether the network topology is feedforward or not, and public array order[] allows to specify any desired user-defined update order for the created network (assign the desired index sequence).

void exec(float *inp=NULL): evaluate network for input data inp (if inp is NULL, the current input activities will be used without change. The array order[] can be filled with the desired sequence in which neuron activities shall be updated.

void adapt(float *target): perform one adaptation step (after a previous exec() call), using target values target. During online learning, each adapt() call will cause an immediate weight change. During epoch learning, error gradient results will be accumulated until a full epoch is completed. Only then will the accumulated values lead to weight changes.

void mode(int mode, float f1=0, f2=0, f3=0): Choose learning mode (currently, one of BPROP_M or RPROP_M). For BPROP_M, any nonzero f1 will redefine the global learning rate, f2 the momentum alpha, f3 the flat spot elimination constant (the latter not yet implemented). For RPROP_M, any nonzero f1 and/or f2 will set the learning step decay and growth factors, and f3 the maximal step size.

void init(void): initialize learning constants to values set with mode() function, clear all memorized gradients and start a new epoch, but don't change the current weights or activities (they can be changed explicitly through the weight[] and act[] arrays).

PARTIALLY RECURRENT NETWORKS:

Layers that have the same number of neurons (they need not be arranged in the same topology) may be connected by connections of type ELMAN. This will produce a set of unmodifiable 1:1-connections of value 1 between corresponding (in their linear ordering sequence) neurons of both layers. An ELMAN weight will not participate in adaptation, but during execution, it will transmit to its destination neuron the activity value that its source neuron had in the previous time step. With this behavior, the standard backpropagation algorithm (exec/adapt pair) generalizes correctly to the training of partially recurrent networks which are characterized by the weaker criterion of an absence of any recurrent connections only among their non-ELMAN weights. Note that this latter condition requires that the index of an ELMAN layer must always be lower than that of any layer to which it is connected with non-ELMAN weights.

RECURRENT BACKPROPAGATION:

Recurrent networks can be trained with Backpropagation-through-Time (BPTT). This requires to enclose each training sequence in a bgn_bptt()...end_bptt() function call pair. For good efficiency, the (maximal) length of the training sequence should be passed as an argument to bgn_bptt() (this avoids repeated resizing of internal buffers). If all training sequences have the same length len, one can also use the method train() (and similarly, test(), cf. below) in the following manner:


    net.bgn_bptt(len);
    net.train(...);
    net.end_bptt(len);

This will cause train() method to consider its training data as the concatenation of a number of training sequences (with each sequence consisting of len successive input-output-blocks). Technically, train() will insert a end_bptt();bgn_bptt(len) pair after every len exec/adapt calls, so that the desired BPTT for the entire data set results.

bgn_bptt(int iSteps=0): begin backpropagation-in-time sequence and prepare buffers for a time series of maximally iSteps. Assumes that up to iSteps exec()/adapt() calls follow (it is permitted to have longer time sequences. In this case, the necessary buffer enlargement will occur automatically, but it is more efficient to tell the appropriate buffer size in advance). For iSteps=0, a default of 32 steps will be assumed.

end_bptt(void): end current backpropagation-in-time sequence and performs one learning step for the time series seen through the exec/adapt sequence since the most recent bgn_bptt() call. For BPTT training, this call is essential: without it, the preceding adapt() calls will be without any effect!!

no_bptt(void): end BPTT mode altogether (frees all BPTT specific buffers).

Currently, BPTT works only with fixed step sizes that must be set in the eta[] array (cf. next section).

PUBLIC CLASS VARIABLES:

float act[]: activities of all network neurones (when the bias neuron has been automatically created, it will not be included in act[]).

float weight[]: network weights, order in the same manner as the pairs in array topology or layerwise, if the constructors for layered networks are used (in this case, for each neuron its bias weight comes first).

float eta[]: for each weight a learning parameter. For learning mode BPROP_M, this is the learning rate, for RPROP_M, it is the learning step size.

float alpha: momentum term for BPROP_M learning mode: alpha=0 (the default) will yield standard backprop learning. A value 0<alpha<1 will modify the weight update to


   Delta(t) = eta*Dw(t) + alpha*Delta(t-1)

Here, Delta(t) is the change of a weight at update step t, and eta*Dw(t) is the change computed from the gradient according to the unmodified backpropagation rule with learning rates eta (for the single step in online learning, for the entire epoch in epoch learning). NOTE: during online learning, t increments at each adaptation step, while during epoch learning, t increments only at each epoch, i.e., the momentum alpha refers to correspondingly different time scales! Useful values of alpha are somewhat below 1. In RPROP_M mode, alpha is ignored.

float lambda: a global weight decay parameter. After each update, all weights are multiplied by a factor 1-lambda (again we have two different timescales for online and epoch learning here!).

float mse[]: mean square error for each output value. Average will be over current epoch. During online learing, average will be since last init() call.

float margin[]: has one element for each output and allows to specify an error margin below which that output will be considered as error free (i.e., it then will not contribute to the computation of the average error). However, for positive margin value the associated error (even if below the margin) will still be backpropagated to drive the adaptation process. To omit below-margin errors in all or some output dimensions for the adaptation process, specify the corresponding margin values with a negative sign (which will only be recognized as a flag).

int order[]: order in which neuron activities will be recomputed by the exec() method (the adapt() method will use the reverse order). For pure feedforward networks, the constructor function will fill order[] with a suitable sequence automatically. For networks with feedback connections, order[] must be specified by the user.

int epochlen: if 0, adapt() will use online learning, otherwise epoch learning with epoch length epochlen.

int wtype[]: for each weight a type specifier. Usually set by the constructor function, but can be overridden with user-supplied values (however, changing the weight type, e.g., will not change the connection topology in any way and thus may lead to inconsistencies in some cases (e.g., when specifying type ELMAN, which assumes 1:1 connections)).

FURTHER METHODS FOR TRAINING OR TESTING:

float train(int epochs, int dim, float *data, int num, int first=0): perform epochs training epochs for dim dimensional rows in array data. The initial numinputs elements in each row will be used as input data, the next numoutput elements will be the output data, any further row elements are ignored (the condition numinputs+numoutputs<=dim is checked). For num>=0, training will use num successive rows, starting at row first. For num<0, training will use the entire data set, except for abs(num) rows starting at row first (this is convenient for crossvalidation). Return value is the root mean square error over the epochs (the elementwise errors can be found in array mse[], but note that the latter are still squared!). For epochs<0, training will use maximally abs(epochs), but will terminate as soon as the error in one epoch has reached 0.

float train(int dim, float *data, int num, int first=0): Shorthand for the above with epochs=1.

float test(int dim, float *data, int num, int first=0): Similar to train, but without adaptation, i.e., only error statistics is computed and the root mean square error is returned.

int schedule(void): find update order. When the network is purely feedforward, this function has already been called by the constructor. For recurrent networks, it should be called when weight types for a partially recurrent topology have been assigned to the wtype[] array. A nonzero return then indicates that the network can be trained with the usual exec/adapt methods (i.e., it is partially recurrent). (the schedule() function ignores any ELMAN connections for its check, ie., checks whether the network can be trained with the standard backprop method).

INSPECTION METHODS:

int neurons(): return nr of neurons (bias neuron not included)
int layers(): return nr of layers (includes input layer 0 in its count).
int layer_of(int id):: get layer of neuron id.
int for_inp_of(int n,char *fmt,void(*f)(float*),int *layers=NULL)
int for_out_of(int n,char *fmt,void (*f)(float*),int *layers=NULL): Iterate function f() over inputs (outputs) of neuron n. When layers!=NULL, the iteration will be restricted to those inputs (outputs) that originate (end) in layers specified by the entries in layers. There is a second pair of routines where the last elements are up to three ints to allow a simpler specification when the number of included layers is small. Return value is the number of iteration steps done.

On each call, f() is passed a vector v whose elements are assembled according to the format string fmt: each character in fmt specifies a particular data element for the passed vector v. The order and number of elements in v will match the order and number of (non-blank) characters in fmt. In the following, the selected neuron n will be referred to as the proximal neuron, the neuron at the other end of an input/output connection as the distal neuron.

The following format characters are defined:

#: index of weight between distal and proximal neuron
w: value of connection weight between distal and proximal neuron
r: learning rate of weight between distal and proximal neuron
?: type of weight between distal and proximal neuron
a,A: activity of distal (proximal) neuron
e,E: error value at distal (proximal) neuron
x,y,z,X,Y,Z: coordinates of distal (upper case: proximal) neuron
l,L: layer to which distal (proximal) neuron belongs
n,N: index of distal (proximal) neuron (the latter coincides with arg1)
f,F: activity function type of distal (proximal) neuron
i,j,k,I,J,K: index positions of distal (upper case: proximal) neuron in its layer
<: nr of inputs of distal neuron
>: nr of outputs of distal neuron

float *for_inp_of(int n, char *fmt, int *layers=NULL)
float *for_out_of(int n, char *fmt, int *layers=NULL): Similar as before, but instead of iterating, returns a newly allocated vector consisting of the concatenation of the argument vectors that would have been passed to the iterated function. E.g., to get the input weights of neuron n in layer l. concatenated into a single vector v, call


    v := for_inp_of(n,"w");

int for_neurons_of(int lay, char *fmt, void (*f)(float*))
float *for_neurons_of(int lay, char *fmt): Similar iterators for iterating over all neurons of a layer. Format characters to be used for fmt are a subset of the above, ie.

a: activity of neuron
b: bias weight of neuron
e: error value at neuron
n: index of neuron
f: activity function type of neuron
i,j,k: index positions of neuron in its layer
x,y,z: coordinates of neuron
l: layer of neuron (always equal to arg1)
<: nr of inputs
>: nr of outputs

void query_neuron(int neuron, char *fmt, float *vec): Similar as for_neurons_of, but queries data only for a single neuron and writes result into vector vec (clipping, if vec is too short).

AUXILIARY METHODS FOR VISUALIZATION:

void bbox(int l, float x,y,z=0,dx=0,dy=0,dz=0): specify a bounding box geometry (position and edge lengths) for layer l.

void bbox(int l, float* x,y,z,dx,dy,dz): similar, but to specify (retrieve) a bounding box for layer l. Zero delta values (last args) will cause no change to the current delta settings.

void bbox(int l, void (*fun)(float x1,y1,x2,y2))
void bbox(int l, void (*fun)(float x1,y1,z1,x2,y2,z2)): Iterate drawing function fun over edges of bounding box of layer l.
int pick(int l, int i,j=0,k=0):: get neuron id from layer number l and within-layer index coord i,j,k.

int pick(float x, float y):: get neuron id from location

void layout2d(float fGap=1,xll=0,yll=0,xur=1,
yur=1,int *layers=NULL)
void layout3d(float fGap=1,xll=0,yll=0,zll=0,
xur=1,yur=1,zur=0,int *layers=NULL): Specify arrangement of layers in rectanglular 2d (3d) box. The box is specified by its lower left and upper right corner. fGap specifies the ratio of the gap between adjacent neurons of the different and in the same layer (i.e., fGap=2 chooses the gap between layers twice as large as within a layer). If layers=NULL, all layers will be chosen, otherwise the positioning will be restricted to the layers whose numbers are listed in layers[].

Presently, there is only limited support for visualization of temporal delay layers: when spatial dimensionality of a layer is lower than three, the presence of a temporal delay dimension is treated as if the next higher spatial dimension were present. I.e., when dx>1, dy=dz=1, a dt>1 will be treated in the layout as if it had the place of dy; similarly, when dx>1,dy>1,dz=1 a dt>1 will be treated as if it had the place of dz. Therefore, TDNN networks with layer dimensionalities not larger than 2 can be visualized adequately, however, there is currently no support for visualizing temporal delay layers for three-dimensional layers.

REMARKS:

Remove rule that default for output layer is Linear.

ALGORITHMS:

NOT YET IMPLEMENTED:

Later, one may add connection bundles one-by-one, using the syntax [this reallocates w]


   join(i,j,rule,type1,type2,type3)

int pick(float x,y,z):: get neuron whose position is closest to x,y,z.

FILE

nnet.c