NAME

bp_layer -- create a layer of bp-nodes

PROTOTYPE

unittype *bp_layer( int din, int nodes, float(*f)(float), float(*df)(float), char *pcOpt, unittype *dest)

ARGUMENTS

int din
nr of inputs per node
int nodes
nr of nodes
float(*f)(float)
activation function ( NULL selects fermi function )
float(*df)(float)
derivative of activation function ( NULL for derivative of fermi function )
char *pcOpt
option string [see below]. NULL selects defaults.
unittype *dest
host unit

INTERFACE:

Inputs:

X_in[din]
(input field 0) input vector x
Y_in[nodes]
(input field 1) error input vector d=target-X_out
Z_in[nodes*(1+din)]:
(optional packed input field 2) this input field is only present when the %S option for weight sharing was specified. In this case, this input field expects to receive a set of weight values (see below).
Eps_in[3]
adaptation gain Eta, momentum parameter Alpha and decay rate Gamma
Ctl_in[1]
(control input) The integer part of this value specifies the epoch length (error gradients will be internally accumulated over an epoch and only at the last epoch step a weight change, using the accumulated gradients, will be made). A value of 1 yields on-line learning. A value of 0 skips any execution and adaptation operations.

Outputs:

X_out[nodes]
(output field 0) network output (=node activity vector) y
Y_out[din]
(output field 1) backpropagated error vector (NOTE: if several bp_layer units are connected, this field must be connected to Y_in[] of the preceding bp_layer).
Z_out[nodes*(1+din)]
weight values. For each node a block of 1+din weight values. The first value is the node's bias, the remaining din values are the weights from inputs 0..din-1. NOTE: if option %S (weight sharing mode) is specified, this field is absent and instead the weights are at input field Y_in[nodes*(1+din)].

DESCRIPTION:

Creates bp_layer with nodes nodes, each having din inputs. from a common input field X_in[din].

EXECUTION:

Computes output vector y from input x according to

    y[i] = f(s[i])

where

    s[i] = sum_{j=0..din-1} (w[i][j]x[j] + bias[i])

ADAPTATION:

Computes from error input d[]=Y_in[] a back-propagated error vector e[]=Y_out[] according to:

    e[j] = sum_{i=0..nodes-1} d[i] f'(s[i]) w[i][j]

where s[i] are the scalar products given above, and f' denotes the derivative (argument df) of the transfer function f. In addition, for each weight/bias, the following error derivative is

computed:


    dE/dw[i][j] = d[i] f'(s[i]) x[j]
    dE/dbias[i] = d[i] f'(s[i])

These are summed over an epoch and weighted averages over the epoch t=0,1,..T-1 are computed:

   DE/DW(nT+T) = Alpha*DE/DW(nT) + Eta*sum_{t=0..T-1} dE/dw[i][j](t)
   DE/DB(nT+T) = Alpha*DE/DB(nT) + Eta*sum_{t=0..T-1} dE/dbias[i](t)

(here, DE/DW:=dE/dw[i][j], DE/DB:=dE/dbias[i] ). These are then used to update the weights. For standard backprop, this is done according to

   Delta w[i][j] = - DE/dW[i][j]
   Delta bias[i] = - DE/dB[i]

(Note that for T=1 all equations specialize into on-line backprop) In Rprop, internal step sizes Dij that are independent dynamical variables are used. They are updated according to:

          Dij *= Eta+ if dE/dwij(nT-T) * dE/dwij(nT) > 0 (1)
          Dij *= Eta- if dE/dwij(nT-T) * dE/dwij(nT) < 0 (2)
             no change otherwise (3)

Currently, the parameters Eta+ and Eta- have fixed values of 1.2 and 0.5. The weight changes Delta w[i][j] are computed from Dij as:

          if (1) or (3) above: Dwij = - sgn(dE/dwij(nT)) Dij
          if (2) above: Dwij = - Dwij(nT-T) and dE/dwij(nT):=0

and for the biases analogously.

TREATMENT OF MISSING VALUES:

The input vector and the error difference vector may contain elements with the non-value NaN. In both cases, such elements are treated as if zero; thus they neither affect the activity computation nor the error backpropagation and act as if they were missing values.

OPTIONS:

pcOpt can contain one of the following control tokens:
%aF
flat spot elimination: adds a constant a to the derivative of the selected transfer function. If no parameter a is specified, a default of a=0.1 is used. This option is mainly useful for binary input patterns that assume the asymptotic values of the transfer function. It may lead to deterioration in other cases.
%aR
use resilient backpropagation algorithm. The optional parameter a allows to specify an inital value for the step sizes (if a is absent, the default 0.1 is used). NOTE: resilient bp is a batch algorithm that requires epoch learning. Thus, CTL_in(0) must be set to the number of patterns per epoch.
%S
operate in weight sharing mode: this mode allows the unit to share its weights with one or several other bp_layers (which must all be of the same size nodes and fan-in din). In this case, the unit does not save any weights (since these are the same as in the weight host unit). For details, see below.

WEIGHT SHARING WITH OTHER BP-layers:

If k units (at least, k=2) are to share their weights, k-1 units must be created with the %S option, one unit (the `weight host'') must be created in `normal'' (non-sharing) mode. The k-1 units that were created in the sharing mode will instead of the usual output field out_2[] for the weights have an additional input field in_[] which must be connected to the output field out_2[] of the common `weight host''. A unit in weight sharing mode will use the weights of the attachted weight host for all its computations. In particular, whenever a weight update step is due, a weight sharing unit will add the computed weight change into the internal array of the weight host that is used by the weight host to compute its weight change. In this way, a weight host automatically collects the weight change contributions from all the attached `weight clients'' in addition to its own. When it finally adds the accumulated weight change to its weights, this corresponds to an update that includes its own and all the contributions of the k-1 other units that collectively share the weights in the weight host. Since the contributions of all k units are summed, each unit's contribution will be weighted by its individual learning rate. This allows to control the relative contribution of different layers.

INITIALIZATION:

The weights may be initialized or re-initialized by calling ctrl_unit(cmode,u), where cmode is one of the following:

       NST_I_RND uniformly between RND_LIMIT1 and RND_LIMIT2
       NST_I_NOR uniformly between RND_LIMIT1 and RND_LIMIT2
                     and subsequent normalization to unit length.
       NST_I_ZERO set all weights to zero [THIS IS NOT USEFUL!]
       NST_I_USER put the unit into "initialization mode" (see below)

In addition, each of these calls sets all output values X_out[] and all biases to zero. Default initialization is with NST_I_NOR. If initialization of the weights w[i][j] to user defined values is desired, a special `initialization mode'' can be turned on. This is achieved by ctrl_unit(NST_I_USER,u). Initialization mode will then persist for the next nodes calls of exec_unit(u) that follow the call ctrl_unit(NST_I_USER,u). The i-th of these calls will set the weight vector w[i][.] of the i-th internal bp_unit to the values to the current input pattern. No adaptation and no computation of output values occurs while the unit is in inialization mode. After the i=nodes-1 call (i.e., after a total of nodes calls), initialization mode ends and the unit reverts to normal operation. LOAD/SAVE: Saves the internal weights, unless the unit operates in weight sharing mode (%S-option). In this case, no weights need to be saved, since weight values are obtained from the attached `weight host unit'' (whose weights are saved, since it does operate in normal mode).

SEE ALSO:

mlp_net

FILE

/local/homes/rhaschke/nst7/man/../o.linx86//../foldersrc/nst_adaptive.c