NAME
bp_layer -- create a layer of bp-nodes
PROTOTYPE
unittype *bp_layer( int din, int nodes, float(*f)(float), float(*df)(float), char *pcOpt, unittype *dest)
ARGUMENTS
- int din
- nr of inputs per node
- int nodes
- nr of nodes
- float(*f)(float)
- activation function ( NULL selects fermi function )
- float(*df)(float)
- derivative of activation function ( NULL for derivative of fermi function )
- char *pcOpt
- option string [see below]. NULL selects defaults.
- unittype *dest
- host unit
INTERFACE:
Inputs:
- X_in[din]
- (input field 0) input vector x
- Y_in[nodes]
- (input field 1) error input vector d=target-X_out
- Z_in[nodes*(1+din)]:
- (optional packed input field 2) this input field is only
present when the %S option for weight sharing was specified.
In this case, this input field expects to receive a set of
weight values (see below).
- Eps_in[3]
- adaptation gain Eta, momentum parameter
Alpha and decay rate Gamma
- Ctl_in[1]
- (control input) The integer part of this value specifies
the epoch length (error gradients will be internally
accumulated over an epoch and only at the last epoch step
a weight change, using the accumulated gradients, will be made).
A value of 1 yields on-line learning. A value of 0 skips
any execution and adaptation operations.
Outputs:
- X_out[nodes]
- (output field 0) network output (=node activity vector) y
- Y_out[din]
- (output field 1) backpropagated error vector (NOTE: if
several bp_layer units are connected, this field
must be connected to Y_in[] of the preceding bp_layer).
- Z_out[nodes*(1+din)]
- weight values. For each node a block of 1+din
weight values. The first value is the node's bias, the
remaining din values are the weights from inputs 0..din-1.
NOTE: if option %S (weight sharing mode) is specified,
this field is absent and instead the weights are at input
field Y_in[nodes*(1+din)].
DESCRIPTION:
Creates bp_layer with nodes nodes, each having din inputs.
from a common input field X_in[din].
EXECUTION:
Computes output vector y from input x according to
y[i] = f(s[i])
where
s[i] = sum_{j=0..din-1} (w[i][j]x[j] + bias[i])
ADAPTATION:
Computes from error input d[]=Y_in[] a back-propagated error vector
e[]=Y_out[] according to:
e[j] = sum_{i=0..nodes-1} d[i] f'(s[i]) w[i][j]
where s[i] are the scalar products given above, and f' denotes the
derivative (argument df) of the transfer function f.
In addition, for each weight/bias, the following error derivative is
computed:
dE/dw[i][j] = d[i] f'(s[i]) x[j]
dE/dbias[i] = d[i] f'(s[i])
These are summed over an epoch and weighted averages over the
epoch t=0,1,..T-1 are computed:
DE/DW(nT+T) = Alpha*DE/DW(nT) + Eta*sum_{t=0..T-1} dE/dw[i][j](t)
DE/DB(nT+T) = Alpha*DE/DB(nT) + Eta*sum_{t=0..T-1} dE/dbias[i](t)
(here, DE/DW:=dE/dw[i][j], DE/DB:=dE/dbias[i] ).
These are then used to update the weights. For standard backprop,
this is done according to
Delta w[i][j] = - DE/dW[i][j]
Delta bias[i] = - DE/dB[i]
(Note that for T=1 all equations specialize into on-line backprop)
In Rprop, internal step sizes Dij that are independent dynamical variables
are used. They are updated according to:
Dij *= Eta+ if dE/dwij(nT-T) * dE/dwij(nT) > 0 (1)
Dij *= Eta- if dE/dwij(nT-T) * dE/dwij(nT) < 0 (2)
no change otherwise (3)
Currently, the parameters Eta+ and Eta- have fixed values of 1.2 and
0.5. The weight changes Delta w[i][j] are computed from Dij as:
if (1) or (3) above: Dwij = - sgn(dE/dwij(nT)) Dij
if (2) above: Dwij = - Dwij(nT-T) and dE/dwij(nT):=0
and for the biases analogously.
TREATMENT OF MISSING VALUES:
The input vector and the error difference vector
may contain elements with the non-value NaN.
In both cases, such elements are treated as if zero;
thus they neither affect the activity computation nor the
error backpropagation and act as if they were missing
values.
OPTIONS:
pcOpt can contain one of the following control tokens:
- %aF
- flat spot elimination: adds a constant a to the derivative
of the selected transfer function. If no parameter a is specified,
a default of a=0.1 is used. This option is mainly useful for
binary input patterns that assume the asymptotic values of the
transfer function. It may lead to deterioration in other cases.
- %aR
- use resilient backpropagation algorithm. The optional parameter
a allows to specify an inital value for the step sizes (if a is
absent, the default 0.1 is used). NOTE: resilient bp is a batch
algorithm that requires epoch learning. Thus, CTL_in(0) must be set
to the number of patterns per epoch.
- %S
- operate in weight sharing mode: this mode allows the unit to share its
weights with one or several other bp_layers (which must all be of the
same size nodes and fan-in din). In this case, the unit does not
save any weights (since these are the same as in the weight host unit).
For details, see below.
WEIGHT SHARING WITH OTHER BP-layers:
If k units (at least, k=2) are
to share their weights, k-1 units must be created with the %S option,
one unit (the `weight host'') must be created in `normal'' (non-sharing)
mode. The k-1 units that were created in the sharing mode will instead of the
usual output field out_2[] for the weights have an additional input field
in_[] which must be connected to the output field out_2[] of the
common `weight host''. A unit in weight sharing mode will use the
weights of the attachted weight host for all its computations. In particular,
whenever a weight update step is due, a weight sharing unit will add the
computed weight change into the internal array of the weight host that
is used by the weight host to compute its weight change. In this way, a
weight host automatically collects the weight change contributions from
all the attached `weight clients'' in addition to its own. When it
finally adds the accumulated weight change to its weights, this corresponds
to an update that includes its own and all the contributions of the k-1
other units that collectively share the weights in the weight host.
Since the contributions of all k units are summed, each unit's contribution
will be weighted by its individual learning rate. This allows to control
the relative contribution of different layers.
INITIALIZATION:
The weights may be initialized or re-initialized by calling
ctrl_unit(cmode,u), where cmode is one of the following:
NST_I_RND uniformly between RND_LIMIT1 and RND_LIMIT2
NST_I_NOR uniformly between RND_LIMIT1 and RND_LIMIT2
and subsequent normalization to unit length.
NST_I_ZERO set all weights to zero [THIS IS NOT USEFUL!]
NST_I_USER put the unit into "initialization mode" (see below)
In addition, each of these calls sets all output values X_out[] and
all biases to zero.
Default initialization is with NST_I_NOR. If initialization of
the weights w[i][j] to user defined values is desired, a special
`initialization mode'' can be turned on. This is achieved by
ctrl_unit(NST_I_USER,u). Initialization mode will then persist
for the next nodes calls of exec_unit(u) that follow the
call ctrl_unit(NST_I_USER,u). The i-th of these calls will
set the weight vector w[i][.] of the i-th internal
bp_unit to the values to the current input pattern.
No adaptation and no computation of output values occurs while
the unit is in inialization mode. After the i=nodes-1 call
(i.e., after a total of nodes calls), initialization mode ends
and the unit reverts to normal operation.
LOAD/SAVE:
Saves the internal weights, unless the unit operates in
weight sharing mode (%S-option). In this case, no weights
need to be saved, since weight values are obtained from
the attached `weight host unit'' (whose weights are saved,
since it does operate in normal mode).
SEE ALSO:
mlp_net
FILE
/local/homes/rhaschke/nst7/man/../o.linx86//../foldersrc/nst_adaptive.c