NAME
som_op -- operator for self-organizing map algorithm
PROTOTYPE
unitptr som_op( int iDim, int iNum, int *(*VisitNode)(int*), int *piVisitNodeInfo, char *pcOpt, unitptr uHost)
ARGUMENTS
- int iDim
- dimension of a node
- int iNum
- nr of nodes
- int *(*VisitNode)(int*)
- function to specify topology (see below)
- int *piVisitNodeInfo
- data array to specify topology (see below)
- char *pcOpt
- option string [may be NULL]
- unitptr uHost
- host unit
RETURN VALUE:
A pointer to the created unit or NULL in the case of an error.
INTERFACE OF CREATED UNIT:
- X_in[1]:
- (input field 0) auxiliary unit 1 should
provide a scalar similarity measure (if
CTL_in[0]>0) or a scalar distance measure
(if CTL_in[0]<0) between the current
input and the weight vector of the current
node (at X_out[]) here.
- Y_in[iDim]:
- (input field 1) auxiliary unit 2 should
provide weight change for currently selected
node here (packed field, if iDim<0).
- Z_in[iDim*iNodes]:
- (packed input field 2) weight vectors for
the iNum nodes, concatenated into one large
array.
- EPS_in[3]:
- (input field 3) element 0: learning rate,
element 1: factor by which current adaptation radius
is multiplied to obtain cut-off radius of adapted
neighborhood zone.
element 2: adaptation radius (default = sigma*sqrt(2) of
gaussian adaptation profile, see below)
- CTL_in[1]:
- (control field) no operation, if 0; otherwise,
sign determines whether auxiliary unit 1 must
provide similarity (for positive sign)
or distance measure (for negative sign).
Default value = -1, i.e., expect distance measure.
- X_out[iDim]:
- (output field 0) weight vector of best match node
found during exec_unit. Packed field, if
iDim<0.
- Y_out[1]:
- (output field 1) index of best match node.
- Z_out[iDim]:
- (output field 2) weight vector of node selected
for current iteration during exec/adapt.
Packed field, if iDim<0.
- out_3[1]:
- (output field 3) index of node selected for
current iteration during exec/adapt.
- out_4[1]:
- (output field 4) (squared) distance between selected
and best match node.
- out_5[iNum]:
- (output field 5) similarity/distance values
for all nodes as computed during last exec call.
(packed, if iNum<0)
SYNOPSIS:
The som_op unit allows to execute and/or adapt self-organizing maps
with arbitrary topology, distance metric and learning rule.
This is achieved by implementing the som_op unit as an `operator
unit'' that provides only the
control structure for a general self-organizing map. To implement
a complete self-organizing map, the user must connect the
som_op operator unit with two additional `operand units'':
one (the `distance metric unit'' DMU )
for implementing the distance metric and the other
(the `adaptation step unit'' ASU) for implementing
the adaptation step for the self-organizing map.
Finally, the desired topology is specified in the form of a
selector function that must be passed (together with an
initialization array) as parameter for the constructor function.
DESCRIPTION:
The som_op unit uses an array of iNum*iDim float elements
as the weight set of a self-organizing map of iNum nodes, each
with a Dim dimensional weight vector. The default weight set
are the variables of packed input field 2. By connecting this
field to the output of another unit, this default weight set
can be changed and at the same time be made read-accessible
for other units (this allows even the sharing of one and the
same weight set among different som_op units).
To obtain a self-organizing map, the som_op unit must
be connected to two auxiliary units (see below)).
EXECUTION:
When the som_op unit is executed, it will iterate the
first auxiliary unit over all nodes, providing at each
iteration the weight vector of the current node to the
auxiliary unit and expecting back the distance (or overlap
measure) between the input and the passed weight vector.
When all iterations were done, the som_op unit will
set its first two output fields to the weight vector
of the best match node and its node index.
ADAPTATION:
When the som_op unit is adapted, it will iterate the
second auxiliary unit over the subset of nodes that lie
within a neigbhorhood of the bestmatch node selected
in the previous exec_unit step. The cut-off radius of this
neigborhood will be given by trunc(1+EPS(u,1)*EPS(u,2)), i.e.,
this is the cut-off distance passed to routine VisitNodes().
VisitNodes is free to interprete this cut-off radius in
any way, however, the expectation is that VisitNodes() returns
successively all node numbers for which the distance to the
best match node is less or equal to this distance.
Note also that the distance return value of VisitNodes()
is used by the som_op unit as if it were a squared distance
measure. This value is also made available
at its output field 4, so that it can be used by the
auxiliary unit for computing the weight change.
Again, at each iteration the som_op will pass
the weight vector of the currently iterated node to
the auxiliary unit, this time expecting back from the
auxiliary unit the required weight change for this node.
The weight change that is returned by the auxiliary unit
is multiplied internally by the som_op unit
by a factor e0*exp(-d2/(e2*e2)),
where d2 is the (squared) distance to the current
best match node (as computed by VisitNodes() ),
and ei=EPS_in(u,i), i=0,1,2. This multiplication is
usually convenient, since it frees the provider of the
adaptation auxiliary unit from caring about this.
If a different profile is desired, set EPS_in(u,1)=0.
This switches the multiplication off and chooses a
flat profile within the cut-off radius.
OPERAND UNIT FOR DISTANCE COMPUTATION:
This is the first auxiliary unit for the som_op unit.
It will be called iNum times during each execution
step of the som_op unit. At the i-th call, the som_op
unit will provide at its outputs out_2[iDim] and out_3[1]
the weight vector and the index of one of the iNum nodes.
This information, together with the current input vector
x can be fed to the distance computation unit which enables
it to compute a distance or similarity measure between
x and the weight vector. The result must be fed to
input in_0 of the som_op unit.
After the current
exec call has processed all iNum nodes, the som_op unit
will place in out_0[iDim] and out_1[1] the weight
vector and the index of the best match node that was
found. If CTL_in[0]>0, the best match node is taken
as the node for which in_0 has received the largest
value from the distance computation unit (i.e., the
distance computation unit should compute a similarity
measure, such as a scalar product), if CTL_in[0]<0,
the best match node is taken
as the node for which in_0 has received the smallest
value from the distance computation unit (i.e., the
distance computation unit should compute a difference
measure, such as the euclidean distance).
The distance computation unit is recognized by the
som_op unit through its connection with in_0 of
the som_op unit. The distance computation unit
must be in the same subunit as the som_op unit.
OPERAND UNIT FOR ADAPTATION STEP COMPUTATION:
This is the second auxiliary unit for the som_op unit.
It will be called iNeighbors times during each adaptation
step of the som_op unit. iNeighbors is the number of nodes in
the neighborhood region that is centered around the bestmatch
node found in the preceding exec_unit call and that needs
adaptation (all nodes outside this region will remain
unaffected). At the i-th call, the som_op
unit will provide at its outputs out_2[iDim] and out_3[1]
the weight vector and the index of one of the iNeighbors nodes
in the neighborhood.
This information, together with the current input vector
x can be fed to the adaptation step unit which enables
it to compute a delta for the current weight vector.
The result u must be fed to input in_1 of the som_op unit
and will lead to the following update of the weight vector
w[] of the currently iterated node:
w_new[i] = w_old[i] + e0*exp(-d2/(e2*e2))*u[i]
Here, e0, e1, e2, d2 and u[i] are shorthands for the values
at EPS_in[0], EPS_in[1], EPS_in[2], out_4[0] and
Z_in[i] and i ranges from 0 to iDim-1.
This enforces a gaussian adaptation profile (which is adeqate
for many cases). If a different profile is desired, set
e1=0. In this case, the above update equation will be
replaced by the simpler
w_new[i] = w_old[i] + e0*u[i]
Since u[i] is provided by the second auxiliary unit, any
desired adaptation profile can now be implemented therein.
After the current
adapt_unit call has processed all iNeighbors
nodes in the current neighborhood in this way, the adaptation
step of the som_op unit is completed.
TOPOLOGY SPECIFICATION:
In order to allow a
flexible and still reasonably efficient specification of
arbitrary topologies, the following procedure has been
adopted: The constructor function is passed the address
of a user-specified function of type
int *VisitNode(int*)
and an integer array int *aiStartVisitNode. Both together
must specify the desired topology in the following way:
The function will be called repeatedly by the som_op unit in order
to visit all nodes in a neighborhood of radius iRadius
of a specified `center node'' iCenter. When the som_op
unit wants to visit the nodes of a new neighborhood,
it requests a first node of the new neighborhood with the
call
VisitNode(piArg)
and passes in the first two elements of the integer array piArg[]
iCenter ( =piArg[0] ) and iRadius ( =piArg[1] )
for the new neighborhood.
The remaining elements are the values that were passed in array
aiVisitNodes to the creation function [Note that the first
element of aiVisitNode was required to specify the number
of elements to follow; this is now the element piArg[2]
that follows after iRadius]
The piArg[2] elements piArg[3..2+piArg[2]] thus are available
to pass information (such as size parameters) about the
chosen topology to function VisitNode to enable it to do
its task.
VisitNode must use this information
to return - as long as there remain unvisited nodes in the currently
traversed neighborhood - with each call
an int pointer to
an int pair (iNode,iDistance2). The first element iNode must specify the
new node in the currently traversed neighborhood, and
iDistance2 is expected to be the (squared) distance to the current center
node aiStartVisitNode[1]. When no unvisited node remains,
VisitNode must return NULL (and should set an internal
flag in order to know that the next call will start a new
neighborhood).
PREDEFINED TOPOLOGY SPECIFICATION:
There is a number of predefined functions that can be specified
for VisitNode, if certain simple topologies are desired:
- VisitNode0d:
- no neighborhood (vector quantization)
- VisitNode1d:
- 1d chain
- VisitLoop1d:
- 1d closed loop
- VisitNode2d:
- rectangular 2d mesh
- VisitNodeHex2d:
- as before, but hexagonal neighborhood
- VisitNode3d:
- rectangular 3d mesh
They all expect aiStartVisitNode[0] to be the number iNumDim of
dimensions, and aiStartVisitNode[1..iNumDim] to be
the number of nodes along the iNumDim
dimensions.
WEIGHT MATRIX:
The som_op unit accesses the elements in its input field
in_2[iNum*iDim] as the weight values of the self-organizing
map. If this input field is connected with another field
of a matching type and dimension, the som_unit will free
its current weights and will use the elements in the newly
connected field instead. The new field need not be connected
only to a single som_op unit. Further units may access
the field too, either for display of the current weight values
(at least under Neo, the values in the original input field
of the som_unit could not be accessed), or for imposing
additional modifications, e.g, if serveral som_op units
work on a shared weight set.
CONTROL MODES:
The usual initialization modes as defined for the
som2d-unit apply also here (current status of implementation:
NST_I_GAUSS, NST_I_RND, NST_I_UNIFORM, NST_I_BINARY,
NST_I_SIGN.
EXAMPLE:
The standard SOM results, when CTL_in(u,0)=1 (specifying
that a distance measure [instead of a match measure] is
expected from the first auxiliary unit), the first
(distance computing) auxiliary unit is chosen to
be the dif_len unit, and the second (weight change
computing unit) is chosen to be the dif_vec unit.
EPS_in[0] then specifies the usual learning rate,
EPS_in[1] the radius of the neighborhood zone from which
at each step nodes are included in the adaptation computation,
and EPS_in[2]*EPS_in[1] the sigma of the gaussian
adaptation modulation function. If EPS_in[2]=0, the
gaussian is replaced by a flat profile, and a user-defined
profile can be implemented within the second auxiliary
unit.
EFFICIENCY:
Usage of packed fields ( iDim<0, iNum<0)
will lead to faster execution.
STATUS:
alpha.
SEE ALSO:
som2d
FILE
/amnt/loge/users/nistaff02/nistaff/rhaschke/nst7/man/../o.linux//../foldersrc/nst_som.c