NAME

som_op -- operator for self-organizing map algorithm

PROTOTYPE

unitptr som_op( int iDim, int iNum, int *(*VisitNode)(int*), int *piVisitNodeInfo, char *pcOpt, unitptr uHost)

ARGUMENTS

int iDim
dimension of a node
int iNum
nr of nodes
int *(*VisitNode)(int*)
function to specify topology (see below)
int *piVisitNodeInfo
data array to specify topology (see below)
char *pcOpt
option string [may be NULL]
unitptr uHost
host unit

RETURN VALUE:

A pointer to the created unit or NULL in the case of an error.

INTERFACE OF CREATED UNIT:

X_in[1]:
(input field 0) auxiliary unit 1 should provide a scalar similarity measure (if CTL_in[0]>0) or a scalar distance measure (if CTL_in[0]<0) between the current input and the weight vector of the current node (at X_out[]) here.
Y_in[iDim]:
(input field 1) auxiliary unit 2 should provide weight change for currently selected node here (packed field, if iDim<0).
Z_in[iDim*iNodes]:
(packed input field 2) weight vectors for the iNum nodes, concatenated into one large array.
EPS_in[3]:
(input field 3) element 0: learning rate, element 1: factor by which current adaptation radius is multiplied to obtain cut-off radius of adapted neighborhood zone. element 2: adaptation radius (default = sigma*sqrt(2) of gaussian adaptation profile, see below)
CTL_in[1]:
(control field) no operation, if 0; otherwise, sign determines whether auxiliary unit 1 must provide similarity (for positive sign) or distance measure (for negative sign). Default value = -1, i.e., expect distance measure.
X_out[iDim]:
(output field 0) weight vector of best match node found during exec_unit. Packed field, if iDim<0.
Y_out[1]:
(output field 1) index of best match node.
Z_out[iDim]:
(output field 2) weight vector of node selected for current iteration during exec/adapt. Packed field, if iDim<0.
out_3[1]:
(output field 3) index of node selected for current iteration during exec/adapt.
out_4[1]:
(output field 4) (squared) distance between selected and best match node.
out_5[iNum]:
(output field 5) similarity/distance values for all nodes as computed during last exec call. (packed, if iNum<0)

SYNOPSIS:

The som_op unit allows to execute and/or adapt self-organizing maps with arbitrary topology, distance metric and learning rule. This is achieved by implementing the som_op unit as an `operator unit'' that provides only the control structure for a general self-organizing map. To implement a complete self-organizing map, the user must connect the som_op operator unit with two additional `operand units'': one (the `distance metric unit'' DMU ) for implementing the distance metric and the other (the `adaptation step unit'' ASU) for implementing the adaptation step for the self-organizing map. Finally, the desired topology is specified in the form of a selector function that must be passed (together with an initialization array) as parameter for the constructor function.

DESCRIPTION:

The som_op unit uses an array of iNum*iDim float elements as the weight set of a self-organizing map of iNum nodes, each with a Dim dimensional weight vector. The default weight set are the variables of packed input field 2. By connecting this field to the output of another unit, this default weight set can be changed and at the same time be made read-accessible for other units (this allows even the sharing of one and the same weight set among different som_op units). To obtain a self-organizing map, the som_op unit must be connected to two auxiliary units (see below)).

EXECUTION:

When the som_op unit is executed, it will iterate the first auxiliary unit over all nodes, providing at each iteration the weight vector of the current node to the auxiliary unit and expecting back the distance (or overlap measure) between the input and the passed weight vector. When all iterations were done, the som_op unit will set its first two output fields to the weight vector of the best match node and its node index.

ADAPTATION:

When the som_op unit is adapted, it will iterate the second auxiliary unit over the subset of nodes that lie within a neigbhorhood of the bestmatch node selected in the previous exec_unit step. The cut-off radius of this neigborhood will be given by trunc(1+EPS(u,1)*EPS(u,2)), i.e., this is the cut-off distance passed to routine VisitNodes(). VisitNodes is free to interprete this cut-off radius in any way, however, the expectation is that VisitNodes() returns successively all node numbers for which the distance to the best match node is less or equal to this distance. Note also that the distance return value of VisitNodes() is used by the som_op unit as if it were a squared distance measure. This value is also made available at its output field 4, so that it can be used by the auxiliary unit for computing the weight change. Again, at each iteration the som_op will pass the weight vector of the currently iterated node to the auxiliary unit, this time expecting back from the auxiliary unit the required weight change for this node. The weight change that is returned by the auxiliary unit is multiplied internally by the som_op unit by a factor e0*exp(-d2/(e2*e2)), where d2 is the (squared) distance to the current best match node (as computed by VisitNodes() ), and ei=EPS_in(u,i), i=0,1,2. This multiplication is usually convenient, since it frees the provider of the adaptation auxiliary unit from caring about this. If a different profile is desired, set EPS_in(u,1)=0. This switches the multiplication off and chooses a flat profile within the cut-off radius.

OPERAND UNIT FOR DISTANCE COMPUTATION:

This is the first auxiliary unit for the som_op unit. It will be called iNum times during each execution step of the som_op unit. At the i-th call, the som_op unit will provide at its outputs out_2[iDim] and out_3[1] the weight vector and the index of one of the iNum nodes. This information, together with the current input vector x can be fed to the distance computation unit which enables it to compute a distance or similarity measure between x and the weight vector. The result must be fed to input in_0 of the som_op unit. After the current exec call has processed all iNum nodes, the som_op unit will place in out_0[iDim] and out_1[1] the weight vector and the index of the best match node that was found. If CTL_in[0]>0, the best match node is taken as the node for which in_0 has received the largest value from the distance computation unit (i.e., the distance computation unit should compute a similarity measure, such as a scalar product), if CTL_in[0]<0, the best match node is taken as the node for which in_0 has received the smallest value from the distance computation unit (i.e., the distance computation unit should compute a difference measure, such as the euclidean distance). The distance computation unit is recognized by the som_op unit through its connection with in_0 of the som_op unit. The distance computation unit must be in the same subunit as the som_op unit.

OPERAND UNIT FOR ADAPTATION STEP COMPUTATION:

This is the second auxiliary unit for the som_op unit. It will be called iNeighbors times during each adaptation step of the som_op unit. iNeighbors is the number of nodes in the neighborhood region that is centered around the bestmatch node found in the preceding exec_unit call and that needs adaptation (all nodes outside this region will remain unaffected). At the i-th call, the som_op unit will provide at its outputs out_2[iDim] and out_3[1] the weight vector and the index of one of the iNeighbors nodes in the neighborhood. This information, together with the current input vector x can be fed to the adaptation step unit which enables it to compute a delta for the current weight vector. The result u must be fed to input in_1 of the som_op unit and will lead to the following update of the weight vector w[] of the currently iterated node:

   w_new[i] = w_old[i] + e0*exp(-d2/(e2*e2))*u[i]

Here, e0, e1, e2, d2 and u[i] are shorthands for the values at EPS_in[0], EPS_in[1], EPS_in[2], out_4[0] and Z_in[i] and i ranges from 0 to iDim-1. This enforces a gaussian adaptation profile (which is adeqate for many cases). If a different profile is desired, set e1=0. In this case, the above update equation will be replaced by the simpler

   w_new[i] = w_old[i] + e0*u[i]

Since u[i] is provided by the second auxiliary unit, any desired adaptation profile can now be implemented therein. After the current adapt_unit call has processed all iNeighbors nodes in the current neighborhood in this way, the adaptation step of the som_op unit is completed.

TOPOLOGY SPECIFICATION:

In order to allow a flexible and still reasonably efficient specification of arbitrary topologies, the following procedure has been adopted: The constructor function is passed the address of a user-specified function of type

   int *VisitNode(int*)

and an integer array int *aiStartVisitNode. Both together must specify the desired topology in the following way: The function will be called repeatedly by the som_op unit in order to visit all nodes in a neighborhood of radius iRadius of a specified `center node'' iCenter. When the som_op unit wants to visit the nodes of a new neighborhood, it requests a first node of the new neighborhood with the call

   VisitNode(piArg)

and passes in the first two elements of the integer array piArg[] iCenter ( =piArg[0] ) and iRadius ( =piArg[1] ) for the new neighborhood. The remaining elements are the values that were passed in array aiVisitNodes to the creation function [Note that the first element of aiVisitNode was required to specify the number of elements to follow; this is now the element piArg[2] that follows after iRadius] The piArg[2] elements piArg[3..2+piArg[2]] thus are available to pass information (such as size parameters) about the chosen topology to function VisitNode to enable it to do its task. VisitNode must use this information to return - as long as there remain unvisited nodes in the currently traversed neighborhood - with each call an int pointer to an int pair (iNode,iDistance2). The first element iNode must specify the new node in the currently traversed neighborhood, and iDistance2 is expected to be the (squared) distance to the current center node aiStartVisitNode[1]. When no unvisited node remains, VisitNode must return NULL (and should set an internal flag in order to know that the next call will start a new neighborhood).

PREDEFINED TOPOLOGY SPECIFICATION:

There is a number of predefined functions that can be specified for VisitNode, if certain simple topologies are desired:
VisitNode0d:
no neighborhood (vector quantization)
VisitNode1d:
1d chain
VisitLoop1d:
1d closed loop
VisitNode2d:
rectangular 2d mesh
VisitNodeHex2d:
as before, but hexagonal neighborhood
VisitNode3d:
rectangular 3d mesh
They all expect aiStartVisitNode[0] to be the number iNumDim of dimensions, and aiStartVisitNode[1..iNumDim] to be the number of nodes along the iNumDim dimensions.

WEIGHT MATRIX:

The som_op unit accesses the elements in its input field in_2[iNum*iDim] as the weight values of the self-organizing map. If this input field is connected with another field of a matching type and dimension, the som_unit will free its current weights and will use the elements in the newly connected field instead. The new field need not be connected only to a single som_op unit. Further units may access the field too, either for display of the current weight values (at least under Neo, the values in the original input field of the som_unit could not be accessed), or for imposing additional modifications, e.g, if serveral som_op units work on a shared weight set.

CONTROL MODES:

The usual initialization modes as defined for the som2d-unit apply also here (current status of implementation: NST_I_GAUSS, NST_I_RND, NST_I_UNIFORM, NST_I_BINARY, NST_I_SIGN.

EXAMPLE:

The standard SOM results, when CTL_in(u,0)=1 (specifying that a distance measure [instead of a match measure] is expected from the first auxiliary unit), the first (distance computing) auxiliary unit is chosen to be the dif_len unit, and the second (weight change computing unit) is chosen to be the dif_vec unit. EPS_in[0] then specifies the usual learning rate, EPS_in[1] the radius of the neighborhood zone from which at each step nodes are included in the adaptation computation, and EPS_in[2]*EPS_in[1] the sigma of the gaussian adaptation modulation function. If EPS_in[2]=0, the gaussian is replaced by a flat profile, and a user-defined profile can be implemented within the second auxiliary unit.

EFFICIENCY:

Usage of packed fields ( iDim<0, iNum<0) will lead to faster execution.

STATUS:

alpha.

SEE ALSO:

som2d

FILE

/amnt/loge/users/nistaff02/nistaff/rhaschke/nst7/man/../o.linux//../foldersrc/nst_som.c