NAME

cluster -- create unit to cluster a number of objects

PROTOTYPE

unitptr cluster(int iOps, int iNrPts, int iDataDim, int iExitMode, float fExitVal, float fMink, int iMode, char * pcFmt, unitptr uHost)

ARGUMENTS

int iOps
number of operands
int iNrPts
number of objects to be clustered
int iDataDim
dimension of one object
int iExitMode
stopping mode one of: 1 (num. of cluster) or 2 (max distance)
float fExitVal
stopping value: num. of clusters or max dist
float fMink
minkowski m (must be != 0)
int iMode
method used for clustering (0 to 6)
char * pcFmt
further options
unitptr uHost
host unit

RETURN VALUE:

A pointer to the created unit or NULL in the case of an error.

INTERFACE OF CREATED UNIT:

X_in[iNrPts*iDataDim]:
packed input field holding iNrPts objects with dimension iDataDim in the following order: (point1-dim1, point1-dim2, .., point1-dimN, .., pointM-dim1, pointM-dim2, .., pointM-dimN)
Y_in[2]:
input field receiving iExitMode at first pin (0=#cluster, 1=max. dist.). This value defines the criteria for stopping the cluster process. The second pin receives the value for the stopping criteria ( fExitVal = #clusters or max. dist).
Z_in[2]:
input field receiving at the first pin the minkowski m ( fMink) used for distance calculations. At the second pin it receives the method used for clustering ( iMode = 0 single linkage, 1 complete linkage, 2 group average, 3 weighted group average, 4 centroid, 5 median, 6 ward)
CTL_in[1]:
control pin enables (1) or disables execution of unit (0)
X_out[iNrPts]:
packed output field holding a list of cluster labels after clustering. The following order is used: (clusterlabel-point1, .., clusterlabel-pointM) with clusterlabel in [0..#clusters-1]
Y_out[3]:
output field holding at the first pin the number iNrPts of objects, at the second pin the number of clusters remaining after clustering, at the third pin the maximum distance occured (dist. at which the last cluster was joined).
Z_out[2]:
output field holding at the first pin the number iNrPts of objects, at the second pin the maxmimum distance occured after clustering until only one cluster remains (only used in operator mode, e.g. for rescaling of the dendrogram output window!).
out_3[8]:
output field holding 4 x,y-coordinates used for output of the dendrogram. These points are building a horseshoe consisting of two points and the node where they are joined. During cycling in operator mode the tree is build up in postorder tree traversal!
out_4[.out_4_0 , out_4_1]:
combined output field consisting of two subfields holding features of the left object of the horseshoe (only valid during execution of operands!): first field out_4_0[iDataDim] holds the object features itself. Second field out_4_1[4] holds: x-coordinate of object in dendrogram (1. pin), y-coordinate (2. pin), object cluster label (3. pin), object number (4. pin)
out_5[.out_5_0 , out_5_1]:
combined output field consisting of two subfields holding features of the right object of the horseshoe (only valid during execution of operands!): first field out_5_0[iDataDim] holds the object features itself. Second field out_5_1[4] holds: x-coordinate of object in dendrogram (1. pin), y-coordinate (2. pin), object cluster label (3. pin), object number (4. pin)
out_6[1]:
output field holding a control output (0 in non-operator mode, 1 in operator mode) In operator mode this flag is set to zero after the first execution cycle of the iOps operands. After last cycle it is set back to 1 (useful, e.g for rescaling graphical output)!
The values of fields Y_in and Z_in are initialized after creation of unit but can be overwritten with wired inputs! The fields Z_out , out_2 out_3 , out_4 are used in operator mode only.

EXECUTION OF CREATED UNIT:

The unit can work in two modes. In normal mode ( pcFmt == NULL) the unit clusters the objects given and returns afterwards with the label list. In operator mode ( pcFmt == %I) all operands are executed ( iNumPts -1)-times until the dendrogram is printed out.

INITIALIZATION:

No initializations but if pcFmt is NULL unit performs in normal mode.

DEFAULTS:

Wrong parameters result in nst warnings! Only if a wrong fMink or iMode is given fMink is set to euclidian lenght (2) and iMode is set to single linkage (0). If the unit should work in operator mode ( pcFmt == %I) and the number of iOps operands is wrong ( iOps <= 0) it is set to 1.

DESCRIPTION:

The unit is provided for the process of hierarchical clustering. The unit not does only label each object with its cluster affiliation but also provides the dendrogram for graphical visualization of the objects topologie. The user can choose between two modes to stop the clustering process: iExitMode == 0 the cluster process is stopped when fExitVal clusters remain. iExitMode == 1 the cluster process is stopped when the distance for clustering the next two clusters is bigger than fExitVal. The distance between objects is calculated with the minkowski distance formular using the parameter fMink ( fMink = 1 city block dist. , = 2 euclidian dist. , and so on) The methods for the clustering process are: iMode = 0 single linkage, 1 complete linkage, 2 group average, 3 weighted group average, 4 centroid, 5 median, 6 ward. The implentation follows a recursive calculation scheme documented in source. If the user has started the unit in operator mode, first the cluster process is done normaly providing a list of labels for each NumPts objects. Afterwards the unit executes its iOps operands iNumPts-1 times, providing at its outputfields information of the dendrogram (binary-)tree. The tree is printed out in postorder showing a ''horseshoe'' of two objects and the father node at each execution step. The 8 points provided at output field out_3[] can be used for visualization e. g. with a draw-sym unit. Another useful output is the last output pin of the unit which is normaly set to zero, but it is set to one in operator mode. If the unit executes its operands the first time this pin is one, but than it is set to zero until all operand executions are done. This is useful for rescaling e. g. a plot-xy unit using the output values at field Z_out[] (num. of objects, max. dist. ). After all executions the last output pin is again set to one. Further information is provided during the operator phase at the two output fields out_4[] and out_5[] These fields hold informations about the actual two nodes (like features, xy-pos. , label and number of the object) which can be drawn in an output window with a draw-str unit.

OPTIONS:

The option string pcFmt allows to customize the behaviour of the cluster unit. The following options are provided:
%I
unit works in operator mode: clustering first the objects than cycles all iOps operands until the whole dendrogram is printed out.

CONTROL MODES:

A call of ctrl_unit(iMode,u) with iMode one of the following can be used to achieve a number of further operations:
NST_RESET
clear all output fields (set all values to zero!)

EXAMPLES:

There is a small example at: /vol/nst/man/examples/cluster.NST.gz using the points in cluster_objects.dat for clustering. The user can choose the cluster method afterwars the dendrogram is visualized in a plot-xy unit.

STATUS:

Preliminary.

SEE ALSO:

FILE

nst_dm.c