NAME
cluster -- create unit to cluster a number of objects
PROTOTYPE
unitptr cluster(int iOps, int iNrPts, int iDataDim, int iExitMode, float fExitVal, float fMink, int iMode, char * pcFmt, unitptr uHost)
ARGUMENTS
- int iOps
- number of operands
- int iNrPts
- number of objects to be clustered
- int iDataDim
- dimension of one object
- int iExitMode
- stopping mode one of: 1 (num. of cluster) or 2 (max distance)
- float fExitVal
- stopping value: num. of clusters or max dist
- float fMink
- minkowski m (must be != 0)
- int iMode
- method used for clustering (0 to 6)
- char * pcFmt
- further options
- unitptr uHost
- host unit
RETURN VALUE:
A pointer to the created unit or NULL in the case of an error.
INTERFACE OF CREATED UNIT:
- X_in[iNrPts*iDataDim]:
- packed input field holding iNrPts objects
with dimension iDataDim in the following order: (point1-dim1,
point1-dim2, .., point1-dimN, .., pointM-dim1, pointM-dim2, ..,
pointM-dimN)
- Y_in[2]:
- input field receiving iExitMode at first pin
(0=#cluster, 1=max. dist.). This value defines the criteria for
stopping the cluster process. The second pin receives the value for
the stopping criteria ( fExitVal = #clusters or max. dist).
- Z_in[2]:
- input field receiving at the first pin the minkowski m
( fMink) used for distance calculations. At the second pin it receives
the method used for clustering ( iMode = 0 single linkage, 1 complete
linkage, 2 group average, 3 weighted group average, 4 centroid, 5
median, 6 ward)
- CTL_in[1]:
- control pin enables (1) or disables execution of unit
(0)
- X_out[iNrPts]:
- packed output field holding a list of cluster labels
after clustering. The following order is used: (clusterlabel-point1,
.., clusterlabel-pointM) with clusterlabel in [0..#clusters-1]
- Y_out[3]:
- output field holding at the first pin the number iNrPts
of objects, at the second pin the number of clusters remaining after
clustering, at the third pin the maximum distance occured (dist. at
which the last cluster was joined).
- Z_out[2]:
- output field holding at the first pin the number iNrPts
of objects, at the second pin the maxmimum distance occured after
clustering until only one cluster remains (only used in operator
mode, e.g. for rescaling of the dendrogram output window!).
- out_3[8]:
- output field holding 4 x,y-coordinates used for output of
the dendrogram. These points are building a horseshoe consisting of
two points and the node where they are joined. During cycling in
operator mode the tree is build up in postorder tree traversal!
- out_4[.out_4_0 , out_4_1]:
- combined output field consisting of two
subfields holding features of the left object of the horseshoe (only
valid during execution of operands!): first field out_4_0[iDataDim]
holds the object features itself. Second field out_4_1[4] holds:
x-coordinate of object in dendrogram (1. pin), y-coordinate (2. pin),
object cluster label (3. pin), object number (4. pin)
- out_5[.out_5_0 , out_5_1]:
- combined output field consisting of two
subfields holding features of the right object of the horseshoe (only
valid during execution of operands!): first field out_5_0[iDataDim]
holds the object features itself. Second field out_5_1[4] holds:
x-coordinate of object in dendrogram (1. pin), y-coordinate (2. pin),
object cluster label (3. pin), object number (4. pin)
- out_6[1]:
- output field holding a control output (0 in non-operator
mode, 1 in operator mode) In operator mode this flag is set to zero
after the first execution cycle of the iOps operands. After last
cycle it is set back to 1 (useful, e.g for rescaling graphical
output)!
The values of fields Y_in and Z_in are initialized after creation of
unit but can be overwritten with wired inputs! The fields Z_out ,
out_2 out_3 , out_4 are used in operator mode only.
EXECUTION OF CREATED UNIT:
The unit can work in two modes. In normal mode ( pcFmt == NULL) the
unit clusters the objects given and returns afterwards with the label
list. In operator mode ( pcFmt == %I) all operands are executed
( iNumPts -1)-times until the dendrogram is printed out.
INITIALIZATION:
No initializations but if pcFmt is NULL unit performs in normal mode.
DEFAULTS:
Wrong parameters result in nst warnings! Only if a wrong fMink or
iMode is given fMink is set to euclidian lenght (2) and iMode is
set to single linkage (0). If the unit should work in operator mode (
pcFmt == %I) and the number of iOps operands is wrong ( iOps <= 0)
it is set to 1.
DESCRIPTION:
The unit is provided for the process of hierarchical clustering. The
unit not does only label each object with its cluster affiliation but
also provides the dendrogram for graphical visualization of the
objects topologie. The user can choose between two modes to stop the
clustering process: iExitMode == 0 the cluster process is stopped
when fExitVal clusters remain. iExitMode == 1 the cluster process
is stopped when the distance for clustering the next two clusters is
bigger than fExitVal. The distance between objects is calculated
with the minkowski distance formular using the parameter fMink (
fMink = 1 city block dist. , = 2 euclidian dist. , and so on) The
methods for the clustering process are: iMode = 0 single linkage, 1
complete linkage, 2 group average, 3 weighted group average, 4
centroid, 5 median, 6 ward. The implentation follows a recursive
calculation scheme documented in source. If the user has started the
unit in operator mode, first the cluster process is done normaly
providing a list of labels for each NumPts objects. Afterwards the
unit executes its iOps operands iNumPts-1 times, providing at its
outputfields information of the dendrogram (binary-)tree. The tree is
printed out in postorder showing a ''horseshoe'' of two objects and
the father node at each execution step. The 8 points provided at
output field out_3[] can be used for visualization e. g. with a
draw-sym unit. Another useful output is the last output pin of the
unit which is normaly set to zero, but it is set to one in operator
mode. If the unit executes its operands the first time this pin is
one, but than it is set to zero until all operand executions are
done. This is useful for rescaling e. g. a plot-xy unit using the
output values at field Z_out[] (num. of objects, max. dist. ). After
all executions the last output pin is again set to one. Further
information is provided during the operator phase at the two output
fields out_4[] and out_5[] These fields hold informations about
the actual two nodes (like features, xy-pos. , label and number of the
object) which can be drawn in an output window with a draw-str unit.
OPTIONS:
The option string pcFmt allows to customize the behaviour
of the cluster unit. The following options are provided:
- %I
- unit works in operator mode: clustering first the objects than
cycles all iOps operands until the whole dendrogram is printed out.
CONTROL MODES:
A call of ctrl_unit(iMode,u) with iMode one of the following
can be used to achieve a number of further operations:
- NST_RESET
- clear all output fields (set all values to zero!)
EXAMPLES:
There is a small example at: /vol/nst/man/examples/cluster.NST.gz
using the points in cluster_objects.dat for clustering. The user can
choose the cluster method afterwars the dendrogram is visualized in a
plot-xy unit.
STATUS:
Preliminary.
SEE ALSO:
FILE
nst_dm.c