NAME
dependency_matrix -- create unit to calculate column dependencies
PROTOTYPE
unitptr dependency_matrix( int iNumRecords, int iDim, char *pcOptions, unitptr uHost)
ARGUMENTS
- int iNumRecords
- nr of records
- int iDim
- dimension of a record
- char *pcOptions
- - not documented in source --
- unitptr uHost
- host unit
RETURN VALUE:
A pointer to the created unit or NULL in the case of an error.
INTERFACE OF CREATED UNIT:
All fields below are packed float fields.
- inp_0[iNumRecords*iDim]:
- (packed input field 0)
the data set to be analyzed as a matrix of iNumRecords
rows of iDim elements each
- inp_1[iDim]:
- for each dimension, the number of bins to use.
Initialized with the value sqrt(iNumRecords)
as a default.
- inp_2[iDim]:
- for each dimension, a min value for the binning range
- inp_3[iDim]:
- for each dimension, a max value for the binning range
- inp_4[1]:
- ctrl pin: 0=skip exec, default is 1;
- out_0[iDim*iDim]:
- the linear correlation between columns i and j
- out_1[iDim*iDim]:
- the associated significance level for an association
- out_2[iDim*iDim]:
- the association between columns i and j as
measured by Cramers V.
- out_3[iDim*iDim]:
- the associated significance level for an association
- out_4[iDim*iDim]:
- a matrix of cross entropy values. Matrix element
ij gives the . between columns i and j in the
data set.
- out_5[iDim*iDim]:
- (only if pcOption contains "%n" token) support (num of data pairs)
- out_6[iDim*iDim]:
- (only if pcOption contains "%n" token) regression slope
- out_7[iDim*iDim]:
- (only if pcOption contains "%n" token) DOF (effective rows*cols-rows-cols+1)
EXECUTION OF CREATED UNIT:
Computes various measures of correlation for each pair ij
of columns in the data set. Those dimensions for which the
min and max values of the binning range are set to the
same value f will have the binning range found by determining
the data min and max limits.
out_0[] return Pearson linear correlation coeeficient
out_1[] return Student-t test result for the null-hypothesis i and j
is uncorrelated and i and j have convergent higher moments (tails die
off) and N is large (lypically >500) and i and j have a binoemal join
distribution (strong assumtion) (r has a mean of 0 with a standard
deviation of 1/sqrt(N))
out_2[] Cramers V is a nomalized chi-square test to check the row and column sums
on independence.
out_3[] return significants level of pure Chi-2 statistic.
out_4[] return the uncertainty coeeficients. I.e. 0 says there is no
association 1 means that knowledge of j fully perdicts outcome of i.
Not yet implemented: but omitting on both sides a
fraction of f of the entire data (.i.e., for f=0, all data
enter into the determination of the binning range, for
f >=1, no data are used and no computation is done).
OPTIONS:
Currently unused.
SEE ALSO:
FILE
/local/homes/rhaschke/nst7/man/../o.linx86//../foldersrc/nst_datcor.c