NAME

dependency_matrix -- create unit to calculate column dependencies

PROTOTYPE

unitptr dependency_matrix( int iNumRecords, int iDim, char *pcOptions, unitptr uHost)

ARGUMENTS

int iNumRecords
nr of records
int iDim
dimension of a record
char *pcOptions
- not documented in source --
unitptr uHost
host unit

RETURN VALUE:

A pointer to the created unit or NULL in the case of an error.

INTERFACE OF CREATED UNIT:

All fields below are packed float fields.
inp_0[iNumRecords*iDim]:
(packed input field 0) the data set to be analyzed as a matrix of iNumRecords rows of iDim elements each
inp_1[iDim]:
for each dimension, the number of bins to use. Initialized with the value sqrt(iNumRecords) as a default.
inp_2[iDim]:
for each dimension, a min value for the binning range
inp_3[iDim]:
for each dimension, a max value for the binning range
inp_4[1]:
ctrl pin: 0=skip exec, default is 1;
out_0[iDim*iDim]:
the linear correlation between columns i and j
out_1[iDim*iDim]:
the associated significance level for an association
out_2[iDim*iDim]:
the association between columns i and j as measured by Cramers V.
out_3[iDim*iDim]:
the associated significance level for an association
out_4[iDim*iDim]:
a matrix of cross entropy values. Matrix element ij gives the . between columns i and j in the data set.
out_5[iDim*iDim]:
(only if pcOption contains "%n" token) support (num of data pairs)
out_6[iDim*iDim]:
(only if pcOption contains "%n" token) regression slope
out_7[iDim*iDim]:
(only if pcOption contains "%n" token) DOF (effective rows*cols-rows-cols+1)

EXECUTION OF CREATED UNIT:

Computes various measures of correlation for each pair ij of columns in the data set. Those dimensions for which the min and max values of the binning range are set to the same value f will have the binning range found by determining the data min and max limits. out_0[] return Pearson linear correlation coeeficient out_1[] return Student-t test result for the null-hypothesis i and j is uncorrelated and i and j have convergent higher moments (tails die off) and N is large (lypically >500) and i and j have a binoemal join distribution (strong assumtion) (r has a mean of 0 with a standard deviation of 1/sqrt(N)) out_2[] Cramers V is a nomalized chi-square test to check the row and column sums on independence. out_3[] return significants level of pure Chi-2 statistic. out_4[] return the uncertainty coeeficients. I.e. 0 says there is no association 1 means that knowledge of j fully perdicts outcome of i. Not yet implemented: but omitting on both sides a fraction of f of the entire data (.i.e., for f=0, all data enter into the determination of the binning range, for f >=1, no data are used and no computation is done).

OPTIONS:

Currently unused.

SEE ALSO:

FILE

/amnt/loge/users/nistaff02/nistaff/rhaschke/nst7/man/../o.linux//../foldersrc/nst_datcor.c