NAME

read_table -- create unit to load a data file

PROTOTYPE

unitptr nst_read_table( char *pcFile, int iFirstRecord, int iMaxRecords, char *pcOptions, unitptr uHost)

ARGUMENTS

char *pcFile
file name
int iFirstRecord
first record to use
int iMaxRecords
max nr of records
char *pcOptions
option string (may be NULL). %L causes using column 0 for labels %C provides labels in concatenate form %M sets parsing of date strings such that month is expected first.
unitptr uHost
host unit

RETURN VALUE:

A pointer to the created unit or NULL in the case of an error.

INTERFACE:

X_out[]:
(dyn packed output field 0) the data, concatenated as a long vector
Y_out[]:
(output field 1) the dimension of each data record
Z_out[]:
(output field 2) total nr of data records read

SYNOPSIS:

This unit will read a table of numerical values from a data file, skipping comment lines starting with a '#' -character and parsing the remaining lines into TITLE, HEADER, AXISINFO and DATA parts, the contents of each of which can be retrieved with the correspondingly named methods. The only assumptions about the file format are that the DATA part must consist of a number n of lines (``records''), each containing the same number d (``record dimension'') of numerical values (suitably formatted date or time specifications are also admissible), optionally preceded by an additional token that then can be interpreted as a 'record 'label for the subsequent d values (to enable that option, the option string pcOptions must contain the token %L ). Neither n nor d need to be specified, the unit will figure them out by looking at the file. To make this possible, there are a few restrictions on the admissible format of the non-data parts of the file (explained in more detail below). Usually, the following format is ok: if there is a title, put it in the first line, optionally followed by header text lines, optionally followed by a line with a non-numeric axis label for each column, followed by the data lines, e.g.

   # this is a comment
   This is the title
   The next two lines are the header part. The file
   describes some physical properties of three liquids.
             density melting_point boiling_point
   water 1.0 0 100
   alcohol 0.8 -36 72
   mercury 13.6 -34 336
   # the file ends here


INTERFACE OF CREATED UNIT:

out_0[]
single pin, holding dynamic float array of n*d elements. It provides the read data records, concatenated into a single, long array. The data of record i (i=0,1,...) starts at element position d*i.
out_1[1]
single scalar float pin, holding record dimension d
out_2[1]
single scalar float pin, holding total number n of records read.
NOTE: the output values are meaningful only after the read method of the present unit has been executed.

EXPECTED FILE FORMAT:

Data Part:

The unit expects the data to be purely numeric (or date/time strings, cf. below), and consisting of n records, each given on a single line and consisting of the same number d ( 'record 'dimension) of numerical values (the tokens NaN, +Inf and -Inf will also be considered as numerical values; NaN can, e.g., be used to indicate missing values). When scanning the file, it will interprete the last group of such lines as the data portion, setting n and d correspondingly.

Record Labels:

If the %L option is specified, the first token on each data line is used as a label for that data line (and may, in this case, also be non-numeric). Thus, the value of d then is one less than the number of tokens per data line.

Axis Info:

If the the data portion is preceded by a line with the same number d of tokens as the record dimension, this line becomes the the axisinfo part: its tokens will be interpreted as axis labels (Note that it, therefore, is impossible to specify d axis labels that are all numeric; the corresponding line could not be distinguished from a data line and would, therefore, be read as the first data line by the present unit). Alternatively, the axisinfo part may also be given as d separate lines that precede the Data Part. In this case, each of these lines must start with the axis label of the corresponding axis. Then, there may follow up to 3 numeric values that may be used to control data processing along that record dimension. If no values are specified, defaults will be set as follows: the first two values will be set to the min and max data value encountered for that dimension; the last value is set to 1. These defaults are also used, when the axisinfo part is only given as a single line (as explained above). The presence of an axisinfo part is optional.

Header Part:

The header part comprises the remaining portion of the file that precedes the axisinfo part (or the data part, if the axisinfo part is absent), except possibly for the first non-comment line in the file, which may be used as a title (see below). Therefore, if the file specifies no axisinfo part, the tail of the header part is restricted not to conform to the format explained in the previous section, since it then would be interpreted as an axisinfo part. Usually, this boils down to the simple rule that in the absence of an axisinfo part the last line of the header text must not contain d tokens, since these would otherwise be misinterpreted as axis label specifications (or as data, if they are all numeric; if the %L option is specified also d+1 tokens, with the last d of them numeric, must be avoided to prevent misinterpretation as the first data line). The presence of a header part is optional.

Title Part:

The first non-comment line in the file that is neither a data line nor an axisinfo line is interpreted as specifying a title of the file, provided its first token starts with an alphanumeric character or with an apostrophe (many files start with some auxiliary numerical parameters; the above restriction prevents them to be assigned funny numerical titles). If the first line cannot be interpreted as the title, it will become the first line of the header. The presence of a title is optional.

Record Grouping by Blank Lines:

Blank lines in the data part do not affect the contents of the returned data array; however, they are regarded as subdivision specifiers, indicating a subdivision of the entire set of n records into smaller groups of n1, n2, ... nk records. The values ni are returned at the i-th position of a vector returned by the groups method. The first element (at position 0) of that vector gives the total number k of groups that were found. Pairs of adjacent blank are enclose 0 record lines and can thus be recognized by the occurrence of a corresponding value of ni=0. Other routines can (but need not) make use of this information, e.g., to group data in the sam way as the gnuplot tool.

Handling of dates and times:

Date strings given in the format DD.MM.YY or DD.MM.YYYY are also accepted as numerical data and are converted into the number of days elapsed since 1.1.1900 (introducing a famous problem, when the short form DD.MM.YY is used). The given day will always be included in the count (ie., the date 1.1.1900 yields the count 1). If option %M (=month first) is specified, the expected format will be MM.DD.YY or MM.DD.YYYY instead. Instead of a dot, also ':' or '-' can be used as a separator. Time strings given in the form Hrs:Min:Secs are converted to seconds (which precludes the absence of Hrs, but allows Secs to be omitted).

METHODS:

Access to the read data is provided by the methods num, dim, axis_info label, header, title and groups. These methods return meaningful information only after a file has actually been read, which requires to execute the read method first.
read:
will read the file whose name is specified at input field 0 (if the file name consists of blanks only, read will do nothing). Values at input fields 1 and 2 can be used to skip a number of initial data records (specify at input field 1) and to limit the number of records read (specify at input field 2, 0 = no limit). A repeated call of read will cause re-reading of the file only when at least one of the input values has changed, or when the file modification time has changed after the last read that was effective. Input values will be initialized to values given when the unit was created.
num
single float output field, provides number n of data records found.
dim
similarly, provides number d of values per record.
title
single string output field, provides title string (empty string, if no title was found).
header
similarly, string contains header material
axis_info
specifies for each of the d axes its axis label, limits for the axis range and an optional parameter for the recommended nr of discretization intervals along an axis Defaults will be substituted, if no values were found in the file. Two different interfaces can be chosen: If the %C option was specified ( 'concatenated 'format), the unit will have four output fields: the first holds a string with all axis labels, separated by white space. The remaining three are dynamic float arrays of d elements, specifing the lower and upper range limits, and the recommended nr of discretization intervals, resp. If the %C option is absent, the unit will have two output fields and one input field of a single pin instead. The input field expects the index of an axis, and the first output field returns the label of that axis, while the second output field returns the lower and upper range limit and the reccommended nr of discretization intervals.
label
specifies for each record a label (or "", if the %L option was not specified) at the single string output field 0. Again, two different interfaces can be chosen: if the %C option was specified, the output field will provide all labels concatenated into a single long string (using blanks as separators). Otherwise, there will be an additional input field with a single float pin allowing then to obtain at the output field the label of a particular record by specifying its positional index at the input. If the %L option is absent
groups
has a single output field with a dynamic float array pin, holding in array element 0 the number k of record groups formed by intervening blank lines, and in the remaining elements 1..k the size n1...nk of each group.
limits:
two output fields with a dynamic float array pin each, holding min and max values for each axis dimension.

EXECUTION:

Execution of the created unit does nothing. However, execution of any of its named subunits (via a use_named unit) invokes the corresponding C class method. The output fields of the unit provide the data (output field 0, dynamic float array of n*d elements) and the values n and d. LOAD/SAVE: Currently, the unit will not save its data to file. Instead, it relies on the persistence of the data file that it was using (future versions may admit an option to save a copy of the used data file; however, for large files this is not recommendable anyway). To this end, the unit only saves the name of the data file (together with the nr of records to skip and the limit on the number of records to read and some check sums for the file contents) from which it most recently did a read operation. Later then, when a LOAD operation is executed, it will read back this information, but for efficiency reasons defer restoration of its state until it itself or one of its methods is executed. When the first executed method is read, the current inputs for the read method will override the loaded information and determine where the new data comes from. Otherwise, the unit does a read, but with the parameters that were previously saved. This will restore the state when the unit was saved, provided the data file that was used before has not changed in the mean time. This is verified by means of the stored check sums, and a warning is issued, if the verification failed.

CTRL OPERATIONS:

Currently, the unit does not react to any of the NST control messages.

SEE ALSO:

FILE

/local/homes/rhaschke/nst7/man/../o.linx86//../foldersrc/nst_read_table.c