NAME
read_table -- create unit to load a data file
PROTOTYPE
unitptr nst_read_table( char *pcFile, int iFirstRecord, int iMaxRecords, char *pcOptions, unitptr uHost)
ARGUMENTS
- char *pcFile
- file name
- int iFirstRecord
- first record to use
- int iMaxRecords
- max nr of records
- char *pcOptions
- option string (may be NULL). %L causes using column 0 for labels %C provides labels in concatenate form %M sets parsing of date strings such that month is expected first.
- unitptr uHost
- host unit
RETURN VALUE:
A pointer to the created unit or NULL in the case of an error.
INTERFACE:
- X_out[]:
- (dyn packed output field 0) the data, concatenated as a long vector
- Y_out[]:
- (output field 1) the dimension of each data record
- Z_out[]:
- (output field 2) total nr of data records read
SYNOPSIS:
This unit will read a table of numerical values from
a data file, skipping comment lines starting
with a '#' -character and parsing the remaining lines into TITLE,
HEADER, AXISINFO and DATA parts, the contents of each of which can be
retrieved with the correspondingly named methods. The only assumptions
about the file format are that the DATA part must consist of
a number n of lines (``records''), each containing the same number d
(``record dimension'') of numerical values (suitably formatted
date or time specifications are also admissible), optionally preceded by
an additional token that then can be interpreted as a 'record
'label for the subsequent d values (to enable that option, the
option string pcOptions must contain the token %L ).
Neither n nor d need to be specified, the unit will
figure them out by looking at the file. To make this possible,
there are a few restrictions on the admissible format of the
non-data parts of the file (explained in more detail below).
Usually, the following format is ok: if there is a title, put
it in the first line, optionally followed by header text lines,
optionally followed by a line with a non-numeric axis label
for each column, followed by the data lines, e.g.
# this is a comment
This is the title
The next two lines are the header part. The file
describes some physical properties of three liquids.
density melting_point boiling_point
water 1.0 0 100
alcohol 0.8 -36 72
mercury 13.6 -34 336
# the file ends here
INTERFACE OF CREATED UNIT:
- out_0[]
- single pin, holding dynamic float array of n*d elements.
It provides the read data records, concatenated into a single, long
array. The data of record i (i=0,1,...) starts at element position
d*i.
- out_1[1]
- single scalar float pin, holding record dimension d
- out_2[1]
- single scalar float pin, holding total number n of
records read.
NOTE: the output values are meaningful only after the read method
of the present unit has been executed.
EXPECTED FILE FORMAT:
Data Part:
The unit expects the data to be purely numeric (or date/time strings,
cf. below), and consisting of n
records, each given on a single line and consisting of the same number
d ( 'record 'dimension) of numerical values (the tokens NaN,
+Inf and -Inf will also be considered as numerical
values; NaN can, e.g., be used to indicate missing values).
When scanning the file, it will interprete the last group of such lines
as the data portion, setting n and d correspondingly.
Record Labels:
If the %L option is specified, the first token on each data line
is used as a label for that data line (and may, in this case, also
be non-numeric). Thus, the value of d then is one less than the
number of tokens per data line.
Axis Info:
If the the data portion is preceded by a line with the same number
d of tokens as the record dimension, this line becomes the
the axisinfo part: its tokens will be interpreted as axis labels
(Note that it, therefore, is impossible to specify
d axis labels that are all numeric; the corresponding line could not
be distinguished from a data line and would, therefore, be read as
the first data line by the present unit).
Alternatively, the axisinfo part may also be given as d separate
lines that precede the Data Part. In this case, each of these lines
must start with the axis label of the corresponding axis. Then,
there may follow up to 3 numeric values that may be used to control
data processing along that record dimension. If no values are specified,
defaults will be set as follows: the first two values will be set
to the min and max data value encountered for that dimension; the
last value is set to 1. These defaults are also used, when the
axisinfo part is only given as a single line (as explained above).
The presence of an axisinfo part is optional.
Header Part:
The header part comprises the remaining portion of the file that
precedes the axisinfo part (or the data part, if the axisinfo part
is absent), except possibly for the first non-comment line in the file,
which may be used as a title (see below). Therefore, if the file specifies
no axisinfo part, the tail of the header part is restricted not to
conform to the format explained in the previous section, since it
then would be interpreted as an axisinfo part. Usually, this boils
down to the simple rule that in the absence of an axisinfo part
the last line of the header text must not contain d tokens, since
these would otherwise be misinterpreted as axis label specifications
(or as data, if they are all numeric; if the %L option is specified
also d+1 tokens, with the last d of them numeric, must be avoided
to prevent misinterpretation as the first data line).
The presence of a header part is optional.
Title Part:
The first non-comment line in the file that is neither a data line
nor an axisinfo line is interpreted as specifying a title of the file,
provided its first token starts with an alphanumeric character or
with an apostrophe (many files start with some auxiliary numerical
parameters; the above restriction prevents them to be assigned
funny numerical titles). If the first line cannot be interpreted
as the title, it will become the first line of the header.
The presence of a title is optional.
Record Grouping by Blank Lines:
Blank lines in the data part do not affect the contents of the returned
data array; however, they are regarded as subdivision specifiers,
indicating a subdivision of the entire set of n records into smaller groups
of n1, n2, ... nk records. The values ni are returned at the i-th
position of a vector returned by the groups method. The first element
(at position 0) of that vector gives the total number k of groups
that were found. Pairs of adjacent blank are enclose 0 record lines
and can thus be recognized by the occurrence of a corresponding
value of ni=0. Other routines can (but need not) make use of this
information, e.g., to group data in the sam way as the gnuplot tool.
Handling of dates and times:
Date strings given in the format DD.MM.YY or DD.MM.YYYY are also
accepted as numerical data and are converted into the number of
days elapsed since 1.1.1900 (introducing a famous problem, when
the short form DD.MM.YY is used). The given day will always be
included in the count (ie., the date 1.1.1900 yields the count 1).
If option %M (=month first) is specified, the expected format
will be MM.DD.YY or MM.DD.YYYY instead. Instead of a dot,
also ':' or '-' can be used as a separator.
Time strings given in the form Hrs:Min:Secs are converted to
seconds (which precludes the absence of Hrs, but allows
Secs to be omitted).
METHODS:
Access to the read data is provided by the methods
num, dim, axis_info label, header, title and groups.
These methods return meaningful information only after a
file has actually been read, which requires to execute
the read method first.
- read:
- will read the file whose name is specified at input field 0
(if the file name consists of blanks only, read will do nothing).
Values at input fields 1 and 2 can be used to skip a number of initial
data records (specify at input field 1) and to limit the number of
records read (specify at input field 2, 0 = no limit). A repeated
call of read will cause re-reading of the file only when at least
one of the input values has changed, or when the file modification
time has changed after the last read that was effective. Input values
will be initialized to values given when the unit was created.
- num
- single float output field, provides number n of data
records found.
- dim
- similarly, provides number d of values per record.
- title
- single string output field, provides title string
(empty string, if no title was found).
- header
- similarly, string contains header material
- axis_info
- specifies for each of the d axes its axis label,
limits for the axis range and an optional parameter
for the recommended nr of discretization intervals along an axis
Defaults will be substituted, if no values were found in the file.
Two different interfaces can be chosen: If the %C option was
specified ( 'concatenated 'format), the unit will have four output
fields: the first holds a string with all axis labels, separated by
white space. The remaining three are dynamic float arrays of
d elements, specifing the lower and upper range limits, and the
recommended nr of discretization intervals, resp.
If the %C option is absent, the unit will have two output fields
and one input field of a single pin instead. The input field expects the
index of an axis, and the first output field returns the label
of that axis, while the second output field returns the
lower and upper range limit and the reccommended nr of
discretization intervals.
- label
- specifies for each record a label (or "", if the
%L option was not specified) at the single string output field 0.
Again, two different interfaces can be chosen: if the %C option
was specified, the output field will provide all labels
concatenated into a single long string (using blanks as separators).
Otherwise, there will be an additional input field with a single
float pin allowing then to obtain at the output field the label of
a particular record by specifying its positional index at the
input. If the %L option is absent
- groups
- has a single output field with a dynamic float array
pin, holding in array element 0 the number k of record groups formed by
intervening blank lines, and in the remaining elements 1..k
the size n1...nk of each group.
- limits:
- two output fields with a dynamic float array pin each,
holding min and max values for each axis dimension.
EXECUTION:
Execution of the created unit does nothing. However, execution of
any of its named subunits (via a use_named unit) invokes the
corresponding C class method. The output fields of the unit
provide the data (output field 0, dynamic float array of
n*d elements) and the values n and d.
LOAD/SAVE:
Currently, the unit will not save its data to file. Instead,
it relies on the persistence of the data file that it was
using (future versions may admit an option to save a copy
of the used data file; however, for large files this is not
recommendable anyway).
To this end, the unit only saves the name of the data file
(together with the nr of records to skip and the limit on the
number of records to read and some check sums for the file
contents) from which it most recently did a
read operation. Later then, when a LOAD operation is executed,
it will read back this information, but for efficiency reasons
defer restoration of its state until it itself or one of its
methods is executed. When the first executed method is read,
the current inputs for the read method will override the loaded
information and determine where the new data comes from.
Otherwise, the unit does a read, but with the parameters that
were previously saved. This will restore the state when the unit
was saved, provided the data file that was used before
has not changed in the mean time. This is verified by means
of the stored check sums, and a warning is issued, if the
verification failed.
CTRL OPERATIONS:
Currently, the unit does not react to any of the NST control
messages.
SEE ALSO:
FILE
/amnt/loge/users/nistaff02/nistaff/rhaschke/nst7/man/../o.linux//../foldersrc/nst_read_table.c