Table of Contents
This chapter covers the reading and writing of tabular data from text (ASCII) files. The first section lays out some considerations for users working with text files, and explains some of the concepts and terms that apply to handling ASCII data in HIPE. This is followed by several sections of "worked examples" with data formats you may typically encounter. The remaining sections each address a particular task that you may want to accomplish.
You can choose to first go to the "worked examples" as a quick start to working with the ASCII I/O tasks in HIPE, or to jump to individual task-based sections of interest, or to read straight through the chapter from beginning to end.
Points to consider about ASCII I/O.
The best ASCII format to use with HIPE is CSV. HIPE is prepared to automatically open comma-separated values (CSV) files
.csv file extension, when double-clicking them in the
Navigator view. Files with delimiter characters
usually require less effort to parse than blank space separated files, which
can require fine-tuning and configuration of the parser class. See
Section 2.14 and the sections
after that one.
FITS files are often a better exchange format. Products in HIPE are easily exported to FITS files, which are easily read back into HIPE with metadata and history preserved. There is no general way to save a data product or a product context to text files, aside from the Spectrum products. If you must save a product to file, save it into FITS format. See Section 1.16.1 for more details.
The ASCII I/O tasks work, in general, with table datasets. Table datasets are by far the most common data structure for Herschel data. Any Herschel data product is ultimately a collection of table datasets. There are dedicated HIPE tasks to exchange data in table dataset form with text files. See the next sections for details.
For more information on table datasets, see the Scripting Guide: Section 2.4, “Table datasets”.
You can save table datasets directly to FITS format. This is the recommended way to save table datasets to file. See Section 1.16.1 for more details.
The ASCII I/O tasks are tools that often require manual configuration. Aside from a few automatically-supported formats, the tasks require some setup in order to handle all cases of data in text files. To set up all column information in a table dataset such as name, unit, type and description, typically you will have to perform some configuration on the command-line.
The ASCII I/O tasks do not automatically detect the format of the data
in a text file, with the
exception of certain
.tbl (space-separated) files.
In the Navigator view of HIPE, you can double-click on files
Spectra have their own dedicated task for writing to text
files. There is a dedicated
exportSpectrumToAscii task for exporting
spectra to text files. This task accepts as input all the most
common data types describing spectra in HIPE, including
SpectralSimpleCube. An example of using the
exportSpectrumToAscii task is given in
Section 2.2. For more
information on the
task, see Section 2.12.
Jython in HIPE includes a rich set of functionality for handling text files. There are different ways to exchange data with text files, depending on the type of data you want to exchange:
Jython lists, tuples and dictionaries. You can write these data structures to file using Jython commands, as explained in the Scripting Guide: Section 1.25, “Writing numeric values to file”.
Note that Herschel data is never distributed as plain Jython data structure, so it is unlikely you will have to write them to file.
For more information on lists, dictionaries and tuples, see the Scripting Guide: Section 1.10, “Lists, dictionaries and tuples”.
Numeric arrays, such as Double1d. You can wrap 1-dimensional numeric arrays into a table
dataset and write the table dataset to file, as explained
later in this chapter. Assuming you have a
Double1d array called
myArray, this is how you create a table
dataset containing it:
myTableDataset= TableDataset() myTableDataset["myColumn"] = Column(
Example 2.1. Creating a TableDataset with a Column made up of array data.
Numeric arrays may be written to a file using the print
statement. Consider two
Double1d arrays named
flux with equal lengths:
wavelength = Double1d(5, 1.0) flux = Double1d(5, 1.0) fh = open('myspectrum.txt','w') for i in range(len(wavelength)): print >> fh, '%13.6f %13.6f'% (wavelength[i], flux[i]) fh.close()
Example 2.2. Read a numeric array from a file and loop over its values.
You can read back the values as follows:
fh = open('myspectrum.txt') lines = fh.readlines() wave = Double1d() fl = Double1d() for line in lines: lsplit = line.split() wave.append(float(lsplit)) fl.append(float(lsplit)) fh.close()
Example 2.3. Read a numeric array from a file and tokenise its values in a loop.
For more information on Numeric arrays, see the Scripting Guide: Section 2.2, “Numeric arrays”.
Concepts in working with the ASCII I/O tasks. There are several concepts and terms that you need to know to work with the full functionality of the ASCII I/O tasks.
Parsers. A parser defines rules to read a text file into HIPE. See Section 2.26 for the available types of parser, their features and how to configure them.
Formatters. A formatter defines rules to write data from HIPE into a text file. See Section 2.27 for the available types of formatter, their features and how to configure them.
Table templates. A table template describes the data to be read from, or written to, a text file. It defines the number of columns in the file, their name, the type and description of the data. While the parser defines general formatting rules, such as the character used to separate data values, the table template describes the data themselves. See Section 2.25 for how to create and configure a table template.
Configuration files. You can use a configuration file to store a particular configuration of the tasks for reading and writing text files. You can then load the configuration file for subsequent executions of the task. See Section 2.18 and Section 2.23 for instructions.
Delimiters. A delimiter is a character that denotes a boundary between fields in a text file. The most common delimiter is a comma. For more information on specifying delimiters, see Section 2.17.
Regular expressions. A regular expression is a concise and
flexible means to match strings of text, such as particular
characters or patterns of characters. Regular expressions are used
to specify which lines of a file to skip, as discussed in Section 2.15, and with the
RegexParser for specifying the delimiter between data fields (for
example, to specify multiple spaces or tabs). A discussion of
regular expressions is outside the scope of this manual, but Section 2.28 contains a few examples.