Chapter 2. Saving data as text files

Table of Contents

2.1. Considerations and concepts for working with text files
2.2. Worked example: Saving a Spectrum product as a text file
2.3. Worked example: Saving a SourceListProduct as a text file
2.4. Worked example: Reading a Spitzer spectrum into a table dataset
2.5. Worked example: Reading a VizieR catalogue into a table dataset
2.6. Reading a comma-separated-value (CSV) file into a table dataset
2.7. Reading a space-separated file into a table dataset
2.8. Reading an IPAC, SExtractor or Topcat file into a table dataset
2.9. Reading a generic ASCII table file into a table dataset
2.10. Writing a table dataset to a comma-separated-values (CSV) file
2.11. Writing a table dataset into a space-separated-value file
2.12. Writing a spectrum to an ASCII table file
2.13. Writing a table dataset to a generic ASCII table file
2.14. Reading column names from a file
2.15. Defining which lines to ignore when reading a file
2.16. Specifying the data types when reading a file
2.17. Specifying how data values are separated when reading a file
2.18. Saving and loading a configuration for reading from file
2.19. Adding a header to an ASCII table file
2.20. Adding table dataset metadata to an ASCII table file
2.21. Defining a custom prefix for commented lines
2.22. Choosing how to separate data values
2.23. Saving and loading options for writing to file
2.24. Parsers, formatters and templates
2.25. Creating and configuring table templates
2.26. Creating and configuring parsers for reading in data
2.27. Creating and configuring formatters for writing data
2.28. Regular expressions

This chapter covers the reading and writing of tabular data from text (ASCII) files. The first section lays out some considerations for users working with text files, and explains some of the concepts and terms that apply to handling ASCII data in HIPE. This is followed by several sections of "worked examples" with data formats you may typically encounter. The remaining sections each address a particular task that you may want to accomplish.

You can choose to first go to the "worked examples" as a quick start to working with the ASCII I/O tasks in HIPE, or to jump to individual task-based sections of interest, or to read straight through the chapter from beginning to end.

2.1. Considerations and concepts for working with text files

Points to consider about ASCII I/O. 

  • The best ASCII format to use with HIPE is CSV. HIPE is prepared to automatically open comma-separated values (CSV) files with the .csv file extension, when double-clicking them in the Navigator view. Files with delimiter characters usually require less effort to parse than blank space separated files, which can require fine-tuning and configuration of the parser class. See Section 2.14 and the sections after that one.

  • FITS files are often a better exchange format. Products in HIPE are easily exported to FITS files, which are easily read back into HIPE with metadata and history preserved. There is no general way to save a data product or a product context to text files, aside from the Spectrum products. If you must save a product to file, save it into FITS format. See Section 1.16.1 for more details.

  • The ASCII I/O tasks work, in general, with table datasets. Table datasets are by far the most common data structure for Herschel data. Any Herschel data product is ultimately a collection of table datasets. There are dedicated HIPE tasks to exchange data in table dataset form with text files. See the next sections for details.

    For more information on table datasets, see the Scripting Guide: Section 2.4, “Table datasets”.

    [Tip] Tip

    You can save table datasets directly to FITS format. This is the recommended way to save table datasets to file. See Section 1.16.1 for more details.

  • The ASCII I/O tasks are tools that often require manual configuration. Aside from a few automatically-supported formats, the tasks require some setup in order to handle all cases of data in text files. To set up all column information in a table dataset such as name, unit, type and description, typically you will have to perform some configuration on the command-line.

    The ASCII I/O tasks do not automatically detect the format of the data in a text file, with the exception of certain .csv (comma-separated-values) and .tbl (space-separated) files.

    [Tip] Tip

    In the Navigator view of HIPE, you can double-click on files ending in .csv or .tbl, to read these in as, respectively, comma-separated-value or space-separated tables.

  • Spectra have their own dedicated task for writing to text files. There is a dedicated exportSpectrumToAscii task for exporting spectra to text files. This task accepts as input all the most common data types describing spectra in HIPE, including Spectrum1d, Spectrum2d and SpectralSimpleCube. An example of using the exportSpectrumToAscii task is given in Section 2.2. For more information on the exportSpectrumToAscii task, see Section 2.12.

  • Jython in HIPE includes a rich set of functionality for handling text files. There are different ways to exchange data with text files, depending on the type of data you want to exchange:

    • Jython lists, tuples and dictionaries. You can write these data structures to file using Jython commands, as explained in the Scripting Guide: Section 1.25, “Writing numeric values to file”.

      Note that Herschel data is never distributed as plain Jython data structure, so it is unlikely you will have to write them to file.

      For more information on lists, dictionaries and tuples, see the Scripting Guide: Section 1.10, “Lists, dictionaries and tuples”.

    • Numeric arrays, such as Double1d. You can wrap 1-dimensional numeric arrays into a table dataset and write the table dataset to file, as explained later in this chapter. Assuming you have a Double1d array called myArray, this is how you create a table dataset containing it:

      myTableDataset = TableDataset()
      myTableDataset["myColumn"] = Column(myArray)

      Example 2.1. Creating a TableDataset with a Column made up of array data.

      Numeric arrays may be written to a file using the print statement. Consider two Double1d arrays named wavelength and flux with equal lengths:

      wavelength = Double1d(5, 1.0)
      flux = Double1d(5, 1.0)
      fh = open('myspectrum.txt','w')
      for i in range(len(wavelength)):
          print >> fh, '%13.6f %13.6f'% (wavelength[i], flux[i])

      Example 2.2. Read a numeric array from a file and loop over its values.

      For more information about formatting strings and printing to file, see the Scripting Guide, Section 1.23, “Writing strings to file” and Section 1.8, “Formatting strings” respectively.

      You can read back the values as follows:

      fh = open('myspectrum.txt')
      lines = fh.readlines()
      wave = Double1d()
      fl = Double1d()
      for line in lines:
          lsplit = line.split()

      Example 2.3. Read a numeric array from a file and tokenise its values in a loop.

      For more information on Numeric arrays, see the Scripting Guide: Section 2.2, “Numeric arrays”.

Concepts in working with the ASCII I/O tasks. There are several concepts and terms that you need to know to work with the full functionality of the ASCII I/O tasks.

  • Parsers. A parser defines rules to read a text file into HIPE. See Section 2.26 for the available types of parser, their features and how to configure them.

  • Formatters. A formatter defines rules to write data from HIPE into a text file. See Section 2.27 for the available types of formatter, their features and how to configure them.

  • Table templates. A table template describes the data to be read from, or written to, a text file. It defines the number of columns in the file, their name, the type and description of the data. While the parser defines general formatting rules, such as the character used to separate data values, the table template describes the data themselves. See Section 2.25 for how to create and configure a table template.

  • Configuration files. You can use a configuration file to store a particular configuration of the tasks for reading and writing text files. You can then load the configuration file for subsequent executions of the task. See Section 2.18 and Section 2.23 for instructions.

  • Delimiters. A delimiter is a character that denotes a boundary between fields in a text file. The most common delimiter is a comma. For more information on specifying delimiters, see Section 2.17.

  • Regular expressions. A regular expression is a concise and flexible means to match strings of text, such as particular characters or patterns of characters. Regular expressions are used to specify which lines of a file to skip, as discussed in Section 2.15, and with the RegexParser for specifying the delimiter between data fields (for example, to specify multiple spaces or tabs). A discussion of regular expressions is outside the scope of this manual, but Section 2.28 contains a few examples.