Typical usage

This page gives examples of the typical ways in which the scripts in this package are invoked, and addresses some common pitfalls.

Unknown Data File Columns

Once you have created your sample annotation spreadsheet, you are ready to run the tab2mage.pl script. In an ideal world, you could simply invoke the following command:

tab2mage.pl -e <spreadsheet filename> -t <target directory to be created>

This command should be executed from within the directory containing the data files and the annotation spreadsheet.

In reality, the situation is more complicated. Manufacturers of scanning and image processing software change the format of their output data files from time to time. The most obvious example of this is when the column headings (quantitation types) within a data file are altered. Additionally, data files may have been modified in subsequent analyses, adding customized columns. This creates a problem, because for accurate encoding in MAGE-ML we need to know more information about these quantitation types than just their name. Tab2MAGE stores this information in a file named QT_list.txt within the installed library directory. This file contains the information gathered so far by ArrayExpress curators. A new version of this file is released with each Tab2MAGE release as more information becomes available. However, it is entirely possible that you will be using data files which contain unknown quantitation types.

The default action for Tab2MAGE is to ignore unknown quantitation types. The package was designed like this to allow very precise control of which data columns are exported to MAGE-ML, and which are discarded. Tab2MAGE includes a mechanism to customize its behavior so that a given site installation can set up a default set of data columns which are then always exported.

Any quantitation types which are discarded during Tab2MAGE processing will be reported in the tab2mage.log file, created in the target directory. To determine which columns are at risk before running the tab2mage.pl script, we recommend using the experiment checker script in Tab2MAGE mode:

expt_check.pl -e <Tab2MAGE spreadsheet filename>

This script will run some checks on the data files and generate a series of log files. One of these, the expt_columnheadings.log file, lists the unknown columns which would be discarded in a Tab2MAGE run.

For Tab2MAGE to export unknown quantitation types, you must override the built-in quantitation type information using the -q or -Q command-line options:

expt_check.pl -q <new QT file> -e <spreadsheet filename>

tab2mage.pl -q <new QT file> -e <spreadsheet filename> -t <target directory to be created>

Note that using the script with the -q option will ignore the default QT information supplied with the script; you must, therefore, include all the QTs you wish to export in your new QT file. Alternatively, you may wish to use the -Q option, which will use the new QTs alongside those embedded in the Tab2MAGE package (but see the note below). As a last resort in case of difficulties, the tab2mage.pl script also accepts a -k option, which will keep all the unknown QTs, including them in the output MAGE-ML as generic SpecializedQuantitationTypes.

The format of the QT file is described in these QT file format notes, and the other_QTs.txt file included with the package provides a set of examples. At its simplest, however, the file can be laid out in this form:

>>>Software name[Manufacturer]
QT name 1
QT name 2
QT name 3
QT name 4
QT name 5
QT name 6...

(This example will create every quantitation type as a SpecializedQuantitationType object).

For correct creation of the appropriate QuantitationType subclasses, with suitable associations and attributes, please see the QT file format notes.

Note: scanning software packages often use column headings which are ambiguous in their origin. Some column headings (e.g., "Flags") are used by many different software manufacturers, and this can lead to ambiguities in assigning QTs to software type. As a result, the Tab2MAGE scripts have to assume that there is only one software type per data file. The desired quantitation types for a given data file must all be grouped together under a single software group. This prevents the creation of incorrect quantitation types for ambiguous column headings which are used by more than one software type.

[ Back ][ Top of page ]