Tab2MAGE logo Supported data files

Data file support in Tab2MAGE can be divided into two categories: Affymetrix data files, and everything else. Non-Affymetrix data must be supplied as plain ASCII tab-delimited text, and the column headings must be left intact. Ideally, you should be able to use the raw unprocessed data files from any of the supported software types without having to edit the data files in any way. Tab2MAGE uses the column headings within the file to identify which kind of file it is dealing with, and what the quantitation types are. The script will reformat recognized file types into MetaColumn-MetaRow format, and then strip the feature coordinates (or reporter identifiers) and the column headings, in the process generating the necessary MAGE objects to describe the data.

The following formats are supported:

Unprocessed (raw) data files

The following list gives a brief overview of how Tab2MAGE recognizes different file formats. In each case, the data file row containing the column headings is identified by matching it to these sets of known column headings.

Generic
MetaColumn/MetaRow format files are recognized using the following column headings:

MetaColumnMetaRowColumnRow

Affymetrix
Tab2MAGE recognizes and parses CEL and EXP files using both the old GDAC formats and the newer GCOS/XDA formats. These file formats are detected using the Affymetrix data file parser incorporated into the Tab2MAGE package. See below for notes on Affymetrix normalized data file formats.

GenePix
GenePix format files are recognized using the following column headings:

BlockColumnRowXY

Agilent
A file containing these headings is recognized as Agilent format file:

RowColPositionXPositionY

ScanAlyze
The following column headings are recognized as being from a ScanAlyze format file:

GRIDCOLROWLEFTTOPRIGHTBOT

ScanArray/QuantArray
ScanArray Express files are recognized from the following headings:

Array ColumnArray RowSpot ColumnSpot RowXY

while the older QuantArray format has these headings:

Array ColumnArray RowColumnRow

ArrayVision
The following column headings are recognized as indicating an ArrayVision format file:

PrimarySecondary

Newer "lg2" ArrayVision files are identified by the following column headings:

Spot labels

Spotfinder
Spotfinder files are recognized by the following column headings:

MCMRSCSR

BlueFuse
A file containing the following headings is recognized as a BlueFuse file:

COLROWSUBGRIDCOLSUBGRIDROW

UCSF Spot
UCSF Spot files are recognized by the following column headings:

Arr-colxArr-colySpot-colxSpot-coly

NimbleScan
NimbleScan files (Feature, Probe and Pair) all contain the following headings:

PROBE_IDXY

Applied Biosystems
Files generated by Applied Biosystems software have the following headings:

Probe_IDGene_ID

CodeLink
CodeLink Expression Analysis files are identified using the following:

Logical_rowLogical_colCenter_XCenter_Y

ImaGene
ImaGene files are recognized using the following columns:

Meta ColumnMeta RowColumnRowFieldGene ID

The ImaGene 3.0 format is also supported:

Meta_colMeta_rowSub_colSub_rowNameSelected

CSIRO Spot
CSIRO Spot files contain the following columns:

grid_cgrid_rspot_cspot_rindexs

Obviously, this method of determining which file type is being processed is not infallible. You are therefore encouraged to test your data files with Tab2MAGE and report any problems to .

[ Back ][ Top of page ]

Normalized data files

Normalized data files may be submitted in any of the above formats. In addition, files may be parsed using a number of special column headings which can be used to designate a column containing reporter or composite sequence identifiers:

Generic normalized data
If you have normalized data mapped to the identifiers used in your array design, you can simply use a single column containing those identifiers to include your data in the final MAGE-ML. Tab2MAGE supports the use of either Reporter Identifiers or CompositeSequence Identifiers for this purpose. Please see these ADF help notes for a discussion on these identifier types. Thus, either of the following sets of column headers may be used:

Reporter Identifier<QT1><QT2><QT3>

CompositeSequence Identifier<QT1><QT2><QT3>

where <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).

Affymetrix normalized data
Tab2MAGE recognizes and parses CHP files using both the old GDAC formats and the newer GCOS/XDA formats. In addition, Affymetrix data normalized by non-Affymetrix methods (e.g. RMA normalization) can be parsed. Either CompositeSequence identifiers (above) or either of the following sets of column headers may be used:

ProbeSet ID<QT1><QT2><QT3>

ProbeSet Name<QT1><QT2><QT3>

Again, <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).

[ Back ][ Top of page ]

Final data matrix files

(Tab2MAGE only) Often it is desirable to store the normalized data from all your hybridizations in a single file. This file must then contain information on the quantitation types measured (log ratio, RMA values etc.) and the hybridizations to which these values apply. This information must be encoded in the data file column headings, for example:

Reporter IdentifierRMA(Hyb1)(Hyb2)RMA(Hyb3)(Hyb4)
A1023400.1470.473
A1023410.530.484
A1023420.1690.188
A1023430.7420.684
A1023440.4790.514

Any of the identifier headings described in the normalized data section can be used in the final data matrix file. Please note that the identifier column must be the first column in the file.

In addition, for Tab2MAGE to correctly construct the mapping the hybridization names must also be included in the annotation spreadsheet, using the "Hybridization" column (in the Hybridization section):

Hybridization section   
File[raw] File[transformed] Array[accession] Hybridization
Data1.txtFinalDataMatrix.txtA-EXML-1Hyb1
Data2.txtFinalDataMatrix.txtA-EXML-1Hyb2
Data3.txtFinalDataMatrix.txtA-EXML-1Hyb3
Data4.txtFinalDataMatrix.txtA-EXML-1Hyb4

Please note: There are currently some limitations imposed by Tab2MAGE when parsing final data matrix files. Firstly, each file must correspond to a single array design. In experiments where multiple array designs have been used, separate final data matrix files should be included for each. For example:

Hybridization section   
File[raw] File[transformed] Array[accession] Hybridization
Data1.txtFinalDataMatrix1.txtA-EXML-1Hyb1
Data2.txtFinalDataMatrix1.txtA-EXML-1Hyb2
Data3.txtFinalDataMatrix2.txtA-EXML-2Hyb3
Data4.txtFinalDataMatrix2.txtA-EXML-2Hyb4

Secondly, the columns in the final data matrix are currently mapped to hybridizations, rather than scanning events. As a result, this system does not readily support experiment designs where hybridizations are scanned multiple times, e.g., for quality control purposes. To accurately reflect such treatments, please include your normalized data as one file per scan, and add a "Scan" column to explicitly specify which scan events belong to which hybridization, as in the following example:

Hybridization section   
Hybridization Scan File[raw] File[normalized]
Hyb1Scan1aData1a.txtNormData1a.txt
Hyb1Scan1bData1b.txtNormData1b.txt
Hyb2Scan2aData2a.txtNormData2a.txt
Hyb2Scan2bData2b.txtNormData2b.txt

Please see the MIAMExpress data file help notes (Affymetrix, two-channel) for further discussion of final data matrix files. There is also an example included with the Tab2MAGE package of how to incorporate a final data matrix into your MAGE-ML output.

For MAGE-TAB users: This Tab2MAGE/MIAMExpress final data matrix format is not supported by MAGE-TAB, which defines its own data matrix format: MAGE-TAB Data Matrix notes

[ Back ][ Top of page ]

Illumina data files

The Illumina BeadStudio software (see www.illumina.com) generates data files in several closely related formats. Tab2MAGE only supports such files reporting Probe-level (PROBE_ID) data, exported from BeadStudio v3 and above, in tab-delimited format. These files are characterised by having PROBE_ID as the first column, with subsequent column headers following the pattern "Hybridization name.QuantitationType name".

Illumina have recently released the 'ArrayExpress Data Submission Report Plug-in' for BeadStudio which will generate a raw data file, plus a basic Tab2MAGE spreadsheet for you to fill in with your experiment and sample information. This plug-in is available from the downloads section of the Illumina iCom website.

BeadStudio files may contain data columns arranged per sample, or per group of samples (so-called "Grouped data"). In contrast to the other scanning softwares noted above, Illumina files may contain data from single or multiple hybridizations.

Sample-level data is supported in "File[raw]", "File[normalized]" or "File[transformed]" columns of the Tab2MAGE spreadsheet Hybridization section. The file name should be included in all of the rows corresponding to the hybridizations it covers. The array designator for each hybridization (found in the data file column headings) must be entered into the "Hybridization" column of the spreadsheet in each appropriate row.

Grouped data is supported only as "File[normalized]" or "File[transformed]". The column headings in these data files contain user-defined tags for each group, and these tags must be included in the "Normalization" column of the spreadsheet. This allows the Tab2MAGE software to map the grouped data back to the original hybridizations.

There is one further data file type exported by BeadStudio, which is not yet supported by Tab2MAGE: the "differential expression" file. This file contains either grouped or ungrouped data reported at the gene level, but with extra columns reporting on "DiffScore" and "Concordance" between samples or groups. Such data need to be split into multiple data cubes when coding in MAGE, and it is this capability which Tab2MAGE does not currently support.

An example of a Tab2MAGE spreadsheet describing sample data at the probe level and grouped data at the gene level can be found in the examples section.

[ Back ][ Top of page ]


SourceForge.net Logo
Last modified: Tue Jun 10 15:14:04 BST 2008