Supported data files

Data file support in Tab2MAGE can be divided into two categories: Affymetrix data files, and everything else. Non-Affymetrix data must be supplied as plain ASCII tab-delimited text, and the column headings must be left intact. Ideally, you should be able to use the raw unprocessed data files from any of the supported software types without having to edit the data files in any way. Tab2MAGE uses the column headings within the file to identify which kind of file it is dealing with, and what the quantitation types are. The script will reformat recognized file types into MetaColumn-MetaRow format, and then strip the feature coordinates (or reporter identifiers) and the column headings, in the process generating the necessary MAGE objects to describe the data.

Unprocessed (raw) data files

The following list gives a brief overview of how Tab2MAGE recognizes different file formats. In each case, the data file row containing the column headings is identified by matching it to these sets of known column headings.

Generic
MetaColumn/MetaRow format files are recognized using the following column headings:

MetaColumn

MetaRow

Column

Row

Affymetrix
Tab2MAGE recognizes and parses CEL and EXP files using both the old GDAC formats and the newer GCOS/XDA formats. These file formats are detected using the Affymetrix data file parser incorporated into the Tab2MAGE package. See below for notes on Affymetrix normalized data file formats.

GenePix
GenePix format files are recognized using the following column headings:

Block

Column

Row

Agilent
A file containing these headings is recognized as Agilent format file:

Row

Col

PositionX

PositionY

ScanAlyze
The following column headings are recognized as being from a ScanAlyze format file:

GRID

COL

ROW

LEFT

TOP

RIGHT

BOT

ScanArray/QuantArray
ScanArray Express files are recognized from the following headings:

Array Column

Array Row

Spot Column

Spot Row

while the older QuantArray format has these headings:

Array Column

Array Row

Column

Row

ArrayVision
The following column headings are recognized as indicating an ArrayVision format file:

Primary

Secondary

Newer "lg2" ArrayVision files are identified by the following column headings:

Spot labels

Spotfinder
Spotfinder files are recognized by the following column headings:

BlueFuse
A file containing the following headings is recognized as a BlueFuse file:

COL

ROW

SUBGRIDCOL

SUBGRIDROW

UCSF Spot
UCSF Spot files are recognized by the following column headings:

Arr-colx

Arr-coly

Spot-colx

Spot-coly

NimbleScan
NimbleScan files (Feature, Probe and Pair) all contain the following headings:

PROBE_ID

Applied Biosystems
Files generated by Applied Biosystems software have the following headings:

Probe_ID

Gene_ID

CodeLink
CodeLink Expression Analysis files are identified using the following:

Logical_row

Logical_col

Center_X

Center_Y

ImaGene
ImaGene files are recognized using the following columns:

Meta Column

Meta Row

Column

Row

Field

Gene ID

The ImaGene 3.0 format is also supported:

Meta_col

Meta_row

Sub_col

Sub_row

Name

Selected

CSIRO Spot
CSIRO Spot files contain the following columns:

grid_c

grid_r

spot_c

spot_r

indexs

Obviously, this method of determining which file type is being processed is not infallible. You are therefore encouraged to test your data files with Tab2MAGE and report any problems to .

[ Back ][ Top of page ]

Normalized data files

Normalized data files may be submitted in any of the above formats. In addition, files may be parsed using a number of special column headings which can be used to designate a column containing reporter or composite sequence identifiers:

Generic normalized data
If you have normalized data mapped to the identifiers used in your array design, you can simply use a single column containing those identifiers to include your data in the final MAGE-ML. Tab2MAGE supports the use of either Reporter Identifiers or CompositeSequence Identifiers for this purpose. Please see these ADF help notes for a discussion on these identifier types. Thus, either of the following sets of column headers may be used:

Reporter Identifier

<QT1>

<QT2>

<QT3>

CompositeSequence Identifier <QT1> <QT2> <QT3>

where <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).

Affymetrix normalized data
Tab2MAGE recognizes and parses CHP files using both the old GDAC formats and the newer GCOS/XDA formats. In addition, Affymetrix data normalized by non-Affymetrix methods (e.g. RMA normalization) can be parsed. Either CompositeSequence identifiers (above) or either of the following sets of column headers may be used:

ProbeSet ID

<QT1>

<QT2>

<QT3>

ProbeSet Name <QT1> <QT2> <QT3>

Again, <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).

[ Back ][ Top of page ]

Final data matrix files

(Tab2MAGE only) Often it is desirable to store the normalized data from all your hybridizations in a single file. This file must then contain information on the quantitation types measured (log ratio, RMA values etc.) and the hybridizations to which these values apply. This information must be encoded in the data file column headings, for example:

Reporter Identifier	RMA(Hyb1)(Hyb2)	RMA(Hyb3)(Hyb4)
A102340	0.147	0.473
A102341	0.53	0.484
A102342	0.169	0.188
A102343	0.742	0.684
A102344	0.479	0.514

Any of the identifier headings described in the normalized data section can be used in the final data matrix file. Please note that the identifier column must be the first column in the file.

In addition, for Tab2MAGE to correctly construct the mapping the hybridization names must also be included in the annotation spreadsheet, using the "Hybridization" column (in the Hybridization section):

Hybridization section
File[raw]	File[transformed]	Array[accession]	Hybridization
Data1.txt	FinalDataMatrix.txt	A-EXML-1	Hyb1
Data2.txt	FinalDataMatrix.txt	A-EXML-1	Hyb2
Data3.txt	FinalDataMatrix.txt	A-EXML-1	Hyb3
Data4.txt	FinalDataMatrix.txt	A-EXML-1	Hyb4

Please note: There are currently some limitations imposed by Tab2MAGE when parsing final data matrix files. Firstly, each file must correspond to a single array design. In experiments where multiple array designs have been used, separate final data matrix files should be included for each. For example:

Hybridization section
File[raw]	File[transformed]	Array[accession]	Hybridization
Data1.txt	FinalDataMatrix1.txt	A-EXML-1	Hyb1
Data2.txt	FinalDataMatrix1.txt	A-EXML-1	Hyb2
Data3.txt	FinalDataMatrix2.txt	A-EXML-2	Hyb3
Data4.txt	FinalDataMatrix2.txt	A-EXML-2	Hyb4

Secondly, the columns in the final data matrix are currently mapped to hybridizations, rather than scanning events. As a result, this system does not readily support experiment designs where hybridizations are scanned multiple times, e.g., for quality control purposes. To accurately reflect such treatments, please include your normalized data as one file per scan, and add a "Scan" column to explicitly specify which scan events belong to which hybridization, as in the following example:

Hybridization section
Hybridization	Scan	File[raw]	File[normalized]
Hyb1	Scan1a	Data1a.txt	NormData1a.txt
Hyb1	Scan1b	Data1b.txt	NormData1b.txt
Hyb2	Scan2a	Data2a.txt	NormData2a.txt
Hyb2	Scan2b	Data2b.txt	NormData2b.txt

Please see the MIAMExpress data file help notes (Affymetrix, two-channel) for further discussion of final data matrix files. There is also an example included with the Tab2MAGE package of how to incorporate a final data matrix into your MAGE-ML output.

For MAGE-TAB users: This Tab2MAGE/MIAMExpress final data matrix format is not supported by MAGE-TAB, which defines its own data matrix format: MAGE-TAB Data Matrix notes

[ Back ][ Top of page ]

Illumina data files

The Illumina BeadStudio software (see www.illumina.com) generates data files in several closely related formats. Tab2MAGE only supports such files reporting Probe-level (PROBE_ID) data, exported from BeadStudio v3 and above, in tab-delimited format. These files are characterised by having PROBE_ID as the first column, with subsequent column headers following the pattern "Hybridization name.QuantitationType name".

Illumina have recently released the 'ArrayExpress Data Submission Report Plug-in' for BeadStudio which will generate a raw data file, plus a basic Tab2MAGE spreadsheet for you to fill in with your experiment and sample information. This plug-in is available from the downloads section of the Illumina iCom website.

BeadStudio files may contain data columns arranged per sample, or per group of samples (so-called "Grouped data"). In contrast to the other scanning softwares noted above, Illumina files may contain data from single or multiple hybridizations.

Sample-level data is supported in "File[raw]", "File[normalized]" or "File[transformed]" columns of the Tab2MAGE spreadsheet Hybridization section. The file name should be included in all of the rows corresponding to the hybridizations it covers. The array designator for each hybridization (found in the data file column headings) must be entered into the "Hybridization" column of the spreadsheet in each appropriate row.

Grouped data is supported only as "File[normalized]" or "File[transformed]". The column headings in these data files contain user-defined tags for each group, and these tags must be included in the "Normalization" column of the spreadsheet. This allows the Tab2MAGE software to map the grouped data back to the original hybridizations.

There is one further data file type exported by BeadStudio, which is not yet supported by Tab2MAGE: the "differential expression" file. This file contains either grouped or ungrouped data reported at the gene level, but with extra columns reporting on "DiffScore" and "Concordance" between samples or groups. Such data need to be split into multiple data cubes when coding in MAGE, and it is this capability which Tab2MAGE does not currently support.

An example of a Tab2MAGE spreadsheet describing sample data at the probe level and grouped data at the gene level can be found in the examples section.

[ Back ][ Top of page ]