Data file support in Tab2MAGE can be divided into two categories: Affymetrix data files, and everything else. Non-Affymetrix data must be supplied as plain ASCII tab-delimited text, and the column headings must be left intact. Ideally, you should be able to use the raw unprocessed data files from any of the supported software types without having to edit the data files in any way. Tab2MAGE uses the column headings within the file to identify which kind of file it is dealing with, and what the quantitation types are. The script will reformat recognized file types into MetaColumn-MetaRow format, and then strip the feature coordinates (or reporter identifiers) and the column headings, in the process generating the necessary MAGE objects to describe the data.
The following formats are supported:
The following list gives a brief overview of how Tab2MAGE recognizes different file formats. In each case, the data file row containing the column headings is identified by matching it to these sets of known column headings.
Generic
MetaColumn/MetaRow format files are recognized using the
following column headings:
MetaColumn | MetaRow | Column | Row |
Affymetrix
Tab2MAGE recognizes and
parses CEL and EXP files using both the old GDAC formats and the
newer GCOS/XDA formats. These file formats are detected using
the Affymetrix data file parser incorporated into the Tab2MAGE
package. See below for notes on Affymetrix
normalized data file formats.
GenePix
GenePix format files are recognized using the following column headings:
Block | Column | Row | X | Y |
Agilent
A file containing these headings is recognized as Agilent format file:
Row | Col | PositionX | PositionY |
ScanAlyze
The following column headings
are recognized as being from a ScanAlyze format file:
GRID | COL | ROW | LEFT | TOP | RIGHT | BOT |
ScanArray/QuantArray
ScanArray Express
files are recognized from the following headings:
Array Column | Array Row | Spot Column | Spot Row | X | Y |
while the older QuantArray format has these headings:
Array Column | Array Row | Column | Row |
ArrayVision
The following column
headings are recognized as indicating an ArrayVision format
file:
Primary | Secondary |
Newer "lg2" ArrayVision files are identified by the following column headings:
Spot labels |
Spotfinder
Spotfinder files are
recognized by the following column headings:
MC | MR | SC | SR |
BlueFuse
A file containing the following headings is recognized as a
BlueFuse file:
COL | ROW | SUBGRIDCOL | SUBGRIDROW |
UCSF Spot
UCSF Spot files are
recognized by the following column headings:
Arr-colx | Arr-coly | Spot-colx | Spot-coly |
NimbleScan
NimbleScan files (Feature,
Probe and Pair) all contain the following headings:
PROBE_ID | X | Y |
Applied Biosystems
Files generated by
Applied Biosystems software have the following headings:
Probe_ID | Gene_ID |
CodeLink
CodeLink Expression Analysis files are identified using the
following:
Logical_row | Logical_col | Center_X | Center_Y |
ImaGene
ImaGene files are recognized using the following columns:
Meta Column | Meta Row | Column | Row | Field | Gene ID |
The ImaGene 3.0 format is also supported:
Meta_col | Meta_row | Sub_col | Sub_row | Name | Selected |
CSIRO Spot
CSIRO Spot files contain the following columns:
grid_c | grid_r | spot_c | spot_r | indexs |
Obviously, this method of determining which file type is being processed is not infallible. You are therefore encouraged to test your data files with Tab2MAGE and report any problems to .
[ Back ][ Top of page ]
Normalized data files may be submitted in any of the above formats. In addition, files may be parsed using a number of special column headings which can be used to designate a column containing reporter or composite sequence identifiers:
Generic
normalized data
If you have normalized data mapped to
the identifiers used in your array design, you can simply use a
single column containing those identifiers to include your data
in the final MAGE-ML. Tab2MAGE supports the use of either
Reporter Identifiers or CompositeSequence Identifiers for this
purpose. Please see these ADF help
notes for a discussion on these identifier types. Thus,
either of the following sets of column headers may be used:
Reporter Identifier | <QT1> | <QT2> | <QT3> |
CompositeSequence Identifier | <QT1> | <QT2> | <QT3> |
where <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).
Affymetrix
normalized data
Tab2MAGE recognizes and parses CHP files
using both the old GDAC formats and the newer GCOS/XDA
formats. In addition, Affymetrix data normalized by
non-Affymetrix methods (e.g. RMA normalization) can be
parsed. Either CompositeSequence identifiers (above) or either of the following sets of
column headers may be used:
ProbeSet ID | <QT1> | <QT2> | <QT3> |
ProbeSet Name | <QT1> | <QT2> | <QT3> |
Again, <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).
[ Back ][ Top of page ]
(Tab2MAGE only) Often it is desirable to store the normalized data from all your hybridizations in a single file. This file must then contain information on the quantitation types measured (log ratio, RMA values etc.) and the hybridizations to which these values apply. This information must be encoded in the data file column headings, for example:
Reporter Identifier | RMA(Hyb1)(Hyb2) | RMA(Hyb3)(Hyb4) |
A102340 | 0.147 | 0.473 |
A102341 | 0.53 | 0.484 |
A102342 | 0.169 | 0.188 |
A102343 | 0.742 | 0.684 |
A102344 | 0.479 | 0.514 |
Any of the identifier headings described in the normalized data section can be used in the final data matrix file. Please note that the identifier column must be the first column in the file.
In addition, for Tab2MAGE to correctly construct the mapping the hybridization names must also be included in the annotation spreadsheet, using the "Hybridization" column (in the Hybridization section):
Hybridization section | |||
File[raw] | File[transformed] | Array[accession] | Hybridization |
Data1.txt | FinalDataMatrix.txt | A-EXML-1 | Hyb1 |
Data2.txt | FinalDataMatrix.txt | A-EXML-1 | Hyb2 |
Data3.txt | FinalDataMatrix.txt | A-EXML-1 | Hyb3 |
Data4.txt | FinalDataMatrix.txt | A-EXML-1 | Hyb4 |
Please note: There are currently some limitations imposed by Tab2MAGE when parsing final data matrix files. Firstly, each file must correspond to a single array design. In experiments where multiple array designs have been used, separate final data matrix files should be included for each. For example:
Hybridization section | |||
File[raw] | File[transformed] | Array[accession] | Hybridization |
Data1.txt | FinalDataMatrix1.txt | A-EXML-1 | Hyb1 |
Data2.txt | FinalDataMatrix1.txt | A-EXML-1 | Hyb2 |
Data3.txt | FinalDataMatrix2.txt | A-EXML-2 | Hyb3 |
Data4.txt | FinalDataMatrix2.txt | A-EXML-2 | Hyb4 |
Secondly, the columns in the final data matrix are currently mapped to hybridizations, rather than scanning events. As a result, this system does not readily support experiment designs where hybridizations are scanned multiple times, e.g., for quality control purposes. To accurately reflect such treatments, please include your normalized data as one file per scan, and add a "Scan" column to explicitly specify which scan events belong to which hybridization, as in the following example:
Hybridization section | |||
Hybridization | Scan | File[raw] | File[normalized] |
Hyb1 | Scan1a | Data1a.txt | NormData1a.txt |
Hyb1 | Scan1b | Data1b.txt | NormData1b.txt |
Hyb2 | Scan2a | Data2a.txt | NormData2a.txt |
Hyb2 | Scan2b | Data2b.txt | NormData2b.txt |
Please see the MIAMExpress data file help notes (Affymetrix, two-channel) for further discussion of final data matrix files. There is also an example included with the Tab2MAGE package of how to incorporate a final data matrix into your MAGE-ML output.
For MAGE-TAB users: This Tab2MAGE/MIAMExpress final data matrix format is not supported by MAGE-TAB, which defines its own data matrix format: MAGE-TAB Data Matrix notes
[ Back ][ Top of page ]
The Illumina BeadStudio software (see www.illumina.com) generates data files in several closely related formats. Tab2MAGE only supports such files reporting Probe-level (PROBE_ID) data, exported from BeadStudio v3 and above, in tab-delimited format. These files are characterised by having PROBE_ID as the first column, with subsequent column headers following the pattern "Hybridization name.QuantitationType name".
Illumina have recently released the 'ArrayExpress Data Submission Report Plug-in' for BeadStudio which will generate a raw data file, plus a basic Tab2MAGE spreadsheet for you to fill in with your experiment and sample information. This plug-in is available from the downloads section of the Illumina iCom website.
BeadStudio files may contain data columns arranged per sample, or per group of samples (so-called "Grouped data"). In contrast to the other scanning softwares noted above, Illumina files may contain data from single or multiple hybridizations.
Sample-level data is supported in "File[raw]", "File[normalized]" or "File[transformed]" columns of the Tab2MAGE spreadsheet Hybridization section. The file name should be included in all of the rows corresponding to the hybridizations it covers. The array designator for each hybridization (found in the data file column headings) must be entered into the "Hybridization" column of the spreadsheet in each appropriate row.
Grouped data is supported only as "File[normalized]" or "File[transformed]". The column headings in these data files contain user-defined tags for each group, and these tags must be included in the "Normalization" column of the spreadsheet. This allows the Tab2MAGE software to map the grouped data back to the original hybridizations.
There is one further data file type exported by BeadStudio, which is not yet supported by Tab2MAGE: the "differential expression" file. This file contains either grouped or ungrouped data reported at the gene level, but with extra columns reporting on "DiffScore" and "Concordance" between samples or groups. Such data need to be split into multiple data cubes when coding in MAGE, and it is this capability which Tab2MAGE does not currently support.
An example of a Tab2MAGE spreadsheet describing sample data at the probe level and grouped data at the gene level can be found in the examples section.
[ Back ][ Top of page ]