Data file support in Tab2MAGE can be divided into two categories: Affymetrix data files, and everything else. Non-Affymetrix data must be supplied as plain ASCII tab-delimited text, and the column headings must be left intact. Ideally, you should be able to use the raw unprocessed data files from any of the supported software types without having to edit the data files in any way. Tab2MAGE uses the column headings within the file to identify which kind of file it is dealing with, and what the quantitation types are. The script will reformat recognized file types into MetaColumn-MetaRow format, and then strip the feature coordinates (or reporter identifiers) and the column headings, in the process generating the necessary MAGE objects to describe the data.
The following formats are supported:
The following list gives a brief overview of how Tab2MAGE recognizes different file formats. In each case, the data file row containing the column headings is identified by matching it to these sets of known column headings.
Tab2MAGE recognizes and parses CEL and EXP files using both the old GDAC formats and the newer GCOS/XDA formats. These file formats are detected using the Affymetrix data file parser incorporated into the Tab2MAGE package. See below for notes on Affymetrix normalized data file formats.
GenePix format files are recognized using the following column headings:
A file containing these headings is recognized as Agilent format file:
The following column headings are recognized as being from a ScanAlyze format file:
ScanArray Express files are recognized from the following headings:
|Array Column||Array Row||Spot Column||Spot Row||X||Y|
while the older QuantArray format has these headings:
|Array Column||Array Row||Column||Row|
The following column headings are recognized as indicating an ArrayVision format file:
Newer "lg2" ArrayVision files are identified by the following column headings:
Spotfinder files are recognized by the following column headings:
A file containing the following headings is recognized as a BlueFuse file:
UCSF Spot files are recognized by the following column headings:
NimbleScan files (Feature, Probe and Pair) all contain the following headings:
Files generated by Applied Biosystems software have the following headings:
CodeLink Expression Analysis files are identified using the following:
ImaGene files are recognized using the following columns:
|Meta Column||Meta Row||Column||Row||Field||Gene ID|
The ImaGene 3.0 format is also supported:
CSIRO Spot files contain the following columns:
Obviously, this method of determining which file type is being processed is not infallible. You are therefore encouraged to test your data files with Tab2MAGE and report any problems to .
Normalized data files may be submitted in any of the above formats. In addition, files may be parsed using a number of special column headings which can be used to designate a column containing reporter or composite sequence identifiers:
If you have normalized data mapped to the identifiers used in your array design, you can simply use a single column containing those identifiers to include your data in the final MAGE-ML. Tab2MAGE supports the use of either Reporter Identifiers or CompositeSequence Identifiers for this purpose. Please see these ADF help notes for a discussion on these identifier types. Thus, either of the following sets of column headers may be used:
where <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).
Tab2MAGE recognizes and parses CHP files using both the old GDAC formats and the newer GCOS/XDA formats. In addition, Affymetrix data normalized by non-Affymetrix methods (e.g. RMA normalization) can be parsed. Either CompositeSequence identifiers (above) or either of the following sets of column headers may be used:
Again, <QT1>, <QT2> etc. are the names of your quantitation types (see the usage notes for more information on including novel quantitation types).
(Tab2MAGE only) Often it is desirable to store the normalized data from all your hybridizations in a single file. This file must then contain information on the quantitation types measured (log ratio, RMA values etc.) and the hybridizations to which these values apply. This information must be encoded in the data file column headings, for example:
Any of the identifier headings described in the normalized data section can be used in the final data matrix file. Please note that the identifier column must be the first column in the file.
In addition, for Tab2MAGE to correctly construct the mapping the hybridization names must also be included in the annotation spreadsheet, using the "Hybridization" column (in the Hybridization section):
Please note: There are currently some limitations imposed by Tab2MAGE when parsing final data matrix files. Firstly, each file must correspond to a single array design. In experiments where multiple array designs have been used, separate final data matrix files should be included for each. For example:
Secondly, the columns in the final data matrix are currently mapped to hybridizations, rather than scanning events. As a result, this system does not readily support experiment designs where hybridizations are scanned multiple times, e.g., for quality control purposes. To accurately reflect such treatments, please include your normalized data as one file per scan, and add a "Scan" column to explicitly specify which scan events belong to which hybridization, as in the following example:
Please see the MIAMExpress data file help notes (Affymetrix, two-channel) for further discussion of final data matrix files. There is also an example included with the Tab2MAGE package of how to incorporate a final data matrix into your MAGE-ML output.
For MAGE-TAB users: This Tab2MAGE/MIAMExpress final data matrix format is not supported by MAGE-TAB, which defines its own data matrix format: MAGE-TAB Data Matrix notes
The Illumina BeadStudio software (see www.illumina.com) generates data files in several closely related formats. Tab2MAGE only supports such files reporting Probe-level (PROBE_ID) data, exported from BeadStudio v3 and above, in tab-delimited format. These files are characterised by having PROBE_ID as the first column, with subsequent column headers following the pattern "Hybridization name.QuantitationType name".
Illumina have recently released the 'ArrayExpress Data Submission Report Plug-in' for BeadStudio which will generate a raw data file, plus a basic Tab2MAGE spreadsheet for you to fill in with your experiment and sample information. This plug-in is available from the downloads section of the Illumina iCom website.
BeadStudio files may contain data columns arranged per sample, or per group of samples (so-called "Grouped data"). In contrast to the other scanning softwares noted above, Illumina files may contain data from single or multiple hybridizations.
Sample-level data is supported in "File[raw]", "File[normalized]" or "File[transformed]" columns of the Tab2MAGE spreadsheet Hybridization section. The file name should be included in all of the rows corresponding to the hybridizations it covers. The array designator for each hybridization (found in the data file column headings) must be entered into the "Hybridization" column of the spreadsheet in each appropriate row.
Grouped data is supported only as "File[normalized]" or "File[transformed]". The column headings in these data files contain user-defined tags for each group, and these tags must be included in the "Normalization" column of the spreadsheet. This allows the Tab2MAGE software to map the grouped data back to the original hybridizations.
There is one further data file type exported by BeadStudio, which is not yet supported by Tab2MAGE: the "differential expression" file. This file contains either grouped or ungrouped data reported at the gene level, but with extra columns reporting on "DiffScore" and "Concordance" between samples or groups. Such data need to be split into multiple data cubes when coding in MAGE, and it is this capability which Tab2MAGE does not currently support.
An example of a Tab2MAGE spreadsheet describing sample data at the probe level and grouped data at the gene level can be found in the examples section.