|
Tab2MAGE is a software package written and supported by the ArrayExpress curation team, which aims to ease the process of submitting large microarray experiment datasets to our public repository database. To this end, Tab2MAGE currently includes two tools, the tab2mage.pl script itself, and a data file checking script, expt_check.pl. With these scripts it is possible to perform an initial data file validation against an array design (e.g., in the form of an "Array Description File" or ADF), and then to generate MAGE-ML using these data files alongside a separate spreadsheet providing MIAME-compliant sample annotation.
As of version 1.9.9, the package also includes tools to validate MAGE-TAB documents and convert them into MAGE-ML.
[ Top of page ]
The tab2mage.pl script is currently under active development. The latest version released on SourceForge is aimed at the moderately experienced user. For the time being it is still recommended that small to medium-sized experiment data submissions be provided via the MIAMExpress web interface. The primary aim of the Tab2MAGE project is to provide an easy-to-use method for submitting large datasets (e.g., greater than around 50 hybridizations) to ArrayExpress.
This script is currently capable of parsing data files in a number of formats. These formats include tab-delimited text files such as those produced by GenePix, BlueFuse, ScanAlyze, ScanArray or Agilent software, and also the data files produced by Affymetrix scanning software. The generic MetaColumn/MetaRow format favoured by ArrayExpress is of course also supported. As of version 1.9.4, Tab2MAGE now also supports most data files produced by the Illumina BeadStudio software package.
To use the script a sample annotation spreadsheet must be provided. This experiment summary file has a flexible format based around a set of predefined column headings. Here is an example of the format of this file, which corresponds to the experiment design depicted here. The spreadsheet consists of three sections, separated by one or more blank lines: Experiment, Protocol and Hybridization.
Experiment
section
This section contains top-level information
about the experiment, such as the title, description and
accession number, organized by row.
Protocol
section
Protocols are defined as needed in this
section, with one protocol per row. If all of the protocols used
in the experiment have previously been loaded into ArrayExpress
and given accession numbers, this whole section can be
omitted.
Hybridization
section
This section contains the bulk of the
experiment information. At its simplest, each row describes the
route taken from BioSource to output data file. Multichannel
(e.g., two-colour) data can be described by entering each
channel as a separate line. Pooling can be described at multiple
levels by using as many lines as necessary to describe all the
relationships between upstream and downstream samples. In a
sense this table can be compared to an SQL database table in the
way that it provides links between MAGE objects.
Further examples of the Tab2MAGE spreadsheet format are included in the online package documentation, alongside detailed information on allowed column and row names. A set of tab2mage.pl usage notes is also included, giving examples of how to invoke the script.
[ Top of page ]
New in version 1.9.7, the Tab2MAGE package now includes a script, magetab.pl, for generating MAGE-ML from a MAGE-TAB document. Most, but not all, of the requirements from the specification have now been implemented. Please see the MAGETAB.txt file included in the package for implementation notes and a discussion of current limitations. The script is typically be invoked as follows:
magetab.pl -i <IDF file> -n <experiment accession> -t <output directory>
The MAGE-TAB parser is currently based on the version 1.0 specification for MAGE-TAB, available for download from the EBI website. See these MAGE-TAB help notes for information on submitting MAGE-TAB documents to ArrayExpress.
See also the Additional packages section for information on converting from Tab2MAGE to MAGE-TAB format.
[ Top of page ]
The expt_check.pl script can be used to validate microarray experiment data files and associated MIAME metadata for common errors. The script can be used in any of four different modes, depending on your needs: Tab2MAGE, MAGE-TAB, MIAMExpress or full stand-alone mode. Full documentation is available; in summary, the four forms are shown here:
Tab2MAGE mode
expt_check.pl -e <Tab2MAGE spreadsheet file>
MAGE-TAB mode
expt_check.pl -i <MAGE-TAB IDF file>
Unless the "-s" ("stand-alone") option is used, the script will attempt to connect to the ArrayExpress database to retrieve information on the array design(s) used in the experiment. The array information is used to check the data files for consistency. This behaviour may also be supressed by supplying the name of an ADF file to the script, using the -a option (see below).
MIAMExpress mode (local installations only)
expt_check.pl -l <login username> -t <experiment title>
Note that if you wish to use the expt_check.pl script with a local installation of MIAMExpress, you will need to edit the database connection parameters in the included ArrayExpress::Curator::Config module.
Full stand-alone mode
expt_check.pl -s -a <ADF file> <list of data files>
If an ADF is supplied to the script in stand-alone mode, the data files will be checked for features missing from that ADF (please see the MIAMExpress ADF help notes for information on the ADF format). A full list of the tests performed by the experiment checker script is available in the documentation.
[ Top of page ]
There are several additional packages related to Tab2MAGE which are available for download from the SourceForge project pages:
MAGE-ML Visualize
A popular download,
this MAGE-ML visualization script developed by Anna Farne at the
EBI will read any MAGE-ML file and generate a graph showing the
links between BioMaterials all the way through to BioAssays, and
include information on sample characteristics and experimental
factor values in the output. Here is an
example of a MAGE-ML Visualize graph.
Alternatively, the expt_check.pl script may be used to visualize either Tab2MAGE or MAGE-TAB encoded experiments. A command-line option, "-x" has been provided for omission of data file checking, allowing for quick visualization of the graph contained in a spreadsheet. The output from this script is a more minimalistic version of that given by MAGE-ML Visualize (expt_check.pl graph example).
Both these scripts use the Graphviz software package to generate the graph output. Please see the Graphviz website for installation instructions.
ADF Checker
The ADF checker script, also
developed by Anna Farne, can be used to validate the format and
content of an Array Description Format
file (used to describe an array design in tabular
format). An online version of this tool is also available on the
EBI
website.
Tab2MAGE to MAGE-TAB converter
Faisal
Ibne Rezwan at the EBI has written this script to convert
Tab2MAGE spreadsheets to MAGE-TAB IDF and SDRF component
documents.
GEOImport
Faisal Ibne Rezwan has also
written this package in collaboration with Margus Lukk (also at
the EBI) to import experiments from the NCBI GEO database
into Tab2MAGE format. These scripts are being used to facilitate
data transfer between GEO and ArrayExpress. At the time of
writing, over 250 experiments have been processed using this
package. Currently only Affymetrix-based experiments are
supported, but we hope to add support for other platforms
shortly.
OE2magetab
A MAGE-TAB post-hoc ontology matching
utility. Xiangqun Zheng Bradley at the EBI has created this
package to match sample annotation terms from an input MAGE-TAB
document to an ontology supplied in either OBO or flat-file
format. This script generates an output MAGE-TAB document
containing the appropriate "Term Source REF" and "Term Source
Accession" values.
mage2tab
The mage2tab project is
maintained by Junmin Liu at CBIL. Its aim is to
provide a simple means to convert MAGE-ML documents to MAGE-TAB
format. Further information on mage2tab is available on the CBIL
wiki page.
[ Top of page ]