Tab2MAGE is a software package written and supported by the ArrayExpress curation team, which aims to ease the process of submitting large microarray experiment datasets to our public repository database. To this end, Tab2MAGE currently includes two tools, the tab2mage.pl script itself, and a data file checking script, expt_check.pl. With these scripts it is possible to perform an initial data file validation against an array design (e.g., in the form of an "Array Description File" or ADF), and then to generate MAGE-ML using these data files alongside a separate spreadsheet providing MIAME-compliant sample annotation.
[ Top of page ]
The tab2mage.pl script is currently under active development. The latest version released on SourceForge is aimed at the moderately experienced user. For the time being it is still recommended that small to medium-sized experiment data submissions be provided via the MIAMExpress web interface. The primary aim of the Tab2MAGE project is to provide an easy-to-use method for submitting large datasets (e.g., greater than around 50 hybridizations) to ArrayExpress.
This script is currently capable of parsing data files in a number of formats. These formats include tab-delimited text files such as those produced by GenePix, BlueFuse, ScanAlyze, ScanArray or Agilent software, and also the data files produced by Affymetrix scanning software. The generic MetaColumn/MetaRow format favoured by ArrayExpress is of course also supported. As of version 1.9.4, Tab2MAGE now also supports most data files produced by the Illumina BeadStudio software package.
To use the script a sample annotation spreadsheet must be provided. This experiment summary file has a flexible format based around a set of predefined column headings. Here is an example of the format of this file, which corresponds to the experiment design depicted here. The spreadsheet consists of three sections, separated by one or more blank lines: Experiment, Protocol and Hybridization.
This section contains top-level information about the experiment, such as the title, description and accession number, organized by row.
Protocols are defined as needed in this section, with one protocol per row. If all of the protocols used in the experiment have previously been loaded into ArrayExpress and given accession numbers, this whole section can be omitted.
This section contains the bulk of the experiment information. At its simplest, each row describes the route taken from BioSource to output data file. Multichannel (e.g., two-colour) data can be described by entering each channel as a separate line. Pooling can be described at multiple levels by using as many lines as necessary to describe all the relationships between upstream and downstream samples. In a sense this table can be compared to an SQL database table in the way that it provides links between MAGE objects.
Further examples of the Tab2MAGE spreadsheet format are included in the online package documentation, alongside detailed information on allowed column and row names. A set of tab2mage.pl usage notes is also included, giving examples of how to invoke the script.
[ Top of page ]
New in version 1.9.7, the Tab2MAGE package now includes a script, magetab.pl, for generating MAGE-ML from a MAGE-TAB document. Most, but not all, of the requirements from the specification have now been implemented. Please see the MAGETAB.txt file included in the package for implementation notes and a discussion of current limitations. The script is typically be invoked as follows:
magetab.pl -i <IDF file> -n <experiment accession> -t <output directory>
The MAGE-TAB parser is currently based on the version 1.0 specification for MAGE-TAB, available for download from the EBI website. See these MAGE-TAB help notes for information on submitting MAGE-TAB documents to ArrayExpress.
See also the Additional packages section for information on converting from Tab2MAGE to MAGE-TAB format.
[ Top of page ]
The expt_check.pl script can be used to validate microarray experiment data files and associated MIAME metadata for common errors. The script can be used in any of four different modes, depending on your needs: Tab2MAGE, MAGE-TAB, MIAMExpress or full stand-alone mode. Full documentation is available; in summary, the four forms are shown here:
expt_check.pl -e <Tab2MAGE spreadsheet file>
expt_check.pl -i <MAGE-TAB IDF file>
Unless the "-s" ("stand-alone") option is used, the script will attempt to connect to the ArrayExpress database to retrieve information on the array design(s) used in the experiment. The array information is used to check the data files for consistency. This behaviour may also be supressed by supplying the name of an ADF file to the script, using the -a option (see below).
MIAMExpress mode (local installations only)
expt_check.pl -l <login username> -t <experiment title>
Note that if you wish to use the expt_check.pl script with a local installation of MIAMExpress, you will need to edit the database connection parameters in the included ArrayExpress::Curator::Config module.
Full stand-alone mode
expt_check.pl -s -a <ADF file> <list of data files>
If an ADF is supplied to the script in stand-alone mode, the data files will be checked for features missing from that ADF (please see the MIAMExpress ADF help notes for information on the ADF format). A full list of the tests performed by the experiment checker script is available in the documentation.
[ Top of page ]
There are several additional packages related to Tab2MAGE which are available for download from the SourceForge project pages:
A popular download, this MAGE-ML visualization script developed by Anna Farne at the EBI will read any MAGE-ML file and generate a graph showing the links between BioMaterials all the way through to BioAssays, and include information on sample characteristics and experimental factor values in the output. Here is an example of a MAGE-ML Visualize graph.
Alternatively, the expt_check.pl script may be used to visualize either Tab2MAGE or MAGE-TAB encoded experiments. A command-line option, "-x" has been provided for omission of data file checking, allowing for quick visualization of the graph contained in a spreadsheet. The output from this script is a more minimalistic version of that given by MAGE-ML Visualize (expt_check.pl graph example).
Both these scripts use the Graphviz software package to generate the graph output. Please see the Graphviz website for installation instructions.
The ADF checker script, also developed by Anna Farne, can be used to validate the format and content of an Array Description Format file (used to describe an array design in tabular format). An online version of this tool is also available on the EBI website.
Tab2MAGE to MAGE-TAB converter
Faisal Ibne Rezwan at the EBI has written this script to convert Tab2MAGE spreadsheets to MAGE-TAB IDF and SDRF component documents.
Faisal Ibne Rezwan has also written this package in collaboration with Margus Lukk (also at the EBI) to import experiments from the NCBI GEO database into Tab2MAGE format. These scripts are being used to facilitate data transfer between GEO and ArrayExpress. At the time of writing, over 250 experiments have been processed using this package. Currently only Affymetrix-based experiments are supported, but we hope to add support for other platforms shortly.
A MAGE-TAB post-hoc ontology matching utility. Xiangqun Zheng Bradley at the EBI has created this package to match sample annotation terms from an input MAGE-TAB document to an ontology supplied in either OBO or flat-file format. This script generates an output MAGE-TAB document containing the appropriate "Term Source REF" and "Term Source Accession" values.
The mage2tab project is maintained by Junmin Liu at CBIL. Its aim is to provide a simple means to convert MAGE-ML documents to MAGE-TAB format. Further information on mage2tab is available on the CBIL wiki page.
[ Top of page ]