magetab.pl - a script to generate valid MAGE-ML from an input MAGE-TAB document.
magetab.pl -i <IDF filename> -n <experiment accession> -t <target directory>
This script will process a MAGE-TAB document and a set of data files to generate valid MAGE-ML. The supported data file formats are listed in the accompanying documentation. The MAGE-TAB format specification may be downloaded from this link:
http://www.ebi.ac.uk/systems-srv/mp/file-exchange/MAGE-TABv1.0.tar.gz
There is one auxilliary file which can be supplied alongside the data files. This file simply defines the QuantitationTypes which are to be extracted from the data files. This QuantitationType file is optional, however, as the script is supplied with a set of defaults determined by a survey of incoming QuantitationTypes by ArrayExpress curators. The script generates a log file in the target directory which details which columns have been ignored. To keep all unrecognized QuantitationTypes, invoke the script with the -k option.
IDF
The MAGE-TAB IDF file to be parsed.
accession
The experiment accession to be used as the top-level Experiment identifier in the generated MAGE-ML.
directory
The target directory to be created. This directory will contain the MAGE-ML file and external data files ready for validation.
QT
filename
QuantitationType file. This option allows you to specify a custom QuantitationType definition file to override those defined as part of the Tab2MAGE package. See the ArrayExpress::Datafile::QT_list manpage for more information.
QT
filename
QuantitationType file. This option will add the new QuantitationType definitions to those included with the Tab2MAGE package. See the ArrayExpress::Datafile::QT_list manpage for more information.
Keep all columns in the data files, regardless of whether they are recognized or not. Unrecognized QTs will be created as generic SpecializedQuantitationTypes in the output MAGE-ML.
If the autosubmissions system is configured, magetab.pl will automatically reassign protocol accessions to fit a local convention. Use the -K option to suppress this behaviour.
Standalone option. This prevents the script from attempting to connect to ArrayExpress to retrieve array information.
Skip data file processing during MAGE-ML generation. The script will attempt to generate MAGE-ML for the metadata contained in the IDF and SDRF only.
namespace
Reporter identifier prefix. By default the script uses the MIAMExpress convention for generating reporter identifiers. This option allows you to override this behaviour by supplying an alternate prefix for identifiers.
namespace
CompositeSequence identifier prefix. By default the script uses the MIAMExpress convention for generating composite sequence identifiers. This option allows you to override this behaviour by supplying an alternate prefix for identifiers.
directory
Source directory containing all the data files referenced in the SDRF. If this is omitted, the current working directory will be searched for data files.
By default, native (usually binary) file formats are used for Affymetrix CEL files and all NimbleScan (NimbleGen) files. This encoding uses far less overhead and retains the files in their original formats; this is often appealing to end-users. When used for ArrayExpress submissions, this option allows for such files to be directly downloadable from the ArrayExpress web interface. However, in unusual circumstances when you might wish to use plain-text encoding for these data files, use this option.
Ignore the data file size limit as configured in Config.yml (i.e., MAX_DATAFILE_SIZE).
The parser will support MAGE-TAB documents in which a single IDF and SDRF have been combined (in that order), with the start of each section marked by [IDF] and [SDRF] respectively. Note that such documents are not compliant with the MAGE-TAB format specification; this format is used by ArrayExpress to simplify data submissions.
font
name
Name of the font to be used for Graphviz-generated PNGs.
Overwrite preexisting files (``clobber'' option).
Prints the version number of the script.
Print a short help text summarizing these options.
Please see the ArrayExpress::Datafile::QT_list manpage for a description of the format of this file.
Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2007.
Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.