ArrayExpress::MAGETAB - A parser class for MAGE-TAB documents.
use ArrayExpress::MAGETAB; my $magetab = ArrayExpress::MAGETAB->new({ idf => $idf_file, target_directory => $dir, expt_accession => $accn, });
$magetab->write_mageml();
This module acts as a front end to the MAGE-TAB parsing API. Parser objects are instantiated with a number of attributes to control how the MAGE-TAB document is parsed. Support is only provided for IDF and SDRF documents at present; it is anticipated that the parser will be extended to support ADF at a later date.
Currently the parser is built using a MAGEv1.1 object model to store the MAGE-TAB metadata. It is envisioned that this dependence on the Bio::MAGE modules may be removed once a full MAGE-TAB object model is agreed upon by the community.
To simplify the process of data submission, ArrayExpress has introduced a new flavour of MAGE-TAB in which the IDF and SDRF sections are combined into a single worksheet. This parser supports both MAGE-TAB v1.1 documents (with separate IDF and SDRF) and these combined documents.
new
Object constructor. This recognises the following attributes:
idf
The path of the IDF file with which to start parsing.
magetab_doc
The path of a combined IDF+SDRF file to parse.
output_file
The name of the output MAGE-ML file.
namespace
The namespace to use in MAGE identifier creation.
authority
The authority to use in MAGE identifier creation.
expt_accession
The accession number assigned to the experiment.
target_directory
The directory into which to write the output files.
source_directory
The directory which contains all data and SDRF files.
is_standalone
A flag indicating whether the script is able to connect to ArrayExpress to retrieve array design information. It is sometimes desirable to skip these downloads, which can be quite large.
qt_filename
QuantitationType file. This option allows you to specify a custom QuantitationType definition file to override those defined as part of the Tab2MAGE package. See the ArrayExpress::Datafile::QT_list manpage for more information.
include_default_qts
This option can be used in conjunction with qt_filename
to indicate that
the QuantitationType listing from the Tab2MAGE package itself
should be included in the lists of known QuantitationTypes used in
data file parsing. The default behaviour is to deactivate these
known QTs if a custom QT file is to be used.
keep_all_qts
A flag indicating whether unrecognised QuantitationTypes in data files should be kept or not. The default behaviour is to strip unrecognised columns out of the data files.
reporter_prefix
The prefix to be used during Reporter identifier construction. This prefix is prepended to the identifiers listed in the data files.
compseq_prefix
The prefix to be used during CompositeElement (CompositeSequence) identifier construction. This prefix is prepended to the identifiers listed in the data files.
protocol_accession_service
A code reference used to reassign protocol accessions. See PROTOCOL_ACCESSIONS, below.
protocol_accession_prefix
The prefix to be used for protocol accession creation, when the autosubmissions system is in use. See PROTOCOL_ACCESSIONS, below.
keep_protocol_accns
A flag indicating whether the protocols in the IDF should be
assigned new accession numbers. This option overrides protocol_accession_service
.
use_plain_text
Some file formats are only supported in their native forms by ArrayExpress. Nonetheless, this package can parse some of these data formats into tab-delimited representations fully encoded in the MAGE-ML document (examples include Nimblegen data, Affymetrix CEL files).
skip_datafiles
This option tells the parser to skip attempting to read the data files referenced by a given MAGE-TAB document, and instead attempts to generate MAGE in their absence. This option is particularly useful for unsupported data file formats.
ignore_size_limits
The Tab2MAGE configuration file allows the user to set maximum data file sizes for parsing and web download, to provide some protection from overloading the system in a production pipeline setting. To temporarily ignore these size limits, use this option.
in_relaxed_mode
A flag indicating whether to allow minor errors during parsing.
At the moment the only errors which are ignored by this option are
Term Source REF
, Protocol REF
,
Parameter Value []
and Factor Value []
columns which reference Names which have not been defined in the
IDF.
clobber
Flag indicating whether or not to overwrite existing files without prompting the user.
parse
Starts the MAGE-TAB parse and loads the document into memory.
write_mageml
Writes out MAGE-ML corresponding to the input MAGE-TAB document.
If the MAGE-TAB has not yet been parsed, parse()
is called automatically.
The parser provides a set of callbacks which can be used to
assign MAGE-TAB Protocol Names to unique accessions at the point of
parsing. If the autosubmissions system has been set up and
configured, then the parser will default to using that mechanism to
assign protocol accessions. If you wish to use your own service,
you may use the protocol_accession_service
and protocol_accession_prefix
attributes to control this. The protocol_accession_service
should point to a code reference which will accept two arguments:
(a) the Protocol Name as given in the IDF, and (b) the experiment
accession. The code reference should return a unique accession
which will then be assigned to the protocol in the output
MAGE-ML.
If the autosubmissions system is to be used, the protocol_accession_prefix
attribute must be set, e.g. to ``P-MTAB-''.
Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2008.
Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.