ArrayExpress::MAGETAB - A parser class for MAGE-TAB documents.
use ArrayExpress::MAGETAB;
my $magetab = ArrayExpress::MAGETAB->new({
idf => $idf_file,
target_directory => $dir,
expt_accession => $accn,
});
$magetab->write_mageml();
This module acts as a front end to the MAGE-TAB parsing API. Parser objects are instantiated with a number of attributes to control how the MAGE-TAB document is parsed. Support is only provided for IDF and SDRF documents at present; it is anticipated that the parser will be extended to support ADF at a later date.
Currently the parser is built using a MAGEv1.1 object model to store the MAGE-TAB metadata. It is envisioned that this dependence on the Bio::MAGE modules may be removed once a full MAGE-TAB object model is agreed upon by the community.
To simplify the process of data submission, ArrayExpress has introduced a new flavour of MAGE-TAB in which the IDF and SDRF sections are combined into a single worksheet. This parser supports both MAGE-TAB v1.1 documents (with separate IDF and SDRF) and these combined documents.
newObject constructor. This recognises the following attributes:
idfThe path of the IDF file with which to start parsing.
magetab_docThe path of a combined IDF+SDRF file to parse.
output_fileThe name of the output MAGE-ML file.
namespaceThe namespace to use in MAGE identifier creation.
authorityThe authority to use in MAGE identifier creation.
expt_accessionThe accession number assigned to the experiment.
target_directoryThe directory into which to write the output files.
source_directoryThe directory which contains all data and SDRF files.
is_standaloneA flag indicating whether the script is able to connect to ArrayExpress to retrieve array design information. It is sometimes desirable to skip these downloads, which can be quite large.
qt_filenameQuantitationType file. This option allows you to specify a custom QuantitationType definition file to override those defined as part of the Tab2MAGE package. See the ArrayExpress::Datafile::QT_list manpage for more information.
include_default_qtsThis option can be used in conjunction with qt_filename to indicate that
the QuantitationType listing from the Tab2MAGE package itself
should be included in the lists of known QuantitationTypes used in
data file parsing. The default behaviour is to deactivate these
known QTs if a custom QT file is to be used.
keep_all_qtsA flag indicating whether unrecognised QuantitationTypes in data files should be kept or not. The default behaviour is to strip unrecognised columns out of the data files.
reporter_prefixThe prefix to be used during Reporter identifier construction. This prefix is prepended to the identifiers listed in the data files.
compseq_prefixThe prefix to be used during CompositeElement (CompositeSequence) identifier construction. This prefix is prepended to the identifiers listed in the data files.
protocol_accession_serviceA code reference used to reassign protocol accessions. See PROTOCOL_ACCESSIONS, below.
protocol_accession_prefixThe prefix to be used for protocol accession creation, when the autosubmissions system is in use. See PROTOCOL_ACCESSIONS, below.
keep_protocol_accnsA flag indicating whether the protocols in the IDF should be
assigned new accession numbers. This option overrides protocol_accession_service.
use_plain_textSome file formats are only supported in their native forms by ArrayExpress. Nonetheless, this package can parse some of these data formats into tab-delimited representations fully encoded in the MAGE-ML document (examples include Nimblegen data, Affymetrix CEL files).
skip_datafilesThis option tells the parser to skip attempting to read the data files referenced by a given MAGE-TAB document, and instead attempts to generate MAGE in their absence. This option is particularly useful for unsupported data file formats.
ignore_size_limitsThe Tab2MAGE configuration file allows the user to set maximum data file sizes for parsing and web download, to provide some protection from overloading the system in a production pipeline setting. To temporarily ignore these size limits, use this option.
in_relaxed_modeA flag indicating whether to allow minor errors during parsing.
At the moment the only errors which are ignored by this option are
Term Source REF, Protocol REF,
Parameter Value [] and Factor Value []
columns which reference Names which have not been defined in the
IDF.
clobberFlag indicating whether or not to overwrite existing files without prompting the user.
parseStarts the MAGE-TAB parse and loads the document into memory.
write_magemlWrites out MAGE-ML corresponding to the input MAGE-TAB document.
If the MAGE-TAB has not yet been parsed, parse() is called automatically.
The parser provides a set of callbacks which can be used to
assign MAGE-TAB Protocol Names to unique accessions at the point of
parsing. If the autosubmissions system has been set up and
configured, then the parser will default to using that mechanism to
assign protocol accessions. If you wish to use your own service,
you may use the protocol_accession_service
and protocol_accession_prefix
attributes to control this. The protocol_accession_service
should point to a code reference which will accept two arguments:
(a) the Protocol Name as given in the IDF, and (b) the experiment
accession. The code reference should return a unique accession
which will then be assigned to the protocol in the output
MAGE-ML.
If the autosubmissions system is to be used, the protocol_accession_prefix
attribute must be set, e.g. to ``P-MTAB-''.
Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2008.
Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.