ArrayExpress::ArrayMAGE.pm - an OO module providing methods for writing MAGE-ML for array designs.
# Import all the ArrayMAGE classes. use ArrayExpress::ArrayMAGE qw(:ALL); # Instantiate the objects we require. my @classes = qw( Feature Reporter BioSequence FeatureReporterMap ArrayDesign Zone ); my %object; foreach my $type (@classes) { if ( $type eq 'ArrayDesign' ) { # ArrayDesign requires an accession. $object{$type} = "ArrayExpress::ArrayMAGE::$type"->new({ accession => $acc, }); } else { $object{$type} = "ArrayExpress::ArrayMAGE::$type"->new(); } } # ReporterGroup and CompositeGroup are typically handled # slightly differently, because multiple groups may be # instantiated during a run (see below). my ( %reporter_group, %composite_group ); # Add objects, e.g. while scanning through an ADF or CDF file. foreach my $row_hashref ( @list_of_lines ) { $object{Feature}->add({ metacolumn => $row->{'MetaColumn'}, metarow => $row->{'MetaRow'}, column => $row->{'Column'}, row => $row->{'Row'}, }); # ReporterGroups and CompositeGroups are handled in similar # ways (only ReporterGroup is shown here): while ( my ( $rname, $rvalue ) = each %$row ) { if ( my ($type) = ( $rname =~ m!ReporterGroup\[(.*?)\]!i ) ) { my $key = "$type.$rvalue"; $reporter_group{$key} ||= ArrayExpress::ArrayMAGE::ReporterGroup->new({ identifier => $key, tag => $rvalue, is_species => ( $type =~ m!species!i ) || 0, }); $reporter_group{$key}->add({ identifier => $row->{'Reporter Identifier'}, }); } } } # Write out the MAGE-ML to STDOUT. ArrayExpress::ArrayMAGE->combine_parts( \%objects, \%reporter_group, \%composite_group, \*STDOUT, );
This is a module designed to allow the creation of MAGE-ML for
array designs in a memory-efficient fashion. This module is the
abstract superclass for a series of MAGE-like classes representing
parts of the MAGE-ML document to be written. These classes, when
instantiated, all point to temporary filehandles into which is
written the XML describing MAGE object instances. Each subclass
implements an 'add' method in addition to those inherited from this
base class. Instantiating the object with new()
opens up the XML tags for a list
of objects, the 'add' adds objects based on the values it is passed
(no surprise there) and finally the XML tags are closed and the
MAGE-ML written out by calling the class method combine_parts()
in this
top-level class.
combine_parts( \%obj, \%rg, \%cg, $fh
)
Class method. Takes: a hashref of ArrayMAGE objects, a hashref of ReporterGroups, a hashref of CompositeGroups, and an open output filehandle, and writes the completed MAGE-ML to that filehandle.
set_namespace("namespace")
Class method. Allows the user to set the identifier namespace for the output MAGE objects.
set_design("design")
Class method. Allows the user to set the design name used in identifiers for the output MAGE objects. Typically this will be the same as the array accession, but e.g. for Affymetrix arrays will correspond to the design name.
set_separator("separator")
Class method. Allows the user to set the identifier separator for the output MAGE objects. Usually this will be q{:} or q{.}.
set_programname("programname")
Class method. Allows the user to set the name of the program (this is not particularly useful).
new()
required attribute: accession
The accession attribute becomes the MAGE PhysicalArrayDesign identifier.
add()
is
not implemented for this class.new()
requires no
attributes.add()
required attribute:
identifier
This identifier is combined with the class namespace to create the BioSequence identifier.
add()
optional attribute:
dbrefs
A hashref with db accessions as keys, and db tags as values
(e.g. { A12345 => 'embl' }
). Typically used for
Reporter BioSequence Database Entry ADF columns.
new()
required attribute:
identifier
This identifier is combined with the class namespace to create a CompositeGroup identifier.
new()
required attribute:
tag
This tag is used as the CompositeGroup name. If
is_species
is set, then it will also be used in a
Species_assn element as the species name.
new()
optional attribute:
is_species
A flag indicating whether this is a grouping based on species or not.
add()
required attribute:
id_ref
This identifier is combined with the class namespace to create a CompositeSequence_ref identifier.
new()
requires no
attributes.add()
required attribute:
identifier
This identifier is combined with the class namespace to create the CompositeSequence identifier.
add()
optional attribute:
name
The name of the CompositeSequence.
add()
optional attribute:
biosequences
An arrayref of BioSequence identifiers to be attached to this CompositeSequence.
add()
optional attribute:
dbrefs
A hashref with db accessions as keys, and db tags as values
(e.g. { A12345 => 'embl' }
). Typically used for
CompositeSequence Description Database Entry ADF columns.
add()
optional attribute:
controltype
A ControlType ontology term to be attached to this CompositeSequence.
add()
optional attribute:
comment
A comment to be added to this CompositeSequence as a Description_assn.
add()
optional attribute:
cs_only
A flag indicating that we are not creating Reporters (and can therefore skip ReporterCompositeMap creation).
new()
requires no
attributes.add()
required attribute:
tag
This tag is combined with the class namespace to create the
Database identifier. Typically this will be a standard database
tag, e.g. embl
.
new()
requires no
attributes.add()
required attribute:
metacolumn
add()
required attribute:
metarow
add()
required attribute:
column
add()
required attribute:
row
Feature coordinates, used in the Feature and Zone_ref identifiers and in FeatureLocation.
new()
requires no
attributes.add()
required attribute:
identifier
The Reporter identifier to which Features should be mapped.
add()
required attribute:
features
An arrayref of Feature identifiers mapped to the Reporter.
add()
optional attribute:
mismatch_info
A hashref of hashrefs, keyed by Feature identifier and then by
start_coord
, new_sequence
and
replaced_length
, used to create a
MismatchInformation_assnlist (useful for Affymetrix array
designs).
new()
requires no
attributes.add()
required attribute:
identifier
This identifier is combined with the class namespace to create the Reporter identifier.
add()
optional attribute:
name
The name of the Reporter.
add()
optional attribute:
biosequences
An arrayref of BioSequence identifiers to be attached to this Reporter.
add()
optional attribute:
dbrefs
A hashref with db accessions as keys, and db tags as values
(e.g. { A12345 => 'embl' }
). Typically used for
Reporter Description Database Entry ADF columns.
add()
optional attribute:
controltype
A ControlType ontology term to be attached to this Reporter.
add()
optional attribute:
warningtype
A WarningType ontology term to be attached to this Reporter.
add()
optional attribute:
failtype
A FailType ontology term to be attached to this Reporter.
add()
optional attribute:
comment
A comment to be added to this Reporter as a Description_assn.
new()
requires no
attributes.add()
required attribute:
identifier
The CompositeSequence identifier to which Reporters should be mapped.
add()
required attribute:
reporters
An arrayref of Reporter identifiers mapped to the CompositeSequence.
new()
required attribute:
identifier
This identifier is combined with the class namespace to create a ReporterGroup identifier.
new()
required attribute:
tag
This tag is used as the ReporterGroup name. If
is_species
is set, then it will also be used in a
Species_assn element as the species name.
new()
optional attribute:
is_species
A flag indicating whether this is a grouping based on species or not.
add()
required attribute:
id_ref
This identifier is combined with the class namespace to create a Reporter_ref identifier.
new()
requires no
attributes.add()
required attribute:
column
add()
required attribute:
row
Zone coordinates, used in the Zone identifier and in its row and column attributes.
Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2008.
Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.