ArrayExpress::Curator::MAGE::Definitions.pm - a module providing a central location for managing EDF column names, OntologyEntry categories and values.
use ArrayExpress::Curator::MAGE::Definitions qw($EDF_EXPTACCESSION validate_hybridization_section $OE_CAT_PROTOCOLTYPE $OE_VAL_GROW);
This module provides a central location for various ontology terms and column heading definitions. In other words, if you want to change the parsing of your EDF or the output OntologyEntries, edit this module. Also provided is a series of optional validation subroutines for checking column or row names against the expected EDF headers.
A string unique to the experiment. Used in all identifiers, and as a top-level Experiment identifier. Synonymous with the ArrayExpress accession number for an experiment. Example: E-MEXP-100
A string identifying the origin of the information provided. Typically this will be an internet domain name such as ``ebi.ac.uk''. This string is used to define the namespace of the MAGE identifiers created.
The name of the experiment. The forms the name attribute of the top-level Experiment object.
A short description of the experiment. This is inserted into a Description text attribute attached to the top-level Experiment object.
In the form YYYY-MM-DDThh:mm:ssZ. This is currently used in a NameValueType (name ArrayExpressReleaseDate) in the top-level Experiment object.
In the form YYYY-MM-DDThh:mm:ssZ. This is currently used in a NameValueType (name ArrayExpressSubmissionDate) in the top-level Experiment object.
A comma-separated list of MO ExperimentDesignType terms used in the top-level ExperimentDesign object.
A comma-separated list of MO QualityControlDescriptionType used in the top-level ExperimentDesign object.
The name of the person submitting the experiment. This is used to create a Person object in the AuditAndSecurity package with a Roles:submitter OntologyEntry. The Person object is referenced in the Experiment package.
The email address of the person submitting the experiment.
The name of the person responsible for coding the experiment into MAGE-ML. This is used to create a Person object in the AuditAndSecurity package with a Roles:data_coder OntologyEntry. The Person object is referenced in the Experiment package.
The email address of the person who coded the experiment in MAGE-ML.
The name of the person who curated the experiment. This is used to create a Person object in the AuditAndSecurity package with a Roles:curator OntologyEntry. The Person object is referenced in the Experiment package.
The email address of the person who curated the experiment.
The name of the primary investigator on the experiment. This is used to create a Person object in the AuditAndSecurity package with a Roles:curator OntologyEntry. The Person object is referenced in the Experiment package.
The email address of the primary investigator.
The name of the organisation to which the submitter is affiliated. This is used to create an Organization MAGE object, which is then associated with each Person object.
The address of the organisation to which the submitter is affiliated.
The name of the experiment to be displayed in the ArrayExpress repository interface. This will typically be added by the ArrayExpress curators after data submission.
A URI pointing to an alternative location for the experimental data, e.g. in a public repository database.
For processing of GEO experiments; this value is used to create a ``GEOReleaseDate'' NameValueType object associated with the top-level Experiment; this is used internally by the ArrayExpress database.
This value is used to populate the ``SecondaryAcession'' NameValueType object associated with the top-level Experiment. This is used internally by ArrayExpress to represent e.g. GEO accession numbers.
The following are used in a BibliographicReference associated with a second Description object in the top-level Experiment object:
The free-text title of any associated publication. This is currently assumed to have PublicationType journal_article (MO), although more control over this may be added in future. Used in
A free-text list of publication authors.
The name of the journal. This should be a standard Pubmed abbreviation.
Year of publication.
Journal volume.
Journal issue.
Page range of journal article.
Any URI associated with the publication.
The Pubmed ID associated with the publication.
The following are all attached to individual protocols defined by successive lines in this section:
Database (e.g. ArrayExpress) accession no.
Protocol name.
MO ProtocolType term; this is an optional field. In its absence, Tab2MAGE will use default ProtocolType terms based on how the protocol is used within the Hybridization section.
Protocol text.
Protocol parameters, listed in the following form: name1(unit1);name2(unit2);...
Arbitrary name for a BioSource. This term is used as a unique identifier within a Tab2MAGE run to determine correct linking between objects. It may however be omitted, in favour of using the set of BioMaterialCharacteristics associated with a BioMaterial as the sole indicator of BioSource identity.
Arbitrary name for a BioSample (associated with a BioSampleType:not_extract OntologyEntry). Used to control linking between objects; may be omitted if desired, in which case a BioSample name constructed from the raw data filename is used instead.
Arbitrary name for a BioSample (associated with a BioSampleType:extract OntologyEntry). Used to control linking between objects; may be omitted if desired, in which case a Extract name constructed from the raw data filename is used instead.
Arbitrary name for a BioSample (associated with a BioSampleType:extract OntologyEntry). Used to control linking between objects; may be omitted if desired, in which case an Immunoprecipitate name constructed from the raw data filename is used instead. These objects are used in ChIP experiments and may be ignored for expression or CGH studies.
Arbitrary name for a LabeledExtract. Used to control linking between objects; may be omitted if desired, in which case a LabeledExtract name constructed from the raw data filename and the label dye name is used instead.
Name of the dye linked to the labeled extract (e.g., Cy3, Cy5, biotin).
MO MaterialType term to be attached to the BioSource. Default: whole_organism
MO MaterialType term to be attached to the BioSample. Default: organism_part
MO MaterialType term to be attached to the Extract. Default: total_RNA
MO MaterialType term to be attached to the Immunoprecipitate. Default: genomic_DNA
MO MaterialType term to be attached to the LabeledExtract. Default: synthetic_DNA
Free-text description to attached to the BioSource (this should be used sparingly, if at all).
Arbitrary name for a hybridization. Used to control linking between objects; may be omitted if desired, in which case a Hybridization name constructed from the raw data filename is used instead.
Arbitrary name for a scanning event. Used to control linking between objects; may be omitted if desired, in which case a Scan name constructed from the raw data filename is used instead.
Arbitrary name for a normalization procedure. Used to control linking between objects; may be omitted if desired, in which case a Normalization name constructed from the normalized data filename is used instead.
MO DerivedBioAssayType term, attached to the relevant DerivedBioAssayData object.
Arbitrary name for a data transformation procedure. Used to control linking between objects; may be omitted if desired.
MO DerivedBioAssayType term, attached to the relevant DerivedBioAssayData object.
MO ImageFormat term. Only used if File[image] columns have been included in the spreadsheet. Used to create Image objects.
Accession number for the ``growth'' protocol (BioSource->BioSample Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: grow
Accession number for the ``treatment'' protocol (BioSource->BioSample Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: specified_biomaterial_action
Accession number for the ``extraction'' protocol (BioSample->Extract Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: nucleic_acid_extraction
Accession number for the ``pooling'' protocol (BioSample->Extract Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: pool
Accession number for the ``labeling'' protocol (Extract->LabeledExtract Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: labeling
Accession number for the ``immunoprecipitation'' protocol (Extract->Immunoprecipitate Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Omit if Immunoprecipitates are not used in the experiment. Default ProtocolType: immunoprecipitate
Accession number for the ``hybridization'' protocol (PhysicalBioAssay BioAssayCreation). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: hybridization
Accession number for the ``image acquisition'' protocol (PhysicalBioAssay BioAssayTreatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: image_acquisition; note however that if Protocol[image_analysis] is not specified then the scanning protocol defaults to feature_extraction. This is an ArrayExpress-specific behaviour and relates to the appearance of the experiment in the ArrayExpress web interface.
Accession number for the ``feature extraction'' protocol (MeasuredBioAssay FeatureExtraction). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: feature_extraction
Accession number for the ``normalization'' protocol (DerivedBioAssayData ProducerTransformation). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: bioassay_data_transformation
Accession number for the ``transformation'' protocol (DerivedBioAssayData ProducerTransformation). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: bioassay_data_transformation
Software is created in the Protocol package, and referenced as SoftwareApplication in the appropriate ProtocolApplication (see above).
Name of the scanning software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.
Name of the feature extraction software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.
Name of the normalization software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.
Name of the data transformation software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.
ArrayExpress array accession number (e.g., A-MEXP-1). This is used in the Array package, and also to create Feature and Reporter identifiers for the DesignElementDimensions.
Serial or lot number of the array. This is used to define ArrayManufacture objects within the Array package.
Name of the raw data file for a given hybridization.
Name of the normalized data file for a given hybridization/normalization.
Name of the transformed final data matrix file for the experiment.
URI locating the Image object associated with an ImageAcquisition (scanning) event. ArrayExpress does not store images, although we can link to images stored on external web sites, where desired.
Affymetrix array library file pertaining to a given hybridization. Required for correct parsing of Affymetrix data files, although for standard Affymetrix arrays the actual CDF file does not need to be supplied.
Affymetrix experiment description file pertaining to a given hybridization. Required for correct parsing of Affymetrix data files.
MO term describing some feature of the BioSource (e.g., Genotype, Sex, DiseaseState, etc.). As many BioMaterialCharacteristics columns may be used as are desired. Each column should contain values from a different category within the MGED Ontology.
MO term describing a FactorValue associated with a given hybridization (e.g., Genotype, Sex, DiseaseState, etc.). As many FactorValue columns may be used as are desired. Each column should contain values from a different category within the MGED Ontology.
Parameter values for the parameters declared in the Protocol section of the spreadsheet. Note that a protocol must be declared in the spreadsheet in order to be able to use parameters with it.
Each of these subroutines simply takes a reference to an array containing the list of headings to be checked, and prints warnings on STDERR (and on the optional error filehandle) if it finds unrecognized headings. No further action is taken; these subroutines merely serve as a warning to the user.
Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2004.
Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.