Tab2MAGE logo Module detail: Definitions.pm

NAME

ArrayExpress::Curator::MAGE::Definitions.pm - a module providing a central location for managing EDF column names, OntologyEntry categories and values.


SYNOPSIS

 use ArrayExpress::Curator::MAGE::Definitions 
 qw($EDF_EXPTACCESSION 
    validate_hybridization_section
    $OE_CAT_PROTOCOLTYPE 
    $OE_VAL_GROW);

DESCRIPTION

This module provides a central location for various ontology terms and column heading definitions. In other words, if you want to change the parsing of your EDF or the output OntologyEntries, edit this module. Also provided is a series of optional validation subroutines for checking column or row names against the expected EDF headers.


SUPPORTED HEADINGS

Experiment section: row names

accession

A string unique to the experiment. Used in all identifiers, and as a top-level Experiment identifier. Synonymous with the ArrayExpress accession number for an experiment. Example: E-MEXP-100

domain

A string identifying the origin of the information provided. Typically this will be an internet domain name such as ``ebi.ac.uk''. This string is used to define the namespace of the MAGE identifiers created.

name

The name of the experiment. The forms the name attribute of the top-level Experiment object.

description

A short description of the experiment. This is inserted into a Description text attribute attached to the top-level Experiment object.

release_date

In the form YYYY-MM-DDThh:mm:ssZ. This is currently used in a NameValueType (name ArrayExpressReleaseDate) in the top-level Experiment object.

submission_date

In the form YYYY-MM-DDThh:mm:ssZ. This is currently used in a NameValueType (name ArrayExpressSubmissionDate) in the top-level Experiment object.

experiment_design_type

A comma-separated list of MO ExperimentDesignType terms used in the top-level ExperimentDesign object.

quality_control

A comma-separated list of MO QualityControlDescriptionType used in the top-level ExperimentDesign object.

submitter

The name of the person submitting the experiment. This is used to create a Person object in the AuditAndSecurity package with a Roles:submitter OntologyEntry. The Person object is referenced in the Experiment package.

submitter_email

The email address of the person submitting the experiment.

data_coder

The name of the person responsible for coding the experiment into MAGE-ML. This is used to create a Person object in the AuditAndSecurity package with a Roles:data_coder OntologyEntry. The Person object is referenced in the Experiment package.

data_coder_email

The email address of the person who coded the experiment in MAGE-ML.

curator

The name of the person who curated the experiment. This is used to create a Person object in the AuditAndSecurity package with a Roles:curator OntologyEntry. The Person object is referenced in the Experiment package.

curator_email

The email address of the person who curated the experiment.

investigator

The name of the primary investigator on the experiment. This is used to create a Person object in the AuditAndSecurity package with a Roles:curator OntologyEntry. The Person object is referenced in the Experiment package.

investigator_email

The email address of the primary investigator.

organization

The name of the organisation to which the submitter is affiliated. This is used to create an Organization MAGE object, which is then associated with each Person object.

address

The address of the organisation to which the submitter is affiliated.

ae_display_name

The name of the experiment to be displayed in the ArrayExpress repository interface. This will typically be added by the ArrayExpress curators after data submission.

URI

A URI pointing to an alternative location for the experimental data, e.g. in a public repository database.

geo_release_date

For processing of GEO experiments; this value is used to create a ``GEOReleaseDate'' NameValueType object associated with the top-level Experiment; this is used internally by the ArrayExpress database.

secondary_accession

This value is used to populate the ``SecondaryAcession'' NameValueType object associated with the top-level Experiment. This is used internally by ArrayExpress to represent e.g. GEO accession numbers.

The following are used in a BibliographicReference associated with a second Description object in the top-level Experiment object:

publication_title

The free-text title of any associated publication. This is currently assumed to have PublicationType journal_article (MO), although more control over this may be added in future. Used in

authors

A free-text list of publication authors.

journal

The name of the journal. This should be a standard Pubmed abbreviation.

year

Year of publication.

volume

Journal volume.

issue

Journal issue.

pages

Page range of journal article.

publication_URI

Any URI associated with the publication.

pubmed_id

The Pubmed ID associated with the publication.

Protocol section: column names

The following are all attached to individual protocols defined by successive lines in this section:

accession

Database (e.g. ArrayExpress) accession no.

name

Protocol name.

type

MO ProtocolType term; this is an optional field. In its absence, Tab2MAGE will use default ProtocolType terms based on how the protocol is used within the Hybridization section.

text

Protocol text.

parameters

Protocol parameters, listed in the following form: name1(unit1);name2(unit2);...

Hybridization section: column names

BioSource

Arbitrary name for a BioSource. This term is used as a unique identifier within a Tab2MAGE run to determine correct linking between objects. It may however be omitted, in favour of using the set of BioMaterialCharacteristics associated with a BioMaterial as the sole indicator of BioSource identity.

Sample

Arbitrary name for a BioSample (associated with a BioSampleType:not_extract OntologyEntry). Used to control linking between objects; may be omitted if desired, in which case a BioSample name constructed from the raw data filename is used instead.

Extract

Arbitrary name for a BioSample (associated with a BioSampleType:extract OntologyEntry). Used to control linking between objects; may be omitted if desired, in which case a Extract name constructed from the raw data filename is used instead.

Immunoprecipitate

Arbitrary name for a BioSample (associated with a BioSampleType:extract OntologyEntry). Used to control linking between objects; may be omitted if desired, in which case an Immunoprecipitate name constructed from the raw data filename is used instead. These objects are used in ChIP experiments and may be ignored for expression or CGH studies.

LabeledExtract

Arbitrary name for a LabeledExtract. Used to control linking between objects; may be omitted if desired, in which case a LabeledExtract name constructed from the raw data filename and the label dye name is used instead.

Dye

Name of the dye linked to the labeled extract (e.g., Cy3, Cy5, biotin).

BioSourceMaterial

MO MaterialType term to be attached to the BioSource. Default: whole_organism

SampleMaterial

MO MaterialType term to be attached to the BioSample. Default: organism_part

ExtractMaterial

MO MaterialType term to be attached to the Extract. Default: total_RNA

ImmunoprecipitateMaterial

MO MaterialType term to be attached to the Immunoprecipitate. Default: genomic_DNA

LabeledExtractMaterial

MO MaterialType term to be attached to the LabeledExtract. Default: synthetic_DNA

BioSourceDescription

Free-text description to attached to the BioSource (this should be used sparingly, if at all).

Hybridization

Arbitrary name for a hybridization. Used to control linking between objects; may be omitted if desired, in which case a Hybridization name constructed from the raw data filename is used instead.

Scan

Arbitrary name for a scanning event. Used to control linking between objects; may be omitted if desired, in which case a Scan name constructed from the raw data filename is used instead.

Normalization

Arbitrary name for a normalization procedure. Used to control linking between objects; may be omitted if desired, in which case a Normalization name constructed from the normalized data filename is used instead.

NormalizationType

MO DerivedBioAssayType term, attached to the relevant DerivedBioAssayData object.

Transformation

Arbitrary name for a data transformation procedure. Used to control linking between objects; may be omitted if desired.

TransformationType

MO DerivedBioAssayType term, attached to the relevant DerivedBioAssayData object.

ImageFormat

MO ImageFormat term. Only used if File[image] columns have been included in the spreadsheet. Used to create Image objects.

Protocol[grow]

Accession number for the ``growth'' protocol (BioSource->BioSample Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: grow

Protocol[treatment]

Accession number for the ``treatment'' protocol (BioSource->BioSample Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: specified_biomaterial_action

Protocol[extraction]

Accession number for the ``extraction'' protocol (BioSample->Extract Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: nucleic_acid_extraction

Protocol[pool]

Accession number for the ``pooling'' protocol (BioSample->Extract Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: pool

Protocol[labeling]

Accession number for the ``labeling'' protocol (Extract->LabeledExtract Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: labeling

Protocol[immunoprecipitate]

Accession number for the ``immunoprecipitation'' protocol (Extract->Immunoprecipitate Treatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Omit if Immunoprecipitates are not used in the experiment. Default ProtocolType: immunoprecipitate

Protocol[hybridization]

Accession number for the ``hybridization'' protocol (PhysicalBioAssay BioAssayCreation). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: hybridization

Protocol[scanning]

Accession number for the ``image acquisition'' protocol (PhysicalBioAssay BioAssayTreatment). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: image_acquisition; note however that if Protocol[image_analysis] is not specified then the scanning protocol defaults to feature_extraction. This is an ArrayExpress-specific behaviour and relates to the appearance of the experiment in the ArrayExpress web interface.

Protocol[image_analysis]

Accession number for the ``feature extraction'' protocol (MeasuredBioAssay FeatureExtraction). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: feature_extraction

Protocol[normalization]

Accession number for the ``normalization'' protocol (DerivedBioAssayData ProducerTransformation). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: bioassay_data_transformation

Protocol[transformation]

Accession number for the ``transformation'' protocol (DerivedBioAssayData ProducerTransformation). The accession number should either be present in the protocol section of the spreadsheet, or pre-existing in ArrayExpress. Default ProtocolType: bioassay_data_transformation

Software is created in the Protocol package, and referenced as SoftwareApplication in the appropriate ProtocolApplication (see above).

Software[scanning]

Name of the scanning software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.

Software[image_analysis]

Name of the feature extraction software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.

Software[normalization]

Name of the normalization software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.

Software[transformation]

Name of the data transformation software, followed by its version in parentheses. The Software name is used to create a Software object in the Protocol package, and the version is inserted into the relevant SoftwareApplication objects.

Array[accession]

ArrayExpress array accession number (e.g., A-MEXP-1). This is used in the Array package, and also to create Feature and Reporter identifiers for the DesignElementDimensions.

Array[serial]

Serial or lot number of the array. This is used to define ArrayManufacture objects within the Array package.

File[raw]

Name of the raw data file for a given hybridization.

File[normalized]

Name of the normalized data file for a given hybridization/normalization.

File[transformed]

Name of the transformed final data matrix file for the experiment.

File[image]

URI locating the Image object associated with an ImageAcquisition (scanning) event. ArrayExpress does not store images, although we can link to images stored on external web sites, where desired.

File[cdf]

Affymetrix array library file pertaining to a given hybridization. Required for correct parsing of Affymetrix data files, although for standard Affymetrix arrays the actual CDF file does not need to be supplied.

File[exp]

Affymetrix experiment description file pertaining to a given hybridization. Required for correct parsing of Affymetrix data files.

BioMaterialCharacteristics[<category>]

MO term describing some feature of the BioSource (e.g., Genotype, Sex, DiseaseState, etc.). As many BioMaterialCharacteristics columns may be used as are desired. Each column should contain values from a different category within the MGED Ontology.

FactorValue[<category>]

MO term describing a FactorValue associated with a given hybridization (e.g., Genotype, Sex, DiseaseState, etc.). As many FactorValue columns may be used as are desired. Each column should contain values from a different category within the MGED Ontology.

Parameter[<parameter name>]

Parameter values for the parameters declared in the Protocol section of the spreadsheet. Note that a protocol must be declared in the spreadsheet in order to be able to use parameters with it.


VALIDATION SUBROUTINES

validate_expriment_section($test_arrayref, $error_fh)
validate_protocol_section($test_arrayref, $error_fh)
validate_hybridization_section($test_arrayref, $error_fh)

Each of these subroutines simply takes a reference to an array containing the list of headings to be checked, and prints warnings on STDERR (and on the optional error filehandle) if it finds unrecognized headings. No further action is taken; these subroutines merely serve as a warning to the user.


AUTHOR

Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2004.

Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.


SourceForge.net Logo