Experiment
section
This is a required tag indicating the start
of the Experiment section.
domain
The domain tag provides information on the originator of the
output MAGE-ML document. This field can contain any suitable
string, such as the originating internet domain name (e.g.,
"ebi.ac.uk").
accession
The
experiment accession number is a unique identifier assigned to
each experiment. Accession numbers for experiments submitted to
ArrayExpress have the format E-XXXX-n.
quality_control
This is a
comma-separated list of terms taken from the MGED
ontology. The terms should be instances of QualityControlDescriptionType. Typical
terms which have been used here include "biological_replicate",
"technical_replicate" and "dye_swap_quality_control".
experiment_design_type
This is a
comma-separated list of terms taken from the MGED
ontology. The terms should be instances of ExperimentDesignType. Examples
of terms which have been used here include
"disease_state_design", "stimulus_or_stress_design" and
"compound_treatment_design".
name
The name of the experiment.
description
A short paragraph (contained within this single spreadsheet
cell) describing the purpose of the experiment.
release_date
Date for public release, in the format YYYY-MM-DD.
submission_date
Date of
submission, in the format YYYY-MM-DD.
submitter
The name of the person responsible for submitting the experiment
to the database.
organization
The organization to which the submitter is affiliated.
publication_title, authors, journal, volume,
issue, pages, year, pubmed_id
Publication details for any
manuscript associated with the experiment. The journal
field should contain a standard Pubmed journal abbreviated
name. The authors list is a semicolon-delimited list of
names.
[ Back ][ Top of page ]
Protocol
section
This is a required tag indicating the start of
the Protocol section.
accession
The protocol accession number
is a unique identifier assigned to each protocol. Accession
numbers for protocols submitted to ArrayExpress have the format
P-XXXX-n.
text
The full text of the protocol, inserted into a single
spreadsheet cell. Note that as of version 1.8.1, Tab2MAGE
now supports line breaks in protocol text when exported from MS
Excel or OpenOffice.org. For this to work correctly, the
spreadsheet should be exported using the default settings for
tab-delimited text, i.e., enclosing all text fields in double
quotes ("). These breaks will then be automatically converted
into <br> tags for correct HTML formatting. Similarly, the
ampersand (&) character is now automatically quoted
correctly as an HTML entity. Protocol text may also be formatted
using standard HTML tags such as <br>. Inequality signs
< and > should be represented by the HTML entities
< and > respectively. Please be aware when using
MS Excel that this application may truncate long text in a
spreadsheet cell, without warning the user.
name
The
name of the protocol.
type
This (very) optional column should contain terms taken from the MGED
ontology. The term should be an instance of ExperimentalProtocolType
or DataTransformationProtocolType. Examples
of terms which have been used here include "grow",
"compound_based_treatment" and "hybridization". Note:
use of this column is not recommended in the majority of
cases. If this column is not present, suitable defaults will be
used based on how the protocols are referenced in the
Hybridization section.
parameters
A list of parameters,
separated by semicolons, in the format "name(units)". Each
parameter name should be unique within the spreadsheet. Provided
that the "units" string matches one of those which are
hard-coded into the MAGE object model (e.g., "ug" for microgram
MassUnit, "degree_C" for degrees Celsius TemperatureUnit etc.), the
correct unit subclass will be created. Otherwise the script will
create a generic QuantityUnit object. If no unit is given, no
Unit object is created in the output MAGE-ML. Note: There
is a clash between using "m" for minutes and meters. Tab2MAGE only
supports using "m" for minutes in this context.
[ Back ][ Top of page ]
Hybridization section
This is a
required tag indicating the start of the Hybridization
section.
BioSource
The name of the
biosource. This is typically the biological material as it was
originally brought into the lab.
Sample
The name of the sample derived from the biosource.
Extract
The name of the extract made from the sample.
LabeledExtract
The name of the labeled
extract.
Dye
The
name of the dye used in labeling the extract.
Hybridization
The name of the hybridization event. Note: This is
the same name as must be used in the final data matrix
file, where used.
Scan
The name of the array scanning event.
Normalization
The name of the normalization event.
BioSourceMaterial, SampleMaterial,
ExtractMaterial, LabeledExtractMaterial
The types of
material used in the experiment. The terms should be taken from
the MGED
ontology, and should be instances of the MaterialType
class.
BioMaterialCharacteristics[<category>]
These columns contain annotation for each of your biosources. The terms in these columns
should be from one or more standard ontologies or other
controlled vocabularies. Examples of suitable ontologies and/or
vocabularies may be found on the following web sites:
The <category> field should contain any of the BioMaterialCharacteristics category terms from the MGED ontology, such as "DiseaseState", "Genotype", "Age" or "Histology". The spreadsheet may contain as many BioMaterialCharacteristics columns as required to fully describe your biosources.
Protocol[grow]
The accession number of
the protocol used to propagate the biosource.
Protocol[treatment]
This is a
general-purpose protocol type describing any treatments of the
sample prior to extraction (e.g., administration of drugs,
tissue dissection etc.).
Protocol[extraction]
The method by which
nucleic acid is extracted from the sample.
Protocol[labeling]
The protocol used to
derive the labeled extract from the label.
Protocol[hybridization]
The protocol
used to hybridize the labeled extract to the array.
Protocol[scanning]
The protocol by which
the hybridized array is imaged.
Protocol[image_analysis]
The protocol
used in the feature extraction step in which the scanned image
is converted into raw numerical data.
Protocol[normalization]
The procedure
used to derive normalized data from the raw data.
Parameter[<parameter name>]
This class of column allows you to specify the values for
each of your protocols on a hyb-by-hyb basis. The
<parameter name> field must refer to a predeclared
parameter defined in the Protocol section. You should use as many
Parameter[] columns as you have predefined parameters.
FactorValue[<category>]
These columns are used to indicate which of your experimental
factors apply to each hybridization. The terms in these
columns should be from one or more standard ontologies or
other controlled vocabularies. Examples of suitable ontologies
and/or vocabularies may be found in the notes on the BioMaterialCharacteristics[] columns. The
<category> field should contain a subclass of the
ExperimentalFactorCategory
class from the MGED
ontology. The relevant categories may be taken from any
depth of the ExperimentalFactorCategory heirarchy, and you may
find that you have to descend several levels to obtain a
biologically useful category. Examples of suitable categories
include "DiseaseState", "OrganismPart", "Compound" or
"EnvironmentalHistory". The spreadsheet may contain as many
FactorValue columns as required to fully describe your
experiment.
FactorValues may be linked to either "Values" or "Measurements". As a rule of thumb, Tab2MAGE currently assumes that a purely numerical value in a FactorValue column should be encoded as a Measurement. In such situations the <category> field should indicate the class and name of the measurement unit, in the form:
FactorValue[<unit class>(<unit name>)]
For example:
FactorValue[Temperature(degree_C)]
In each case the unit class and name should be derived from the values defined by the MAGE object model. If a match is not found, Tab2MAGE will generate a generic QuantityUnit as a placeholder. Note: There is a clash between using "m" for minutes and meters. Tab2MAGE only supports using "m" for minutes in this context.
File[raw]
The name of the file containing the raw data. For Affymetrix
submissions this should be the name of the CEL file. A single
hybridization may be linked to multiple raw data files (e.g.,
derived from different scanning events) using multiple rows
within the Tab2MAGE spreadsheet.
File[normalized]
The name of the file
containing the normalized data. For Affymetrix submissions this
should be the name of the CHP file. A single hybridization may
be linked to multiple normalized data files (e.g., derived using
different normalization protocols) using multiple rows within
the Tab2MAGE spreadsheet.
File[transformed]
The name of the combined
transformed final data matrix file. The format of this file is described in
the data file notes. The column
headings in this file must contain the name(s) of the
hybridization(s) with which each column is associated.
File[exp]
(Affymetrix submissions only). The name of the EXP file for
a given hybridization.
File[cdf]
(Affymetrix submissions only). The name of the library CDF
file provided by Affymetrix. The actual CDF file(s) need only be
supplied to ArrayExpress if you are using custom arrays.
Array[accession]
The ArrayExpress
accession number for the array design used in the
hybridization. Please note that if your array design has not
yet been submitted to ArrayExpress, then you should be prepared
to submit it yourself using the MIAMExpress submission
tool. In cases where you have used commercial arrays, please
contact the
to ask about adding the array
design to the database.
Array[serial]
The serial number of the
array. For Affymetrix submissions this should be the Chip Lot
number.
[ Back ][ Top of page ]
Please see the MAGE mappings for other supported column headings, and further information on how the spreadsheet information is used to create the MAGE-ML document.