Spreadsheet construction

Experiment section
This is a required tag indicating the start of the Experiment section.

domain
The domain tag provides information on the originator of the output MAGE-ML document. This field can contain any suitable string, such as the originating internet domain name (e.g., "ebi.ac.uk").

accession
The experiment accession number is a unique identifier assigned to each experiment. Accession numbers for experiments submitted to ArrayExpress have the format E-XXXX-n.

quality_control
This is a comma-separated list of terms taken from the MGED ontology. The terms should be instances of QualityControlDescriptionType. Typical terms which have been used here include "biological_replicate", "technical_replicate" and "dye_swap_quality_control".

experiment_design_type
This is a comma-separated list of terms taken from the MGED ontology. The terms should be instances of ExperimentDesignType. Examples of terms which have been used here include "disease_state_design", "stimulus_or_stress_design" and "compound_treatment_design".

name
The name of the experiment.

description
A short paragraph (contained within this single spreadsheet cell) describing the purpose of the experiment.

release_date
Date for public release, in the format YYYY-MM-DD.

submission_date
Date of submission, in the format YYYY-MM-DD.

submitter
The name of the person responsible for submitting the experiment to the database.

organization
The organization to which the submitter is affiliated.

publication_title, authors, journal, volume, issue, pages, year, pubmed_id
Publication details for any manuscript associated with the experiment. The journal field should contain a standard Pubmed journal abbreviated name. The authors list is a semicolon-delimited list of names.

[ Back ][ Top of page ]

Protocol section

Protocol section
This is a required tag indicating the start of the Protocol section.

accession
The protocol accession number is a unique identifier assigned to each protocol. Accession numbers for protocols submitted to ArrayExpress have the format P-XXXX-n.

text
The full text of the protocol, inserted into a single spreadsheet cell. Note that as of version 1.8.1, Tab2MAGE now supports line breaks in protocol text when exported from MS Excel or OpenOffice.org. For this to work correctly, the spreadsheet should be exported using the default settings for tab-delimited text, i.e., enclosing all text fields in double quotes ("). These breaks will then be automatically converted into <br> tags for correct HTML formatting. Similarly, the ampersand (&) character is now automatically quoted correctly as an HTML entity. Protocol text may also be formatted using standard HTML tags such as <br>. Inequality signs < and > should be represented by the HTML entities < and > respectively. Please be aware when using MS Excel that this application may truncate long text in a spreadsheet cell, without warning the user.

name
The name of the protocol.

type
This (very) optional column should contain terms taken from the MGED ontology. The term should be an instance of ExperimentalProtocolType or DataTransformationProtocolType. Examples of terms which have been used here include "grow", "compound_based_treatment" and "hybridization". Note: use of this column is not recommended in the majority of cases. If this column is not present, suitable defaults will be used based on how the protocols are referenced in the Hybridization section.

parameters
A list of parameters, separated by semicolons, in the format "name(units)". Each parameter name should be unique within the spreadsheet. Provided that the "units" string matches one of those which are hard-coded into the MAGE object model (e.g., "ug" for microgram MassUnit, "degree_C" for degrees Celsius TemperatureUnit etc.), the correct unit subclass will be created. Otherwise the script will create a generic QuantityUnit object. If no unit is given, no Unit object is created in the output MAGE-ML. Note: There is a clash between using "m" for minutes and meters. Tab2MAGE only supports using "m" for minutes in this context.

[ Back ][ Top of page ]

Hybridization section

Hybridization section
This is a required tag indicating the start of the Hybridization section.

BioSource
The name of the biosource. This is typically the biological material as it was originally brought into the lab.

Sample
The name of the sample derived from the biosource.

Extract
The name of the extract made from the sample.

LabeledExtract
The name of the labeled extract.

Dye
The name of the dye used in labeling the extract.

Hybridization
The name of the hybridization event. Note: This is the same name as must be used in the final data matrix file, where used.

Scan
The name of the array scanning event.

Normalization
The name of the normalization event.

BioSourceMaterial, SampleMaterial, ExtractMaterial, LabeledExtractMaterial
The types of material used in the experiment. The terms should be taken from the MGED ontology, and should be instances of the MaterialType class.

BioMaterialCharacteristics[<category>]
These columns contain annotation for each of your biosources. The terms in these columns should be from one or more standard ontologies or other controlled vocabularies. Examples of suitable ontologies and/or vocabularies may be found on the following web sites:

The <category> field should contain any of the BioMaterialCharacteristics category terms from the MGED ontology, such as "DiseaseState", "Genotype", "Age" or "Histology". The spreadsheet may contain as many BioMaterialCharacteristics columns as required to fully describe your biosources.

Protocol[grow]
The accession number of the protocol used to propagate the biosource.

Protocol[treatment]
This is a general-purpose protocol type describing any treatments of the sample prior to extraction (e.g., administration of drugs, tissue dissection etc.).

Protocol[extraction]
The method by which nucleic acid is extracted from the sample.

Protocol[labeling]
The protocol used to derive the labeled extract from the label.

Protocol[hybridization]
The protocol used to hybridize the labeled extract to the array.

Protocol[scanning]
The protocol by which the hybridized array is imaged.

Protocol[image_analysis]
The protocol used in the feature extraction step in which the scanned image is converted into raw numerical data.

Protocol[normalization]
The procedure used to derive normalized data from the raw data.

Parameter[<parameter name>]
This class of column allows you to specify the values for each of your protocols on a hyb-by-hyb basis. The <parameter name> field must refer to a predeclared parameter defined in the Protocol section. You should use as many Parameter[] columns as you have predefined parameters.

FactorValue[<category>]
These columns are used to indicate which of your experimental factors apply to each hybridization. The terms in these columns should be from one or more standard ontologies or other controlled vocabularies. Examples of suitable ontologies and/or vocabularies may be found in the notes on the BioMaterialCharacteristics[] columns. The <category> field should contain a subclass of the ExperimentalFactorCategory class from the MGED ontology. The relevant categories may be taken from any depth of the ExperimentalFactorCategory heirarchy, and you may find that you have to descend several levels to obtain a biologically useful category. Examples of suitable categories include "DiseaseState", "OrganismPart", "Compound" or "EnvironmentalHistory". The spreadsheet may contain as many FactorValue columns as required to fully describe your experiment.

FactorValues may be linked to either "Values" or "Measurements". As a rule of thumb, Tab2MAGE currently assumes that a purely numerical value in a FactorValue column should be encoded as a Measurement. In such situations the <category> field should indicate the class and name of the measurement unit, in the form:

FactorValue[<unit class>(<unit name>)]

For example:

FactorValue[Temperature(degree_C)]

In each case the unit class and name should be derived from the values defined by the MAGE object model. If a match is not found, Tab2MAGE will generate a generic QuantityUnit as a placeholder. Note: There is a clash between using "m" for minutes and meters. Tab2MAGE only supports using "m" for minutes in this context.

File[raw]
The name of the file containing the raw data. For Affymetrix submissions this should be the name of the CEL file. A single hybridization may be linked to multiple raw data files (e.g., derived from different scanning events) using multiple rows within the Tab2MAGE spreadsheet.

File[normalized]
The name of the file containing the normalized data. For Affymetrix submissions this should be the name of the CHP file. A single hybridization may be linked to multiple normalized data files (e.g., derived using different normalization protocols) using multiple rows within the Tab2MAGE spreadsheet.

File[transformed]
The name of the combined transformed final data matrix file. The format of this file is described in the data file notes. The column headings in this file must contain the name(s) of the hybridization(s) with which each column is associated.

File[exp]
(Affymetrix submissions only). The name of the EXP file for a given hybridization.

File[cdf]
(Affymetrix submissions only). The name of the library CDF file provided by Affymetrix. The actual CDF file(s) need only be supplied to ArrayExpress if you are using custom arrays.

Array[accession]
The ArrayExpress accession number for the array design used in the hybridization. Please note that if your array design has not yet been submitted to ArrayExpress, then you should be prepared to submit it yourself using the MIAMExpress submission tool. In cases where you have used commercial arrays, please contact the to ask about adding the array design to the database.

Array[serial]
The serial number of the array. For Affymetrix submissions this should be the Chip Lot number.

[ Back ][ Top of page ]