Tab2MAGE logo Creating your spreadsheet

The Tab2MAGE spreadsheet structure is split into three main sections: Experiment, Protocol and Hybridization. The sections are separated from each other by one or more blank lines. Please note that blank lines are used to demarcate the ends of sections, and as such they should not be used within the body of any section. We will discuss each of these three sections in turn.

A series of real-world and conceptual examples is given at the end of this page.

Experiment section

This section holds all of the top-level information about an experiment. It consists of two columns; the left column contains a series of predefined row tags, while the right column contains the actual values pertaining to your experiment. An example is shown below:

Experiment section 
domainebi.ac.uk
accessionE-EXML-1
quality_controldye_swap_quality_control
experiment_design_typestrain_or_line_design
name<your experiment title>
description<short description of your experiment>
release_date2004-08-30
submission_date2004-07-28
submitterJohn Falstaff
organizationWindsor Laboratories
publication_title<your manuscript title>
authorsJohn Falstaff; Robin Goodfellow
journalNature Genetics
volume12
issue4
pages123-456
year2004
pubmed_id12345678

With the exception of the section header, all the tags above are optional. However, if the domain or accession tags are not present suitable placeholders will be used instead. A more detailed explanation of the meaning of each of these tags may be obtained by clicking on them.

[ Back ][ Top of page ]

Protocol section

The Protocol section allows the user to define the protocols used in the experiment. The section consists of three required columns (accession, text and name) with an optional parameters column. A fifth column, "type" is available for adding custom ProtocolType terms from the MGED Ontology to your protocols, but its use is not recommended for the majority of cases. This column should generally be omitted, and the tab2mage script will use a set of default protocol types based on how the protocols are used in the Hybridization section.

Protocol section   
accession text name parameters
P-EXML-1Cells were grown in YPD (1% yeast extract/2% peptone/2% glucose) to an OD600 of approximately 0.8Yeast growthgrowth temperature (degree_C); pH
P-EXML-2<protocol text>Yeast cell harvestingpellet weight (mg)
P-EXML-3<protocol text>Cell lysis and RNA prep 
P-EXML-4<protocol text>cDNA labeling 
P-EXML-5<protocol text>Hybridizationhyb temp (degree_C); hyb volume (uL)
P-EXML-6<protocol text>Scanning 
P-EXML-7<protocol text>Image analysis 
P-EXML-8<protocol text>Normalization 

Note that you may omit protocols which have been previously submitted to ArrayExpress, and simply refer to those protocols directly in the Hybridization section. If no new protocols are needed then the entire Protocol section may be omitted. A more detailed explanation of the meaning of each of the fields in this section may be obtained by clicking on the column headings.

[ Back ][ Top of page ]

Hybridization section

The Hybridization section describes how each sample links to each hybridization, scanning and subsequent normalization, and as such is the largest and most complex section. There is a simple principle underlying this section, however: Everything which appears on a given line must be related in some way.

The column headings available in this section can be divided into several different types for the purpose of clarity. Note that the various columns in this section may appear in any order, and so you may find that re-ordering the columns makes the spreadsheet more legible for your own application.

For each of the following tables, a more detailed explanation of the meaning of each of the column headings may be obtained by clicking on them.

Names of materials and processes
These columns are provided so that you can give unique names to each of the materials used in your experiment (e.g., samples, extracts, labeled extracts). Hybridization and normalization events are also given names. The use of these names is entirely optional, as Tab2MAGE will attempt to link all of these things together for you. If, however, your experiment involves complex pooling or splitting operations then it is recommended that you make use of the names. Doing so will allow you a much greater degree of control over the output MAGE-ML.

The following table is an example of a simple reference design in which individual samples are hybridized against a common reference pool:

Hybridization section      
BioSource Sample Extract LabeledExtract Dye Hybridization Normalization
S288CS288C sampleS288C extractS288C LECy3S288C HybS288C Norm
S288CS288C sampleReference extractReference LECy5S288C HybS288C Norm
Sigma1278bSigma1278b sampleSigma1278b extractSigma1278b LECy3Sigma1278b HybSigma1278b Norm
Sigma1278bSigma1278b sampleReference extractReference LECy5Sigma1278b HybSigma1278b Norm
W303aW303a sampleW303a extractW303a LECy3W303a HybW303a Norm
W303aW303a sampleReference extractReference LECy5W303a HybW303a Norm

Note that Tab2MAGE does not constrain you to use the standard experimental layout:

biosource ----------> sample ----------> extract ----------> labeled extract ----------> hybridization event
    [growth, treatment]     [extraction]         [labeling]               [hybridization]

If, for instance, your experiment starts with a series of extracts provided by an external supplier, the sample and extract stages may be omitted:

biosource ----------> labeled extract ----------> hybridization event
          [labeling]               [hybridization]

Material types
These columns simply indicate the type of each material used. The terms should all be instances of MaterialType from the MGED ontology. Typical terms are shown in this example:

Hybridization section   
BioSourceMaterial SampleMaterial ExtractMaterial LabeledExtractMaterial
whole_organismwhole_organismtotal_RNAsynthetic_DNA
whole_organismorganism_parttotal_RNAsynthetic_RNA
organism_partcellpolyA_RNAsynthetic_RNA

These columns are not absolutely required for Tab2MAGE to run. However, the strictures of the MAGE model mean that if these values are not given, then suitable default values must be generated. It is therefore wise to use these columns wherever possible.

Material characteristics
Each biosource used in the experiment can have an arbitrary number of characteristics. The "BioMaterialCharacteristics[]" heading provides a way to define as many of these characteristics as are needed. The general form of this heading is:

BioMaterialCharacteristics[<category>]

where <category> is a subclass of the BioMaterialCharacteristics class within the MGED ontology. Examples are given in the table below:

Hybridization section  
BioMaterialCharacteristics[Genotype] BioMaterialCharacteristics[Organism] BioMaterialCharacteristics[StrainOrLine]
CAD1::myc9:TRP1Saccharomyces cerevisiaeS288C
CAD1::myc9:TRP1Saccharomyces cerevisiaeW303a
RTG3::myc18:TRP1Saccharomyces cerevisiaeS288C
RTG3::myc18:TRP1Saccharomyces cerevisiaeW303a

If the BioSource name column is not used, then the set of characteristics of each biosource is used to determine how many biosources should be created. This means that if all your biosources share identical characteristics, and none are given names, then they will be treated as a single biosource in the output MAGE-ML. To circumvent this behaviour, please name your biosources.

Links to defined protocols and parameters
This set of columns allows you to link the protocols defined in the previous section to your hybridizations:

Hybridization section     
Protocol[grow] Protocol[treatment] Protocol[extraction] Protocol[labeling] Protocol[hybridization] Protocol[scanning]
P-EXML-1P-EXML-2P-EXML-3P-EXML-4P-EXML-5P-EXML-6
P-EXML-1P-EXML-2P-EXML-3P-EXML-4P-EXML-5P-EXML-6
P-EXML-1P-EXML-2P-EXML-3P-EXML-4P-EXML-5P-EXML-6
P-EXML-1P-EXML-2P-EXML-3P-EXML-4P-EXML-5P-EXML-6

If you have assigned parameters to your protocols, you may set the parameter values here:

Hybridization section    
Parameter[growth temperature] Parameter[pH] Parameter[pellet weight] Parameter[hyb temp] Parameter[hyb volume]
306105650
316125650
306105550
316125550

The general scheme linking protocols with experimental stage is as shown below:

biosource ----------> sample --------> extract -------> labeled extract -------> hybridization 
      [grow, treatment]     [extraction]      [labeling]            [hybridization]    |
                                                                                       |
                                                                                       |
      ----------------------------------------------------------------------------------
      |
      |
      v
hybridization ------------> image -----------> raw data file -----------> normalized data file 
               [scanning]       [image_analysis]           [normalization]

Note that for protocols linked to the biological materials, the existence of the protocol implies the existence of the target material. For example, if a growth or treatment protocol is used in a given row then a sample object will be automatically created for that row in the output MAGE-ML.

Experimental factors
Each of your hybridizations should have one or more experimental factor values associated with it. These factors refer to the different conditions used in the preparation of each hybridization. They may reflect variation in the starting biosources (BioMaterialCharacteristics) or differences in the way those biosources have been treated. The general form of this heading is:

FactorValue[<category>]

where <category> is a subclass of the ExperimentalFactorCategory class within the MGED ontology (Note: you are likely to need to descend several subclasses into the heirarchy to find a biologically meaningful category). Examples are given in the table below:

Hybridization section  
FactorValue[Organism] FactorValue[Sex] FactorValue[OrganismPart]
Homo sapiensmaleliver
Homo sapiensmalekidney
Homo sapiensfemaleliver
Homo sapiensfemalekidney
Mus musculusmaleliver
Mus musculusmalekidney
Mus musculusfemaleliver
Mus musculusfemalekidney

Your spreadsheet may contain as many "FactorValue[]" columns as necessary to fully describe your experiment.

Data files and array information
The final class of headings is used to specify which data files relate to which hybridization. There are also columns in this section for linking the data files to the array design used. Here is an example of these columns for a spotted array:

Hybridization section   
File[raw] File[normalized] Array[accession] Array[serial]
Data1.txtNormData1.txtA-EXML-1244532
Data2.txtNormData2.txtA-EXML-1244533
Data3.txtNormData3.txtA-EXML-1244534
Data4.txtNormData4.txtA-EXML-1244535

Below is an example of what is required for an Affymetrix submission. Note that for spreadsheet submissions to ArrayExpress we do not require you to upload the actual CDF library file, unless you are using a custom array. For such submissions the "Array[serial]" column should contain Affymetrix Chip Lot numbers:

Hybridization section     
File[raw] File[normalized] File[exp] File[cdf] Array[accession] Array[serial]
Data1.CELData1.CHPData1.EXPYG_S98.cdfA-AFFY-272125576
Data2.CELData2.CHPData2.EXPYG_S98.cdfA-AFFY-272103523
Data3.CELData3.CHPData3.EXPYG_S98.cdfA-AFFY-272468372
Data4.CELData4.CHPData4.EXPYG_S98.cdfA-AFFY-272542678

Currently supported data file formats include GenePix, Affymetrix, Agilent, ScanArray/QuantArray, ScanAlyze and Arrayvision. The MetaColumn/MetaRow format favoured by ArrayExpress is of course also supported.

The "Array[accession]" column is required in any spreadsheets referencing data files. If Tab2MAGE is used in the absence of data files then the array information may be omitted, although such usage of Tab2MAGE will obviously not be acceptable as an ArrayExpress data submission.

If you prefer to process your normalized data in a single file mapping to multiple hybridizations, Tab2MAGE also supports including this data as a Final Data Matrix file. This file can be included in an experiment using the "File[transformed]" column. Please note that the final data matrix file must be formatted as described in the data file notes.

Hybridization section   
File[raw] File[transformed] Array[accession] Hybridization
Data1.txtFinalDataMatrix.txtA-EXML-1Hyb1
Data2.txtFinalDataMatrix.txtA-EXML-1Hyb2
Data3.txtFinalDataMatrix.txtA-EXML-1Hyb3
Data4.txtFinalDataMatrix.txtA-EXML-1Hyb4

Please see the MAGE mappings for other supported column headings, and further information on how the spreadsheet information is used to create the MAGE-ML document.

[ Back ][ Top of page ]

Real-world Examples

Below is a selection of some of the real-world curated Tab2MAGE spreadsheets submitted to ArrayExpress over the past few months:

Spreadsheet Graph Experiment name Platform Link to experiment
E-TABM-16.txt PNG FDA-CDER MTS RNA reagent cross platform test Multiple platforms ArrayExpress
E-TABM-18.txt PNG Transcription profiling of 35 different Arabidopsis thaliana ecotypes Affymetrix ArrayExpress
E-TABM-22.txt PNG Transcription profiling of human lung cancers and lung cancer cell lines miRNA expression One-channel custom array ArrayExpress
E-TABM-33.txt PNG Transcription profiling of zebrafish development Affymetrix ArrayExpress
E-TABM-35.txt PNG Comparative genomic hybridization of 25 Coxiella burnetii isolates relative to the Nine Mile (RSA493) reference isolate Affymetrix ArrayExpress
E-TABM-54.txt PNG Comparative genome hybridization of 137 Bordetella pertussis strains Two-color custom array ArrayExpress
E-TABM-66.txt PNG Transcription profiling of normal and malignant human breast epithelial cells Multiple platforms ArrayExpress
E-TABM-70.txt PNG Transcription profiling of human cell lines treated with cytochalasin D and nocodazole with the aim of characterising tetraploid clones Agilent ArrayExpress
E-TABM-102.txt PNG Transcription profiling of wild type and ATF3 -/-mouse bone marrow macrophages stimulated with lipopolysaccharide over time Affymetrix ArrayExpress
E-TABM-134.txt PNG WGA-LCM and Genomewide Survey of Lung Cancer Affymetrix ArrayExpress
E-TABM-136.txt PNG Transcription profiling of human and chimpanzee heart, brain, testis and lymphblastoid cell lines to study functionality of intergenic transcription Affymetrix ArrayExpress
E-TABM-140.txt PNG Chromatin immunoprecipitation (ChIP-chip) of human erythroleukemia cell line K-562 with anti-histone antibodies using an ENCODE array Two-color custom array ArrayExpress
E-TABM-163.txt PNG Transcription profiling of murine presomitic mesoderms of 17 samples at various time points to identify cyclic genes of the mouse segmentation clock Affymetrix ArrayExpress
E-MEXP-880.txt PNG Methylation profiling of normal and cancerous breast cells from human patients and cell lines in a 125 kB region of the HOXA cluster Two-color custom array ArrayExpress

[ Back ][ Top of page ]

Conceptual Examples

A series of very simple conceptual examples is given here:

[ Back ][ Top of page ]

Please see the experiment checker and tab2mage.pl help notes for more information.


SourceForge.net Logo
Last modified: Tue Feb 26 14:12:46 GMT 2008