Tab2MAGE logo Module detail: Datafile.pm

NAME

ArrayExpress::Datafile.pm - an OO module providing methods for parsing data files.


SYNOPSIS

 use ArrayExpress::Datafile;
 my $file = ArrayExpress::Datafile->new({
     name            => 'data.txt',
     data_type       => 'raw',
     array_design_id => 'A-MEXP-123',
 });

DESCRIPTION

This is a module providing methods for data file parsing. See also the ArrayExpress::Datafile::Parser manpage for Datafile handler objects.

Accessor methods

set_row_count

Setter method for the total row count.

get_row_count

Getter method for the total row count.

increment_row_count

Adds one (or the passed value) to the overall row count. Returns the new row count.

get_parse_errors

Getter method for the total number of parse errors.

increment_parse_errors

Adds one (or the passed value) to the overall parsing error count. Returns the new error count.

get_not_null

The number of data file cells which are not null relating to known QTs.

increment_not_null

This is an incremental counter for the number of data file cells which are not null relating to known QTs. Returns the new ``not null'' count.

set_hyb_identifier

Setter method for the Tab2MAGE or MIAMExpress hybridization identifier associated with a raw or normalized data file. Does not apply to FGEM files.

get_hyb_identifier

Getter method for the Tab2MAGE or MIAMExpress hybridization identifier associated with a raw or normalized data file. Does not apply to FGEM files.

set_hyb_sysuid

Setter method for the internal MIAMExpress SYSUID value associated with a raw or normalized data file. Does not apply to FGEM files.

get_hyb_sysuid

Getter method for the internal MIAMExpress SYSUID value associated with a raw or normalized data file. Does not apply to FGEM files.

set_array_design_id

Setter method for an identifier linking the data file to an array design. For Tab2MAGE this identifier is the ArrayExpress array accession number. For MIAMExpress this identifier is the internal ArrayExpress Oracle database identifier for the array design.

get_array_design_id

Getter method for an identifier linking the data file to an array design. For Tab2MAGE this identifier is the ArrayExpress array accession number. For MIAMExpress this identifier is the internal ArrayExpress Oracle database identifier for the array design.

set_ded_identifier

Setter method for the MAGE identifier string representing the DesignElementDimension which has been associated with the file.

get_ded_identifier

Getter method for the MAGE identifier string representing the DesignElementDimension which has been associated with the file.

set_array_design

Setter method for the ArrayDesign object associated with this data file. See the ArrayExpress::Datafile::ArrayDesign manpage for information on this class.

get_array_design

Getter method for the Array Design object associated with this data file.

set_data_metrics

The set_data_metrics method is a setter method for a hashref which relates actual column heading to datatype, scale and subclass. The keys are QT names as defined in the ArrayExpress::Datafile::QT_list manpage. Note that the returned hashref should only have daughter hashrefs as values, and so if no datatype,scale or subclass info is available for a column heading (e.g. MetaRow, MetaColumn) then that coumn should not be represented. In practice, this method should only return information on the QTs for a single software type (see $self->check_column_headings).

get_data_metrics

Getter method for data metrics hashref (see set_data_metrics).

set_intensity_vector

Setter method for a hashref linking a datafile row identifier (e.g. ``1.1.4.1'') to a measured data value. Used in Pearson correlation coefficient calculation.

get_intensity_vector

Getter method for a hashref linking a datafile row identifier (e.g. ``1.1.4.1'') to a measured data value. Used in Pearson correlation coefficient calculation.

set_index_columns

Setter method for an arrayref describing the array indices of the coordinate columns (MetaColumn, MetaRow etc.) in $self->get_column_headings.

get_index_columns

Getter method for an arrayref describing the array indices of the coordinate columns (MetaColumn, MetaRow etc.) in $self->get_column_headings.

set_column_headings

Setter method for the actual column headings found in the data file (arrayref).

get_column_headings

Getter method for the actual column headings found in the data file (arrayref).

set_heading_qts

Setter method for an arrayref containing a list of the actual recognized QTs in the file. This is not a uniqued list - a repeated QT will appear multiple times (as for example, in a FGEM data file).

get_heading_qts

Getter method for an arrayref containing a list of the actual recognized QTs in the file. This is not a uniqued list - a repeated QT will appear multiple times (as for example, in a FGEM data file).

set_heading_hybs

Setter method for an arrayref containing a list of (potential) hyb ids derived from the column headings of a FGEM file. These are checked elsewhere.

get_heading_hybs

Getter method for an arrayref containing a list of (potential) hyb ids derived from the column headings of a FGEM file. These are checked elsewhere.

add_fail_columns

Method which adds the passed argument to a list of column headings which are unrecognized.

get_fail_columns

Returns an arrayref listing the unrecognized column headings (uniqued and sorted).

add_fail_hybs

Method which adds the passed argument to a list of unrecognized hybridization identifiers parsed from FGEM column headings which are unrecognized. Hybridization identifiers are either the Tab2MAGE or MIAMExpress user-supplied names for the hybridizations.

get_fail_hybs

Returns an arrayref listing the unrecognised hyb identifiers (uniqued and sorted).

set_is_exp

Setter method for boolean flag indicating whether the file is an EXP file or not.

get_is_exp

Getter method for EXP file flag.

get_is_binary

Getter method for binary file flag.

set_is_miamexpress

Setter method for boolean flag indicating whether the file is part of a MIAMExpress submission, for which slightly different validation rules are used. More typically the is_miamexpress argument to new() is used.

get_is_miamexpress

Getter method for MIAMExpress file flag.

add_factor_value

Mutator method for the experimental factor values associated with the file. Takes (category, value) as an argument, adds them to the list.

get_factor_value

Getter method for factor values associated with the file; returns a hashref in the form:

 {category => [value1, value2, ...], ...}
set_ded_type

Setter method for the type of DesignElementDimension associated with the file (Feature, Reporter or CompositeSequence).

get_ded_type

Getter method for the type of DesignElementDimension associated with the file.

set_format_type

Setter method for the format type (e.g., Affymetrix, GenePix, BlueFuse etc.). See the ArrayExpress::Curator::Config manpage for the enumerated types.

get_format_type

Getter method for the format type.

set_data_type

Setter method for the data type (e.g., raw, normalized, transformed). See the ArrayExpress::Curator::Config manpage for the enumerated types. Also allowed is the 'EXP' file type (Affymetrix).

get_data_type

Getter method for the data type.

set_qt_type

Setter method for the QT type associated with the file. This is derived from the software names defined in the ArrayExpress::Datafile::QT_list manpage.

get_qt_type

Getter method for the QT type associated with the file.

set_path

Setter method for the full filesystem path of the data file.

get_path

Getter method for the full filesystem path of the data file.

set_name

Setter method for the name of the file. Also sets $self->set_path if it has not been otherwise set. Note that this is not the same behaviour as the ``name'' argument to the object constructor, which does not change the path at all.

get_name

Getter method for the name of the file.

set_target_filename

Setter method for the name of the output stripped data file.

get_target_filename

Getter method for the name of the output stripped data file.

get_linebreak_type

Getter method for the line-ending character(s). Can be \n, \r\n, \r or Unknown.

get_line_format

Getter method for the line-ending format (Mac, Unix, DOS or Unknown). Used in reports.

set_exp_data

Setter method for storing the output of the $self->parse_exp_file method.

get_exp_data

Getter method for retrieving the output of the $self->parse_exp_file method.

set_mage_qtd

Setter method for the Bio::MAGE::BioAssayData::QuantitationTypeDimension object associated with the file.

get_mage_qtd

Getter method for the Bio::MAGE::BioAssayData::QuantitationTypeDimension object associated with the file.


AUTHOR

Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2004.

Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.


SourceForge.net Logo