ArrayExpress::Datafile.pm - an OO module providing methods for parsing data files.
use ArrayExpress::Datafile; my $file = ArrayExpress::Datafile->new({ name => 'data.txt', data_type => 'raw', array_design_id => 'A-MEXP-123', });
This is a module providing methods for data file parsing. See also the ArrayExpress::Datafile::Parser manpage for Datafile handler objects.
Setter method for the total row count.
Getter method for the total row count.
Adds one (or the passed value) to the overall row count. Returns the new row count.
Getter method for the total number of parse errors.
Adds one (or the passed value) to the overall parsing error count. Returns the new error count.
The number of data file cells which are not null relating to known QTs.
This is an incremental counter for the number of data file cells which are not null relating to known QTs. Returns the new ``not null'' count.
Setter method for the Tab2MAGE or MIAMExpress hybridization identifier associated with a raw or normalized data file. Does not apply to FGEM files.
Getter method for the Tab2MAGE or MIAMExpress hybridization identifier associated with a raw or normalized data file. Does not apply to FGEM files.
Setter method for the internal MIAMExpress SYSUID value associated with a raw or normalized data file. Does not apply to FGEM files.
Getter method for the internal MIAMExpress SYSUID value associated with a raw or normalized data file. Does not apply to FGEM files.
Setter method for an identifier linking the data file to an array design. For Tab2MAGE this identifier is the ArrayExpress array accession number. For MIAMExpress this identifier is the internal ArrayExpress Oracle database identifier for the array design.
Getter method for an identifier linking the data file to an array design. For Tab2MAGE this identifier is the ArrayExpress array accession number. For MIAMExpress this identifier is the internal ArrayExpress Oracle database identifier for the array design.
Setter method for the MAGE identifier string representing the DesignElementDimension which has been associated with the file.
Getter method for the MAGE identifier string representing the DesignElementDimension which has been associated with the file.
Setter method for the ArrayDesign object associated with this data file. See the ArrayExpress::Datafile::ArrayDesign manpage for information on this class.
Getter method for the Array Design object associated with this data file.
The set_data_metrics method is a setter method for a hashref which relates actual column heading to datatype, scale and subclass. The keys are QT names as defined in the ArrayExpress::Datafile::QT_list manpage. Note that the returned hashref should only have daughter hashrefs as values, and so if no datatype,scale or subclass info is available for a column heading (e.g. MetaRow, MetaColumn) then that coumn should not be represented. In practice, this method should only return information on the QTs for a single software type (see $self->check_column_headings).
Getter method for data metrics hashref (see set_data_metrics).
Setter method for a hashref linking a datafile row identifier (e.g. ``1.1.4.1'') to a measured data value. Used in Pearson correlation coefficient calculation.
Getter method for a hashref linking a datafile row identifier (e.g. ``1.1.4.1'') to a measured data value. Used in Pearson correlation coefficient calculation.
Setter method for an arrayref describing the array indices of the coordinate columns (MetaColumn, MetaRow etc.) in $self->get_column_headings.
Getter method for an arrayref describing the array indices of the coordinate columns (MetaColumn, MetaRow etc.) in $self->get_column_headings.
Setter method for the actual column headings found in the data file (arrayref).
Getter method for the actual column headings found in the data file (arrayref).
Setter method for an arrayref containing a list of the actual recognized QTs in the file. This is not a uniqued list - a repeated QT will appear multiple times (as for example, in a FGEM data file).
Getter method for an arrayref containing a list of the actual recognized QTs in the file. This is not a uniqued list - a repeated QT will appear multiple times (as for example, in a FGEM data file).
Setter method for an arrayref containing a list of (potential) hyb ids derived from the column headings of a FGEM file. These are checked elsewhere.
Getter method for an arrayref containing a list of (potential) hyb ids derived from the column headings of a FGEM file. These are checked elsewhere.
Method which adds the passed argument to a list of column headings which are unrecognized.
Returns an arrayref listing the unrecognized column headings (uniqued and sorted).
Method which adds the passed argument to a list of unrecognized hybridization identifiers parsed from FGEM column headings which are unrecognized. Hybridization identifiers are either the Tab2MAGE or MIAMExpress user-supplied names for the hybridizations.
Returns an arrayref listing the unrecognised hyb identifiers (uniqued and sorted).
Setter method for boolean flag indicating whether the file is an EXP file or not.
Getter method for EXP file flag.
Getter method for binary file flag.
Setter method for boolean flag indicating whether the file is
part of a MIAMExpress submission, for which slightly different
validation rules are used. More typically the is_miamexpress
argument to new()
is used.
Getter method for MIAMExpress file flag.
Mutator method for the experimental factor values associated with the file. Takes (category, value) as an argument, adds them to the list.
Getter method for factor values associated with the file; returns a hashref in the form:
{category => [value1, value2, ...], ...}
Setter method for the type of DesignElementDimension associated with the file (Feature, Reporter or CompositeSequence).
Getter method for the type of DesignElementDimension associated with the file.
Setter method for the format type (e.g., Affymetrix, GenePix, BlueFuse etc.). See the ArrayExpress::Curator::Config manpage for the enumerated types.
Getter method for the format type.
Setter method for the data type (e.g., raw, normalized, transformed). See the ArrayExpress::Curator::Config manpage for the enumerated types. Also allowed is the 'EXP' file type (Affymetrix).
Getter method for the data type.
Setter method for the QT type associated with the file. This is derived from the software names defined in the ArrayExpress::Datafile::QT_list manpage.
Getter method for the QT type associated with the file.
Setter method for the full filesystem path of the data file.
Getter method for the full filesystem path of the data file.
Setter method for the name of the file. Also sets $self->set_path if it has not been otherwise set. Note that this is not the same behaviour as the ``name'' argument to the object constructor, which does not change the path at all.
Getter method for the name of the file.
Setter method for the name of the output stripped data file.
Getter method for the name of the output stripped data file.
Getter method for the line-ending character(s). Can be
\n
, \r\n
, \r
or
Unknown
.
Getter method for the line-ending format (Mac, Unix, DOS or Unknown). Used in reports.
Setter method for storing the output of the $self->parse_exp_file method.
Getter method for retrieving the output of the $self->parse_exp_file method.
Setter method for the Bio::MAGE::BioAssayData::QuantitationTypeDimension object associated with the file.
Getter method for the Bio::MAGE::BioAssayData::QuantitationTypeDimension object associated with the file.
Tim Rayner (rayner@ebi.ac.uk), ArrayExpress team, EBI, 2004.
Acknowledgements go to the ArrayExpress curation team for feature requests, bug reports and other valuable comments.