Matlab script for parsing Agilent output (microarrays)

This Matlab script parses the output of a text file from Agilent software, for microarrays. It assumes that the data is a sequence of:

  • one line of field types, tab-separated
  • one line of descriptors
  • the data, one instance per line

The parsed information is stored in a structure, with 1 struct for each dataset.

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
% parse Agilent txt files to Matlab struct
% C.Ladroue
 
function data=parse_agilent_txt(fn)
 
    data=[];
    fh=fopen(fn,'r');
 
    while ~feof(fh)
        [format names info]=parse_fields(fh);
        cpt=0;
        s=[];
        line=fgetl(fh);
        while ~(strcmp(line,'*') || feof(fh))
            u=textscan(line,format,'Delimiter','t');
            cpt=cpt+1;
            for k=2:length(names)
                if iscell(u{k}), u{k}=u{k}{1};end
                eval(sprintf('s(cpt).%s=u{k};',names{k}));
            end;
            line=fgetl(fh);
        end;
 
        data=setfield(data,info,s);
    end;
    fclose(fh);
 
%% Parse the fields' type and name.
function [format names info]=parse_fields(fh)
    ftype=textscan(fgetl(fh),'%s','Delimiter','t');
    ftype=ftype{1};
    format='%s';
    for k=2:length(ftype)
        switch ftype{k}
            case 'integer' 
                format=[format '%d'];
            case 'float'
                format=[format '%f'];
            case 'text'
                format=[format '%s'];
            case 'boolean'
                format=[format '%d'];
            otherwise format=[format '%s'];
        end;
    end;
 
    names=textscan(fgetl(fh),'%s','Delimiter','t');
    names=names{1};
    info=names{1};

Example:

Select All Code:
>> data=parse_agilent_txt('Data/xxxxxxx.txt')
 
data = 
 
    FEPARAMS: [1x1 struct]
       STATS: [1x1 struct]
    FEATURES: [1x41444 struct]
>> fieldnames(data.FEPARAMS)  
 
ans = 
 
    'Protocol_Name'
    'Protocol_date'
    'Scan_ScannerName'
    'Scan_NumChannels'
    'Scan_Date'
    'Scan_MicronsPerPixelX'
    'Scan_MicronsPerPixelY'
    'Scan_OriginalGUID'
    'Scan_NumScanPass'
    'Grid_Name'
    'Grid_Date'
    'Grid_NumSubGridRows'
    'Grid_NumSubGridCols'
    'Grid_NumRows'
    'Grid_NumCols'
    'Grid_RowSpacing'
    'Grid_ColSpacing'
    'Grid_OffsetX'
    'Grid_OffsetY'
    'Grid_NomSpotWidth'
    'Grid_NomSpotHeight'
    'FeatureExtractor_Barcode'
    'FeatureExtractor_Sample'
    'FeatureExtractor_ScanFileName'
    'FeatureExtractor_ArrayName'
    'FeatureExtractor_ScanFileGUID'
    'FeatureExtractor_DesignFileName'
    'FeatureExtractor_ExtractionTime'
    'FeatureExtractor_UserName'
    'FeatureExtractor_ComputerName'
    'FeatureExtractor_Version'
    'FeatureExtractor_IsXDRExtraction'
    'FeatureExtractor_ColorMode'
    'FeatureExtractor_QCReportType'

Comments are closed.