Virtual Expression Arrays and Expression Array Cross Hybridization

    In anticipation of expanded application of gene expression technology at UTSW (i.e. DNA microarrays and DNA hybridization chips), we have computed virtual expression arrays (VEAs) for the 6,217 open reading frames (ORFs) of S. cerevisiae by searching for sequence similarity to the GenBank expressed sequence tag (EST) database. We envision two basic uses for the VEA: First, as a means of comparing experimental expression data to compiled database information. Second, as a method of quantifying possible cross-hybridization of products on microarrays.

    The visualized arrays are keyed by similarity (color) and number of similarities (intensity).  We have found that: 
 
    1) ~60% of all yeast ORF sequences have similarity (with P(N)<10-10) to some mammalian ESTs in the database
    2) the density of hits are uniformly distributed on the yeast chromosomes, except that chromosome 3 is anomalously low and chromosome 11 is high
    3) the percentage of hits to yeast ORFs when sorted according to human tissue type are over represented in colon, neuron, thyroid, and testis cDNAs and under represented in fetus, prostate, and spleen.
 
Pictures of various VEAs are available.
    We have compiled VEAs using the yeast ORFs as the "virtual chip "and "captured" cDNAs from various cancer types.  For example, as of 1/98, there are 173 different yeast genes with EST hits after probing with lung cancer cDNA sequences.  Thus, VEA has the capability of providing the researcher with a current update relative to the yeast genome of the entire CGAP effort for all deposited normal tissue, cancer or preneoplasia cDNAs, from our WWW site (http://www.pompous.swmed.edu/estarray.htm).  By extension it is possible to develop virtual chips based on other platforms (e.g. a particular human chromosome region) to interrogate the EST database.  Finally, it will be important to validate and compare virtual with actual expression levels for the various cancers.

After having computed  the VEA, we have also conducted validation research.  For example above is a VEA for mouse cardiac against a yeast array.  the graph shows actual experimental data of a cross hybridization experiment.  The data indicate that most of the 6,200  genes of yeast show the ability to corss hybridize with mouse (under low stringency).  One the graph, mRNA expression signal extracted from normal mouse heart is on the Y-axis and the mRNA signal from hypertrophic mouse heart is on the X-axis.  Important here is that this method has assisted us to identify many varying homologous genes that potentially could be used to construct dedicated arrays for heart disease analysis.

    The use of expression and DNA re-sequencing arrays is expanding.  To date these arrays have consisted of genes of whole genomes (yeast, mTB, etc.) or collections of cDNA clones, often selected at random.  To make maximum use of information and to enable the ability to construct arrays for a unique purpose with a smaller number of array members to control cost and eliminate unused data, we have begun to design these arrays by computer.  A code that will design arrays for any purpose based on keyword searches into many databases, and then linking the data with our PRIMO oligo design codes and others, we will be able to automatically and rapidly design arrays, ready then for construction.  On the left is one component of the analysis, the identification of array members that could cross hybridize among themselves, diluting the value of the data.  This is the results of a analysis of one yeast gene for sequence homology to other genes in the yeast genome.  Our code seeks to minimize this overlap.