This page is the web supplement to the paper:

  • Wren JD and Garner HR "Shared Relationship Analysis: Ranking set cohesion and commonalities within a literature-derived relationship network" Bioinformatics 2004 Jan;20(2): 191-8
Click the following link to download the dataset of genes predicted to be within each Gene Ontology category analyzed. The file is in Microsoft Excel format and spread across 3 spreadsheets. It is rather large (27 MB), so please be patient if your connection is slow.

Predicted Gene Ontology Relationships

Details on the generation of this dataset

Gene ontology records were downloaded 11/11/2002 and contained a total of 13,414 unique ontological identification numbers and 13,106 unique descriptions. 115,303 Locuslink records were downloaded on the same day and processed so that only entries that represented actual known genes were included in the database (e.g. no genes whose existence is predicted based upon phenotype, no tentative assignments based upon weak homology, ORFs or predictive methods), leaving 42,345 entries. A total of 21,452 of these Locuslink entries had at least one existing ontology category.

Genes associated with GO categories were output if their observed to expected ratio (Obs/Exp) was at least 2 standard deviations (2s) above the average Obs/Exp value for the same number of relationships given the same size set as determined by random network simulations, and if the object was related to at least a minimal (5%) portion of the set. Only GO categories with at least 3 members were processed. Entries were manually deleted from the output when the gene name was ambiguous or identical to a common word or phrase. These gene names were: Autoantigen, cell surface protein, G protein-coupled receptor, membrane protein, unknown function, unassigned, transcription factor, transcriptional co-activator, nuclear protein, inflated and copper.

Supplemental information may be found here.