The computational biology group at UTSW has constructed a number of programs, databases and tools for the archiving, analysis and interpretation of biomedical information.  The focus has been to develop tools that integrate and visualize various data for gene discovery, predict variations to enable a directed study of polymorphisms, and to cluster and mine the available biomedical literature for hidden knowledge.  The goal is to develop computational tools that enable the biomedical researcher to accelerate there research.  These tools run on our various servers, and some of them and/or their results can be accessed via the www.

Areas of Emphasis in computational tools

PathoGene, Eremorph, siRNA, PCR Now , HomologeneP, ARGH , Nome della Proteina , SpinOut , SPOREbase


DNA Sequence Analysis Computational Tools:

  • PathoGene- A tool for designing PCR primers for every sequenced microbial, viral, and fungal organism or custom sequence. Products and primers are BLASTable. Validated primers may be stored in a database for future retrieval. Potential promoter identification is also possible.
  • Eremorph
  • Eremorph,- A gene centric web-accessible database that offers to users special DNA sequence annotation that is a combination of unique computations and measurements with certain publicly available annotation. Eremorph contains these annotations overlaid upon genomic slices to which RefSeq genes are aligned, thus providing these data for coding, intron, untranslated, and promoter regions.
  • siRNA - A tool which aids in designing the target sequence for siRNA and provides a database containing validation information about designed siRNA
  • PCR Now - A tool for designing Real-Time PCR primers. Supplement to PathoGene for RT-PCR primer design in organisms not featured in the PathoGene database.
  • PHIG - A database that contains 632 genes related to the human immune system as identified by NCBI. A description of their function, location, and links to known SNPs and homologs in mouse and rat are also included.
  • Primer DB - A database for storing and retrieving validated PathoGene PCR primers. Parameters under which they were generated will also be stored.
  • PANORAMA - A genetic features computation and visualization server with interactive JAVA and pdf file output.  Submitted sequence is searched for similarity, exons, predicted polymorphisms, CpG islands, and much more.
  • PRIMO - An DNA primer design code for primer based DNA sequence walking, PCR and oligonucleotide arrays.  This code works with DNA quality values to optimize oligo design.
  • POMPOUS and Rep-X - These codes have analyzed GenBank and UniGene databases to identify highly probable simple sequence repeat polymorphisms.  A catalog of results is available, and this predictive analysis is also integrated into Panorama.
  • Signal - A downloadable tool for DNA or protein sequence analysis including dotplot comparison of two sequences, methylation analysis, and much, much more.
  • PathoBLAST - Blast your gene of interest against the GenBank genomes collected for many of the biothreat pathogens listed on this site.
  • ABI 377 sequencer analysis utilities Macintosh C programs  for making gel-specific matrices, merging 2 partially run gel files, or cutting a single gel image in half for 2 subsequent analyses.
  • SNPCEQer - a multiple alignment code that uses the Beckman CEQ2000 DNA sequencer output to identify heterozygote and homozygote single nucleotide polymorphisms.  This software runs on the PC NT platform with then Beckman CEQ acquisition software.
  • SNPCEQerII - a GUI-based application that integrates SNP detection, SNP analysis and SNP editing in the Microsoft Windows environment.  SNPCEQer II detects SNPs in DNA sequences generated by the Beckman CEQTM 2000 XL DNA analysis system and provides tools to analyze SNPs by inspecting trace data (chromatograms) around putative SNPs, and by comparing the trace data with that of other related DNA sequences.  The SNP report can be edited and printed, as can the chromatograms. SNPCEQer II is implemented in Visual C++.
  • SEE-SCAPE - A server that performs a BLAST similarity search and then provides an enhanced 3D visualization of the pairwise comparison of all returned 'hits'. This enables the biologist to better identify distinct clusters of relationships among sequences with similarity.
  • Phred/Phrap/Consed - This suite of programs created by Phil Green for shotgun DNA sequence assembly and finishing is available on our servers for projects at UTSW that require de novo sequencing.
  • Phred/Phrap/Polyphred/Consed - This suite of programs created by Phil Green and Debbie Nickerson for identification of SNPs using sequences from multiple individuals is available on our servers for projects at UTSW that require SNP hunting.

Gene/Protein collection and array Computational Tools:

  • HomologeneP a web-accessible database resource of computed homologs and alignments for every bacterial, viral and fungal coding sequence. These were computed using reciprocal similarity, and the results with various gene annotations are available in a search-able, browse-able database.
  • Nome della Proteina (GI to Locus Link Convertor)- A new protein indentification resolution database. This database provides easy conversion of protein GI Numbers to Locus Link IDs and vice ersa. GO terms, products, AfCS IDs and analyses , and official names and symbols are also provided. .
  • Bad Bug Base a web-accessible database resource that provides computationally determined homologs for every bacterial and viral coding sequence. HUGO Gene names,Entrez Gene ID, RefSeq (NC_xxxxxx), or relevant keywords can be used as input. This database has been superseded by HomologenP, but is still available for a while.
  • MAD (microarray data management and processing) - A set of Windows integrated software for DNA microarray data management and processing
  • MarC-V - Excel-VBA, software for managing, analyzing, plotting and normalizing data from individual microarray experiments.
  • Virtual Expression Arrays (VEA) - Any predefined array, for example the yeast genome, can be compared to the number of EST sequences stored in GenBank as a way of computationally estimating expression level or differential expression level.  Our codes have been used to calculate this.
  • Expression Array Cross Hybridization - Genes in gene families or those that share significant homology because of simple sequence repeats within their coding and UTR regions can cross hybridize, leading to a false co-expression dependence.  We have experimentally verified our codes that determine candidates for this effect.
  • Gene Traffic 2.0 - Gene Traffic is a Bioinformatics Server Appliance for microarray data - a complete microarray data storage and analysis system.  Gene Traffic allows the user to view scanned images, scatter plots, hybridization statistics, and much more.
  • A new software pipeline for analysis of high-performance mass spectrometer data allows rapid and accurate identification of potential biomarkers for detection of diseases such as cancer or other patient conditions. Colaboration available, click on the link to email the project leader.

Biomedical eText Data Mining:

  • ARGH - a comprehensive catalog of biomedical acronyms and abbreviations extracted from MedLine abstracts.
  • eTBLAST - a text similarity engine, which accepts a query and then compares it to a collection of other text. FRISC - Faculty Research Interests Science Comparator, a pre-computed set of MedLine abstracts that are maximally similar to the research interests of UTSW investigators. FRISC uses eTBLAST as its engine. (experimental version for UTSW faculty)
  • TRITE - Topical Research Interests Comparator.  TRITE is a pre-computed set of Medline similarity hits that are topical. TRITE uses the eTBLAST engine, operating on an edited set of topics selected from the Encyclopedia of Molecular Biology, Blackwell Science, Ltd. (experimental version)
  • eTSNAP computes a pairwise similarity matrix for a plurality of user
    supplied text records using our text similarity engine, eTBLAST. The data can then be viewed in several interactive ways so the user can identify clusters of similar information and explore them for meaning and new relationships.
  • IRIDESCENT -  A knowledge discovery engine designed for comprehensive identification and analysis of  literature trends
  • GeneAlert - A UNIX-based system for culling returned BLAST results to remove excess information clutter and for arranging information in a more efficient and user friendly way has been written and applied to genomic research. 

Biomedical Communications Tools

  • SpinOut - "A researcher’s guide to corporate identity,” was developed to help researchers with potential spinout companies from university laboratories access information on effective practices for corporate identity design. On this site http://innovation.swmed.edu/IDGuide) we present concepts and examples of each component for an effective corporate identity package.

Databases which are kept local:

  • GenBank
  • SPORE data
  • PGA1 data
  • PGA2 / Proteomics data
  • Instant Array - A pre-computed microarray probe database for all completely annotated microbial, viral, and fungal organisms.
  • Locus Link
  • Rep-X
  • PIR
  • UniGene
  • Medline
  • KEGG Gene Database
  • KEGG Pathway Database
  • Homologene
  • Research Genetics Clone Database
  • And many, many more.

Developmental Biology: