GeneAlert - A sequence search results keywords parser.

    With the ever increasing amount of sequence data being produced, coupled with researchers now investigating larger sequenced regions for gene and features content, the results returned by similarity search programs can be overwhelming.  A UNIX-based system for culling returned BLAST results to remove excess information clutter and for arranging information in a more efficient and user friendly way has been written and applied to genomic research.  This system, called Gene Alert, is used by the investigator to reprocess the raw data returned from BLAST, re-ordering it from similarity score based to a keyword score based list.  This processing allows the user to specify keywords and a weighting for each that are used to assign importance to the many returned ‘hits’ found in a BLAST search.  The system can exclude certain classes of information, such as certain types of organisms, and ‘hits’ that occur and are not of interest to the biologist/geneticist such as vector contamination ‘hits’.  The system runs automatically using a configuration file that contains keywords and other parameters, customized to each researcher, and a user query sequence file.  BLAST and Gene Alert can be periodically and automatically re-run, with significant results automatically e-mailed to the user. 

            This system is of particular utility for a researcher working on a large number of projects, for which the computer will automatically look only for items of interest to the scientist.  It is also of value to individual research efforts by amplifying the importance of certain results contained within a myriad of returned similarity ‘hits’, thus bringing to the attention of the biologist/geneticist user information that may have ordinarily been overlooked.  This system has been applied to several genomic regions that have been sequenced on human chromosome 3, 11 and 15 for scientists actively involved in hunts for biologically or medically relevant genes.

   This system is no longer supported by our group.

   Flow chart of the Gene Alert PERL script: