SNPCEQer, a SNP finding code for the Beckman CEQ 2000 and 8000 sequencers

A brief description of the code:

SNPCEQer detects single nucleotide polymorphisms (SNPs) in sequences obtained from the Beckman CEQTM2000 DNA Analysis System. The Beckman CEQTM system is an automated system designed to determine the base sequence of DNA samples. Version 4.0 of the CEQTM analysis software includes an algorithm that tags heterozygous base positions in individual sequences and computes a quality value for each base call. SNPCEQer aligns sequences obtained using the Beckman CEQTM2000 and reports high quality discrepancies between individual sequences and the SNPCEQer generated consensus as putative heterozygous and homozygous SNPs. For a test set of 204 sequences, SNPCEQer correctly tagged 13 putative homozygous polymorphisms, as did the Unix-based PolyPhred. In each case, the sequence was homozygous at the position for a nucleotide other than that most commonly found. SNPCEQer was designed to operate from Windows NT, the same computer used by the CEQTM2000, making it more accessible to users without Unix systems. For information on SNPCEQer, contact elizabeth.flood@utsouthwestern.edu.

 

Please download the file here.  (Version 1.0 posted  March 14, 2002).

 

Here are the ReadMe instructions:

Introduction:

SNPCEQer detects homozygous and heterozygous single nucleotide polymorphisms (SNPs) in sequences generated by Beckman CEQTM2000 DNA Analysis System.

Input files:

         Sequence files in .phd format.

         Wild type sequence in text format (optional).

Output files:

1.      SNP file-all the information about SNP identified by the program.

2.      myalign-alignments of all the sequences processed.

3.      cluster-sequence data of the all processed

4.      AllSeq file-names of all the sequence files processed.

5.      badsequence file-names of all the sequence with low quality that are excluded for SNP calculation.

Working platform:

Window 98, 2000.

To install the software:

Copy all the files in SNPCEQer folder into to your target folder in your computer.

To run the program:

  1. Click on the SNPCEQer application (one of the three SNPCEQer files).
  2. The program will prompt for the data path to the folder contains the sequence data.
  3. User provides the data path such as "c:\mydata\TestSeq".
  4. User press "Enter" key.
  5. The program asks if there is wild type sequence included.
  6. The user press "Y" or "N" as the program instructed and then press “Enter” key.
  7. If  “Y” is entered, the program will ask for the wild type sequence name.
  8. User provides the wild type sequence name such as "wild_type.txt".
  9. The program gives the message that shows that it is running.
  10. The program finishes the calculation and waits for user's instruction
  11. User presses "Y" to run the program again or User clicks 'N' to quit.
  12. The results (Output files mentioned above) are placed in the folder that contained the data (TestSeq).
  13. Then the files in “TestSeq” should be the same as those in “sampleData” folder after successful execution.

Note:

  1. If a wild type sequence file is provided, the file should only contain sequence data, nothing else. The wild type sequence file should in the same directory of the other test sequences.
  2. When providing the wild_type sequence name, the full file name that includes the extension should be provided for the program, e.g. wild_type.txt.

 

To read the results:

Go to the folder containing your sequences data, you will find all the output files mentioned above.

In the snp file you will find all the information about the SNPs identified, namely:

1.      Disclaimer

2.      Summary:Lists date of execution, number of sequence analyzed, data source, number of SNPs identified.

3.      Distribution of SNPs in consensus sequence: Lists SNP position in consensus sequence, number of heterozygous SNPs, number of homozygous SNPs, total number of SNPs.

4.      Distributions of SNPs among sequences tested.  Lists sequence name, number of heterozygous SNPs, number of homozygous SNPs, total number of SNPs.

5.      Consensus sequence: Sequence data.

6.      SNP information

SNPs are listed in descending order according the score value. The greatest score represents the most probable true SNP.

“Position: Cons./Aligned  233/400” means that  the position of  the SNP in consensus sequence and the tested sequence (before trimming) is 233 and 400 respectively.

“Qual. Val.: Cons./Aligned  42/30” means that the quality value of the base of the SNP is 42 in consensus sequence and 30 in the tested sequence respectively.

SNPs are given in the format of NCBI dbSNP format.

7.      Sequence trimming: How many bases are trimmed at both ends of the sequences.

8.      Parameters for calculation are listed