Calculating Genetic Distances between Protein Sequences

Procedure to work simulator

  1. Consider three random protein sequences: Protein Sequence 1, Protein Sequence 2, Protein Sequence 3.
 

.fasta format aminoacid sequences were aligned using multiple sequence alignment with sequinR package. The dist.alignment() function takes a multiple alignment as input and calculates calculates the genetic distance between each pair of proteins in the multiple alignment.  

  1. Click on run button to execute simulator.

 

The numerical values in matrix of the output indicates genetic distance between each pair of proteins in the multiple alignment. The larger the genetic distance between two sequences, the more amino acid changes or indels that have occurred since they shared a common ancestor, and the longer ago their common ancestor probably lived.

 

DIY

 

  1. Follow ( https://vlab.amrita.edu/index.php?sub=3&brch=311&sim=1835&cnt=2) to install R in personal computer.

  2. Install the SeqinR package.

    Import “seqinr” library to R workspace

  3. Create a function() for retrieving multiple sequence from database

    Function 1

     create a list to store sequences
            
             connect to the database
            
             for each element in the array query the sequence and assign to a variable 
                 
             append the data into a list
                 
             end
                 
             close the connection
                 
             return list
            
             end of fuction 1
            
  1. Create a vector of sequences

  2. Retrieve the sequences from the database

  3. Write the sequences into a file

  4. Align the sequence and assign to a variable in phylip format