Computing scoring matrices for amino acids and long pairwise alignment in R

Procedure to run the simulator  

  1. User can enter the sequence to be compared in respective “ Sequence 1 and Sequence 2” boxes.
  2. Retrieve sequence data in FASTA format from NCBI.   Consider examples,  

Sequence 1 - Keratin[Homosapiens]  

AAB59562.1 keratin [Homo sapiens] 

  MTTCSRQFTSSSSMKGSCGIGGGIGAGSSRISSVLAGGSCRAPNTYGGGLSVSSSRFSSGGAYGLGGGYG
          GGFSSSSSSFGSGFGGGYGGGLGAGLGGGFGGGFAGGDGLLVGSEKVTMQNLNDRLASYLDKVRALEEAN
          ADLEVKIRDWYQRQRPAEIKDYSPYFKTIEDLRNKILTATVDNANVLLQIDNARLAADDFRTKYETELNL
          RMSVEADINGLRRVLDELTLARADLEMQIESLKEELAYLKKNHEEEMNALRGQVGGDVNVEMDAAPGVDL
          SRILNEMRDQYEKMAEKNRKDAEEWFFTKTEELNREVATNSELVQSGKSEISELRRTMQNLEIELQSQLS
          MKASLENSLEETKGRYCMQLAQIQEMIGSVEEQLAQLRCEMEQQNQEYKILLDVKTRLEQEIATYRRLLE
          GEDAHLSSSQFSSGSQSSRDVTSSSRQIRTKVMDVHDGKVVSTHEQVLRTKN
        

 

Sequence 2 - keratin [Rattus norvegicus]  

AAA41473.1 keratin [Rattus norvegicus] 

  MDFSRRSFHRSLSSSSQGPALSTSGSLYRKGTMQRLGLHSVYGGWRHGTRISVSKTTMSYGNHLSNGGDL
          FGGNEKLAMQNLNDRLASYLEKVRSLEQSNSKLEAQIKQWYETNAPSTIRDYSSYYAQIKELQDQIKDAQ
          IENARCVLQIDNAKLAAEDFRLKFETERGMRITVEADLQGLSKVYDDLTLQKTDLEIQIEELNKDLALLK
          KEHQEEVEVLRRQLGNNVNVEVDAAPGLNLGEIMNEMRQKYEILAQKNLQEAKEQFERQTQTLEKQVTVN
          IEELRGTEVQVTELRRSYQTLEIELQSQLSMKESLERTLEETKARYASQLAAIQEMLSSLEAQLMQIRSD
          TERQNQEYNILLDIKTRLEQEIATYRRLLEGEDIKTTEYQLNTLEAKDIKKTRKIKTVVEEVVDGKVVSS
          EVKEIEENI
        

 

 
  1. Then click on Align Seq tab to run simulator.  

     
  2. The alignment score for the given aminoacid sequence matrix is given as output which indicates the relative score obtained by matching two characters in a sequence alignment.  

  3. User can also download simple file given in the GUI and follow Steps 1 and 2 to get output.

     

DIY    

  1. Follow ( https://vlab.amrita.edu/index.php?sub=3&brch=311&sim=1835&cnt=2) to install R in personal computer.

  2. Install the SeqinR package.

  3. To load “SeqinR” R package follow

    library("seqinr")

    Import “seqinr” library to R workspace

    Connect to the ACNUC database

    Query the sequences using accession number and assign to a variable

    Import “Biostrings” library to R workspace

    Load Blosum50 scoring matrix

    Convert the amino acid sequences into strings

    Convert the string into uppercase

    Pairwise align the sequences with gap values input

    Assign the alignment data into a matrix