Multiple Sequence Alignment

Protocol

pp 143–161
Cite this protocol

Walter Pirovano³ &
Jaap Heringa³

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 452))

7259 Accesses

Abstract

Multiple sequence alignment (MSA) has assumed a key role in comparative structure and function analysis of biological sequences. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments. It should be stressed, however, that many complex biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, so that this chapter might constitute a helpful guide or starting point for researchers who aim to construct a reliable MSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Multiple Sequence Alignment Algorithms in Bioinformatics

Chapter © 2022

Multiple Sequence Alignment

Chapter © 2017

A Survey of Multiple Sequence Alignment Techniques

Chapter © 2015

References

Gribskov, M., McLachlan, A. D., Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84, 4355–4358.
Article PubMed CAS Google Scholar
Haussler, D., Krogh, A., Mian, I. S., et al. (1993) Protein modeling using hidden Markov models: analysis of globins, in Proceedings of the Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Bucher, P., Karplus, K., Moeri, N., et al. (1996) A flexible motif search technique based on generalized profiles. Comput Chem 20, 3–23.
Article PubMed CAS Google Scholar
Dayhoff, M. O., Schwart, R. M., Orcutt, B. C. (1978) A model of evolutionary change in proteins, in (Dayhoff, M., ed.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC.
Google Scholar
Henikoff, S., Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89, 10915–10919.
Article PubMed CAS Google Scholar
Needleman, S. B., Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48, 443–453.
Article PubMed CAS Google Scholar
Carillo, H., Lipman, D. J. (1988) The multiple sequence alignment problem in biology. SIAM J Appl Math 48, 1073–1082.
Article Google Scholar
Stoye, J., Moulton, V., Dress, A. W. (1997) DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment. Comput Appl Biosci 13, 625–626.
PubMed CAS Google Scholar
Feng, D. F., Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25, 351–360.
Article PubMed CAS Google Scholar
Hogeweg, P., Hesper, B. (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20, 175–186.
Article PubMed CAS Google Scholar
Gotoh, O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264, 823–838.
Article PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.
PubMed CAS Google Scholar
Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183, 63–98.
Article PubMed CAS Google Scholar
Heringa, J., Taylor, W. R. (1997) Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol 7, 416–421.
Article PubMed CAS Google Scholar
Smith, T. F., Waterman, M. S. (1981) Identification of common molecular subsequences. J Mol Biol 147, 195–197.
Article PubMed CAS Google Scholar
Waterman, M. S., Eggert, M. (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 197, 723–728.
Article PubMed CAS Google Scholar
Thompson, J. D., Plewniak, F., Poch, O. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88.
Article PubMed CAS Google Scholar
Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comput Chem 23, 341–364.
Article PubMed CAS Google Scholar
Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput Chem 26, 459–477.
Article PubMed CAS Google Scholar
Simossis, V. A., Heringa, J. (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 33, W289–294.
Article PubMed CAS Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.
Article PubMed CAS Google Scholar
Kabsch, W., Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637.
Article PubMed CAS Google Scholar
Jones, D. T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292, 195–202.
Article PubMed CAS Google Scholar
Rost, B., Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232, 584–599.
Article PubMed CAS Google Scholar
Lin, K., Simossis, V. A., Taylor, W. Ret al. (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21, 152–159.
Article PubMed CAS Google Scholar
Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinfor-matics 5, 113.
Article Google Scholar
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797.
Article PubMed CAS Google Scholar
Edgar, R. C. (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res 32, 380–385.
Article PubMed CAS Google Scholar
Notredame, C., Higgins, D. G., Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302, 205–217.
Article PubMed CAS Google Scholar
Huang, X., Miller, W. (1991) A time-efficient, linear-space local similarity algorithm. Adv Appl Math 12, 337–357.
Article Google Scholar
Thompson, J. D., Higgins, D. G., Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.
Article PubMed CAS Google Scholar
O'Sullivan, O., Suhre, K., Abergel, C., et al. (2004) 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol 340, 385–395.
Article PubMed Google Scholar
Taylor, W. R., Orengo, C. A. (1989) Protein structure alignment. J Mol Biol 208, 1–22.
Article PubMed CAS Google Scholar
Shi, J., Blundell,T. L., Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 310, 243–257.
Article PubMed CAS Google Scholar
Wallace,I.M.,O'Sullivan,O.,Higgins,D.G., et al. (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34, 1692–1699.
Article PubMed CAS Google Scholar
Katoh, K, Misawa, K, Kuma, K, et al. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059–3066.
Article PubMed CAS Google Scholar
Katoh, K., Kuma, K, Toh, H., et al. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33, 511–518.
Article PubMed CAS Google Scholar
Gotoh, O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci 11, 543–551.
PubMed CAS Google Scholar
Altschul, S. F. (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32, 88–96.
Article PubMed CAS Google Scholar
Zachariah, M. A., Crooks, G. E., Holbrook, S. R, et al. (2005) A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins 58, 329–338.
Article PubMed CAS Google Scholar
Do, C. B., Mahabhashyam, M. S., Brudno, M., et al. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15, 330–340.
Article PubMed CAS Google Scholar
Holmes, I., Durbin, R. (1998) Dynamic programming alignment accuracy. J Comput Biol 5, 493–504.
Article PubMed CAS Google Scholar
Zhou, PL, Zhou, Y. (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621.
Article PubMed CAS Google Scholar
Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng 12, 85–94.
Article PubMed CAS Google Scholar
Sammeth, M., Heringa, J. (2006) Global multiple-sequence alignment with repeats. Prot Struct Funct Bioinf 64, 263–274.
Article CAS Google Scholar
Morgenstern, B., Dress, A., Werner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci U S A 93, 12098–12103.
Article PubMed CAS Google Scholar
Morgenstern, B. (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res 32, W33–36.
Article PubMed CAS Google Scholar
Krogh, A., Larsson, B., von Heijne, G., et al. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305, 567–580.
Article PubMed CAS Google Scholar
Kail, L., Krogh, A., Sonnhammer, E.L. (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338, 1027–1036.
Article Google Scholar
Clamp, M., Cuff, J., Searle, S. M., et al. (2004) The Jalview Java alignment editor. Bioinformatics 20, 426–427.
Article PubMed CAS Google Scholar
Saitou, N., Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406–425.
PubMed CAS Google Scholar
Galtier, N., Gouy, M., Gautier, C. (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci 12, 543–548.
PubMed CAS Google Scholar
Li, W.-LL, Graur, D. (1991) Fundamentals of Molecular Evolution. Sinauer, Sunderland, MA.
Google Scholar
Gille, C, Frommel, C. (2001) STRAP: editor for STRuctural Alignments of Proteins. Bioinformatics 17, 377–378.
Article PubMed CAS Google Scholar
Parry-Smith, D. J., Payne, A. W., Michie, A. D., et al. (1998) CINEMA-a novel colour INteractive editor for multiple alignments. Gene 221, GC57–63.
Article PubMed CAS Google Scholar
Attwood, T. K., Beck, M. E., Bleasby, A. J., et al. (1997) Novel developments with the PRINTS protein fingerprint database. Nucleic Acids Res 25, 212–217.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
Walter Pirovano & Jaap Heringa

Authors

Walter Pirovano
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Heringa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Jonathan M. Keith PhD

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Pirovano, W., Heringa, J. (2008). Multiple Sequence Alignment. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 452. Humana Press. https://doi.org/10.1007/978-1-60327-159-2_7

Download citation

DOI: https://doi.org/10.1007/978-1-60327-159-2_7
Publisher Name: Humana Press
Print ISBN: 978-1-58829-707-5
Online ISBN: 978-1-60327-159-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions