Species-specific analysis of protein sequence motifs using mutual information
Background Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description ...
Weckwerth Wolfram+3 more
doaj +1 more source
Filtering Degenerate Patterns with Application to Protein Sequence Analysis
In biology, the notion of degenerate pattern plays a central role for describing various phenomena. For example, protein active site patterns, like those contained in the PROSITE database, e.g., [FY ]DPC[LIM][ASG]C[ASG], are, in general, represented by ...
Matteo Comin, Davide Verzotto
doaj +1 more source
Dynamic network analysis improves protein 3D structural classification [PDF]
Protein structural classification (PSC) is a supervised problem of assigning proteins into pre-defined structural (e.g., CATH or SCOPe) classes based on the proteins' sequence or 3D structural features. We recently proposed PSC approaches that model protein 3D structures as protein structure networks (PSNs) and analyze PSN-based protein features, which
arxiv
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets
Fabian Sievers+11 more
semanticscholar +1 more source
BEAST: Bayesian evolutionary analysis by sampling trees [PDF]
Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics ...
Drummond, Alexei J., Rambaut, Andrew
core +3 more sources
Three dimensional chaos game representation of protein sequences [PDF]
A new three dimensional approach to the chaos game representation of protein sequences is explored in this thesis. The basics of DNA, the synthesis of proteins from DNA, protein structure and functionality and sequence alignment techniques are presented.
arxiv
MMseqs2: sensitive protein sequence searching for the analysis of massive data sets
Sequencing costs have dropped much faster than Moore's law in the past decade, and sensitive sequence searching has become the main bottleneck in the analysis of large (meta)genomic datasets. While previous methods sacrificed sensitivity for speed gains,
Martin Steinegger, J. Söding
semanticscholar +1 more source
Structure-Guided Recombination Creates an Artificial Family of Cytochromes P450 [PDF]
Creating artificial protein families affords new opportunities to explore the determinants of structure and biological function free from many of the constraints of natural selection.
Arnold, Frances H.+5 more
core +1 more source
The Phylogeny of Osteopontin—Analysis of the Protein Sequence [PDF]
Osteopontin (OPN) is important for tissue remodeling, cellular immune responses, and calcium homeostasis in milk and urine. In pathophysiology, the biomolecule contributes to the progression of multiple cancers. Phylogenetic analysis of 202 osteopontin protein sequences identifies a core block of integrin-binding sites in the center of the protein ...
openaire +3 more sources
Modeling Protein Using Large-scale Pretrain Language Model [PDF]
Protein is linked to almost every life process. Therefore, analyzing the biological structure and property of protein sequences is critical to the exploration of life, as well as disease detection and drug discovery. Traditional protein analysis methods tend to be labor-intensive and time-consuming.
arxiv