Multiple sequence alignment-based RNA language model and its application to structural inference. [PDF]
Compared to proteins, DNA and RNA are more difficult languages to interpret because 4-letter-coded DNA/RNA sequences have less information content than 20-letter-coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)
Zhang Y +13 more
europepmc +2 more sources
MAGUS: Multiple sequence Alignment using Graph clUStering. [PDF]
Motivation The estimation of large multiple sequence alignments (MSAs) is a basic bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to improve the scalability and accuracy of MSA estimation in established methods such ...
Smirnov V, Warnow T.
europepmc +2 more sources
Scalable long read self-correction and assembly polishing with multiple sequence alignment. [PDF]
Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%.
Morisse P +4 more
europepmc +2 more sources
ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference. [PDF]
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference.
Steenwyk JL +4 more
europepmc +2 more sources
Recursive MAGUS: Scalable and accurate multiple sequence alignment. [PDF]
Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy.
Vladimir Smirnov
doaj +2 more sources
MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. [PDF]
This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large ...
Katoh K, Rozewicki J, Yamada KD.
europepmc +2 more sources
Kalign 3: multiple sequence alignment of large data sets. [PDF]
Motivation Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design ...
Lassmann T.
europepmc +2 more sources
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability [PDF]
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained ...
Kazutaka Katoh, Daron M. Standley
openalex +2 more sources
Accelerated large-scale multiple sequence alignment [PDF]
Background Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of ...
Lloyd Scott, Snell Quinn O
doaj +2 more sources
Advances in post-processing methods for multiple sequence alignment [PDF]
The reliability of multiple sequence alignment (MSA) results directly determines the credibility of the conclusions drawn from biological research. However, MSA is inherently an NP-hard problem, making it theoretically impossible to guarantee a globally ...
Yixiao Zhai +3 more
doaj +2 more sources

