Towards an Encyclopaedia of Sequence Biology [PDF]
In this review, I have presented several topics relevant to the present state and to the future state of the scientific field that I propose to call sequence biology (SB). In some pertinent publications, this field was called DNA linguistics.
Alexander Bolshoy
openalex +2 more sources
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence [PDF]
Stewart T. Cole +41 more
openalex +2 more sources
Strainberry: automated strain separation in low-complexity metagenomes using long reads
Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and ...
Riccardo Vicedomini +3 more
doaj +1 more source
decOM: similarity-based microbial source tracking of ancient oral samples using k-mer-based methods
Background The analysis of ancient oral metagenomes from archaeological human and animal samples is largely confounded by contaminant DNA sequences from modern and environmental sources.
Camila Duitama González +5 more
doaj +1 more source
Mapping-friendly sequence reductions: Going beyond homopolymer compression
Summary: Sequencing errors continue to pose algorithmic challenges to methods working with sequencing data. One of the simplest and most prevalent techniques for ameliorating the detrimental effects of homopolymer expansion/contraction errors present in ...
Luc Blassel, Paul Medvedev, Rayan Chikhi
doaj +1 more source
Mega-scale experimental analysis of protein folding stability in biology and design
Large-scale assays using cDNA display proteolysis are used to measure the folding stabilities of protein domains, providing a method to quantify the effects of mutations on protein folding, with applications in protein design.
Kotaro Tsuboyama +8 more
semanticscholar +1 more source
Predicting multiple conformations via sequence clustering and AlphaFold2
AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein’s biological function often depends on multiple conformational substates2, and disease-causing point mutations often ...
Hannah K. Wayment-Steele +8 more
semanticscholar +1 more source
100th Anniversary of Macromolecular Science Viewpoint: Opportunities in the Physics of Sequence-Defined Polymers [PDF]
Polymer science has been driven by ever-increasing molecular complexity, as polymer synthesis expands an already-vast palette of chemical and architectural parameter space.
Perry, Sarah L., Sing, Charles E.
core +3 more sources
BERTology Meets Biology: Interpreting Attention in Protein Language Models [PDF]
Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability.
Jesse Vig +5 more
semanticscholar +1 more source
SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction [PDF]
Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism.
Wu Yan +5 more
doaj +2 more sources

