Results 21 to 30 of about 14,465 (182)
Fast mapping of short sequences with mismatches, insertions and deletions using index structures. [PDF]
With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions.
Steve Hoffmann +7 more
doaj +1 more source
Scalable Parallel Suffix Array Construction [PDF]
Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. We describe the first implementation and experimental evaluation of a scalable parallel algorithm for suffix array construction.
Kulla, F., Sanders, P.
openaire +3 more sources
Accelerated preprocessing in task of searching substrings in a string
Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database
A. V. Mazurenko, N. V. Boldyrikhin
doaj +1 more source
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. [PDF]
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation.
Qingpeng Zhang +4 more
doaj +1 more source
Gclust: A Parallel Clustering Tool for Microbial Genomic Data
The accelerating growth of the public microbial genomic data imposes substantial burden on the research community that uses such resources. Building databases for non-redundant reference sequences from massive microbial genomic data based on clustering ...
Ruilin Li +17 more
doaj +1 more source
Computing Maximal Lyndon Substrings of a String
There are two reasons to have an efficient algorithm for identifying all right-maximal Lyndon substrings of a string: firstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal ...
Frantisek Franek, Michael Liut
doaj +1 more source
Deterministic sub-linear space LCE data structures with efficient construction [PDF]
Given a string $S$ of $n$ symbols, a longest common extension query $\mathsf{LCE}(i,j)$ asks for the length of the longest common prefix of the $i$th and $j$th suffixes of $S$. LCE queries have several important applications in string processing, perhaps
Bannai, Hideo +5 more
core +2 more sources
Approximate String Matching with Compressed Indexes
A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T, thus actually replacing T. Despite the recent explosion of interest on compressed
Pedro Morales +3 more
doaj +1 more source
EERTREE: An Efficient Data Structure for Processing Palindromes in Strings [PDF]
We propose a new linear-size data structure which provides a fast access to all palindromic substrings of a string or a set of strings. This structure inherits some ideas from the construction of both the suffix trie and suffix tree. Using this structure,
Rubinchik, Mikhail, Shur, Arseny M.
core +1 more source
Efficient computation of absent words in genomic sequences
Background Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on.
Herold Julia +2 more
doaj +1 more source

