Suffix arrays - Open Access .click

Results 21 to 30 of about 14,465 (182)

Fast mapping of short sequences with mismatches, insertions and deletions using index structures. [PDF]

PLoS Computational Biology, 2009
With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions.
Steve Hoffmann +7 more
doaj +1 more source

Scalable Parallel Suffix Array Construction [PDF]

Parallel Computing, 2006
Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. We describe the first implementation and experimental evaluation of a scalable parallel algorithm for suffix array construction.
Kulla, F., Sanders, P.
openaire +3 more sources

Accelerated preprocessing in task of searching substrings in a string

Advanced Engineering Research, 2019
Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database
A. V. Mazurenko, N. V. Boldyrikhin
doaj +1 more source

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. [PDF]

PLoS ONE, 2014
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation.
Qingpeng Zhang +4 more
doaj +1 more source

Gclust: A Parallel Clustering Tool for Microbial Genomic Data

Genomics, Proteomics & Bioinformatics, 2019
The accelerating growth of the public microbial genomic data imposes substantial burden on the research community that uses such resources. Building databases for non-redundant reference sequences from massive microbial genomic data based on clustering ...
Ruilin Li +17 more
doaj +1 more source

Computing Maximal Lyndon Substrings of a String

Algorithms, 2020
There are two reasons to have an efficient algorithm for identifying all right-maximal Lyndon substrings of a string: firstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal ...
Frantisek Franek, Michael Liut
doaj +1 more source

Deterministic sub-linear space LCE data structures with efficient construction [PDF]

, 2016
Given a string $S$ of $n$ symbols, a longest common extension query $\mathsf{LCE}(i,j)$ asks for the length of the longest common prefix of the $i$th and $j$th suffixes of $S$. LCE queries have several important applications in string processing, perhaps
Bannai, Hideo +5 more
core +2 more sources

Approximate String Matching with Compressed Indexes

Algorithms, 2009
A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T, thus actually replacing T. Despite the recent explosion of interest on compressed
Pedro Morales +3 more
doaj +1 more source

EERTREE: An Efficient Data Structure for Processing Palindromes in Strings [PDF]

, 2015
We propose a new linear-size data structure which provides a fast access to all palindromic substrings of a string or a set of strings. This structure inherits some ideas from the construction of both the suffix trie and suffix tree. Using this structure,
Rubinchik, Mikhail, Shur, Arseny M.
core +1 more source

Efficient computation of absent words in genomic sequences

BMC Bioinformatics, 2008
Background Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on.
Herold Julia, Kurtz Stefan, Giegerich Robert +2 more
doaj +1 more source

suffix array
data structures
suffix tree

pattern matching
string algorithms
004

theoretical computer science
medical informatics
computer applications to medicine