Results 41 to 50 of about 14,384 (164)
Accurate long read mapping using enhanced suffix arrays [PDF]
With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes.
Dawyndt, Peter +4 more
core +1 more source
Universal Compressed Text Indexing [PDF]
The rise of repetitive datasets has lately generated a lot of interest in compressed self-indexes based on dictionary compression, a rich and heterogeneous family that exploits text repetitions in different ways. For each such compression scheme, several
Navarro, Gonzalo, Prezza, Nicola
core +2 more sources
Distributed enhanced suffix arrays [PDF]
Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and information retrieval. Recent work enables the efficient parallel construction of suffix arrays and trees requiring at most O(n/p) memory per ...
Patrick Flick, Srinivas Aluru
openaire +1 more source
Geoseq: a tool for dissecting deep-sequencing datasets
Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj).
Homann Robert +6 more
doaj +1 more source
Speeding up index construction with GPU for DNA data sequences [PDF]
The advancement of technology in scientific community has produced terabytes of biological data.This datum includes DNA sequences.String matching algorithm which is traditionally used to match DNA sequences now takes much longer time to execute because ...
Abdul Rashid, Nur’aini +1 more
core
On Maximal Unbordered Factors [PDF]
Given a string $S$ of length $n$, its maximal unbordered factor is the longest factor which does not have a border. In this work we investigate the relationship between $n$ and the length of the maximal unbordered factor of $S$.
A Ehrenfeucht +11 more
core +5 more sources
String Comparison in $V$-Order: New Lexicographic Properties & On-line Applications [PDF]
$V$-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words. $V$-order has recently been proposed as an alternative to lexicographical order in the computation of ...
Alatabbi, Ali +3 more
core +1 more source
Handling Massive N-Gram Datasets Efficiently [PDF]
This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is
Pibiri, Giulio Ermanno +1 more
core +3 more sources
Background The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs).
Narechania Apurva +3 more
doaj +1 more source
Finding patterns in strings using suffix arrays [PDF]
Finding regularities in large data sets requires implementations of systems that are efficient in both time and space requirements. Here, we describe a newly developed system that exploits the internal structure of the enhanced suffixarray to find ...
Stehouwer, H., Van Zaanen, M.
core

