Results 11 to 20 of about 623,633 (264)
Comparing neural- and N-gram-based language models for word segmentation. [PDF]
Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language.
Doval Y, Gómez-Rodríguez C.
europepmc +4 more sources
Variable word rate N-grams [PDF]
4 pages, 4 figures, ICASSP ...
Gotoh, Yoshihiko, Renals, Steve
openaire +4 more sources
Stemmer and phonotactic rules to improve n-gram tagger-based indonesian phonemicization
A phonemicization or grapheme-to-phoneme conversion (G2P) is a process of converting a word into its pronunciation. It is one of the essential components in speech synthesis, speech recognition, and natural language processing.
Suyanto Suyanto +4 more
doaj +1 more source
Small world of the miRNA science drives its publication dynamics
Many scientific articles became available in the digital form which allows for querying articles data, and specifically the automated metadata gathering, which includes the affiliation data.
A. B. Firsov, I. I. Titov
doaj +1 more source
Computing n-gram statistics in MapReduce [PDF]
Statistics about n-grams (i.e., sequences of contiguous words or other tokens in text documents or other string data) are an important building block in information retrieval and natural language processing. In this work, we study how n-gram statistics, optionally restricted by a maximum n-gram length and minimum collection frequency, can be computed ...
Berberich, K., Bedathur, S.
openaire +4 more sources
Handling Massive N-Gram Datasets Efficiently [PDF]
This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is
Pibiri, Giulio Ermanno +1 more
core +3 more sources
Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors
This work presents a generalized approach for the fast structural alignment of thousands of macromolecular structures. The method uses string representations of a macromolecular structure and a hash table that stores n-grams of a certain size for ...
Robert Preissner +6 more
doaj +1 more source
Human assessments of document similarity [PDF]
Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA).
Belkin +28 more
core +1 more source
Domain generation algorithms (DGAs) play an important role in network attacks and can be mainly divided into two types: dictionary-based and character-based.
Shaojie Chen +3 more
doaj +1 more source
Recursive n-gram hashing is pairwise independent, at best [PDF]
Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values.
Carter +12 more
core +2 more sources

