Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Pakoci, Edvin; Popović, Branislav; Pekar, Darko

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Pakoci, Edvin;Popović, Branislav;Pekar, Darko;

Computational Intelligence and Neuroscience 2019 Vol. 2019 pp. 5072918

211

pakoci2019usingcomputational

Abstract

Serbian is in a group of highly inflective and morphologically rich languages that use a lot of different word suffixes to express different grammatical, syntactic, or semantic features. This kind of behaviour usually produces a lot of recognition errors, especially in large vocabulary systems-even when, due to good acoustical matching, the correct lemma is predicted by the automatic speech recognition system, often a wrong word ending occurs, which is nevertheless counted as an error. This effect is larger for contexts not present in the language model training corpus. In this manuscript, an approach which takes into account different morphological categories of words for language modeling is examined, and the benefits in terms of word error rates and perplexities are presented. These categories include word type, word case, grammatical number, and gender, and they were all assigned to words in the system vocabulary, where applicable. These additional word features helped to produce significant improvements in relation to the baseline system, both for n-gram-based and neural network-based language models. The proposed system can help overcome a lot of tedious errors in a large vocabulary system, for example, for dictation, both for Serbian and for other languages with similar characteristics.

Keywords

Machine learning text mining medline gold standard author name disambiguation

Access

DOI:

10.1155/2019/5072918

Citation

ID: 49854

Ref Key: pakoci2019usingcomputational

Use this key to autocite in SciMatic or Thesis Manager

References

No Bibliography

Blockchain Verification

Account:

NFT Contract Address:

0x95644003c57E6F55A65596E3D9Eac6813e3566dA

Article ID:

49854

Unique Identifier:

10.1155/2019/5072918

Network:

Scimatic Chain (ID: 481)

Blockchain Readiness Checklist

Authors

Abstract

Journal Name

Year

Title

5/5

Creates 1,000,000 NFT tokens for this article

Token Features:

ERC-1155 Standard NFT
1 Million Supply per Article
Transferable via MetaMask
Permanent Blockchain Record

Scan with Saymatik Web3.0 Wallet

Gas fees required in SCI Coins

Buy SCI

Saymatik Web3.0 Wallet

Google Play

App Store

Coming soon

Reference Key: lastname+year+titlefirstword+journalfirstword

Article Type (Article, Book, Proceedings etc.)

Add a reference in a raw form. Our automatic system will correct it later.

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.

Abstract

Keywords

Access

Citation

References

References

Blockchain Verification

Blockchain Readiness Checklist

Article Tokenized!

Token Features:

Saymatik Web3.0 Wallet