EVALUATION OF SEMANTIC SIMILARITY FOR SENTENCES IN NATURAL LANGUAGE BY MATHEMATICAL STATISTICS METHODS

EVALUATION OF SEMANTIC SIMILARITY FOR SENTENCES IN NATURAL LANGUAGE BY MATHEMATICAL STATISTICS METHODS

Pismak, A. E.;Kharitonova, A. E.;Tsopa, E. A.;Klimenkov, S. V.;
naučno-tehničeskij vestnik informacionnyh tehnologij, mehaniki i optiki 2016 Vol. 16 pp. 324-330
279
pismak2016evaluationnaucnotehniceskij

Abstract

Subject of Research. The paper is focused on Wiktionary articles structural organization in the aspect of its usage as the base for semantic network. Wiktionary community references, article templates and articles markup features are analyzed. The problem of numerical estimation for semantic similarity of structural elements in Wiktionary articles is considered. Analysis of existing software for semantic similarity estimation of such elements is carried out; algorithms of their functioning are studied; their advantages and disadvantages are shown. Methods. Mathematical statistics methods were used to analyze Wiktionary articles markup features. The method of semantic similarity computing based on statistics data for compared structural elements was proposed.Main Results. We have concluded that there is no possibility for direct use of Wiktionary articles as the source for semantic network. We have proposed to find hidden similarity between article elements, and for that purpose we have developed the algorithm for calculation of confidence coefficients proving that each pair of sentences is semantically near. The research of quantitative and qualitative characteristics for the developed algorithm has shown its major performance advantage over the other existing solutions in the presence of insignificantly higher error rate. Practical Relevance. The resulting algorithm may be useful in developing tools for automatic Wiktionary articles parsing. The developed method could be used in computing of semantic similarity for short text fragments in natural language in case of algorithm performance requirements are higher than its accuracy specifications.

Citation

ID: 96778
Ref Key: pismak2016evaluationnaucnotehniceskij
Use this key to autocite in SciMatic or Thesis Manager

References

Blockchain Verification

Account:
NFT Contract Address:
0x95644003c57E6F55A65596E3D9Eac6813e3566dA
Article ID:
96778
Unique Identifier:
Network:
Scimatic Chain (ID: 481)
Loading...
Blockchain Readiness Checklist
Authors
Abstract
Journal Name
Year
Title
5/5
Creates 1,000,000 NFT tokens for this article
Token Features:
  • ERC-1155 Standard NFT
  • 1 Million Supply per Article
  • Transferable via MetaMask
  • Permanent Blockchain Record
Blockchain QR Code
Scan with Saymatik Web3.0 Wallet

Saymatik Web3.0 Wallet