Variation benchmark datasets: update, criteria, quality and applications.

Sarkar, Anasua; Yang, Yang; Vihinen, Mauno

Variation benchmark datasets: update, criteria, quality and applications.

Sarkar, Anasua;Yang, Yang;Vihinen, Mauno;

Database : the journal of biological databases and curation 2020 Vol. 2020

271

sarkar2020variationdatabase

Abstract

Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench.

Keywords

NMR DFT B3LYP bioactive conformation m06-2x solution-phase ensembles

Access

DOI:

baz117

Citation

ID: 91403

Ref Key: sarkar2020variationdatabase

Use this key to autocite in SciMatic or Thesis Manager

References

No Bibliography

Blockchain Verification

Account:

NFT Contract Address:

0x95644003c57E6F55A65596E3D9Eac6813e3566dA

Article ID:

91403

Unique Identifier:

baz117

Network:

Scimatic Chain (ID: 481)

Blockchain Readiness Checklist

Authors

Abstract

Journal Name

Year

Title

5/5

Creates 1,000,000 NFT tokens for this article

Token Features:

ERC-1155 Standard NFT
1 Million Supply per Article
Transferable via MetaMask
Permanent Blockchain Record

Scan with Saymatik Web3.0 Wallet

Gas fees required in SCI Coins

Buy SCI

Saymatik Web3.0 Wallet

Google Play

App Store

Coming soon

Reference Key: lastname+year+titlefirstword+journalfirstword

Article Type (Article, Book, Proceedings etc.)

Add a reference in a raw form. Our automatic system will correct it later.