controlling for population structure and genotyping platform bias in the emerge multi-institutional biobank linked to electronic health records

controlling for population structure and genotyping platform bias in the emerge multi-institutional biobank linked to electronic health records

;David Russell Crosslin;David Russell Crosslin;Gerard eTromp;Amber eBurt;Daniel Seung Kim;Shefali S Verma;Anastasia M. Lucas;Yuki eBradford;Dana C. Crawford;Dana C. Crawford;Sebastian M. Armasu;John A. Heit;M. Geoffrey Hayes;Helena eKuivaniemi;Marylyn D Ritchie;Gail P. Jarvik;Gail P. Jarvik;Mariza eDe Andrade
chemical record (new york, ny) 2014 Vol. 5 pp. -
207
crosslin2014frontierscontrolling

Abstract

Combining samples across multiple cohorts in large-scale scientific research programs is often required to achieve the necessary power for genome-wide association studies. Controlling for genomic ancestry through principal component analysis (PCA) to address the effect of population stratification is a common practice. In addition to local genomic variation, such as copy number variation and inversions, other factors directly related to combining multiple studies, such as platform and site recruitment bias, can drive the correlation patterns in PCA. In this report, we describe combination and analysis of multi-ethnic cohort with biobanks linked to electronic health records for large-scale genomic association discovery analyses. First, we outline the observed site and platform bias, in addition to ancestry differences. Second, we outline a general protocol for selecting variants for input into the subject variance-covariance matrix, the conventional PCA approach. Finally, we introduce an alternative approach to PCA by deriving components from subject loadings calculated from a reference sample. This alternative approach of generating principal components controlled for site and platform bias, in addition to ancestry differences, with the advantage of fewer covariates and degrees of freedom.principal component analysis, ancestry, biobank, loadings, genetic association study

Citation

ID: 156225
Ref Key: crosslin2014frontierscontrolling
Use this key to autocite in SciMatic or Thesis Manager

References

Blockchain Verification

Account:
NFT Contract Address:
0x95644003c57E6F55A65596E3D9Eac6813e3566dA
Article ID:
156225
Unique Identifier:
10.3389/fgene.2014.00352
Network:
Scimatic Chain (ID: 481)
Loading...
Blockchain Readiness Checklist
Authors
Abstract
Journal Name
Year
Title
5/5
Creates 1,000,000 NFT tokens for this article
Token Features:
  • ERC-1155 Standard NFT
  • 1 Million Supply per Article
  • Transferable via MetaMask
  • Permanent Blockchain Record
Blockchain QR Code
Scan with Saymatik Web3.0 Wallet

Saymatik Web3.0 Wallet