Automatic Feature Selection for Imbalanced Echocardiogram Data Using Event-Based Self-Similarity.

Automatic Feature Selection for Imbalanced Echocardiogram Data Using Event-Based Self-Similarity.

Huang, Huang-Nan; Chen, Hong-Min; Lin, Wei-Wen; Wiryasaputra, Rita; Chen, Yung-Cheng; Wang, Yu-Huei; Yang, Chao-Tung
Diagnostics (Basel, Switzerland) 2025 Vol. 15
7
huang2025automatic

Abstract

Using echocardiogram data for cardiovascular disease (CVD) can lead to difficulties due to imbalanced datasets, leading to biased predictions. Machine learning models can enhance prognosis accuracy, but their effectiveness is influenced by optimal feature selection and robust classification techniques. This study introduces an event-based self-similarity approach to enhance automatic feature selection approach for imbalanced echocardiogram data. Critical features correlated with disease progression were identified by leveraging self-similarity patterns. This study used an echocardiogram dataset, visual presentations of high-frequency sound wave signals, and data of patients with heart disease who are treated using three treatment methods: catheter ablation, ventricular defibrillator, and drug control-over the course of three years. The dataset was classified into nine categories and Recursive Feature Elimination (RFE) was applied to identify the most relevant features, reducing model complexity while maintaining diagnostic accuracy. Machine learning classification models, including XGBoost and CATBoost, were trained and evaluated. Both models achieved comparable accuracy values, 84.3% and 88.4%, respectively, under different normalization techniques. To further optimize performance, the models were combined into a voting ensemble, improving feature selection and predictive accuracy. Four essential features-age, aorta (AO), left ventricular (LV), and left atrium (LA)-were identified as critical for prognosis and were found in Random Forest (RF)-voting ensemble classifier. The results underscore the importance of feature selection techniques in handling imbalanced datasets, improving classification robustness, and reducing bias in automated prognosis systems. Our findings highlight the potential of machine learning-driven echocardiogram analysis to enhance patient care by providing accurate, data-driven assessments.

Citation

ID: 281982
Ref Key: huang2025automatic
Use this key to autocite in SciMatic or Thesis Manager

References

Blockchain Verification

Account:
NFT Contract Address:
0x95644003c57E6F55A65596E3D9Eac6813e3566dA
Article ID:
281982
Unique Identifier:
10.3390/diagnostics15080976
Network:
Scimatic Chain (ID: 481)
Loading...
Blockchain Readiness Checklist
Authors
Abstract
Journal Name
Year
Title
5/5
Creates 1,000,000 NFT tokens for this article
Token Features:
  • ERC-1155 Standard NFT
  • 1 Million Supply per Article
  • Transferable via MetaMask
  • Permanent Blockchain Record
Blockchain QR Code
Scan with Saymatik Web3.0 Wallet

Saymatik Web3.0 Wallet