Automatic Feature Selection for Imbalanced Echocardiogram Data Using Event-Based Self-Similarity.

Huang, Huang-Nan; Chen, Hong-Min; Lin, Wei-Wen; Wiryasaputra, Rita; Chen, Yung-Cheng; Wang, Yu-Huei; Yang, Chao-Tung

Automatic Feature Selection for Imbalanced Echocardiogram Data Using Event-Based Self-Similarity.

Huang, Huang-Nan; Chen, Hong-Min; Lin, Wei-Wen; Wiryasaputra, Rita; Chen, Yung-Cheng; Wang, Yu-Huei; Yang, Chao-Tung

Diagnostics (Basel, Switzerland) 2025 Vol. 15

10

huang2025automatic

Abstract

Using echocardiogram data for cardiovascular disease (CVD) can lead to difficulties due to imbalanced datasets, leading to biased predictions. Machine learning models can enhance prognosis accuracy, but their effectiveness is influenced by optimal feature selection and robust classification techniques. This study introduces an event-based self-similarity approach to enhance automatic feature selection approach for imbalanced echocardiogram data. Critical features correlated with disease progression were identified by leveraging self-similarity patterns. This study used an echocardiogram dataset, visual presentations of high-frequency sound wave signals, and data of patients with heart disease who are treated using three treatment methods: catheter ablation, ventricular defibrillator, and drug control-over the course of three years. The dataset was classified into nine categories and Recursive Feature Elimination (RFE) was applied to identify the most relevant features, reducing model complexity while maintaining diagnostic accuracy. Machine learning classification models, including XGBoost and CATBoost, were trained and evaluated. Both models achieved comparable accuracy values, 84.3% and 88.4%, respectively, under different normalization techniques. To further optimize performance, the models were combined into a voting ensemble, improving feature selection and predictive accuracy. Four essential features-age, aorta (AO), left ventricular (LV), and left atrium (LA)-were identified as critical for prognosis and were found in Random Forest (RF)-voting ensemble classifier. The results underscore the importance of feature selection techniques in handling imbalanced datasets, improving classification robustness, and reducing bias in automated prognosis systems. Our findings highlight the potential of machine learning-driven echocardiogram analysis to enhance patient care by providing accurate, data-driven assessments.

Keywords

Machine learning cardiovascular disease classification feature selection echocardiogram voting ensemble

Access

DOI:

10.3390/diagnostics15080976

Citation

ID: 281982

Ref Key: huang2025automatic

Use this key to autocite in SciMatic or Thesis Manager

References

No Bibliography

Blockchain Verification

Account:

NFT Contract Address:

0x95644003c57E6F55A65596E3D9Eac6813e3566dA

Article ID:

281982

Unique Identifier:

10.3390/diagnostics15080976

Network:

Scimatic Chain (ID: 481)

Blockchain Readiness Checklist

Authors

Abstract

Journal Name

Year

Title

5/5

Creates 1,000,000 NFT tokens for this article

Token Features:

ERC-1155 Standard NFT
1 Million Supply per Article
Transferable via MetaMask
Permanent Blockchain Record

Scan with Saymatik Web3.0 Wallet

Gas fees required in SCI Coins

Buy SCI

Saymatik Web3.0 Wallet

Google Play

App Store

Coming soon

Reference Key: lastname+year+titlefirstword+journalfirstword

Article Type (Article, Book, Proceedings etc.)

Add a reference in a raw form. Our automatic system will correct it later.

Automatic Feature Selection for Imbalanced Echocardiogram Data Using Event-Based Self-Similarity.

Abstract

Keywords

Access

Citation

References

References

Blockchain Verification

Blockchain Readiness Checklist

Article Tokenized!

Token Features:

Saymatik Web3.0 Wallet