A Machine Learning Tool for Early Intervention 💖

🫀`About`

Heart disease is a leading cause of death worldwide, and accurate prediction of heart disease remains a significant challenge.

This project aims to develop a machine learning model capable of predicting heart disease using a comprehensive dataset of key indicators.

`What's in this project?` 🫶

A dataset of over 400,000 adult profiles, capturing the diverse health status of individuals across various demographics and risk factors
A range of machine learning models, including Decision Trees, Random Forests, Gradient Boosting, and more
Hyperparameter tuning using BayesSearchCV and RandomizedSearchCV
Model evaluation using classification reports and accuracy scores
Prediction on new, unseen patient data from a random sample from the test set

`Dataset` 📊

Dataset URL: 💖 Indicators of Heart Disease
License: CC0-1.0
Number of samples: 400,000
Number of factors: 40

Category	Number of Images
Training	12594
Validation	500
Testing	500

`Methodology` 🔍

Requirements

Python 3.x
Xgboost
Keras
Scikit-learn
NumPy
Pandas
Matplotlib
Seaborn
Plotly

Data Preprocessing 🔀

Data Scaling: Appling PCA to the training features, normalize categorical labels, and shuffle the dataset to increase randomness and reduce bias. 🔀

Models 🤖

The following models are implemented and compared:

DecisionTreeClassifier
RandomForestClassifier
ExtraTreesClassifier
GradientBoostingClassifier
HistGradientBoostingClassifier
XGBClassifier
LGBMClassifier
CatBoostClassifier
SVC
LogisticRegression
MLPClassifier
AdaBoostClassifier
GaussianNB

Model Performance 📊

The model achieves a test accuracy of 94.66% using the MLPClassifier model, which is a great result considering the complexity of the dataset! 🎉 I have also identified the best hyperparameters for the RandomForestClassifier and XGBClassifier models using BayesSearchCV and RandomizedSearchCV.

Training accuracy: 0.9996
Validation accuracy: 0.9420
Test accuracy: 0.9600

	Predicted Positive	Predicted Negative
Actual Positive	232	12
Actual Negative	15	213

Hyperparameter Tuning 🔧

GridSearchCV
RandomizedSearchCV

`Acknowledgments` 🙏

Kaggle dataset: 💖 Indicators of Heart Disease (2022 UPDATE)
Scikit-learn and Xgboost libraries for model training
Matplotlib and Seaborn libraries for data visualization

`🙅‍♂️Disclaimer`

This project is licensed under AGPL-3.0 License and is for personal use only and should not be used for commercial purposes. The pre-trained model and may not always produce accurate results.

`Get Involved!` 😌

This project demonstrates the potential of machine learning for heart disease prediction. The model achieves high accuracy and can be used as a starting point for further research and development in this field.

I hope you found this project informative and engaging! 😊
If you’re interested in collaborating and contributing to the project, please let me know! I’d love to hear from you.

`Getting Started` 🚀

To get started with this project, you’ll need to:

Install the required libraries, including pandas, numpy, scikit-learn, xgboost, catboost, lightgbm pip install pandas numpy scikit-learn tensorflow xgboost lightgbm catboost 📦
Download the dataset from Kaggle 📈
Run the code to train and evaluate the model 🤖

Enjoy working with the content! 😊

🫀Heart Disease Predictor

A Machine Learning Tool for Early Intervention 💖

🫀About

What's in this project? 🫶

Dataset 📊

Methodology 🔍