Ambeone Student’s Projects Gallery
Based on the Big Data Analytics & Machine Learning Techniques taught in Ambeone’s Programs
This is a Gallery of some glimpses Data Science projects done by recent Ambeone students as part of their program.In case you are interested to know more about a particular project/projects, you may contact us for details .
Predicting Diabetes With Machine Learning Techniques
Using Data Science & Machine Learning to Predict Diabetes
A person is considered to be Diabetic if their Glucose parameter reading (glyhb) is>=7. Their health attributes like like cholesterol, gender, height, weight, body frame etc are used to predict their glyhb reading using a Predictive Model which can then be used to classify a person as Diabetic or Non-Diabetic.
- To create Predictive Model for identifying Diabetic and Non Diabetic patients based on some health parameters.
- Dataset used http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/diabetes.csv
Procedure and Techniques Covered
- Target is to classify patients into 2 groups :Diabetic & Non Diabetic based on their body parameters by predicting the glyhb readings
- Correlation plot is used to find the variables correlated to glyhb
- Make a model using Logistic Regression to predict if a person falls under Diabetic/Non-Diabetic group
- Use Decision Trees & Random Forest methods for the classification
- Predict the glyhb using Neural Network algorithm
Correlation Plot – Glucose is highly co-related to glyhb followed by age,ratio,waist,cholesterol
Logistic Regression – Glucose level is the most important variable followed by age and waist ratio
Decision Tree – Glucose followed by sample taken time(after food) are the most important variables
Random Forest -Glucose is the most important variable followed by age,ratio,waist,hdl,cholesterol
Neural Network – MSE was reduced from .073 to .013
The Predictive Model based on Logistic Regression is able to predict the diabetes with an accuracy of 93.87 %,precision of 96.3%