Ambeone Student’s Projects Gallery
Based on the Big Data Analytics & Machine Learning Techniques taught in Ambeone’s Programs
This is a Gallery of some glimpses Data Science projects done by recent Ambeone students as part of their program.In case you are interested to know more about a particular project/projects, you may contact us for details .
Analyzing & Predicting Customer Churn in Telecom industry using Machine Learning Models
Submitted by : Reeka
Overview
- Churn(which is loss of customers to competition) is a problem for telecom companies because it is expensive to acquire a new customer and companies want to retain their existing customers.
- For a Telecom Company “X”, Churn is a problem for their business and churn rates have been increasing steadily over the last 1 year.
- Company wants to predict the propensity of its customers to churn and this would help the company to determine the right engagement or intervention plan.
- The Company wants to find out the factors influencing Customer Churn and to target the specific factors with offers more in-line with other service providers, which could help them to retain customers.
Objectives
- To predict Customer Churn.
- Highlighting the main variablesfactors influencing Customer Churn.
- Use various Machine Learning algorithms to build prediction models, evaluate the accuracy and performance of these models.
- Finding out the best model and providing final conclusion.
Model Building Steps
1. Data Visualization & Analysis:
- A lot of people with phone service churned.
- People with fibre optic internet churned much more than people with DSL or no internet at all.
- People without Value Added Services churn frequently.
- Those with Paperless Billing tend to churn more frequently than those without Paperless Billing.
- Those with month-to-month contract tend to churn more frequently than those of one & two year contract.
- Electronic check Payment method tend to churn more frequently than the other Payment method.
- All of the categorical variables seem to have a reasonably broad distribution, therefore, all of them will be kept for the further analysis.
2. Data Science Techniques used:
SUMMARY
Test & Models | Significant Variables |
Anova (Chi-Square) test | Tenure, Internet Service, Contract and total Charges |
Logistic Regression model | Tenure, Contract, Paperless Billing and Total charges |
Decision Tree Model | Contract, Internet Service and Tenure. |
Random Forest Model | Tenure, Contract and Total Charges |
- In terms of Accuracy the Logistic Regression model (80.7%) is slightly better than the Decision Tree Model (79.8%) and almost equal as Random Forest Model (80.68%).
- Precision rate (percentage of correct prediction of churned customers) for Random Forest model (68%) is slightly better than Logistic Regression (66.8%).
- Random Forest model is the best fit model.
- Churn predictors as per test and models: Contract, Tenure, and Total charges
Conclusion
FACTORS INFLUENCING CUSTOMER CHURN:
Expected to Churn | Expected to Not Churn |
Ø Customers with month-to-month contracts.
Ø Customers without internet services and with fibre optic internet services. Ø Customers without online backup, device protection, online security and tech support. Ø Customers with Paperless Billing and Electronic Check Payment method. |
Ø Customers who have been with the company for a longer period.
Ø Average Total Charges for Not Churned customers is approximately 2553 AED and that of Churned Customers is approximately 1532 AED. Ø Customers with DSL Internet Services. Ø Customers with multiple lines. |