This repository contains the code for a Lead Scoring Prediction application. The application predicts the likelihood of a lead converting based on input features using a logistic regression model. The project includes data preprocessing, model training, and a Streamlit-based web application for making predictions.
- Overview
- Features
- Installation
- Project Structure
- Usage
- Model and Preprocessing Details
- Contributing
- License
The Lead Scoring Prediction App is designed to help businesses predict the likelihood of a lead converting based on various features such as time spent on the website, lead origin, lead source, and notable activities. The model was trained using logistic regression with Recursive Feature Elimination (RFE) to select the most important features. The application is built using Streamlit, allowing users to input feature values and get predictions directly from the trained model.
- Data Preprocessing: Handles missing values, encodes categorical variables, scales numerical features, and selects important features using RFE.
- Model Training: Trains a logistic regression model using the processed features.
- Streamlit Web App: Provides a user-friendly interface for making predictions based on user input.
- Logging: Tracks the process of data cleaning, preprocessing, and model training for easy debugging and monitoring.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/lead-scoring-prediction-app.git cd lead-scoring-prediction-app
-
Install dependencies using Poetry:
poetry install
-
Activate the virtual environment:
poetry shell
-
Download or prepare the dataset:
- Place the
Leads.csv
file in the appropriate directory or modify the code to point to your dataset.
- Place the
-
Run the Streamlit app:
streamlit run streamlit_app/app.py
├── data_preprocessing.py # Preprocessing script
├── data_cleaning.py # Data cleaning script
├── log_reg_model.py # Logistic regression model training script
├── log_reg_rfe_model.py # Logistic regression with RFE model training script
├── logging_utils.py # Logging utilities
├── model_dispatcher.py # Model dispatcher script for easy model selection
├── model_utils.py # Utility functions for model evaluation
├── streamlit_app/
│ └── app.py # Streamlit application script
├── models/ # Directory containing saved models and preprocessing objects
├── README.md # Project documentation
└── requirements.txt # Dependencies list (auto-generated by Poetry)
You can train the model by running the appropriate script:
python main.py --model_type log_reg_rfe --mode train_full
This script will clean the data, preprocess it, select the most important features using RFE, and train a logistic regression model. The model, along with the preprocessing objects (like the scaler and encoder), will be saved in the models/
directory.
After training the model, you can run the Streamlit app to make predictions:
streamlit run streamlit_app/app.py
- Open the local URL provided by Streamlit in your browser.
- Input the feature values in the provided fields.
- Click the Predict button to see the prediction result.
The preprocessing steps include:
- Handling missing values by imputing or dropping columns as necessary.
- Encoding categorical variables using one-hot encoding.
- Scaling numerical features to standardize the data.
- Selecting important features using Recursive Feature Elimination (RFE).
The model used is a logistic regression classifier, which has been trained on the selected features. Recursive Feature Elimination (RFE) was used to select the top features that contribute most to the prediction.
This project is licensed under the MIT License - see the LICENSE file for details.