Skip to content

a-djebali/data-science-python

Repository files navigation

Data Science & Machine Learning with Python

This is a repository that contains a list of data mining and machine learning algorithms with python using Anaconda platform. There's also an entire section on machine learning with Apache Spark in order to scale up these techniques to big data analyzed on a computing cluster.

Covered techniques

The following techniques used by real data scientists in the tech industry:

  • Regression analysis
  • K-Means Clustering
  • Principal Component Analysis
  • Train/Test and cross validation
  • Bayesian Methods
  • Decision Trees and Random Forests
  • Multivariate Regression
  • Multi-Level Models
  • Support Vector Machines
  • Reinforcement Learning
  • Collaborative Filtering
  • K-Nearest Neighbor
  • Bias/Variance Tradeoff
  • Ensemble Learning
  • Term Frequency / Inverse Document Frequency
  • Experimental Design and A/B Tests

Projects

In order to practice these techniques I've built the following projects:

  • Movie recommendation system using actual user rating data
  • Search engine works for Wikipedia data
  • Spam classifier

Getting started with python

This is a tutorial designed for software programmers who need to learn Python programming language from scratch.

Statistics and Probability Refresher

  • Mean, median, mode and introducing numpy, scipy and matplotlib
  • Standard deviation, population and sample variance
  • Data distributions

About

Data science and Machine learning with Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published