Skip to content

Latest commit

 

History

History
89 lines (65 loc) · 4.46 KB

README.md

File metadata and controls

89 lines (65 loc) · 4.46 KB

Statistics Fundamentals Repository

Welcome to the Descriptive Statistics repository! This repository serves as a comprehensive resource for exploring fundamental concepts in statistics, including types of data, probability, relationships, and visualization techniques. It contains a series of Jupyter Notebooks, Python scripts, and accompanying visual outputs designed to guide users through the essential building blocks of data analysis.

Contents

This repository covers a range of topics related to Descriptive Statistics, organized into several sections:

1. Types of Data: Categorical & Quantitative

In this section, we explore the distinction between different types of data:

  • Categorical Data: Data that represents distinct categories or groups.
  • Quantitative Data: Data that represents numerical values and allows for mathematical operations.

Learn how to properly identify and categorize different data types for more effective analysis.

2. Displaying & Describing Data

Effective data analysis begins with understanding how to visualize and summarize data. In this section, we provide:

  • Visualizations such as histograms, bar charts, and box plots.
  • Summary statistics, including measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation).

These tools help describe the shape, center, and variability of your data.

3. Common Distributions & Their Properties

Here, we introduce several commonly encountered statistical distributions, such as:

  • Normal Distribution: The bell curve, essential for many statistical tests.
  • Binomial Distribution: Models the number of successes in a sequence of independent experiments.
  • Poisson Distribution: Useful for modeling rare events.

You’ll learn about their properties and when to use them in real-world scenarios.

4. Relationships: Scatterplots, Correlation & Simple Regression

This section focuses on methods for identifying relationships between two variables:

  • Scatterplots: Visualize the relationship between two quantitative variables.
  • Correlation: Measure the strength and direction of a linear relationship.
  • Simple Regression: Model and predict the relationship between variables using linear regression techniques.

5. Experimental Design & Causation

Designing an experiment to test hypotheses and establish causality is a critical component of statistics. In this section, we discuss:

  • Confounding Variables
  • Random Assignment
  • Control Groups

You'll gain insights into the principles of experimental design that help avoid biases and misinterpretations.

6. Sampling Techniques

Sampling methods play a vital role in data collection. This section highlights:

  • Uniform Sampling: Every unit has an equal chance of being selected.
  • Stratified Sampling: Dividing a population into strata and sampling from each stratum proportionally.

Understanding these techniques ensures accurate representation and reliable conclusions from your data.

7. Probability: Rules, Randomness, & Simulations

Probability is the foundation of inferential statistics. Here we explore:

  • Probability Rules: Including addition, multiplication, and conditional probability.
  • Randomness & Simulations: Simulating real-world phenomena to understand randomness and variability.

You'll find Python simulations that demonstrate these concepts through practical examples.

How to Use This Repository

  • Jupyter Notebooks: Each topic is explained through detailed notebooks with code examples, visualizations, and explanations.
  • Python Files: If you prefer running code outside of Jupyter, the Python scripts contain standalone functions and code blocks.
  • Visualization Outputs: Each notebook is paired with high-quality visualizations to aid in understanding concepts.

Feel free to clone this repository and run the notebooks to gain a deeper understanding of each topic!

Requirements

The code in this repository is built using the following libraries:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn
  • scipy
  • researchpy
  • statsmodels
  • plotly

To install these dependencies, run:

pip install -r requirements.txt

Contributing

If you’d like to contribute to this project, feel free to submit a pull request or suggest additional content through the Issues page.

License

This repository is open-source under the MIT License. You are free to use, modify, and distribute the content as needed.