Python Package Management

Python Package Management: pip, venv, and conda

Why Package Management Matters

Imagine you’re working on two projects:

Project A needs NumPy version 1.21
Project B needs NumPy version 1.24

Without proper package management, you’d have conflicts! That’s why we use:

Package managers (pip, conda) to install and manage libraries
Virtual environments to isolate project dependencies
Requirements files to share and reproduce environments

By the end of this class, you’ll never have to worry about “it works on my machine” problems again.

Understanding Python Packages

What’s a Package?

A package is pre-written code you can use in your projects, instead of writing everything from scratch

Where Do Packages Come From?

# PyPI (Python Package Index) - The official repository
# https://pypi.org - Contains 500,000+ packages
pip install pandas  # Downloads from PyPI

# Conda repositories - Anaconda's package repository
# Contains Python and non-Python packages
conda install pandas  # Downloads from conda-forge or defaults

# Git repositories - Install directly from GitHub
pip install git+https://github.com/user/repo.git

pip: Python’s Default Package Manager

pip (Pip Installs Packages) comes with Python and is the standard way to install Python packages.

Basic pip Commands

# Check if pip is installed and its version
pip --version
pip3 --version  # On some systems, use pip3 for Python 3

# Install packages
pip install numpy                    # Install latest version
pip install pandas==1.5.3           # Install specific version
pip install pandas numpy matplotlib # Install multiple packages

# Upgrade packages
pip install --upgrade pip           # Upgrade pip itself
pip install --upgrade numpy         # Upgrade to latest version

# Uninstall packages
pip uninstall numpy                 # Remove package

# Get information
pip list                            # Show all installed packages

Installing from Requirements Files

Requirements files list all packages your project needs:

# Create a requirements file
pip freeze > requirements.txt       # Save current environment

# Look at the file
cat requirements.txt
# Output:
# numpy==1.24.0
# pandas==1.5.3
# matplotlib==3.6.2

# Install from requirements file
pip install -r requirements.txt     # Install all listed packages

# Create different requirement files for different purposes
pip freeze > requirements-dev.txt   # Development dependencies
pip freeze > requirements-prod.txt  # Production dependencies

Advanced pip Usage

# Install packages for current user only (no sudo needed)
pip install --user pandas

Virtual Environments with venv

Virtual environments are isolated Python environments. Each project gets its own set of packages, preventing conflicts.

Why Use Virtual Environments?

Without virtual environments:

Global Python
├── Project A (needs Django 3.2)
├── Project B (needs Django 4.1)  # Conflict!
└── Project C (needs no Django)   # Unnecessary bloat!

With virtual environments:

Project A/
├── venv/  # Has Django 3.2
Project B/
├── venv/  # Has Django 4.1
Project C/
├── venv/  # Clean, no Django

Creating and Using Virtual Environments

# Make sure you have venv (usually included with Python 3.3+)
python3 -m venv --help

# Create a virtual environment
python3 -m venv myenv              # Create env named 'myenv'
python3 -m venv venv               # Common convention: name it 'venv'
python3 -m venv ~/envs/project1    # Create in specific location

# Activate the virtual environment
# Linux/Mac:
source myenv/bin/activate           # Activate the environment
# Windows:
myenv\Scripts\activate              # Windows Command Prompt
myenv\Scripts\Activate.ps1          # Windows PowerShell

# You'll see your prompt change:
# Before: username@computer:~/project$
# After:  (myenv) username@computer:~/project$

# Verify you're in the virtual environment
which python                        # Should show path to myenv/bin/python
python --version
pip list                           # Shows only packages in this env

# Install packages (only in this environment)
pip install pandas numpy
pip install -r requirements.txt

# Deactivate when done
deactivate                         # Return to global Python

# Delete a virtual environment (when deactivated)
rm -rf myenv/                      # Just delete the folder!

Virtual Environment Best Practices

# 1. Always use a virtual environment for projects
cd my_project
python3 -m venv venv
source venv/bin/activate

# 2. Add venv to .gitignore (don't commit it)
echo "venv/" >> .gitignore
# https://github.com/github/gitignore/blob/main/Python.gitignore


# 3. Create requirements.txt for others
pip freeze > requirements.txt
git add requirements.txt  # DO commit this

# 4. Document activation in README
echo "# Setup
python3 -m venv venv
source venv/bin/activate  # Linux/Mac
pip install -r requirements.txt" > README.md

# 5. Use consistent naming
# Good: venv, .venv, env
# Bad: my_super_special_environment_v2_final

Troubleshooting Virtual Environments

# Forgot if you're in a virtual environment?
echo $VIRTUAL_ENV                  # Shows path if activated
which python                        # Check which Python you're using

# Wrong Python version?
python3.10 -m venv venv            # Specify Python version
python3.11 -m venv venv --clear    # Recreate with different version

# Can't activate?
# Make sure you're using the right command for your shell
source venv/bin/activate           # bash/zsh (Linux/Mac)

# Package installed globally by mistake?
deactivate                        # Exit virtual env
pip uninstall package_name        # Uninstall from global
source venv/bin/activate          # Re-enter virtual env
pip install package_name          # Install in correct place

Conda: The Scientific Python Manager

Conda is a package manager popular in data science. It can manage Python itself and non-Python dependencies (like C libraries).

Installing Conda

You have two options:

Miniconda (Recommended - minimal installation):

# Download from: https://docs.conda.io/en/latest/miniconda.html
# Linux example:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Follow the prompts, restart terminal after installation

Anaconda (Full distribution with many packages pre-installed):

# Download from: https://www.anaconda.com/products/distribution
# Includes Jupyter, Spyder, and 250+ packages
# Takes up ~3GB of space

Basic Conda Commands

# Check installation
conda --version
conda info

# Update conda itself
conda update conda

# List environments
conda env list
conda info --envs

# List packages in current environment
conda list

Creating Conda Environments

# Create new environment
conda create -n myproject          # Empty environment
conda create -n myproject python=3.10  # With specific Python version
conda create -n datascience python=3.10 pandas numpy jupyter  # With packages

# Activate/deactivate environment
conda activate myproject            # Activate
conda deactivate                   # Return to base

# Your prompt changes to show active environment:
# (base) user@computer:~$          # Default conda environment
# (myproject) user@computer:~$     # Your environment

Managing Packages with Conda

# Search for packages
conda search pandas
conda search -c conda-forge pandas  # Search specific channel

# Install packages
conda install numpy                # From default channel
conda install pandas=1.5.3         # Specific version
conda install numpy pandas matplotlib  # Multiple packages

# Update packages
conda update numpy
conda update --all                 # Update everything

# Remove packages
conda remove pandas # Remove only in the environment you're in

Environment Files with Conda

# Export environment to file
conda env export > environment.yml
conda env export --no-builds > environment.yml  # More portable

# Look at the file
cat environment.yml
# Output:
# name: myproject
# channels:
#   - conda-forge
#   - defaults
# dependencies:
#   - python=3.10
#   - numpy=1.24.0
#   - pandas=1.5.3
#   - pip:
#     - some-pip-only-package==1.0.0

# Create environment from file
conda env create -f environment.yml

Best Practice: Use both!

# Start with conda for complex dependencies
conda create -n myproject python=3.10
conda activate myproject
conda install numpy pandas scipy  # Scientific packages
pip install some-special-package  # Only on PyPI
# Always use pip AFTER conda installs

When to Use Which?

Use venv when:

Working on web applications
Simple Python projects
Teaching/learning Python basics
Minimal dependencies
Speed is important

Use conda when:

Data science/machine learning projects
Need specific Python versions easily
Complex scientific computing
Dependencies include C/Fortran libraries
Working with GPU libraries (CUDA)

Real-World Workflows

Data Science Project (conda)

# 1. Create environment with key packages
conda create -n ds_project python=3.10 jupyter pandas numpy matplotlib seaborn scikit-learn

# 2. Activate and add more packages
conda activate ds_project
conda install -c conda-forge plotly
pip install kaggle  # Not in conda

Mixed Workflow (conda + pip)

# 1. Use conda for base environment
conda create -n mixed_project python=3.10
conda activate mixed_project

# 2. Install scientific packages with conda
conda install numpy pandas scipy matplotlib

# 3. Install web/special packages with pip
pip install flask redis celery

# 4. Export both conda and pip packages
conda env export > environment.yml
# This file includes both conda and pip packages!

pip Issues

“Permission denied” when installing

# Don't use sudo! Use --user or virtual environment
pip install --user package_name
# Better: use a virtual environment

“No module named pip”

# Reinstall pip
python3 -m ensurepip
# Or
curl https://bootstrap.pypa.io/get-pip.py | python3

“Package conflicts” or “incompatible versions”

# Create fresh environment
python3 -m venv fresh_env
source fresh_env/bin/activate
pip install -r requirements.txt

venv Issues

“venv not found” or “No module named venv”

# Install python3-venv (Ubuntu/Debian)
sudo apt install python3-venv
# Or use conda instead

conda Issues

Mixing pip and conda causes issues

# Best practice: conda first, then pip
conda install all_conda_packages
pip install only_pip_packages  # At the very end

Best Practices Summary

1. Always Use Virtual Environments

# Never install packages globally
# Bad:  pip install pandas
# Good: source venv/bin/activate && pip install pandas

2. Document Your Dependencies

# For pip projects
pip freeze > requirements.txt

# For conda projects  
conda env export --no-builds > environment.yml

3. Use .gitignore

# Always exclude environments from git
echo "venv/
*.pyc
__pycache__/
.env" > .gitignore

4. One Environment Per Project

# Don't share environments between projects
project1/venv/  # Separate
project2/venv/  # Separate

5. Keep Environments Clean

# Periodically rebuild environments
deactivate
rm -rf venv/
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Quick Reference Card

Daily Commands You’ll Actually Use

# venv workflow
python3 -m venv venv              # Create
source venv/bin/activate          # Enter
pip install package               # Install
pip freeze > requirements.txt     # Save
deactivate                       # Exit

# conda workflow
conda create -n project python=3.10  # Create
conda activate project                # Enter
conda install package                 # Install
conda env export > environment.yml   # Save
conda deactivate                     # Exit

# Check where you are
which python                     # Which Python?
pip list                        # What's installed?
echo $VIRTUAL_ENV              # In venv?
conda info --envs              # Which conda env?

Practice Exercises

Exercise 1: Create a simple Project Environment

# Create a Flask web app environment
mkdir web_project && cd web_project
python3 -m venv venv
source venv/bin/activate
pip install pandas sqlite3
pip freeze > requirements.txt
deactivate

Exercise 2: Data Science Setup with Conda

# Create a data analysis environment
conda create -n data_analysis python=3.10 pandas matplotlib jupyter
conda activate data_analysis
# Launch jupyter notebook
jupyter notebook
# Create a simple analysis
conda deactivate

Exercise 3: Clone and Run a Flask App

Quick Background:

Git clone: Downloads a copy of someone else’s code repository to your computer
Flask: A Python web framework for building websites and web apps
127.0.0.1:5000: Your local development server address (localhost port 5000). It’s for prototyping. None but you can access the website you’re creating.

Your Task: Go to https://github.com/pj8912/todo-app and figure out how to:

Clone the repository to your machine
Set up the Python environment
Install the dependencies
Get the app running
Open it in your browser

What to look for on the GitHub page:

The clone URL
Setup instructions in the README
What files you need to run
Any database setup steps

Success criteria: You should see a working todo app in your browser where you can add and delete tasks.

Hints: Look for files like requirements.txt, app.py, and any setup scripts. The README usually has the steps you need.

Resources

📚 Documentation:

🛠 Tools to Explore:

pipenv - Combines pip and venv
poetry - Modern dependency management
mamba - Fast conda alternative
pip-tools - Better requirements management