Lecture 8

Python Packaging

Announcements

  • Assignment two is due Wednesday, March 2 (i.e. in one week).
  • Grades for assignment 1 are done.

There are open source Python packages for:

  • charting and data visualization
  • web application frameworks
  • manipulating images
  • machine learning, natural language processing, and computer vision
  • calculus, linear algebra, and statistics
  • connecting to databases
  • game programming
  • mapping and GIS
  • geology, astronomy, biological computation
  • robotics
  • ...and more!

What's a package?

  • python module designed to be installed somewhere
  • Can be small (one file) or huge (hundreds of files)
  • May contain non-python code
  • May rely on non-python software already being installed
  • May rely on other python packages

When a package relies on other software, we say that other software is a dependency of the package.

There are multiple ways to make a Python package.
The most common one is called a source distribution (a.k.a an sdist).

What's in a source distribution?

  • The python code (one or more .py files)
  • The README (if there is one)
  • The setup.py file (more on that in a sec...)
  • Any other files you specify

All the files above are archived in a single file (a Tar or Zip archive usually)

  • Python has a standard library component called setuptools
  • Setuptools knows how to build a source distribution from a python project.
  • And if given a source distribution, knows how to install it in your environment
  • How does it know how to do this?

The setup.py file


from setuptools import setup

setup(
    name='cmsc-210',
    version='1.0.0',
    url='https://github.com/mazelife/cmsc-210',
    author='James Stevenson',
    author_email='author@gmail.com',
    description='All code written in CMSC-210',
    packages=["cmsc210"],
    install_requires=['dependencyA == 1.11.1',
                      'dependencyB >= 1.5.0'],
)

    

How does installation work?

  • setuptools unzips the archive
  • setuptools then copies all packages defined in setup.py to a place where the Python interpreter will find it.

The python interpreter can tell you where it looks:


import sys
>>> print('\n'.join(sys.path))

/opt/anaconda3/envs/cmsc-210/lib/python38.zip
/opt/anaconda3/envs/cmsc-210/lib/python3.8
/opt/anaconda3/envs/cmsc-210/lib/python3.8/lib-dynload
/opt/anaconda3/envs/cmsc-210/lib/python3.8/site-packages
    

When using the Python distribution supplied with conda, site-packages is where most things will go.


cmsc-210-1.0.0.tar.gz

If you have a source distribution, python can install it for you.
But where do you get a source distribution?

The Python Package Index

  • Maintained by the Python Software Foundation (+ corporate donors)
  • Has over 200k python packages
  • Is free to use
  • Anyone can publish to it
  • Sturgeon's law applies
  • But there are many extremely high-quality packages there too.

Let's take a look at an example.

To install a package from the package index, use a tool called pip in the terminal:


        # Install the latest version of the "seaborn" package:
        pip install seaborn

        # Install a specific version of the "seaborn" package:
        pip install seaborn==0.11.0
    

pip is:

  • The most popular tool for installing Python packages
  • Included with modern versions of Python
  • Wraps setup tools
  • Knows how to get things from the Python Packge Index
  • Works with Conda
  • If a project has dependencies, it will install those too.

You can also access pip through PyCharm if you don't want to use the terminal

  1. Preferences
  2. Project > Project Interpreter
  3. Click the "+"
  4. Search for the package you want
  5. Click "Install"

How do I specify dependencies in my project?

The best way depends on the answer to this question:

Is my project, itself, a package that I want to share?

If your project is a package you are planning to share or publish to PyPI:


from setuptools import setup

setup(
    name='cmsc-210',
    version='1.0.0',
    description='All code written in cmsc-210',
    packages=['cmsc210'],
    install_requires=['seaborn == 0.11.0',
                      'scikit-learn == 0.23' ],
)
    

If your project is just for you:

Create a requirements.txt in the project root:


seaborn == 0.11.0
scikit-learn == 0.23
    

pip knows how to read these:


        pip install -r requirements.txt
    

Setuptools also knows how to upload a package to PyPI:


        # Only run this the first time you upload your package:
        python setup.py register

        # Push the current version to PyPI:
        python setup.py upload
    

PyCharm will also ask you if you want to install the project's requirements when it sees these.

Quick Reference

  • Have a root folder for your projects that includes a README
  • There should be one top-level module inside the root folder
    • A single Python file if it's small
    • A directory with __init__.py and all the rest if it's not.
  • If you plan to share the code or need to package it to install somewhere else, include a setup.py
  • If your project has dependencies, include those in setup.py if you need one or requirements.txt otherwise.

What questions do you have?

Let's look at creating an isolated environment for our assignments

Refresher from Installfest

Anaconda is a free and open-source Python platform that contains:

  1. software to install, run, and update Python packages
  2. software to create, save, load, and switch between project specific software environments on your local computer
  3. A distribution of Python itself

It's a good idea to create a Conda environment for each project you work on in PyCharm.

  • Each project can have only its own packages installed
  • Each project can have different versions of the same packages installed
  • Each project can even have its own python version installed

Let's use assignment #2 as an example...