Setup

Graduate Quantitative Economics and Datascience

Author

Jesse Perla, UBC

It is strongly suggested to get GitHub’s student developer pack which gives you lots of free stuff (and even access to the AI GitHub Copilot ).

Similarly, it is strongly recommended to install VS Code for easy access to both the repositories, and a coding environment to run jupyter notebooks.

All of the slides in these lectures are available as Juptyer notebooks in a github repository, https://github.com/jlperla/grad_econ_datascience_notebooks

NOTE: Relative to other instructions you may see, these emphasize the newer uv Python installation rather than the increasingly outdated Anaconda. The small changes are worth the effort for speed of installation/setup as well as better support for reproducibiilty.

Quick Start

To install a Python and Jupyter environment:

  1. Setup a github ID and apply for the GitHub’s student developer pack

  2. Install git and then VS Code

    • NOTE: For further instructions when using VS Code, you can access different features through the command palette by <Cmd+Shift+P> or <Ctrl+Shift+P> depending on your OS, and then typing in text. When you are given instructions with a > (e.g., > Git: clone) it indicates you should search for that text.
  3. Install uv, which is strongly recommended instead of conda (although both can coexist)

    • MacOS or Linux: curl -sSfL https://raw.githubusercontent.com/astral-sh/uv/main/install.sh | sh, from any terminal
    • Windows: iwr https://raw.githubusercontent.com/astral-sh/uv/main/install.ps1 -useb | iex, from a Windows PowerShell terminal (i.e., Open the Start menu, type Windows PowerShell, select Windows PowerShell, then select Open.)
  4. In VS Code, clone the notebook version of this website

    1. <Cmd+Shift+P> to enter into the VS Code command palette, then type clone to select > Git: clone
    2. Paste in the repository URL: https://github.com/jlperla/grad_econ_datascience_notebooks and choose to open a new window with the repository
    3. If prompted, you likely want to associate your github ID with VS Code to make git operations easier.
  5. Start a terminal in VS Code (e.g., `<Ctrl+`> or from the menu Terminal/New Terminal)

  6. In the terminal, install the appropriate uv packages and python with

    uv sync
  7. Sometimes optional, but helpful to choose the default python environment for a cloned repository with > Python: Select Interpreter and choose ./.venv/bin/python on macos/linux or .\.venv\Scripts\python.exe on windows (likely the default).

  8. In the new window, open files such as slides/probability.ipynb as Jupyter Notebooks.

    • If necessary, choose the Jupyter kernel as the Python Environments... then choose the ./.venv/bin/python interpreter

The python package dependencies are enumerated in the pyproject.toml and uv.lock files. If these are ever changed when you update the notebooks, you can simply run uv sync to install the latest versions.

Setup with a Lockfile

If you download a repository from another location which has a pyproject.toml and uv.lock file (e.g. in the ECON526 course repo) then you can simply run uv sync and select the interpreter, following instructions above, to use the .venv virtual environment.

When the project or lockfiles change, simply redo the uv sync in that directory.

Other Notebook Repositories

You may find other notebook repositories useful, such as

  • https://github.com/QuantEcon/lecture-python-intro.notebooks
  • https://github.com/QuantEcon/lecture-python.notebooks
  • https://github.com/QuantEcon/lecture-datascience.notebooks

In cases where there is a requirements.txt but no pyproject.toml and uv.lock file, with uv you need to create a new python environment in the cloned repository. In a terminal started in the root of the cloned project in VS Code:

uv venv --python 3.11
uv pip install -r requirements.txt

Then you should be able to follow the instructions as above (e.g., selecting the ./.venv/bin/python interpreter and Jupyter kernels)

Alternative Conda Instructions

Install Anaconda, which downloads a graphical installer and will take some time to install.

  1. If you are using the global installation of conda,

    • You can install with pip -r requirements.txt for a notebook repository.
  2. Alternatively, you can use an isolated virtual environment (similar to to the .venv above, but not stored in a particular directory). For example, you might create an environment for these lectures in the cloned notebooks with

    conda create -n econ526 python=3.11
    conda activate econ526
    pip install -r requirements.txt
    • And set this environment in VS Code with > Python: Select Interpreter and choose econ526

NOTE: While we recommend using uv where possible, it can coexist with conda.

More Background

Jupyter Notebooks

  • Jupyter notebooks are interactive documents which contain code, text, results, etc.
  • They have an advantage in easy prototyping and exploration of code/data
    • But they have limitations for production code and reproducibility
    • Learn how to edit direct scripts (e.g., .py or .jl files) as you practice your skills
  • Confusingly, Jupyter notebooks do not require you to use the jupyter or jupyterlab interfaces
    • Many find the VS Code support to be a more convenient interface

Python and Package Management

  • Python is a language and an ecosystem of software packages, and not a single implementation
  • For installing different packages for graphics, datascience, etc. you need to use a package manager.
    • It is notorious for having confusing and conflicting package installations, especially for complicated packages in modern machine learning
  • Reproducibility is a key concern, both for wasting time with broken installations, and easily sharing code with others

Reproducibility Challenges

Given that most modern programming languages are heavily dependent on packages, there are some serious challenges given that package versions may conflict:

= If you download code from someone else, what package versions are required to be able to run the code in its entirely? - If you setup two projects on your own computer, will changes in the installation of one project break the other?

The solution to these problems is to:

  • NEVER use the same installation packages for 2 different projects (i.e., avoid global installations of packages at all costs, and instead use a virtual environment)
  • ALWAYS distribute a full snapshot of all working package versions for your project (i.e., a lock file in Python or R, or a manifest file in Julia). These files should be considered a key part of the source-code since they ensure you can always reproduce your results.

Some languages such as Julia and Rust have better package management designed into the language itself, but in Python the key is to have a complete separate installation of the packages and Python itself for each project in what is called a virtual environment.

Virtual Environments

  • Do you really need to worry about these virtual environments?
    • You could get started without them as undergraduate students typically do
    • But at the MA/PhD level the standards of reproducibility are higher, so it is worth the small effort to learn
  • In general, all python installations have the same basic principle for reproducibility
    • Multiple installations of python and packages for different projects
    • If you start a new project, you create a new installation and add relevant packages to isolate from your other projects - which avoids accidentally breaking the code
    • In their various forms, these are called virtual environments in Python
    • Other languages like Julia also setup project-specific environments, but the support is directly built-in
  • The goal: every set of packages is project specific
    • If you setup your computer so that there is no “default” python installation, then you will have the easiest time avoiding conflicts

Python and Virtual Environments

Variations

  • In practice there are two main variations on installing python and a package management/virtual environment system
    1. Use the conda package manager, which is compatible with pip and has many preinstalled packages for data science
      • Previously recommended, but rapidly becoming less popular
    2. Use the newer uv package management system, which is lightning fast and tends to be less intrusive than Anaconda
      • Increasingly recommended, but evolving support with less documentation
  • These can coexist on the same computer - allowing you to use conda where you need it for specific projects
  • While one can manually install python and use the pip package management directly, it requires more expertise and is only done in particular cases such as with cloud computing

It is clear at this point that conda will be replaced by something better, so it is worth investing in more modern systems - if only for the installation speed.

Why is uv winning?

  • Anaconda is often unbearably slow to use, painful to install, and confusing for many users
    • Furthermore, while previously thought to be freely available under a permissive license, the company behind Anaconda has recently started shaking down firms for non-academic usage which is causing uncertainty
    • The conda virtual environment is non-standard and not compatible with other package management systems
    • Conda virtual environments are in a global registry which gets cluttered, rather than the more standard approach of having a project-specific virtual environment in the directory itself.
  • uv has some direct advantages
    • It is very fast, lightweight, and easy to use
    • It is fast to install and update
    • It is compatible with the standard venv virtual environment system, which makes it have less vendor lock-in if (when!) something better comes along
    • It tends to isolate environments better than conda for easier reproducibility
    • It uses the vendor-agnostic pyproject.toml file to provide a list of dependencies rather the conda-specific environment.yml file
  • uv has minimal, and temporary disadvantages
    • While it easier to use than conda, it is still relatively new which could cause problems with particular packages or following instructions
    • There are some packages with complicated binary dependencies (e.g., perhaps some machine learning packages) which are better supported on conda

Will uv become the eventual standard? Perhaps not, but it is far better than conda now, and future package managers are likely to be largely compatible.