Setup
Graduate Quantitative Economics and Datascience
It is strongly suggested to get GitHub’s student developer pack which gives you lots of free stuff (and even access to the AI GitHub Copilot ).
Similarly, it is strongly recommended to install VS Code for easy access to both the repositories, and a coding environment to run jupyter notebooks.
All of the slides in these lectures are available as Juptyer notebooks in a github repository, https://github.com/jlperla/grad_econ_datascience_notebooks
NOTE: Relative to other instructions you may see, these emphasize the newer uv Python installation rather than the increasingly outdated Anaconda. The small changes are worth the effort for speed of installation/setup as well as better support for reproducibiilty.
Quick Start
To install a Python and Jupyter environment:
Setup a github ID and apply for the GitHub’s student developer pack
-
- NOTE: For further instructions when using VS Code, you can access different features through the command palette by
<Cmd+Shift+P>
or<Ctrl+Shift+P>
depending on your OS, and then typing in text. When you are given instructions with a>
(e.g.,> Git: clone
) it indicates you should search for that text.
- NOTE: For further instructions when using VS Code, you can access different features through the command palette by
Install uv, which is strongly recommended instead of
conda
(although both can coexist)- MacOS or Linux:
curl -sSfL https://raw.githubusercontent.com/astral-sh/uv/main/install.sh | sh
, from any terminal - Windows:
iwr https://raw.githubusercontent.com/astral-sh/uv/main/install.ps1 -useb | iex
, from a Windows PowerShell terminal (i.e., Open the Start menu, type Windows PowerShell, select Windows PowerShell, then select Open.)
- MacOS or Linux:
In VS Code, clone the notebook version of this website
<Cmd+Shift+P>
to enter into the VS Code command palette, then typeclone
to select> Git: clone
- Paste in the repository URL:
https://github.com/jlperla/grad_econ_datascience_notebooks
and choose to open a new window with the repository - If prompted, you likely want to associate your github ID with VS Code to make git operations easier.
Start a terminal in VS Code (e.g.,
`<Ctrl+`>
or from the menuTerminal/New Terminal
)In the terminal, install the appropriate
uv
packages and python withuv sync
Sometimes optional, but helpful to choose the default python environment for a cloned repository with
> Python: Select Interpreter
and choose./.venv/bin/python
on macos/linux or.\.venv\Scripts\python.exe
on windows (likely the default).In the new window, open files such as
slides/probability.ipynb
as Jupyter Notebooks.- If necessary, choose the Jupyter kernel as the
Python Environments...
then choose the./.venv/bin/python
interpreter
- If necessary, choose the Jupyter kernel as the
The python package dependencies are enumerated in the pyproject.toml
and uv.lock
files. If these are ever changed when you update the notebooks, you can simply run uv sync
to install the latest versions.
Setup with a Lockfile
If you download a repository from another location which has a pyproject.toml
and uv.lock
file (e.g. in the ECON526 course repo) then you can simply run uv sync
and select the interpreter, following instructions above, to use the .venv
virtual environment.
When the project or lockfiles change, simply redo the uv sync
in that directory.
Other Notebook Repositories
You may find other notebook repositories useful, such as
https://github.com/QuantEcon/lecture-python-intro.notebooks
https://github.com/QuantEcon/lecture-python.notebooks
https://github.com/QuantEcon/lecture-datascience.notebooks
In cases where there is a requirements.txt
but no pyproject.toml
and uv.lock
file, with uv
you need to create a new python environment in the cloned repository. In a terminal started in the root of the cloned project in VS Code:
uv venv --python 3.11
uv pip install -r requirements.txt
Then you should be able to follow the instructions as above (e.g., selecting the ./.venv/bin/python
interpreter and Jupyter kernels)
Alternative Conda Instructions
Install Anaconda, which downloads a graphical installer and will take some time to install.
If you are using the global installation of conda,
- You can install with
pip -r requirements.txt
for a notebook repository.
- You can install with
Alternatively, you can use an isolated virtual environment (similar to to the
.venv
above, but not stored in a particular directory). For example, you might create an environment for these lectures in the cloned notebooks withconda create -n econ526 python=3.11 conda activate econ526 pip install -r requirements.txt
- And set this environment in VS Code with
> Python: Select Interpreter
and chooseecon526
- And set this environment in VS Code with
NOTE: While we recommend using uv
where possible, it can coexist with conda
.
More Background
Jupyter Notebooks
- Jupyter notebooks are interactive documents which contain code, text, results, etc.
- They have an advantage in easy prototyping and exploration of code/data
- But they have limitations for production code and reproducibility
- Learn how to edit direct scripts (e.g.,
.py
or.jl
files) as you practice your skills
- Confusingly, Jupyter notebooks do not require you to use the
jupyter
orjupyterlab
interfaces- Many find the VS Code support to be a more convenient interface
Python and Package Management
- Python is a language and an ecosystem of software packages, and not a single implementation
- For installing different packages for graphics, datascience, etc. you need to use a package manager.
- It is notorious for having confusing and conflicting package installations, especially for complicated packages in modern machine learning
- Reproducibility is a key concern, both for wasting time with broken installations, and easily sharing code with others
Reproducibility Challenges
Given that most modern programming languages are heavily dependent on packages, there are some serious challenges given that package versions may conflict:
= If you download code from someone else, what package versions are required to be able to run the code in its entirely? - If you setup two projects on your own computer, will changes in the installation of one project break the other?
The solution to these problems is to:
- NEVER use the same installation packages for 2 different projects (i.e., avoid global installations of packages at all costs, and instead use a virtual environment)
- ALWAYS distribute a full snapshot of all working package versions for your project (i.e., a lock file in Python or R, or a manifest file in Julia). These files should be considered a key part of the source-code since they ensure you can always reproduce your results.
Some languages such as Julia and Rust have better package management designed into the language itself, but in Python the key is to have a complete separate installation of the packages and Python itself for each project in what is called a virtual environment.
Virtual Environments
- Do you really need to worry about these virtual environments?
- You could get started without them as undergraduate students typically do
- But at the MA/PhD level the standards of reproducibility are higher, so it is worth the small effort to learn
- In general, all python installations have the same basic principle for reproducibility
- Multiple installations of
python
and packages for different projects - If you start a new project, you create a new installation and add relevant packages to isolate from your other projects - which avoids accidentally breaking the code
- In their various forms, these are called virtual environments in Python
- Other languages like Julia also setup project-specific environments, but the support is directly built-in
- Multiple installations of
- The goal: every set of packages is project specific
- If you setup your computer so that there is no “default” python installation, then you will have the easiest time avoiding conflicts
Python and Virtual Environments
Variations
- In practice there are two main variations on installing python and a package management/virtual environment system
- Use the conda package manager, which is compatible with
pip
and has many preinstalled packages for data science- Previously recommended, but rapidly becoming less popular
- Use the newer uv package management system, which is lightning fast and tends to be less intrusive than Anaconda
- Increasingly recommended, but evolving support with less documentation
- Use the conda package manager, which is compatible with
- These can coexist on the same computer - allowing you to use
conda
where you need it for specific projects - While one can manually install python and use the
pip
package management directly, it requires more expertise and is only done in particular cases such as with cloud computing
It is clear at this point that conda
will be replaced by something better, so it is worth investing in more modern systems - if only for the installation speed.
Why is uv
winning?
- Anaconda is often unbearably slow to use, painful to install, and confusing for many users
- Furthermore, while previously thought to be freely available under a permissive license, the company behind Anaconda has recently started shaking down firms for non-academic usage which is causing uncertainty
- The conda virtual environment is non-standard and not compatible with other package management systems
- Conda virtual environments are in a global registry which gets cluttered, rather than the more standard approach of having a project-specific virtual environment in the directory itself.
uv
has some direct advantages- It is very fast, lightweight, and easy to use
- It is fast to install and update
- It is compatible with the standard
venv
virtual environment system, which makes it have less vendor lock-in if (when!) something better comes along - It tends to isolate environments better than conda for easier reproducibility
- It uses the vendor-agnostic
pyproject.toml
file to provide a list of dependencies rather the conda-specificenvironment.yml
file
uv
has minimal, and temporary disadvantages- While it easier to use than
conda
, it is still relatively new which could cause problems with particular packages or following instructions - There are some packages with complicated binary dependencies (e.g., perhaps some machine learning packages) which are better supported on conda
- While it easier to use than
Will uv
become the eventual standard? Perhaps not, but it is far better than conda
now, and future package managers are likely to be largely compatible.