Installing Python packages on Fox

Using pip inside a virtual environment is the easiest way to install Python packages as a user. It is advised to use virtual environments since it is a straight forward way to isolate different installations from each other. This makes it possible to have multiple versions of the same package installed in your $HOME without problems of conflicting dependencies.

pip is the main package installer for Python and included in every Python installation. It is easy to use and can be combined with venv to manage independent environments. It is recommended that you at least have one virtual environment for each disparate experiment.

Key takeaways:

Always install packages inside a virtual environment
Do not install packages with "pip install --user". They will end up in $HOME/.local and from there they will leak into both containers and environments, thus breaking compatibility for other installations.
Leverage the existing software stack for dependencies using the option "--system-site-packages"
When loading dependencies from the central software stack, always use the same toolchain (more info on this further down the text)

How to create a Python virtual environment and install a Python package inside of it

In this example we have used venv which comes with with the Python standard library. In other guides/documentation virtualenv is used. The first is a subset of the latter and has all the functionality we need.

First load the Python module with (use 'module avail Python' to see all):

$ module load Python/3.8.6-GCCcore-10.2.0

Create the virtual environment in your $HOME folder with an appropriate name:

$ python3 -m venv $HOME/pandas-env --system-site-packages

Activate the environment:

$ source $HOME/pandas-env/bin/activate

Install packages with pip. Here we install pandas.

(pandas-env) $ python3 -m pip install pandas

You are now ready to use the new environment. When you are done and want to get out of the environment, you simply type:

$ deactivate

Remember that you will always have to load the same module(s) before you activate your environment next time.

For more information, have a look at the official pip and venvhttps://virtualenv.pypa.io/en/latestp / documentations.

Note
When running software from your Python environment in a batch script, it is highly recommended to activate the environment only in the script (see below), while keeping the login environment clean when submitting the job, otherwise the environments can interfere with each other (even if they are the same).

Choosing a Python version

If you need a specific version of Python for your installation, then you can search and see if that version is available on our system. This command will give you a list of all Python modules installed:

$ module avail python

Load the module you need and then create a virtual environment before you start installing packages inside of it.

Searching for dependencies and choosing a toolchain

The dependencies that are needed to install a certain Python package are usually listed in the requirements.txt file. This file is found in the sourcesfiles for the package you are interessted in. The dependencies can sometimes be found under the variable install_requires of the file setup.py (also found in the sourcefiles).

Since we already have hundreds of Python packages installed (in different versions) on our system, you can utilize those when installing the package you need using this small procedure:

Search to see if any of your dependencies are available using module spider
Make sure the modules (which contain your dependencies) are built with the same toolchain (more on this further down)
Load all the modules you need (if you do not get an error message, they are compatible)
Create your virtual environment and install the Python package you need

If you already know some of the Python packages you want to use, you can search for them directly with the module spider command. Let us say you want to use the Python package numpy:

$ module spider numpy

-----------------------------------------------------------------------------------------------------------------------------------------
  numpy:
-----------------------------------------------------------------------------------------------------------------------------------------
     Versions:
        numpy/1.25.1 (E)
        numpy/1.26.2 (E)
        numpy/1.26.4 (E)

This will give you a list of all version of numpy installed. In order to see what module contains the version of numpy you need, run module spider again with the version number:

$ module spider numpy/1.26.4

-----------------------------------------------------------------------------------------------------------------------------------------
  numpy: numpy/1.26.4 (E)
-----------------------------------------------------------------------------------------------------------------------------------------
    This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy.

       SciPy-bundle/2024.05-gfbf-2024a

In order to see what other Python packages you will get access to when loading this SciPy-bundle module, run:

$ module spider SciPy-bundle/2024.05-gfbf-2024a
     Included extensions
      ===================
      beniget-0.4.1, Bottleneck-1.3.8, deap-1.4.1, gast-0.5.4, mpmath-1.3.0,
      numexpr-2.10.0, numpy-1.26.4, pandas-2.2.2, ply-3.11, pythran-0.16.1,
      scipy-1.13.1, tzdata-2024.1, versioneer-0.29

If you then load the module you can check which version of Python it comes with:

[ec-parosen@login-3 ~]$ module load SciPy-bundle/2024.05-gfbf-2024a
[ec-parosen@login-3 ~]$ which python3
/cluster/software/EL9/easybuild/software/Python/3.12.3-GCCcore-13.3.0/bin/python3

We see here that SciPy-bundle/2024.05-gfbf-2024a is built on top of the Python/3.12.3-GCCcore-13.3.0 module. You only need to load the first since the latter is a dependecy and will be loaded automatically.

Note

Note
If you want to combine several different modules that contains the Python packages you need, they all need to come from the same toolchain. For example `foss/2023a` or `foss/2022b`. Note that `GCCcore-12.3.0` is a subtoolchain of `foss/2023a` and modules with either one of these postfixes are thus comptatible. Here is a list of all installed foss toolchains and the GCC versions included in them:

If you want to combine several different modules that contains the Python packages you need, they all need to come from the same toolchain. For example foss/2023a or foss/2022b. Note that GCCcore-12.3.0 is a subtoolchain of foss/2023a and modules with either one of these postfixes are thus comptatible. Here is a list of all installed foss toolchains and the GCC versions included in them:

foss/2021a -> 10.3.0
foss/2021b -> 11.2.0
foss/2022a -> 11.3.0
foss/2022b -> 12.2.0
foss/2023a -> 12.3.0
foss/2023b -> 13.2.0
foss/2024a -> 13.3.0

Using the virtual environment in a batch script

In a batch script you will activate the virtual environment in the same way as above. You must just load the python module first:

# Set up job environment
set -o errexit # exit on any error
set -o nounset # treat unset variables as error

# Load modules
module load Python/Python/3.12.3-GCCcore-13.3.0

# Set the ${PS1} (needed in the source of the virtual environment for some Python versions)
export PS1=\$

# activate the virtual environment
source $HOME/my_new_pythonenv/bin/activate

# execute example script
python pdexample.py

Sharing package configuration

To allow other researchers to replicate your virtual environment setup it can be a good idea to "freeze" your packages. This tells pip that it should not silently upgrade packages and also gives a good way to share the exact same packages between researchers.

To freeze the packages into a list to share with others run:

$ python -m pip freeze --local > requirements.txt

The file requirements.txt will now contain the list of packages installed in your virtual environment with their exact versions. When publishing your experiments it can be a good idea to share this file which other can install in their own virtual environments like so:

$ python -m pip install -r requirements.txt

Your virtual environment and the new one installed from the same requirements.txt should now be identical and thus should replicate the experiment setup as closely as possible.