Fox Jupyter Felles

This is your quick grab-and-go document for running a Jupyter lab server on Fox, with felles drive attached.

For detailed documentations please check the following links:
- Fox Documentation
- Accessing the Felles Drive

Step 0: Get Fox access

To get access to Fox, you need to apply for membership, guides here:
- /english/services/it/research/platforms/edu-research/help/getting-started-with-educloud.html
After you have an account, use any authenticator app on your phone to register the educloud 2FA

Now you can connect to Fox using this bash command in your terminal:

# NOTE the Educloud-specific username! Educloud usernames commonly start with 'ec-'.
$ ssh <educloud-username>@fox.educloud.no

Step 1: Create a `sbatch` file

sbatch is the command to submit a new job to the Slurm job manager. You can read more about job, Slurm in the Fox documentation:
- /english/services/it/research/platforms/edu-research/help/fox/jobs/

You can copy-paste the script below to a new document named sjupyter.sbatch, just change all the variable with < > to a value you like.

#! /bin/bash

#SBATCH --job-name=<your_job_name>
#SBATCH --account=ec12 ## This has to be ec12, which is the code for Fox
#SBATCH --time=6:00:00 ## Change this to the time you want 
#SBATCH --mem-per-cpu=8G
#SBATCH --ntasks=1
#SBATCH --output=<your_output_file_path> # example: /fp/homes01/u01/<your_username>/sjupyter.log
#SBATCH --partition=accel # read more about different partitions in the Fox Documentation
#SBATCH --gpus=1

## Set up job environment:
set -o errexit  # Exit the script on any error
set -o nounset  # Treat any unset variables as an error

module --quiet purge
## load conda
module load Miniconda3/4.9.2
## Set the ${PS1} (needed in the source of the Anaconda environment)
export PS1=\\$

## Source the conda environment setup
## The variable ${EBROOTANACONDA3} or ${EBROOTMINICONDA3}
## So use one of the following lines
## comes with the module load command
source ${EBROOTMINICONDA3}/etc/profile.d/conda.sh

## Deactivate any spill-over environment from the login node
conda deactivate &>/dev/null

## create conda env in the job's localscratch and install packages
## note that the environment will be deleted when the job is done, the /localscratch only exists when the job's running
conda init bash
conda clean --all --yes --quiet
conda create -q -y -p /localscratch/$JOB_ID/conda/env/base python=3.10
source activate /localscratch/$JOB_ID/conda/env/base
## notice the --download-only flag for conda install and --no-cache-dir flag for pip install
## they are very important so that your storage space at ~/ won't explode
conda install -q -y -c conda-forge nb_conda_kernels --download-only
yes | pip install jupyterlab --no-cache-dir
yes | pip install torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/cu118> --no-cache-dir
## add any package you want to install here
yes | pip install matplotlib librosa jupyterlab tqdm lmdb==1.4.0 --no-cache-dir

# set UI output to None, otherwise a permission error 
export XDG_RUNTIME_DIR=""
# Start the jupyter lab server
jupyter lab --ip=0.0.0.0 --port=8080

Step 2: Submit the job and wait for starting

Make sure you are in a login node, your terminal should show something like this:
```
[ec-<username>@login-3 ~]$
```

Now we can submit a job using our file:

$ sbatch sjupyter.sbatch
# You'll get something like this:
# Submitted batch job 200658

Now the job has been submitted, we can check the queue and our job status by running squeue:

$ squeue

## Output will look like this:
#          JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
#          200453     accel job_fit. ec-xxxxx PD       0:00      1 (Resources)
#          200610    normal norbert3 ec-yyyyy R        41:01      1 c1-29

# filter the output by specifying your username:
$ squeue -u ec-<your_username>

## Output will look like this:
#          JOBID PARTITION     NAME     USER            ST       TIME  NODES NODELIST(REASON)
#          200658     accel job_fit. ec-<your_username> PD       0:00      1 (Resources)

Now we need to wait in the queue, until the ST becomes R, which mean it��s running.

Step 3: Connect to the Jupyter server from your local computer

Once the job starts to run, the outputs will be generated to a file that we assigned in the sbatch file #SBATCH --output=<your_output_file_path>.

Open the file and see if any error shows up. If not, the Jupyter server will start after installing all the packages, and show something like this:

[I 2023-04-21 05:09:11.659 ServerApp] Serving notebooks from local directory: /fp/homes01/u01/ec-<your_username>
[I 2023-04-21 05:09:11.659 ServerApp] Jupyter Server 2.5.0 is running at:
[I 2023-04-21 05:09:11.659 ServerApp] <http://gpu-1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de>
[I 2023-04-21 05:09:11.659 ServerApp]     <http://127.0.0.1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de>
[I 2023-04-21 05:09:11.659 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2023-04-21 05:09:11.666 ServerApp] No web browser found: Error('could not locate runnable browser').
[C 2023-04-21 05:09:11.666 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///fp/homes01/u01/ec-<your_username>/.local/share/jupyter/runtime/jpserver-280430-open.html
    Or copy and paste one of these URLs:
        <http://gpu-1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de>
        <http://127.0.0.1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de>

Notice the machine gpu-1, the port 8080, and the token.

Now open another terminal from your local machine and run this:

# change all the variables with <> to your value
$ ssh -t -t ec-<your_username>@fox.educloud.no -L 6060:localhost:6061 ssh <machine_name> -L 6061:localhost:<jupyter_port>

# here we need to use 2FA and your password again
# if succeed, the terminal will now show your fox username and the machine name:
# [ec-<your_username>@gpu-1 ~]$

Now, open a browser and go to localhost:6060, you��ll see the jupyter server!
- It requires the token at the first time, which is the sequence of characters in the output file.

Step 4: Mount felles drive

# connect to felle drive and copy the dataset to node's /localscratch
mkdir ~/felles
sshfs -p 22 <your_uio_username>@login.uio.no:/net/hypatia/uio/fs01/lh-div-ritmo ~/felles
# now you need your 2FA and password of your UIO account, NOT THE EDUCLOUD FOX ACCOUNT

If you want to unmount felles, you could use fusermount -u:
```
## unmount felles
fusermount -u ~/felles
rmdir ~/felles
```

Published Sep. 20, 2023 1:26 PM - Last modified Sep. 20, 2023 1:26 PM