-
For detailed documentations please check the following links:
Step 0: Get Fox access
-
To get access to Fox, you need to apply for membership, guides here:
-
After you have an account, use any authenticator app on your phone to register the educloud 2FA
-
Now you can connect to Fox using this bash command in your terminal:
# NOTE the Educloud-specific username! Educloud usernames commonly start with 'ec-'. $ ssh <educloud-username>@fox.educloud.no
Step 1: Create a sbatch
file
-
sbatch
is the command to submit a new job to the Slurm job manager. You can read more about job, Slurm in the Fox documentation: -
You can copy-paste the script below to a new document named
sjupyter.sbatch
, just change all the variable with< >
to a value you like.#! /bin/bash #SBATCH --job-name=<your_job_name> #SBATCH --account=ec12 ## This has to be ec12, which is the code for Fox #SBATCH --time=6:00:00 ## Change this to the time you want #SBATCH --mem-per-cpu=8G #SBATCH --ntasks=1 #SBATCH --output=<your_output_file_path> # example: /fp/homes01/u01/<your_username>/sjupyter.log #SBATCH --partition=accel # read more about different partitions in the Fox Documentation #SBATCH --gpus=1 ## Set up job environment: set -o errexit # Exit the script on any error set -o nounset # Treat any unset variables as an error module --quiet purge ## load conda module load Miniconda3/4.9.2 ## Set the ${PS1} (needed in the source of the Anaconda environment) export PS1=\\$ ## Source the conda environment setup ## The variable ${EBROOTANACONDA3} or ${EBROOTMINICONDA3} ## So use one of the following lines ## comes with the module load command source ${EBROOTMINICONDA3}/etc/profile.d/conda.sh ## Deactivate any spill-over environment from the login node conda deactivate &>/dev/null ## create conda env in the job's localscratch and install packages ## note that the environment will be deleted when the job is done, the /localscratch only exists when the job's running conda init bash conda clean --all --yes --quiet conda create -q -y -p /localscratch/$JOB_ID/conda/env/base python=3.10 source activate /localscratch/$JOB_ID/conda/env/base ## notice the --download-only flag for conda install and --no-cache-dir flag for pip install ## they are very important so that your storage space at ~/ won't explode conda install -q -y -c conda-forge nb_conda_kernels --download-only yes | pip install jupyterlab --no-cache-dir yes | pip install torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/cu118> --no-cache-dir ## add any package you want to install here yes | pip install matplotlib librosa jupyterlab tqdm lmdb==1.4.0 --no-cache-dir # set UI output to None, otherwise a permission error export XDG_RUNTIME_DIR="" # Start the jupyter lab server jupyter lab --ip=0.0.0.0 --port=8080
Step 2: Submit the job and wait for starting
-
Make sure you are in a login node, your terminal should show something like this:
[ec-<username>@login-3 ~]$
-
Now we can submit a job using our file:
$ sbatch sjupyter.sbatch # You'll get something like this: # Submitted batch job 200658
-
Now the job has been submitted, we can check the queue and our job status by running
squeue
:$ squeue ## Output will look like this: # JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) # 200453 accel job_fit. ec-xxxxx PD 0:00 1 (Resources) # 200610 normal norbert3 ec-yyyyy R 41:01 1 c1-29 # filter the output by specifying your username: $ squeue -u ec-<your_username> ## Output will look like this: # JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) # 200658 accel job_fit. ec-<your_username> PD 0:00 1 (Resources)
-
Now we need to wait in the queue, until the
ST
becomesR
, which mean it’s running.
Step 3: Connect to the Jupyter server from your local computer
-
Once the job starts to run, the outputs will be generated to a file that we assigned in the sbatch file
#SBATCH --output=<your_output_file_path>
. -
Open the file and see if any error shows up. If not, the Jupyter server will start after installing all the packages, and show something like this:
[I 2023-04-21 05:09:11.659 ServerApp] Serving notebooks from local directory: /fp/homes01/u01/ec-<your_username> [I 2023-04-21 05:09:11.659 ServerApp] Jupyter Server 2.5.0 is running at: [I 2023-04-21 05:09:11.659 ServerApp] <http://gpu-1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de> [I 2023-04-21 05:09:11.659 ServerApp] <http://127.0.0.1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de> [I 2023-04-21 05:09:11.659 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [W 2023-04-21 05:09:11.666 ServerApp] No web browser found: Error('could not locate runnable browser'). [C 2023-04-21 05:09:11.666 ServerApp] To access the server, open this file in a browser: file:///fp/homes01/u01/ec-<your_username>/.local/share/jupyter/runtime/jpserver-280430-open.html Or copy and paste one of these URLs: <http://gpu-1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de> <http://127.0.0.1:8080/lab?token=40f6d00286cff03ace3b500adf24c4501af984bcaeaaa9de>
-
Notice the machine
gpu-1
, the port8080
, and the token. -
Now open another terminal from your local machine and run this:
# change all the variables with <> to your value $ ssh -t -t ec-<your_username>@fox.educloud.no -L 6060:localhost:6061 ssh <machine_name> -L 6061:localhost:<jupyter_port> # here we need to use 2FA and your password again # if succeed, the terminal will now show your fox username and the machine name: # [ec-<your_username>@gpu-1 ~]$
-
Now, open a browser and go to
localhost:6060
, you’ll see the jupyter server!- It requires the token at the first time, which is the sequence of characters in the output file.
Step 4: Mount felles drive
# connect to felle drive and copy the dataset to node's /localscratch
mkdir ~/felles
sshfs -p 22 <your_uio_username>@login.uio.no:/net/hypatia/uio/fs01/lh-div-ritmo ~/felles
# now you need your 2FA and password of your UIO account, NOT THE EDUCLOUD FOX ACCOUNT
-
If you want to unmount felles, you could use
fusermount -u
:## unmount felles fusermount -u ~/felles rmdir ~/felles