Getting Started¶

A one-page intro to UMD Engineering's research computing environment. Read this first; the rest of the guide goes deeper on each topic.

What this is¶

A research-computing environment where each participating lab owns its own GPU nodes (typically 1–5 per lab). All labs share one portal and one scheduler, but your jobs only ever land on your lab's nodes — and the only people in your queue are your own labmates. There is no campus-wide pool you're competing against.

Two ways in:

A web portal at https://hpc.eng.umd.edu with full Linux desktops in your browser and JupyterLab notebooks.
SSH for terminal-only workflows (batch job submission with sbatch, queue checks, log inspection). For VS Code / Cursor, use the VS Code (code-server) OOD app — see below.

A scheduler called Slurm tracks who in your lab is using which CPUs, memory, and GPUs and gives you an isolated allocation when you ask for one. Your lab's data stays on a NAS that mounts into every session — symlinks appear in your home directory at session start. When your session ends, the node goes back into your lab's pool.

You need:

A UMD Directory ID (the one you use for Elm, Testudo, etc.) and your AD password. For most people the AD password matches your UMD password — try that first.
Membership in your lab's Active Directory group. Your PI or lab admin adds you; once they do, group membership syncs to Slurm automatically (within ~5 minutes) and your Linux account is created on first login. If you can't log in, this is almost always the cause — ask your PI before asking IT.
A modern browser. Current Chrome, Firefox, Safari, or Edge.
A UMD network connection or the UMD VPN. The portal isn't reachable from the open internet.

The one rule¶

Run all compute jobs through Slurm. Do not run training, simulations, or other heavy work as a plain python … / matlab … / bash run.sh outside an allocation. Slurm tracks your CPU / memory / GPU use; running outside it steps on whoever Slurm thinks owns the node, gets killed without warning, and doesn't appear in usage reports.

In practice this means:

In a desktop or Jupyter session — you're already inside a Slurm allocation, run things normally.
In a terminal SSH'd to hpc.eng.umd.edu — that host is the scheduler / submit host. Use it for sbatch, squeue, sacct, light editing. Don't run training on it. For an interactive GPU shell, use srun --pty bash -l from there; Slurm will land you on a node inside an allocation.

Three ways to use the system¶

1. Browser desktop (most users start here)¶

Go to https://hpc.eng.umd.edu, sign in with UMD SSO. From Interactive Apps, pick Lab Desktop (or Lab Desktop Advanced if you want to override the resource defaults). Choose a profile from the drop-down — profiles pin to specific nodes in your lab (e.g. Lincheng Research Desktop, Inspire Turing Desktop, Inspire Searle Desktop). Optionally enter your AD password if you'll need lab-storage access. Click Launch.

After ~30 seconds you have a full Xfce Linux desktop in your browser with GPU acceleration and your lab's software ready under Applications → Research Software.

Best for: graphical apps (MATLAB, ANSYS, COMSOL, FSLeyes, FreeView), 3D visualisation, anything you'd normally double-click.

Detail: launching-a-desktop.md.

2. JupyterLab in the browser¶

Same portal. From Interactive Apps, pick JupyterLab instead of the desktop. The form asks for a profile (same drop-down as the desktop apps — pin to a specific lab node), resource sizing, and a modules field which defaults to jupyter-gpu/2026a — the curated stack with PyTorch, TensorFlow, CuPy and scientific Python.

Best for: Python notebooks, ML training with GPU, prototyping. If the default jupyter-gpu env is missing a package you can pip install --user into it for the session, or build your own conda env on lab storage and select it as the kernel — see python-and-conda.md.

3. VS Code (code-server) in the browser¶

From Interactive Apps, pick VS Code (code-server). Pick the lab profile (your lab's node), resources, and a workdir. Submit. The session card gives you a Connect to VS Code link — full VS Code in your browser, running on your lab's GPU node, with /opt/sw modules and your lab share already mounted. Use this rather than pointing desktop VS Code Remote-SSH at the cluster (which lands on the Slurm controller and hurts everyone).

4. Direct SSH (advanced — batch ops only)¶

ssh <directoryID>@hpc.eng.umd.edu          # the Slurm submit host

For writing/submitting sbatch scripts, checking squeue/sacct, inspecting logs on lab storage. AD password works out of the box; Kerberos works if you kinit <user>@AD.UMD.EDU first. All compute must go through Slurm (sbatch or srun --pty), and nothing IDE-shaped should run here — see direct-ssh.md for the full warning.

Detail: direct-ssh.md.

Your first desktop session¶

Open https://hpc.eng.umd.edu, sign in.
Interactive Apps → Lab Desktop.
Pick a Desktop profile (e.g. Lincheng Research Desktop or Inspire Turing Desktop) — this chooses which lab node your session lands on. Pick a resolution. Enter your AD password if you want lab-storage access this session (otherwise leave it blank). Click Launch.
Wait for the card to go from Queued → Starting → Running, then click Launch NoVNC in a new tab.
Inside the desktop, open a terminal (Applications → Terminal Emulator). Try:

ls /mnt/lab-*                   # your lab's storage
module avail                    # software available
module load matlab              # load MATLAB onto your PATH
matlab &                        # launches the GUI

Save your work to your lab share, not ~. Your home directory is local to the node your profile pinned this session to. Pick the same profile next time and you'll see the same home; pick a different one (or run a CLI job on a different node) and you won't. Lab storage is mounted identically everywhere and persists.
When done, go back to the OOD tab and click Delete on the session card to release the node.

Your first Jupyter session¶

https://hpc.eng.umd.edu → Interactive Apps → JupyterLab.
Pick a profile (one of your lab's nodes), walltime, and GPU count (the form pre-fills the profile's default — usually 1 GPU). Leave the modules field at jupyter-gpu/2026a. Click Launch.
When it's running, click Connect to Jupyter.
New notebook → the curated stack is already active:

import torch
print(torch.cuda.is_available())   # True
print(torch.cuda.get_device_name(0))

Save notebooks to your lab share (look in ~ for the symlinks), not ~ itself.
End the session from the OOD tab when done.

Your first Slurm CLI job¶

For long-running work (overnight training, parameter sweeps), batch jobs are the right pattern. From any terminal — desktop session, OOD shell, or ssh hpc.eng.umd.edu:

Save this as ~/lab-research/jobs/run.sh (substitute your share):

#!/bin/bash
#SBATCH --job-name=hello-gpu
#SBATCH --partition=<your-lab>            # e.g. lincheng or inspire
#SBATCH --time=00:10:00                    # 10 minutes
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --gres=gpu:1
#SBATCH --output=/mnt/lab-research/jobs/logs/%x-%j.out

module load miniconda3
conda activate jupyter-gpu

python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"

Then:

sbatch run.sh                # submit
squeue -u $USER              # see your jobs
cat /mnt/lab-research/jobs/logs/hello-gpu-*.out   # output

Detail and more useful directives: slurm-cli.md.

Storage in 30 seconds¶

Where	Persists?	Use for
`/mnt/lab-*`	Yes — forever	All real work: code, data, notebooks, outputs
`~` (your home)	No — local to the node, not guaranteed to follow you	Dotfiles only. Nothing important.
`/scratch`	No — node-local, shared with everyone on the node, no auto-cleanup	Fast temp space (container scratch, big intermediate files); make a `/scratch/$USER/...` dir and clean it up yourself

Full version: your-lab-storage.md.

Loading software with `module`¶

All research software lives at /opt/sw and is loaded on demand:

module avail                 # what's installed
module avail matlab          # filter
module load matlab           # add to PATH
module load abaqus/2026      # specific version
module list                  # what's loaded
module unload matlab         # remove
module purge                 # unload everything

Inside a desktop session you can also launch most apps from the Applications → Research Software menu — those launchers do the module load for you.

Full software list: software.md.

What can go wrong on day one¶

"User does not exist" after signing in — You aren't in your lab's AD group yet. Ask your PI or lab admin to add you; access syncs through automatically within a few minutes. (No manual account provisioning needed.)
Stuck in "Queued" — Every node in your lab's partition is full. Wait, end old sessions, or pick a smaller resource ask.
Session starts but your lab share looks empty — You either left the AD password blank, or entered it wrong. Relaunch with the correct AD password.
Session disconnects immediately — A genuine launch error (not the same as "lab share is empty"). Check the session output log linked from the session card.
Anything else — Screenshot the error and the time, email Engineering IT at eit-help@umd.edu.

More: faq.md.

Maintenance window¶

Routine maintenance is the third Saturday of every month, 08:00 – 12:00 ET. Nodes reboot during this window and any running session is terminated. OOD shows a banner well in advance. Plan long jobs and save work to your lab share before Saturday morning. See faq.md for the banner timing and what to expect.

Where to go next¶

launching-a-desktop.md — form options, resource sizing, native VNC client, reconnecting.
your-lab-storage.md — the one most important page. Read it before you put something important in ~ and find your next session can't see it.
software.md — application inventory, license servers, requesting new software.
python-and-conda.md — building your own PyTorch / JAX environments.
slurm-cli.md — batch jobs, arrays, checking cluster availability.
direct-ssh.md — submit-host SSH for batch ops. (For VS Code / Cursor, use the VS Code (code-server) OOD app instead — Remote-SSH at the submit host is harmful.)
faq.md — symptom-first troubleshooting.

Help¶

Your lab's IT liaison or PI — first stop for "how does my lab do X" and "please add me to the lab AD group."
Engineering IT (eit-help@umd.edu) — portal, desktop, storage, software requests, anything broken.