Remote development on HPC (Yale’s) clusters with VSCode/Cursor
TL;DR
When using Remote-SSH or a similar tool, you want to start your VSCode server on a compute node. Yale’s cluster, for example, kills VSCode instances on the login node automatically. You can get around this by setting ProxyCommand
in your ssh configs to ssh twice (first to login node, then to compute node) to start a server there directly.
See the solution as well as the extra step for VSCode.
Remote Tunnels is also a good workaround, but it’s an extra step and doesn’t work if you’re using Cursor because it’s blocked by Microsoft.
The Issue
The issue with Remote-SSH (apparently) is that VSCode server can be quite a demanding process, so when you’re using an HPC you should avoid starting it on the login node. Some places (e.g. Brown) have HPC staff setup dedicated VSCode nodes and associated configs, but other places (looking at you, Yale) decide that it’s better to just kill all VSCode processes automatically and suggest that people use alternatives.
If you use VSCode, the best way is probably to use Remote Tunnels, which requires starting a code cli instance on the compute node. In this case, instead of an ssh connection, both your local client and the remote server talk to Microsoft, who establishes a tunnel for you that is authenticated with Microsoft/Github account. But this has a few problems:
- It’s just a lot of hassle. The steps are:
- ssh into the login node
- start a script
- watch the output of that script, which gives you a code to verify your account with Microsoft
- open a browser page on your local computer and paste in that code
- Does not work with Cursor — Microsoft blocked Cursor from using its official extensions, and Cursor’s replacement doesn’t include remote tunnels yet
- I somehow managed to install the already-blocked Remote Tunnels extension on Cursor on my Mac, but I can’t do it anymore on my Windows machine.
I spent a lot of time wrestling with this and wanted an easier solution: ideally something that’s as simple as regular Remote-SSH, which only has 1 step: open the window on VSCode.
The Solution
Pre-Requisites
You should be able to ssh into the login node of your cluster. At Yale, this requires you to have setup ssh keypairs and the appropriate ssh config; it also has an MFA step via DUO.
Background
The idea is simple, but automating takes a little more work. Below is how the typical HPC is structured:
Assuming you know which compute node you want to end up, you’d setup an ssh config that looks like this:
Host grace
HostName grace.ycrc.yale.edu
User <your-netid>
Host grace-remote-ssh
User <your-netid>
HostName compute-0001
ProxyJump grace
and open a Remote SSH window to connect to grace-remote-ssh
.
This works because ssh compute-0001
from the login node will take you to the compute node, and we specified it to go through grace
first. The compute nodes are usually only accessible via ssh from the login node. Furthermore, SLURM usually restricts ssh access to the nodes currently under your allocation. The biggest hurdle to automation here is that nodes are only available after requesting, and the nodename changes depending on vacancy. So you don’t know which node you should put in your config.
UW’s recommendation is to use a script to replace your local config file. But that also seems like a lot of work. The steps would be:
- SSH into the cluster and start a job (with a particular name)
- Run your local script, which SSH’es again into the cluster, finds the job name, and copies it back
- Remote SSH into the compute node
- When you’re done. Cancel your job request manually
Sure you can put step 1 and 2 into one script, but that’s still 3 steps.
The 1-step Solution
Now, instead of manually allocating and then connecting, you can bundle those two actions into one SSH invocation. VSCode will:
- SSH to the login node
- Invoke
salloc
to grab a compute node nc
-pipe that node’s SSH port back over the same connection- Land you directly on the compute node
Simply add this host entry to your ~/.ssh/config
:
# This is your login node, it could be any other thing/name
Host grace
HostName grace.ycrc.yale.edu
User <your-netid>
Host ycrc-ondemand
User <your-netid>
ProxyCommand ssh grace "bash -lc 'salloc --nodes=1 --partition=devel --time=4:00:00 --job-name=vscode /bin/bash -c \"nc \$SLURM_NODELIST 22\"'"
ForwardAgent yes
ssh grace
opens the login-node session and prompts you for Duo; once you approve the push,bash -lc 'salloc …'
runs in a login shell sosalloc
(and any module-provided SLURM tools) are available onPATH
. You can change the specs of this allocation just like any othersalloc
command,- as soon as SLURM grants your job,
$SLURM_NODELIST
1 expands to the real compute-node hostname, nc $SLURM_NODELIST 22
pipes that node’s port 22 back through the login host, completing the SSH tunnel to the compute node.
Once this is in place, your only step is:
ssh ycrc-ondemand
or, in VS Code’s Remote-SSH panel, select ycrc-ondemand—and you’ll land straight on your allocated compute node. No extra scripts, no manual edits, and no VS Code processes on the login node.
The Caveat: MFA
Yale’s cluster requires MFA on every login. It’s done from an interactive terminal like this:
(<your-netid>@grace.ycrc.yale.edu) Duo two-factor login for <your-netid>
Enter a passcode or select one of the following options:
1. Duo Push to XXX-XXX-XXXX
2. Phone call to XXX-XXX-XXX
At which point you need to enter 1↩︎
. On Cursor this is a non-issue because the default Remote-SSH behavior is to loop this back into an interactive prompt. But in VSCode the default behavior is to stream it to Outputs. So there’s an extra step:
Open Settings
Search for Remote-SSH: Show Login Terminal and set it to true
"remote.SSH.showLoginTerminal": true
Once enabled, VSCode will open a new terminal pane when you connect, you type 1 and press Enter, then approve the push on your device.
What’s also great about this approach is that once you close your client, the remote will also know (since it’s interactive) and will automatically relinquish the job allocation.
There’s more?
I also attemped to write a much more complicated script that re-allocates a new session when the current job is close to ending. This part is not hard, but the harder part is maintaining the same connection and knowing when the client has disconnected. I think keeping the same connection would require a custom reverse proxy that’s always on the same port, but I couldn’t get this to work. You should tell me if you manage to do this!
Footnotes
This gives you the node list of the current job from the job allocation itself. E.g. if you’re on requested an interactive job and got
node001
, it’ll givenode001
within that interactive terminal.↩︎