Remote development on HPC (Yale’s) clusters with VSCode/Cursor

code

Bypass login node and start code servers on a compute node directly using Remote-SSH

Author

Enyan Zhang

Published

June 29, 2025

TL;DR

When using Remote-SSH or a similar tool, you want to start your VSCode server on a compute node. Yale’s cluster, for example, kills VSCode instances on the login node automatically. You can get around this by setting ProxyCommand in your ssh configs to ssh twice (first to login node, then to compute node) to start a server there directly.

See the solution as well as the extra step for VSCode.

Remote Tunnels is also a good workaround, but it’s an extra step and doesn’t work if you’re using Cursor because it’s blocked by Microsoft.

The Issue

The issue with Remote-SSH (apparently) is that VSCode server can be quite a demanding process, so when you’re using an HPC you should avoid starting it on the login node. Some places (e.g. Brown) have HPC staff setup dedicated VSCode nodes and associated configs, but other places (looking at you, Yale) decide that it’s better to just kill all VSCode processes automatically and suggest that people use alternatives.

If you use VSCode, the best way is probably to use Remote Tunnels, which requires starting a code cli instance on the compute node. In this case, instead of an ssh connection, both your local client and the remote server talk to Microsoft, who establishes a tunnel for you that is authenticated with Microsoft/Github account. But this has a few problems:

It’s just a lot of hassle. The steps are:
1. ssh into the login node
2. start a script
3. watch the output of that script, which gives you a code to verify your account with Microsoft
4. open a browser page on your local computer and paste in that code
Does not work with Cursor — Microsoft blocked Cursor from using its official extensions, and Cursor’s replacement doesn’t include remote tunnels yet
- I somehow managed to install the already-blocked Remote Tunnels extension on Cursor on my Mac, but I can’t do it anymore on my Windows machine.

I spent a lot of time wrestling with this and wanted an easier solution: ideally something that’s as simple as regular Remote-SSH, which only has 1 step: open the window on VSCode.

The Solution

Pre-Requisites

You should be able to ssh into the login node of your cluster. At Yale, this requires you to have setup ssh keypairs and the appropriate ssh config; it also has an MFA step via DUO.

Background

The idea is simple, but automating takes a little more work. Below is how the typical HPC is structured:

Assuming you know which compute node you want to end up, you’d setup an ssh config that looks like this:

Host grace
    HostName grace.ycrc.yale.edu
    User <your-netid>

Host grace-remote-ssh
    User <your-netid>
    HostName compute-0001
    ProxyJump grace

and open a Remote SSH window to connect to grace-remote-ssh.

This works because ssh compute-0001 from the login node will take you to the compute node, and we specified it to go through grace first. The compute nodes are usually only accessible via ssh from the login node. Furthermore, SLURM usually restricts ssh access to the nodes currently under your allocation. The biggest hurdle to automation here is that nodes are only available after requesting, and the nodename changes depending on vacancy. So you don’t know which node you should put in your config.

UW’s recommendation is to use a script to replace your local config file. But that also seems like a lot of work. The steps would be:

SSH into the cluster and start a job (with a particular name)
Run your local script, which SSH’es again into the cluster, finds the job name, and copies it back
Remote SSH into the compute node
When you’re done. Cancel your job request manually

Sure you can put step 1 and 2 into one script, but that’s still 3 steps.

The 1-step Solution

Now, instead of manually allocating and then connecting, you can bundle those two actions into one SSH invocation. VSCode will:

SSH to the login node
Invoke salloc to grab a compute node
nc-pipe that node’s SSH port back over the same connection
Land you directly on the compute node

Simply add this host entry to your ~/.ssh/config:

# This is your login node, it could be any other thing/name
Host grace
    HostName grace.ycrc.yale.edu
    User <your-netid>

Host ycrc-ondemand
  User <your-netid>
  ProxyCommand ssh grace "bash -lc 'salloc --nodes=1 --partition=devel --time=4:00:00 --job-name=vscode /bin/bash -c \"nc \$SLURM_NODELIST 22\"'"
  ForwardAgent yes

How it works

ssh grace opens the login-node session and prompts you for Duo; once you approve the push,
bash -lc 'salloc …' runs in a login shell so salloc (and any module-provided SLURM tools) are available on PATH. You can change the specs of this allocation just like any other salloc command,
as soon as SLURM grants your job, $SLURM_NODELIST¹ expands to the real compute-node hostname,
nc $SLURM_NODELIST 22 pipes that node’s port 22 back through the login host, completing the SSH tunnel to the compute node.

Once this is in place, your only step is:

ssh ycrc-ondemand

or, in VS Code’s Remote-SSH panel, select ycrc-ondemand—and you’ll land straight on your allocated compute node. No extra scripts, no manual edits, and no VS Code processes on the login node.

The Caveat: MFA

Yale’s cluster requires MFA on every login. It’s done from an interactive terminal like this:

(<your-netid>@grace.ycrc.yale.edu) Duo two-factor login for <your-netid>

Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-XXXX
 2. Phone call to XXX-XXX-XXX

At which point you need to enter 1↩︎. On Cursor this is a non-issue because the default Remote-SSH behavior is to loop this back into an interactive prompt. But in VSCode the default behavior is to stream it to Outputs. So there’s an extra step:

Open Settings
Search for Remote-SSH: Show Login Terminal and set it to true
```
"remote.SSH.showLoginTerminal": true
```

Once enabled, VSCode will open a new terminal pane when you connect, you type 1 and press Enter, then approve the push on your device.

What’s also great about this approach is that once you close your client, the remote will also know (since it’s interactive) and will automatically relinquish the job allocation.

There’s more?

I also attemped to write a much more complicated script that re-allocates a new session when the current job is close to ending. This part is not hard, but the harder part is maintaining the same connection and knowing when the client has disconnected. I think keeping the same connection would require a custom reverse proxy that’s always on the same port, but I couldn’t get this to work. You should tell me if you manage to do this!

Footnotes

This gives you the node list of the current job from the job allocation itself. E.g. if you’re on requested an interactive job and got node001, it’ll give node001 within that interactive terminal.↩︎