<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Enyan Zhang</title>
<link>https://enyanz.com/posts.html</link>
<atom:link href="https://enyanz.com/posts.xml" rel="self" type="application/rss+xml"/>
<description>Enyan&#39;s corner of the web</description>
<generator>quarto-1.6.40</generator>
<lastBuildDate>Thu, 02 Oct 2025 04:00:00 GMT</lastBuildDate>
<item>
  <title>San Francisco, August 2025</title>
  <dc:creator>Enyan Zhang</dc:creator>
  <link>https://enyanz.com/posts/sf-2508/</link>
  <description><![CDATA[ 





<!-- For the title picture -->
<style>
    .quarto-title-banner {
        aspect-ratio: 3/2;
    }
</style>
<p>A mini photo dump from my August travel to San Francisco, which was my first time on the west coast since 2014.</p>
<p>All photos here are taken with a Pentax SP loaded with Fujifilm 200. I liked the idea of contrasting the next wave of automation and human technology with capturing them with a fully mechanical camera.</p>
<div>

</div>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="000061680012.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680012.webp" class="img-fluid" alt="A red-and-gold pagoda-style gate peeks through dense evergreens in a Japanese garden, late-afternoon light raking the foliage."></a></p>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="000061680008.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680008.webp" class="img-fluid" alt="Crowd gathered on a closed park road in bright sun; a child in neon overalls and a red-haired teen watch a street demo."></a></p>
</div>
</div>
</div>
<hr>
<p>I also had my first-time Waymo experience. I intentionally tried not watching any videos, or read others’ experiences from Waymo since launch<sup>1</sup>. My most optimistic projection was that the first-time experience will be so smooth and human-level that it will feel underwhelming, as if I’m just sitting in a normal human-driven car. It was exactly what the rides felt like.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;Inspired by the story I once heard that Putnam tries to avoid learning about the distinction between Elm and Beech, so he can keep using <a href="https://philosophy-science-humanities-controversies.com/listview-list.php?concept=Elm%2FBeech+Example">this example</a>. Anyways, that’s quite off topic even for a post like this.</p></div></div><p><a href="000061680022.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680022.webp" class="img-fluid" alt="A white Jaguar I-Pace Waymo test vehicle turns onto a residential San Francisco street in warm, low sun."></a></p>
<div>

</div>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="000061680023.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680023.webp" class="img-fluid" alt="View through a windshield down a steep city block; trolleybus wires crisscross above a “Fillmore” street sign."></a></p>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="000061680024.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680024.webp" class="img-fluid" alt="Close view of a Waymo-equipped white Jaguar curbside; a pedestrian passes behind it on a sunny urban corner."></a></p>
</div>
</div>
</div>
<hr>
<p>The other recurring topic during the trip, especially among people who haven’t been in SF for a while, is how the billboards driving into SF are exclusively AI companies. I liked the diversity, though: it ranged from genuinely good ads, to confusing slogans, to those that are outright antisocial.</p>
<p>I also thought I should be recording those on film, a dying medium more than a century old</p>
<p><a href="000061680026.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680026.webp" class="img-fluid" alt="Blue Okta billboard reading “Build and secure AI agents from day one” rises beside trees against a clear sky."></a></p>
<p><a href="000061680025.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680025.webp" class="img-fluid" alt="Highway scene with a “Prompt it. Then push it.” Figma billboard above terraced houses; cars blur past in the foreground."></a></p>
<p>I must say that the cityscape itself is quite dull and disappointing, especially for the technological center of the world. The people and the atmosphere is quite magical, though.</p>
<hr>
<p><a href="000061680029.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/sf-2508/000061680029.webp" class="img-fluid" alt="Airport gate interior in shadow frames a United jet taxiing on the runway beyond large floor-to-ceiling windows."></a></p>
<p>At SFO.</p>




 ]]></description>
  <category>photography</category>
  <category>life</category>
  <guid>https://enyanz.com/posts/sf-2508/</guid>
  <pubDate>Thu, 02 Oct 2025 04:00:00 GMT</pubDate>
  <media:content url="https://enyanz.com/posts/sf-2508/000061680016.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>The Super-Takumar 1:1.4/50mm</title>
  <dc:creator>Enyan Zhang</dc:creator>
  <link>https://enyanz.com/posts/takumar-50-f1.4/</link>
  <description><![CDATA[ 





<!-- For the title picture -->
<style>
    .quarto-title-banner {
        aspect-ratio: 3/2;
    }
</style>
<p>I was at Fujiya’s film section in Nakano. The camera selection wasn’t very exciting. Someone told me that Kitamura might have what I’m looking for. After a few turns and a narrow staircase up, I found myself looking at an entire shelf of old cameras and lenses marked as junk.</p>
<p>A whole row of 50 and 55mm Takumars was lying there. Most were in pretty bad condition. But one stood out: the aperture blades had oil stains, and there was a lot of yellowing<sup>1</sup>. But apart from that it was so clean I could barely see any dust. It was just 4400 yen. How could I say no?</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;Happens a lot to the radioactive Takumars, and supposedly <a href="https://pentax-manuals.com/repairs/yellow.htm">reversible</a>.</p></div></div><p>With a X-T30 II in hand, I asked if they had any M42 to Fujifilm X adapters. You might try your luck at Fujiya’s junk section, the clerk assisting me said. I ran back to Fujiya. Amazingly, it was the only adapter they had in the shop. I ran to Kitamura again, put the adapter on, and tried the lens. It worked amazingly well: yellowing is nothing but a white balance issue for digital cameras. I boarded the train east with the fastest lens I owned thus far: the Super-Takumar 50mm f1.4.</p>
<p><br></p>
<section id="the-lens" class="level2">
<h2 class="anchored" data-anchor-id="the-lens">The Lens</h2>
<p>The lens is certainly one of the more famous Takumars, and you’ll find a lot of reviews online. It’s amazingly built, yet not too heavy at 230g. With an additional adapter (mine is 130g) it becomes heavy for an APS-C lens, but still the balance with my X-T30 doesn’t feel off. With focus assist on modern mirrorless, manual focusing was surprisingly easy: by the second day I was ambitious enough to start tracking moving objects.</p>
<p>For most models, you can see a number on the back of the manual/auto switch, which controls the automatic stop-down pin for aperture. Mine says 38701, and is a 7-element version of the Super-Takumar. It is said to be a simplification of the earlier, much more rare 8-element verion. The later multi-coated version supposedly has better flare control, as well as an additional protruding at the mount that communicated with the camera (like the SP F) for open-aperture metering. You can find more info <a href="https://www.pentaxforums.com/lensreviews/SMC-S-M-C-Super-Takumar-50mm-F1.4.html">here</a>.</p>
</section>
<section id="shooting" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="shooting">Shooting</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="DSCF2760-Enhanced-NR.webp" class="lightbox" data-gallery="pictures" title="Look at how much it blooms! It makes for very unique shots in night scenes"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF2760-Enhanced-NR.webp" class="img-fluid figure-img" alt="Aerial night view of Tokyo’s Shibuya Crossing from above, showing mostly cleared crosswalks with a few groups of people waiting. The scene is dominated by vivid illuminated signage on tall buildings, colorful advertisements, and the glow of city lights reflecting off the streets."></a></p>
<figcaption>Look at how much it blooms! It makes for very unique shots in night scenes</figcaption>
</figure>
</div>
<p>I enjoyed bokeh and the amount of light it lets in at F1.4. But wide open it also softens a lot: your pictures will be fine if there’s not much contrast. With any light source or bright reflections, however, there’ll be a big glowing, dreamy bloom around any bright area.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="DSCF3064.webp" class="lightbox" data-gallery="pictures" title="Shinjuku at night. This one is shot at F/2"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3064.webp" class="img-fluid figure-img" alt="A wide nighttime cityscape view of Shinjuku, Tokyo, featuring a dense cluster of brightly lit billboards and tall buildings. A yellow-green train crosses an elevated track in the lower part of the image, while cars and crowds are visible on the streets below, creating a lively, layered urban scene."></a></p>
<figcaption>Shinjuku at night. This one is shot at F/2</figcaption>
</figure>
</div>
<p>Stopping down to F2 makes that blooming a lot more controlled. You also start seeing hexagon-shaped bokehs and six-pointed sunstars — the lens has 6 (non-rounded) aperture blades. I liked it a lot!</p>
<p>All of the following pictures are shot at F2.</p>
<p class="page-columns page-full"><a href="DSCF3252.webp" class="lightbox page-columns page-full" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3252.webp" class="column-screen-inset img-fluid" alt="A crowded platform at Shinjuku train station at night, with passengers lined up along the edge waiting for an arriving E-235 Yamanote Line train. Overhead signs indicate track numbers, and fluorescent lighting reflects off the wet platform floor."></a></p>
<div class="column-screen-inset" style="display: flex; justify-content: space-between; align-items: flex-start; gap: 1rem;">
<p><a href="DSCF3281.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3281.webp" class="img-fluid" style="flex: 1; height: 26.5vw; object-fit: contain;" alt="A brightly lit yellow and green automated parking payment machine in Japan, captured at night under a light rain. The machine stands on a wet, reflective asphalt surface near a quiet urban street, with Japanese instructions and payment slots clearly visible."></a></p>
<p><a href="DSCF3266.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3266.webp" class="img-fluid" style="flex: 1; height: 26.5vw; object-fit: contain;" alt="A detailed nighttime close-up of a white and green automatic platform safety gate at Takadanobaba train station in Tokyo. The gate label indicates 'Yamanote Line, Car 2, Door 1;' in Japanese, and beyond it, colorful city lights and signage appear out of focus, producing a vibrant hexagonal bokeh effect that contrasts with the sharp foreground."></a></p>
<p><a href="DSCF2927.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF2927.webp" class="img-fluid" style="flex: 1; height: 26.5vw; object-fit: contain;" alt="A close-up of a handmade ceramic pitcher with a glossy, multicolored abstract glaze in red, yellow, blue, and white on a black base. The pitcher is displayed on a table at a pottery market, with additional ceramic pieces and softly blurred visitors in the background."></a></p>
</div>
<p>At even smaller apertures, the lens seems to be perfectly capable of resolving 24MP<sup>2</sup>: nail the focus, and you’ll get sharp images down to the last pixel. Having shot quite a bit with the XC15-45mm kit lens, this was quite shocking. It’s also a good street photography lens: at F11 and F16 the depth of field is so large that the <a href="https://en.wikipedia.org/wiki/Hyperfocal_distance">hyperfocal distance</a><sup>3</sup> is just a few meters, so it barely needs any focusing for most scenes. If you choose to open up, though, you still get a lot of light for the night and beautiful bokehs.</p>
<div class="no-row-height column-margin column-container"><div id="fn2"><p><sup>2</sup>&nbsp;Well, my Fujifilm has a 24MP sensor, so I can’t tell you much beyond that</p></div><div id="fn3"><p><sup>3</sup>&nbsp;Something I probably will never learn if I didn’t start playing with manual lenses</p></div></div><p><a href="DSCF2584.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF2584.webp" class="img-fluid" alt="A middle-aged woman sits alone on a blue bench at an empty Japanese train station platform, holding a black bag on her lap. Overhead, a digital sign reads 'Train Approach Information' in English and Japanese, while platform safety doors line the edge, and a quiet urban background with tracks extends into the distance."></a></p>
<p><a href="DSCF2603.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF2603.webp" class="img-fluid" alt="An upward perspective view framed between two elevated structural beams or overpasses, revealing a pale blue sky with scattered white clouds. The geometric lines and muted tones emphasize architectural detail and the interplay of light and shadow."></a></p>
<p>The lens certainly has a lot of the vintage lens “character”: I’m still not sure I love the blooming, which is probably what a lot of people describe as “dreamy”. And there are certainly subtle style differences from modern lenses. But the other great thing about having “character” is how you interact with it: after using this lens, I learn its quirks, which then become additional dimensions you can tweak about your photography. In terms of how fun it is to shoot, it certainly tops any “modern” lens.</p>
<div>

</div>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="DSCF3902.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3902.webp" class="img-fluid" alt="A low-angle view down a long, warmly lit indoor hallway with polished floors and wooden benches lining the walls. Framed artwork and colorful fabric hangings decorate the walls, and a glass door at the far end reveals a softly focused exit sign and window light."></a></p>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="DSCF3885.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3885.webp" class="img-fluid" alt="A close-up of a smiling Hotei (Laughing Buddha) statue adorned with a wooden bead necklace, positioned in front of an ornate wooden clock with Roman numerals. The scene is softly lit, emphasizing the serene expression and aged texture of the statue."></a></p>
</div>
</div>
</div>
<p><a href="DSCF3880.webp" class="lightbox" data-gallery="pictures"><img src="https://enyanz.com/posts/takumar-50-f1.4/DSCF3880.webp" class="img-fluid" alt="A small indoor Zen sand garden featuring a carefully balanced stack of smooth dark stones in the foreground. The background shows a softly blurred window view with green plants and diffused natural light, creating a tranquil, contemplative atmosphere."></a></p>
<p>And, did I mention this? $30 for beautifully built 50mm F1.4 lens with so much history. Having used this lens for a bit now, I’m perfectly happy paying multiples of this amount. Turning the focus and aperture rings feels so nice I’d buy it even just as a fidget toy.</p>
</section>
<section id="afterwords-yellowing" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="afterwords-yellowing">Afterwords: Yellowing</h2>
<p>When I initially got the lens, its yellowing made it 1300K warmer compared to a modern lens. After laying it in direct sunlight<sup>4</sup> for a few days, and after finding an UV light in the basement, leaving it with the UV light, now the yellowing has reduced a lot — I’d say it’s a few hundred K at most. Now it’s completely usable without adjusting white balance. Kitamura definitely jumped the gun marking this as junk. Their loss, my gain.</p>


<div class="no-row-height column-margin column-container"><div id="fn4"><p><sup>4</sup>&nbsp;A lesson learned: remove plastic lens caps, as they will melt when getting focused on by the lens under direct sunlight.</p></div></div></section>


 ]]></description>
  <category>photography</category>
  <category>life</category>
  <guid>https://enyanz.com/posts/takumar-50-f1.4/</guid>
  <pubDate>Tue, 01 Jul 2025 04:00:00 GMT</pubDate>
  <media:content url="https://enyanz.com/posts/takumar-50-f1.4/takumar-on-xt30.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>Remote development on HPC (Yale’s) clusters with VSCode/Cursor</title>
  <dc:creator>Enyan Zhang</dc:creator>
  <link>https://enyanz.com/posts/ycrc-remote-dev/</link>
  <description><![CDATA[ 




<section id="tldr" class="level2">
<h2 class="anchored" data-anchor-id="tldr">TL;DR</h2>
<p>When using Remote-SSH or a similar tool, you want to start your VSCode server on a compute node. Yale’s cluster, for example, kills VSCode instances on the login node automatically. You can get around this by setting <code>ProxyCommand</code> in your ssh configs to ssh twice (first to login node, then to compute node) to start a server there directly.</p>
<p><em>See the solution as well as the extra step for VSCode.</em></p>
<p><a href="https://code.visualstudio.com/docs/remote/tunnels">Remote Tunnels</a> is also a good workaround, but it’s an extra step and doesn’t work if you’re using Cursor because it’s blocked by Microsoft.</p>
</section>
<section id="the-issue" class="level2">
<h2 class="anchored" data-anchor-id="the-issue">The Issue</h2>
<p>The issue with Remote-SSH (apparently) is that VSCode server can be quite a demanding process, so when you’re using an HPC you should avoid starting it on the login node. Some places (e.g.&nbsp;<a href="https://docs.ccv.brown.edu/oscar/connecting-to-oscar/remote-ide">Brown</a>) have HPC staff setup dedicated VSCode nodes and associated configs, but other places (looking at you, Yale) decide that it’s better to just kill all VSCode processes automatically and suggest that people use <a href="https://docs.ycrc.yale.edu/clusters-at-yale/access/ood-vscode/">alternatives</a>.</p>
<p>If you use VSCode, the best way is probably to use <a href="https://code.visualstudio.com/docs/remote/tunnels">Remote Tunnels</a>, which requires starting a code cli instance on the compute node. In this case, instead of an ssh connection, both your local client and the remote server talk to Microsoft, who establishes a tunnel for you that is authenticated with Microsoft/Github account. But this has a few problems:</p>
<ul>
<li>It’s just a lot of hassle. The steps are:
<ol type="1">
<li>ssh into the login node</li>
<li>start a script</li>
<li>watch the output of that script, which gives you a code to verify your account with Microsoft</li>
<li>open a browser page on your local computer and paste in that code</li>
</ol></li>
<li><strong>Does not</strong> work with Cursor — Microsoft blocked Cursor from using its official extensions, and Cursor’s replacement doesn’t include remote tunnels yet
<ul>
<li>I somehow managed to install the already-blocked Remote Tunnels extension on Cursor on my Mac, but I can’t do it anymore on my Windows machine.</li>
</ul></li>
</ul>
<p>I spent a lot of time wrestling with this and wanted an easier solution: ideally something that’s as simple as regular Remote-SSH, which only has 1 step: open the window on VSCode.</p>
</section>
<section id="the-solution" class="level2">
<h2 class="anchored" data-anchor-id="the-solution">The Solution</h2>
<section id="pre-requisites" class="level3">
<h3 class="anchored" data-anchor-id="pre-requisites">Pre-Requisites</h3>
<p>You should be able to ssh into the login node of your cluster. At Yale, this requires you to have setup ssh keypairs and the appropriate ssh config; it also has an MFA step via DUO.</p>
</section>
<section id="background" class="level3">
<h3 class="anchored" data-anchor-id="background">Background</h3>
<p>The idea is simple, but automating takes a little more work. Below is how the typical HPC is structured:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://enyanz.com/posts/ycrc-remote-dev/hpc.svg" class="img-fluid figure-img" style="width:80.0%"></p>
<figcaption>Structure of a typical HPC Cluster</figcaption>
</figure>
</div>
<p>Assuming you know which compute node you want to end up, you’d setup an ssh config that looks like this:</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Host</span> grace</span>
<span id="cb1-2">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">HostName</span> grace.ycrc.yale.edu</span>
<span id="cb1-3">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">User</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>your-netid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Host</span> grace-remote-ssh</span>
<span id="cb1-6">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">User</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>your-netid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb1-7">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">HostName</span> compute-0001</span>
<span id="cb1-8">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ProxyJump</span> grace</span></code></pre></div>
<p>and open a Remote SSH window to connect to <code>grace-remote-ssh</code>.</p>
<p>This works because <code>ssh compute-0001</code> from the login node will take you to the compute node, and we specified it to go through <code>grace</code> first. The compute nodes are usually only accessible via ssh from the login node. Furthermore, SLURM usually restricts ssh access to the nodes currently under your allocation. The biggest hurdle to automation here is that nodes are only available after requesting, and the nodename changes depending on vacancy. So you <em>don’t</em> know which node you should put in your config.</p>
<p>UW’s recommendation is to use a script to replace your local config file. But that also seems like a lot of work. The steps would be:</p>
<ol type="1">
<li>SSH into the cluster and start a job (with a particular name)</li>
<li>Run your local script, which SSH’es again into the cluster, finds the job name, and copies it back</li>
<li>Remote SSH into the compute node</li>
<li>When you’re done. Cancel your job request manually</li>
</ol>
<p>Sure you can put step 1 and 2 into one script, but that’s still 3 steps.</p>
</section>
<section id="the-1-step-solution" class="level3">
<h3 class="anchored" data-anchor-id="the-1-step-solution">The 1-step Solution</h3>
<p>Now, instead of manually allocating and then connecting, you can bundle those two actions into <strong>one SSH invocation</strong>. VSCode will:</p>
<ol type="1">
<li>SSH to the login node</li>
<li>Invoke <code>salloc</code> to grab a compute node</li>
<li><code>nc</code>-pipe that node’s SSH port back over the same connection</li>
<li>Land you directly on the compute node</li>
</ol>
<p>Simply add this host entry to your <code>~/.ssh/config</code>:</p>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This is your login node, it could be any other thing/name</span></span>
<span id="cb2-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Host</span> grace</span>
<span id="cb2-3">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">HostName</span> grace.ycrc.yale.edu</span>
<span id="cb2-4">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">User</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>your-netid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Host</span> ycrc-ondemand</span>
<span id="cb2-7">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">User</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>your-netid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb2-8">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ProxyCommand</span> ssh grace <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bash -lc 'salloc --nodes=1 --partition=devel --time=4:00:00 --job-name=vscode /bin/bash -c </span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\"</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">nc </span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\$</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">SLURM_NODELIST 22</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\"</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'"</span></span>
<span id="cb2-9">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ForwardAgent</span> yes</span></code></pre></div>
<div class="callout callout-style-simple callout-note callout-titled" title="How it works">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
How it works
</div>
</div>
<div class="callout-body-container callout-body">
<ul>
<li><code>ssh grace</code> opens the login-node session and prompts you for Duo; once you approve the push,</li>
<li><code>bash -lc 'salloc …'</code> runs in a <em>login</em> shell so <code>salloc</code> (and any module-provided SLURM tools) are available on <code>PATH</code>. You can change the specs of this allocation just like any other <code>salloc</code> command,</li>
<li>as soon as SLURM grants your job, <code>$SLURM_NODELIST</code><sup>1</sup> expands to the real compute-node hostname,</li>
<li><code>nc $SLURM_NODELIST 22</code> pipes that node’s port 22 back through the login host, completing the SSH tunnel to the compute node.</li>
</ul>
</div>
</div>
<p>Once this is in place, <strong>your only step</strong> is:</p>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ssh</span> ycrc-ondemand</span></code></pre></div>
<p>or, in VS Code’s Remote-SSH panel, select <strong>ycrc-ondemand</strong>—and you’ll land straight on your allocated compute node. No extra scripts, no manual edits, and no VS Code processes on the login node.</p>
</section>
<section id="the-caveat-mfa" class="level3">
<h3 class="anchored" data-anchor-id="the-caveat-mfa">The Caveat: MFA</h3>
<p>Yale’s cluster requires MFA on every login. It’s done from an interactive terminal like this:</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">(</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>your-netid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>@grace.ycrc.yale.edu<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Duo</span> two-factor login for <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>your-netid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb4-2"></span>
<span id="cb4-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Enter</span> a passcode or select one of the following options:</span>
<span id="cb4-4"></span>
<span id="cb4-5"> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">1.</span> Duo Push to XXX-XXX-XXXX</span>
<span id="cb4-6"> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2.</span> Phone call to XXX-XXX-XXX</span></code></pre></div>
<p>At which point you need to enter <code>1↩︎</code>. On Cursor this is a non-issue because the default Remote-SSH behavior is to loop this back into an interactive prompt. But in VSCode the default behavior is to stream it to Outputs. So there’s an extra step:</p>
<ol type="1">
<li><p>Open Settings</p></li>
<li><p>Search for Remote-SSH: Show Login Terminal and set it to true</p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb5-1"><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">"remote.SSH.showLoginTerminal":</span> <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">true</span></span></code></pre></div></li>
</ol>
<p>Once enabled, VSCode will open a new terminal pane when you connect, you type 1 and press Enter, then approve the push on your device.</p>
<p>What’s also great about this approach is that once you close your client, the remote will also know (since it’s interactive) and will automatically relinquish the job allocation.</p>
</section>
</section>
<section id="theres-more" class="level2">
<h2 class="anchored" data-anchor-id="theres-more">There’s more?</h2>
<p>I also attemped to write a much more complicated script that re-allocates a new session when the current job is close to ending. This part is not hard, but the harder part is maintaining the same connection and knowing when the client has disconnected. I think keeping the same connection would require a custom reverse proxy that’s always on the same port, but I couldn’t get this to work. You should tell me if you manage to do this!</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>This gives you the node list of the <em>current job</em> from the job allocation itself. E.g. if you’re on requested an interactive job and got <code>node001</code>, it’ll give <code>node001</code> within that interactive terminal.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>ai</category>
  <guid>https://enyanz.com/posts/ycrc-remote-dev/</guid>
  <pubDate>Sun, 29 Jun 2025 04:00:00 GMT</pubDate>
  <media:content url="https://enyanz.com/posts/ycrc-remote-dev/thumbnail.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>Cline review</title>
  <dc:creator>Enyan Zhang</dc:creator>
  <dc:creator>Claude 3.5 Sonnet</dc:creator>
  <dc:creator>DeepSeek R1</dc:creator>
  <link>https://enyanz.com/posts/cline-review/</link>
  <description><![CDATA[ 




<div class="callout callout-style-simple callout-warning callout-titled" title="Notice &amp; Disclaimer: AI Generated Content">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Notice &amp; Disclaimer: AI Generated Content
</div>
</div>
<div class="callout-body-container callout-body">
<p>This post is initially generated by a language model, usually by summarizing a human conversation or expanding a human-written summary. The goal is not to populate the internet with yet another piece of uncalled for, AI-generated slop (in which, unfortunately, people working in AI are complicit). Rather, it is to enable lower friction in sharing and distilling information. I have worked on, and often significantly rewritten, the post to ensure it accurately reflects the underlying human intentions and experiences, but there may be inaccuracies and biases that remain.</p>
</div>
</div>
<section id="update-june-29-2025" class="level2">
<h2 class="anchored" data-anchor-id="update-june-29-2025">Update June 29 2025</h2>
<p>The situation changed a lot from the time I initially tried out Cline (or generally speaking, the more agentic AI coding tools). The first thing, of course, is that success rate massively improved with new models/tools. There are much more complex tasks I feel comfortable to rely on LMs now, and the easy “gotchas” happen less.</p>
<p>At the same time, I don’t think the big picture has changed: if you have an important project, you probably don’t want a codebase you don’t fully understand, so you can only scrap it when things stop working/LMs stop being able to work on it. Keep the important things in your control!</p>
<p>But I can also confidently say that Claude Code/Codex/Gemini CLIs offer a easier-to-setup and more customizable experience. And if you’re starting out, agent mode in VSCode’s Copilot and Cursor is more user friendly, and also has the benefit of not having to pay-per-use (Cline bill shoots up extremely quickly due to how much context it collects!). Since all of them basically use the same suite of models as backend, ultimately the choice is a choice of whose UX is better, which changes rather quickly, and whose prompts best suit your use cases. I’m guessing that there’s an intricate balance between telling the model too little so it makes the same obvious mistaks and telling it too much so its operating structure is too rigid, but that’s just my guess from the outsider’s perspective.</p>
</section>
<section id="tldr" class="level2">
<h2 class="anchored" data-anchor-id="tldr">TL;DR</h2>
<p><a href="https://cline.bot/">Cline</a> is a VSCode extension offering a bring-your-own-key alternative to GitHub Copilot, with the ability to execute commands and plan multi-step edits. While promising in concept, its high token usage and several UX limitations make it difficult to recommend over Copilot Edit (as of Feb 2025). It’s also not as “smart” as you think it might be<sup>1</sup>. Cost per coding session can range from $0.5-3 depending on your model choice.</p>
<p>Also see the verdict section.</p>
</section>
<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p>Cline operates as a chatbot-style interface within VS Code, capable of code generation, modification, and terminal command execution (which sounds more promising than it actually is). The default plan is completely free, with the only cost being your LM API calls.</p>
</section>
<section id="core-functionality" class="level2">
<h2 class="anchored" data-anchor-id="core-functionality">Core Functionality</h2>
<p>Unlike Copilot’s real-time completions, Cline works in a turn-based manner similar to Copilot Edit, where you request specific changes or additions and the AI responds with complete code snippets or modifications. The two most important features are the internal feedback loop and more generous access: Cline can execute code changes in steps following its own plan, and it can modify files/execute commands on your computer.</p>
</section>
<section id="pros" class="level2">
<h2 class="anchored" data-anchor-id="pros">Pros</h2>
<ol type="1">
<li>Bring-your-own-key! Use any LM and provider you want</li>
<li>Cline has a “plan” mode, in which it gathers information and makes a plan</li>
<li>Can request access to files/execute terminal commands</li>
<li>Offers checkpoint features for reverting changes</li>
</ol>
</section>
<section id="cons" class="level2">
<h2 class="anchored" data-anchor-id="cons">Cons</h2>
<ol type="1">
<li>Cline determines when a task is complete, not you. Once it declares the task is complete it’s done. I find this really weird.</li>
<li>Very token-consuming: first request is often 10k+ tokens, hitting context limit is realistic. Each session can be $0.5-3 depending on your model, so expect to spend more than copilot/cursor if you let it run by itself.</li>
<li>No effective code verification: Cline can, in principle, run commands and check outputs, but it doesn’t do it reliably and use command outputs productively.
<ol type="i">
<li>An example: I start a task telling Cline how to verify success (run the script with tests in it). Cline executes the command, and without checking the outputs, immediately declares the task is complete.</li>
<li>In general it feels much like vanilla AI autocomplete: once Cline generates a plan, it executes it step-by-step, without verifying after steps or re-planning. Think about if your initial plan for a coding project every worked out completely!<sup>2</sup></li>
</ol></li>
<li>Cannot revert to checkpoints before AI modifications (as of Feb 2025). This could be a really simple fix, but they don’t yet have it. You’d better have another copy/commit before Cline starts working on your code.</li>
<li>Each session has its own context, so Cline always starts by gathering information. This can be frustrating if your codebase is complicated.</li>
<li>Doesn’t feel as polished compared to Copilot Edit</li>
</ol>
</section>
<section id="verdict" class="level2">
<h2 class="anchored" data-anchor-id="verdict">Verdict</h2>
<p>While Cline offers flexibility through custom API keys, it doesn’t eliminate what I think is the biggest bottleneck in coding — your thinking speed. It’s not reliable enough for you to only care about the high-level functions/designs<sup>3</sup>, so you still have to be in the loop, understand every line of code, and tell it specifically what to do. If you treat it as a human capable of executing on your high-level goals, you will be thoroughly disappointed. But if you treat it like a multi-turn Copilot Edit, it’s not too bad and can definitely be a productivity tool.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Especially given the model we use is <a href="https://x.com/kimmonismus/status/1889732702795940272">ranked 18th</a> in the world in programming. Yes, I tried O1. Yes, I tried DeepSeek R1. As of Feb 2025, you can’t code hands-off yet.↩︎</p></li>
<li id="fn2"><p>Spoiler alert: these executions don’t often work.↩︎</p></li>
<li id="fn3"><p>My general impression of what works/what doesn’t work in AI coding: describing only the high level input-output behavior equals diaster. Giving pseudocode or a complete description of the implementation works (and saves you a lot of time), but usually you need to debug it yourself.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>ai</category>
  <category>ai-coding</category>
  <category>lm-written</category>
  <guid>https://enyanz.com/posts/cline-review/</guid>
  <pubDate>Fri, 14 Feb 2025 05:00:00 GMT</pubDate>
  <media:content url="https://enyanz.com/posts/cline-review/thumbnail.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>Applying to graduate school</title>
  <dc:creator>Enyan Zhang</dc:creator>
  <link>https://enyanz.com/posts/grad-school/</link>
  <description><![CDATA[ 




<section id="the-meta-story" class="level2">
<h2 class="anchored" data-anchor-id="the-meta-story">The Meta-story</h2>
<blockquote class="blockquote">
<p>Meta<br>
<em>adjective</em>: referring to itself or to the conventions of its genre; self-referential. <sup>1</sup></p>
</blockquote>
<p>I don’t think there was, like one would like to imagine, a “moment of revelation” for when I decided that I wanted to apply to grad school. Unlike some other stories you’d hear, it wasn’t that I always thought that I would one day go to grad school either. My opinion is that the real reason we, as humans, decide to do something is often much more complicated (and opaque) than the story we tell others, ourselves, and eventually convince ourselves is the case.</p>
<p>In the same vein, there are many people who write posts that give <em>advice</em> on applying to grad school. I owe much thanks to their work — see “Other Links” on the right for some I felt was really helpful. But I also cannot help but feel that advices are often too <em>distilled</em>: the advice-giver reflects on their experiences, thinks about the larger picture, and summarizes them into their advices. The downside of this process is that a lot of details are lost in the process, and inferring the intended situation for which a piece of advice is applicable is quite non-trivial. <sup>2</sup></p>
<p>Given the wealth of online resources these day, especially for applying to graduate schools in CS, I feel that the most helpful thing for me to do is to fill in the void of such lost details — so instead of giving advices from the unqualified position of a junior graduate student, I’ll try to write about my experiences: how I started doing research, what my application season was like, etc.. Hopefully the experience can be the medium of implicit advices, from which you get to decide what to take away.</p>
</section>
<section id="computer-science" class="level2">
<h2 class="anchored" data-anchor-id="computer-science">Computer Science?</h2>
<p>I learned some programming — which amounted to <code>def</code>, <code>return</code>, <code>for</code>, <code>if</code>, and <code>else</code> in Python — in high school, but nowhere near seriousness. In fact, I started college as a mechanical engineering major, and did it (mostly) for my two years at Rutgers. It wasn’t very fun. Most engineering programs in the US share a common core during the first 2 years, which covers the basics in a broad range of STEM topics. I think it’s because of <a href="https://www.abet.org/accreditation">ABET accredidation</a>. If you’re an engineering major that wants to “build stuff”, you’ll be fairly disappointed during these 2 years.</p>
<p>The final straw (I think, in retrospect) was a mechanical engineering internship I did. I can’t complain much about the company or the projects I was tasked to do — I had an overall fairly positive experience — but there’s also the feeling that traditional mechanical engineering companies are simply not where “things happen”. It did not feel like a career I wanted to have.</p>
<p>Computer science, in contrast, is indeed where “things happen”. I was tranferring to Brown the semester after the internship, so it was a good excuse to start something new. I did the necessary placements, and took intro to CS and deep learning<sup>3</sup> (a very weird combination) in the first semester of my junior year.</p>
</section>
<section id="and-language" class="level2">
<h2 class="anchored" data-anchor-id="and-language">… and Language?</h2>
<p>Interestingly, I think what’s most important to my starting my current research was the <em>humanities electives</em> required by ABET: during my first year at Rutgers, I took and immensely enjoyed 2 philosophy courses, logic and philosophy of language. Philosophy of language, in particular, opened a new world for me. I was fascinated by the project of introspectively characterizing our linguistic capabilities. Taking the (very well given!) advice of the need to have some STEM-humanities balance, I took a philosophy of language seminar, Sense and Reference<sup>4</sup>, the same semester as my first CS courses. It struck me that artificial systems — GPT-3 has been out for 2 years at that point — did not satisfy most of the assumptions we use when analyzing linguistic creatures, yet they seem to master language so well.</p>
<p>The same semester, driven by the fear of not having something to do during my junior year summer, I started looking for research opportunities. One such attempt was going to an ask-me-anything session hosted by <a href="https://cs.brown.edu/people/epavlick/">Ellie</a>, during which I unloaded my philosophy of language questions re. language models towards her. I enjoyed that a lot, and started dropping in to the lab meetings.</p>
<p>After a while, someone in Ellie’s lab asked for help on a new project. I volunteered<sup>5</sup>, and started working on it. Getting the first toy model to train took 3 months. I got a research assistantship during the summer, and getting the first proof-of-concept to work took another 2 months. I was lucky, though: when the start of my senior year approached, I had a project that was taking shape. The <a href="https://arxiv.org/abs/2310.10899">project</a> was a great reflection of my interests: a union of philosophical questions about artificial and natural intelligence and technical neural network research.</p>
</section>
<section id="applying" class="level2">
<h2 class="anchored" data-anchor-id="applying">Applying</h2>
<p>Much like everyone else’s research projects, there was a lot of head-scratching and a lot of frustration involved in my first project. But the sense of fulfillment of finally getting something done, and more importantly, getting something <em>new</em>, something <em>I cared about</em> done led me to think that maybe research could be a career past graduation. I also realized that I had what’s minimally required to apply to grad school: I liked what I did, so I could apply to do similar things and use my experience to back up the application.</p>
<p>Late September 2023, I decided to apply: I wasn’t confident about getting offers<sup>6</sup>, but like a lot of other things in life, the cost of failing is sufficiently low so it’d be foolish not to try. I asked my advisors what programs/people she would recommend, added in authors of papers I read and admire, and built a list. I only included schools I <em>actually want to go</em> (which is against best practices given by <a href="https://matt.might.net/articles/how-to-apply-and-get-in-to-graduate-school-in-science-mathematics-engineering-or-computer-science/">this article</a>, for example, if what you want is to have <em>a</em> place to go). As a remedial strategy, I decided that I will also apply to full-time jobs. It’s nice that the timelines are somehow separated — deadlines for PhD programs are usually in December, when companies slow down interviewing/hiring, and the biggest recruiting season is early fall, before grad school applications start. This choice added significant work, and I did not get a full-time offer (mostly due to my disastrous technical interviews), so I can’t comment on how advisable this strategy is. But I think it’s worth considering.</p>
<p>I also chatted with some graduate students about their experiences: it would seem that at Brown, people are in general fairly happy. No horror stories of unbearable pressure or terrible advisors, and many recommends it as a chance to do something you believe matters. That resonated with me a lot: I still think doing a PhD is what maximizes your chances of doing something you care about and think matters.</p>
<p>Despite being a habitual procrastinator (I wrote my Brown transfer essay 3 hours before the deadline!), I set a hard deadline for my SoP: first draft before the first day of November, and miraculously met it. I then went through a few revisions， mostly asking for comments from friends, asked around for recommendation letters<sup>7</sup>, and submitted my applications during finals week.</p>
</section>
<section id="post-application" class="level2">
<h2 class="anchored" data-anchor-id="post-application">Post Application</h2>
<p>There was a lot of anxiety after I turned in my applications: every email notification would make me jump, and I checked GradCafe multiple times a day. I quickly found out that I’m suffering from too much information: if a school I applied to has updates on GradCafe, I would start wondering if that means I’ll get rejected. In reality (and I think we all know), there are just way too many variables and one can’t reliably predict what’s behind the scene. I stopped polling GradCafe and social media sites for updates in January, and my anxiety lessened significantly.</p>
<p>I eventually started getting interviews — they can happen at any time, but mine turned out to be early — and enjoyed chatting with the professors that interviewed me. It felt a lot better than job interviews. Instead of being interrogated for technical details, My interviews are more like research chats<sup>8</sup>. If there’s a shared passion/niche in research, the chat usually goes quite well. I also tried selling people my project ideas.</p>
<p>Then in February and March, I went to school visits. I really enjoyed talking to people: grad students, professors, other visitors. And on a range of topics: my research, their research, where the field is going, where my (their) life is going, how my (their) life currently is. All of the trips are paid for, which is a nice cherry on top as well. I also found keeping notes very helpful: these chats are much more effective if you keep a list of questions, as well as whom to ask. For example, you might want to ask for comments on your potential advisor from their students, other students in the department, their collaborators, and themselves. It’s also very useful if you keep notes of their answers — maybe do that when you come back to the hotel at the end of the day. There are details that easily get lost if you don’t note them down: potential collaborators, papers to check out, bureaucratic red tape, etc..I personally find the notes really useful when making decisions.</p>
<p>But beyond obtaining information, I think you get a better feeling of what it’s <em>like</em><sup>9</sup> to be there at a school visit. The more you ask and experience, the clearer your picture is going to be — and I think that’s the best way of deciding where to go: imagine yourself being at those places, which one do you like more?</p>
<p>I talked to a lot of people (maybe too many) while trying to make the decision. In retrospect, the choice was pretty clear to begin with, I was just indecisive and wanted the best of both worlds. I think this is the case for many important decisions we make: if you picture yourself after choosing different options, the choice becomes a lot more clear. I sent response emails in early April, and accepted my offer officially — the official deadline is April 15 and I think you shouldn’t get pressured to decide early, but it’s also good to accept as soon as you have decided.</p>
</section>
<section id="afterwords" class="level2">
<h2 class="anchored" data-anchor-id="afterwords">Afterwords</h2>
<p>I am currently writing this blog post in my apartment, half a year after coming to Yale. I am, so far, enjoying my grad school life. I think there was a lot of luck (and privilege) involved in my application process, and we shouldn’t try to summarize too much from past examples. But in light of all the uncertainties, it’s even more important that one explore — had I not taken philosophy of language, I wouldn’t be here now — and attempt — had I decided too early that I wasn’t qualified, for research or for grad school, I also wouldn’t be here. I consider myself an unlikely and atypical applicant, and I suppose there are two sides of this story: part of the hope of writing this post is to give some other atypical applicant like myself a reference data point, but the other part is that not having a reference data point doesn’t mean something shouldn’t be attempted.</p>
<p>You can find my SoP <a href="yale_sop.pdf">here</a>. I’m also happy to chat more.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Google English Dictionary, provided by Oxford Languages↩︎</p></li>
<li id="fn2"><p>There’s the saying that “for any non-trivial piece of advice, the opposite is often also true”. I don’t know what the source is, but I deeply feel that this is the case.↩︎</p></li>
<li id="fn3"><p>Oweing to having double majored statistics at Rutgers, I actually did have the background needed for deep learning.↩︎</p></li>
<li id="fn4"><p>The course gets its name from <a href="https://en.wikipedia.org/wiki/Sense_and_reference">Frege</a>, and it’s also an <a href="https://www.frege.org/phil1860/course_desc.php">amazing course</a>.↩︎</p></li>
<li id="fn5"><p>Without knowing Pytorch (I only learned Tensorflow) or Huggingface↩︎</p></li>
<li id="fn6"><p>One main worry was the fact that I had only started CS a year ago, and never did the classic sequence of requirements. That turned out to be less important that I anticipated.↩︎</p></li>
<li id="fn7"><p>A huge headache and stress factor, especially if your recommender is not very responsive.↩︎</p></li>
<li id="fn8"><p>I’ve seen some people prepare presentation slides about their research projects. I think this can be helpful, but if the default is to “chat about research”, my opinion is that presenting with slides actually kills the atmosphere and may lead to people fixating on technical details↩︎</p></li>
<li id="fn9"><p>How your life will be <em>like</em> is very much a type of qualia: borrowing classic philosophical theories, we can’t know this unless we <em>lived</em> it, and visits give you a good taste.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>research</category>
  <category>life</category>
  <guid>https://enyanz.com/posts/grad-school/</guid>
  <pubDate>Sat, 01 Feb 2025 05:00:00 GMT</pubDate>
  <media:content url="https://enyanz.com/posts/grad-school/cit.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>RNNs, Huggingface Trainer, and PackedSequence’s</title>
  <dc:creator>Enyan Zhang</dc:creator>
  <link>https://enyanz.com/posts/trainer-packed-sequence/</link>
  <description><![CDATA[ 




<section id="tldr" class="level2">
<h2 class="anchored" data-anchor-id="tldr">TL;DR</h2>
<section id="the-issue" class="level3">
<h3 class="anchored" data-anchor-id="the-issue">The Issue</h3>
<p>When training with Huggingface <code>Trainer</code>, If your data collator (<code>data_collator</code> in <code>Trainer</code> or <code>collate_fn</code> for PyTorch <code>DataLoader</code>) outputs a <code>PackedSequence</code> for training an recurrent model (rnn/lstm/gru/who knows), there will be an assertion error <code>assert isinstance(data, (list, tuple)) and len(data) == 2</code> triggered by line 254 of <code>torch/nn/utils/rnn.py</code></p>
</section>
<section id="the-solution" class="level3">
<h3 class="anchored" data-anchor-id="the-solution">The Solution</h3>
<p>Huggingface trainer is sending the <code>PackedSequence</code> to the correct device (e.g.&nbsp;GPU) incorrectly, you need to override a method, see this section for the code.</p>
</section>
<section id="an-even-better-way" class="level3">
<h3 class="anchored" data-anchor-id="an-even-better-way">An Even Better Way</h3>
<p>See afterwords. This issue is completely avoidable if you define your model class differently.</p>
</section>
</section>
<section id="full-story" class="level2">
<h2 class="anchored" data-anchor-id="full-story">Full Story</h2>
<section id="background" class="level3">
<h3 class="anchored" data-anchor-id="background">Background</h3>
<p>Recently I was training toy RNNs for a project. Writing a <code>train</code> function with a <code>for epoch in range(epochs)</code> in 2024 felt very wrong (and unnecessary), so I thought about making everything work with <code>Trainer</code> of <a href="https://huggingface.co/docs/transformers/en/index">Huggingface Transformers</a>. There are many good reasons for doing so (and it was a huge quality of life improvement!), I’ll list a few I’ve already used (and worked pretty much out of the box):</p>
<ul>
<li>saving/loading models with a one-liner</li>
<li>adding/changing learning rate schedules</li>
<li>generating with <code>.generate()</code></li>
<li>doing simple hyperparameter sweeps (see <a href="https://huggingface.co/docs/transformers/en/hpo_train">Hyperparameter Search</a>)</li>
</ul>
<p>But things don’t always work, and when they don’t work, debugging <code>Trainer</code> is frustrating — it does too many things and many such things rely on heuristics, below is an incomplete list of issues I already came across (and still remember debugging):</p>
<ul>
<li>it assumes the training target is a dict entry called <code>label</code> or <code>labels</code>, and will skip <code>evaluate()</code> otherwise — but it won’t skip the eval loop entirely, instead it will only return eval metainfo such as runtime. The solution is to specify <code>label-names</code> in <code>TrainingArguments</code>.</li>
<li>it sends tensors to the model’s device by recursively iterating all inputs until it reaches the basic data elements (which should normally be some <code>Tensor</code>), but the heuristic for stopping this recursion is <code>hasattr(data, "to")</code> (see <a href="https://github.com/huggingface/accelerate/blob/03153658f4165206e3a18e8c1d668ec3d6592ed0/src/accelerate/utils/operations.py#L148">source</a>) — so if you define a class that contains your custom data, it absolutely cannot have a <code>to</code> mathod that does something else.</li>
</ul>
<p>And unfortunately one such heuristic breaks Pytorch RNNs. Here’s the premise:</p>
<ol type="1">
<li>Transformers deal with variable-length sequences by padding inputs
<ol type="1">
<li>This is usually done by a <code>DataCollator</code>, which gets a list of dict and returns a dict of collated tensors (the action of “creating a batch” from samples)</li>
<li>Additionally, <code>attention_mask</code> helps model zero-out attention on padding tokens, so effectively the model does not “see” the padded tokens</li>
</ol></li>
<li>RNNs also need to deal with variable-length input sequences
<ol type="1">
<li>It’s best if we also delegate this task to a data-collating function</li>
<li>But RNN’s can’t deal with padding! There’s no trivial parallel for something like <code>attention_mask</code>, especially because Pytorch RNNs have are called with the entire sequence at once, as opposed to manually “unrolling” the model.</li>
</ol></li>
</ol>
<p>The solution of the above problem is to use a <a href="https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.PackedSequence.html"><code>PackedSequence</code></a>. The underlying idea is quite simple: instead of viewing the input as a batch of sequences, view it as a sequence of batches, where each batch can have a different batch size. The figure below illustrates it quite well<sup>1</sup>:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://enyanz.com/posts/trainer-packed-sequence/packed_sequence.jpg" class="img-fluid figure-img"></p>
<figcaption>A Visual Illustration of <code>PackedSequence</code> from <a href="https://github.com/sgrvinod/">@sgrvinod</a></figcaption>
</figure>
</div>
<p>So the solution seems simple enough: we just need to define a data collating function that creates a <code>PackedSequence</code> from a list of samples, like the one below, and life’s good, right?</p>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> torch.nn.utils.rnn <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pack_padded_sequence</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> collate_fn(examples):</span>
<span id="cb1-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># first collate, e.g. using torch.stack</span></span>
<span id="cb1-5">  examples <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {k: torch.stack([e[k] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> e <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> examples]) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> examples[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]}</span>
<span id="cb1-6"></span>
<span id="cb1-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># assume you previously tokenized the input with a transformers tokenizer</span></span>
<span id="cb1-8">  input_lengths <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(tokenized_input[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"attention_mask"</span>], dim<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-9">  examples[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"input_ids"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pack_padded_sequence(</span>
<span id="cb1-10">    examples[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"input_ids"</span>], </span>
<span id="cb1-11">    input_lengths, </span>
<span id="cb1-12">    batch_first<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, </span>
<span id="cb1-13">    encforce_sorted<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb1-14">    )</span>
<span id="cb1-15"></span>
<span id="cb1-16">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> examples</span></code></pre></div>
</section>
<section id="why-trainer-cannot-process-packedsequences" class="level3">
<h3 class="anchored" data-anchor-id="why-trainer-cannot-process-packedsequences">Why <code>Trainer</code> cannot process <code>PackedSequence</code>’s</h3>
<p>If only life is so easy — I invite you to re-read the title of this post and realize that we’ve only just gotten to the issue. If you tried <code>trainer.train()</code> with a <code>collate_fn</code> like above, you will get the following cryptic error message:</p>
<div class="bash-output">
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">0%</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span>                                                    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">0/12520</span> [00:00<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">?</span>, <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">?</span>it/s]</span>
<span id="cb2-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Traceback</span> <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">most</span> recent call last<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb2-3">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/src/train.py"</span>, line 243, in <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>module<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb2-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">main()</span></span>
<span id="cb2-5">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/src/train.py"</span>, line 173, in main</span>
<span id="cb2-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">trainer.train()</span></span>
<span id="cb2-7">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 2123, in train</span>
<span id="cb2-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">inner_training_loop</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb2-9">           <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-10">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 2481, in _inner_training_loop</span>
<span id="cb2-11">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">tr_loss_step</span> = self.training_step<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">model,</span> inputs, num_items_in_batch<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb2-12">                   <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-13">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 3573, in training_step</span>
<span id="cb2-14">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">inputs</span> = self._prepare_inputs<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">inputs</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb2-15">             <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-16">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 3520, in _prepare_inputs</span>
<span id="cb2-17">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">inputs</span> = self._prepare_input<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">inputs</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb2-18">             <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-19">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 3502, in _prepare_input</span>
<span id="cb2-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">data</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">{k:</span> self._prepare_input<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">v</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">,</span> v in data.items<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">}</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb2-21">                      <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-22">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 3502, in <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>dictcomp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span></span>
<span id="cb2-23">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">data</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">{k:</span> self._prepare_input<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">v</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">,</span> v in data.items<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">}</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb2-24">                          <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-25">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/transformers/trainer.py"</span>, line 3504, in _prepare_input</span>
<span id="cb2-26">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">data</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">self._prepare_input</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">v</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb2-27">           <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-28">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/torch/nn/utils/rnn.py"</span>, line 93, in __new__</span>
<span id="cb2-29">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">*_packed_sequence_init_args</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb2-30">     <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^^^^^^^^^^^^^^</span></span>
<span id="cb2-31">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">File</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/&lt;project-dir&gt;/.venv/lib64/python3.11/site-packages/torch/nn/utils/rnn.py"</span>, line 254, in _packed_sequence_init_args</span>
<span id="cb2-32">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">assert</span> isinstance<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">data,</span> <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">list,</span> tuple<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">))</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">and</span> len<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">data</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">==</span> 2</span>
<span id="cb2-33">                                               <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">^^^^^^^^^^^^^^</span></span>
<span id="cb2-34"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">AssertionError:</span> </span>
<span id="cb2-35">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">In</span> call to configurable <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'main'</span> <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>function <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">main</span> at 0x148164687240<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span></code></pre></div>
</div>
<div style="height: 4ex;">

</div>
<p>What happend?? If you look at the call stack at this point, it’s roughly the following:</p>
<ol type="1">
<li><code>Trainer</code> dispatches a batch (list of examples) to our collator</li>
<li>Collator does its job, returning a <code>dict</code> where the value corresponding to <code>input_ids</code> is a <code>PackedSequence</code></li>
<li>The collated batch (now one <code>dict</code>) gets sent to <code>_prepare_inputs</code>, which then sends the batch to <a href="https://github.com/huggingface/transformers/blob/62db3e6ed67a74cc1ed1436acd9973915c0a4475/src/transformers/trainer.py#L3516"><code>_prepare_input</code></a> to map the inputs on the right devices</li>
<li>Since the collated bunch can have arbitrary nesting (think a dict of list of tensors), <code>_parepare_input</code> recursively calls itself until it reaches the bottom level — tensors — and puts them to the right device. See below:</li>
</ol>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> _prepare_input(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, data: Union[torch.Tensor, Any]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Union[torch.Tensor, Any]:</span>
<span id="cb3-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Prepares one `data` before feeding it to the model, be it a tensor or a nested list/dictionary of tensors.</span></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    """</span></span>
<span id="cb3-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(data, Mapping):</span>
<span id="cb3-6">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(data)({k: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._prepare_input(v) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k, v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data.items()})</span>
<span id="cb3-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">elif</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(data, (<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">tuple</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>)):</span>
<span id="cb3-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">type</span>(data)(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._prepare_input(v) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data)</span>
<span id="cb3-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">elif</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(data, torch.Tensor):</span>
<span id="cb3-10">        kwargs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"device"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.args.device}</span>
<span id="cb3-11">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.is_deepspeed_enabled <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> (torch.is_floating_point(data) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> torch.is_complex(data)):</span>
<span id="cb3-12">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># NLP models inputs are int/uint and those get adjusted to the right dtype of the</span></span>
<span id="cb3-13">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># embedding. Other models such as wav2vec2's inputs are already float and thus</span></span>
<span id="cb3-14">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># may need special handling to match the dtypes of the model</span></span>
<span id="cb3-15">            kwargs.update({<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dtype"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.accelerator.state.deepspeed_plugin.hf_ds_config.dtype()})</span>
<span id="cb3-16">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> data.to(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs)</span>
<span id="cb3-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> data</span></code></pre></div>
<p>If you look at the error message, <code>PackedSequence</code>’s constructor here is complaining that it didn’t get enough arguments: there needs to be at least 2, the padded tensor and lengths of each example. If you use a debugger you’ll also find that the <code>data</code> getting passed here is only one tensor. Why?</p>
<p>It turns out, <code>PackedSequence</code> inherits from <code>NamedTuple</code>, which in turn is a <code>tuple</code>!</p>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">$</span> python</span>
<span id="cb4-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Python</span> 3.12.7 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">packaged</span> by conda-forge <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">main,</span> Oct  4 2024, 15:57:01<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">[Clang</span> 17.0.6 ] on darwin</span>
<span id="cb4-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">Type</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"help"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"copyright"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"credits"</span> or <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"license"</span> for more information.</span>
<span id="cb4-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;&gt;</span> import <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">torch</span></span>
<span id="cb4-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;&gt;</span> from <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">torch.nn.utils.rnn</span> import PackedSequence</span>
<span id="cb4-6"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;&gt;</span> a <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">=</span> PackedSequence<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">torch.tensor</span><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">[[1,</span> 2], [1, 1]]<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">,</span> torch.tensor<span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">[1,</span> 2]<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">))</span></span>
<span id="cb4-7"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;&gt;</span> isinstance<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">a,</span> tuple<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span></span>
<span id="cb4-8"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">True</span></span></code></pre></div>
<p>So in the second <code>elif</code> of <code>_prepare_input</code>, Huggingface trainer incorrectly iterates over it, thinking it’s a list of some sort, and then proceeds to attempt to instantiate a new <code>PackedSequence</code>. All the fuss because a slightly wrong heursitic.</p>
</section>
<section id="fixing-the-issue" class="level3">
<h3 class="anchored" data-anchor-id="fixing-the-issue">Fixing the issue</h3>
<p>Fixing the problem once we know what happened is fairly easy: a specific problem calls for a specific solution. Just define a mixin for <code>Trainer</code> classes that overrides default behavior if the data is a <code>PackedSequence</code>, and subsequenctly define new <code>Trainer</code>’s that inherits from the mixin.</p>
<p>If you have the exact issue, adding the codeblock below should be a simple fix (notice that it replaces <code>Trainer</code> and <code>Seq2SeqTrainer</code> by subclassing them).</p>
<div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> transformers <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> (</span>
<span id="cb5-2">    Seq2SeqTrainer <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> HFSeq2SeqTrainer,  </span>
<span id="cb5-3">    Trainer <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> HFTrainer,</span>
<span id="cb5-4">)</span>
<span id="cb5-5"></span>
<span id="cb5-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> PrepareInputMixin:</span>
<span id="cb5-7">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> _prepare_input(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>, data: Union[torch.Tensor, Any]) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> Union[torch.Tensor, Any]:</span>
<span id="cb5-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(data, PackedSequence):</span>
<span id="cb5-9">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> PackedSequence(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>._prepare_input(data.data), data.batch_sizes, data.sorted_indices, data.unsorted_indices)</span>
<span id="cb5-10">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb5-11">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">super</span>()._prepare_input(data)</span>
<span id="cb5-12"></span>
<span id="cb5-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> Seq2SeqTrainer(PrepareInputMixin, HFSeq2SeqTrainer):</span>
<span id="cb5-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span></span>
<span id="cb5-15"></span>
<span id="cb5-16"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> Trainer(PrepareInputMixin, HFTrainer):</span>
<span id="cb5-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span></span></code></pre></div>
<p>The code should now run! (or, at least, you should now see a different bug!)</p>
</section>
</section>
<section id="afterword" class="level2">
<h2 class="anchored" data-anchor-id="afterword">Afterword</h2>
<p>Only after I fixed this bug, I realized that this is totally preventable: an even better way to train RNNs is to do the packing (and unpacking) of tensors within the model’s <code>forward</code> method. This has a few advantages: it’s more compatible with huggingface’s api (you can, for example, sum <code>attention_mask</code>’s to infer the sequence length, or add an <code>input_lengths</code> argument), and it also makes embedding and encoder-decoder structures more intuitive. So something like the following</p>
<div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> RecurrentEncoder(PreTrainedModel):</span>
<span id="cb6-2">    config_class <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> RecurrentEncoderConfig</span>
<span id="cb6-3"></span>
<span id="cb6-4">    ... other methods ...</span>
<span id="cb6-5"></span>
<span id="cb6-6">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> forward(</span>
<span id="cb6-7">        <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>,</span>
<span id="cb6-8">        input_ids: torch.LongTensor,</span>
<span id="cb6-9">        input_lengths: Optional[torch.LongTensor],</span>
<span id="cb6-10">        return_hidden_states: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">bool</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb6-11">    ) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> BaseModelOutputWithNoAttention:</span>
<span id="cb6-12"></span>
<span id="cb6-13">        embedded <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.embedding(input_ids)</span>
<span id="cb6-14"></span>
<span id="cb6-15">        packed_embedded <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.nn.utils.rnn.pack_padded_sequence(</span>
<span id="cb6-16">            embedded, input_lengths, batch_first<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, enforce_sorted<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb6-17">        )</span>
<span id="cb6-18"></span>
<span id="cb6-19">        packed_output, hidden <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>.recurrent_unit(packed_embedded)</span>
<span id="cb6-20"></span>
<span id="cb6-21">        hidden_states, _ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.nn.utils.rnn.pad_packed_sequence(packed_output, batch_first<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, padding_value<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>)</span>
<span id="cb6-22"></span>
<span id="cb6-23">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> BaseModelOutputWithNoAttention(</span>
<span id="cb6-24">            hidden_states<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>hidden_states,</span>
<span id="cb6-25">        )</span></code></pre></div>
<p>I should probably tidy up and make a release for the reccurent models I wrote at some point.</p>
</section>
<section id="credits" class="level2">
<h2 class="anchored" data-anchor-id="credits">Credits</h2>
<p>Thumbnail image: <a href="https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks">Stanford CS 230</a><br>
<code>PackedSequence</code>’s: This <a href="https://gist.github.com/HarshTrivedi/f4e7293e941b17d19058f6fb90ab0fec">Github demo</a>, and this <a href="https://stackoverflow.com/questions/51030782/why-do-we-pack-the-sequences-in-pytorch">Stackoverflow answer</a></p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>In addition, I liked this <a href="https://stackoverflow.com/questions/51030782/why-do-we-pack-the-sequences-in-pytorch">StackOverflow answer</a> explaining how it works.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>huggingface</category>
  <category>research</category>
  <guid>https://enyanz.com/posts/trainer-packed-sequence/</guid>
  <pubDate>Fri, 31 Jan 2025 05:00:00 GMT</pubDate>
  <media:content url="https://enyanz.com/posts/trainer-packed-sequence/rnn.png" medium="image" type="image/png" height="38" width="144"/>
</item>
</channel>
</rss>
