A Computer Runs Code

a Next.js RCE, a long habit, the day it caught something, and the year after

Reconstructed from Slack screenshots, firewall logs, and the SOC 2 incident report. Names and specific identifiers are anonymized; the conversations, the sequence, and the technical details are real.

There is a thing every working developer does, every day, that we have collectively decided not to think about.

We run code we did not write.

Not in some abstract sense. Literally. The dependency you installed this morning, the colleague's branch you checked out to review, the example app from the vendor you're evaluating, the framework you upgraded last week — all of it executes on your machine, as your user, with your filesystem, your network, your SSH keys, your cloud credentials, your everything. The industry's answer to this is mostly: try not to think about it. Pin your versions. Glance at the deprecation warnings. Hope.

I have been thinking about it for a long time. Not because anything in particular happened, but because of a single observation that I've never quite been able to set down:

A computer runs code. Any code.

That is the entire thing it does. If something can place code on the machine, that code will run. If it can place code that runs as me, the code can do what I can do. The defensive question is not whether code will run — it will — but what running code is permitted to reach.

This piece is about how that orientation has slowly shaped the machine I work on, over years of small adjustments, and about one day a few months ago when one of those adjustments caught something, and about the year that followed.

I run each client project on my dev machine under its own unprivileged user, inside a bwrap sandbox, behind a default-deny outbound firewall that alerts me when something tries to leave that isn't on the allow-list. The machine sits behind a second outbound filter on a separate box, on the assumption that any defense running on the same hardware as the threat can be disabled by the threat. I built this over years for the same reason most developers don't: I couldn't stop thinking about the fact that a computer runs code, any code, and that running code reaches whatever you let it reach. When other developers have called it overkill, I have not argued. Last December a Next.js RCE proved them wrong.

the morning

A few months ago, a known remote-code-execution vulnerability in Next.js was being actively exploited in the wild. Scanners came around. One of them reached a project I was actively developing for a client. The project was, during a small window that morning, three things at once: running an unpatched version of the framework, with a webhook endpoint exposed, behind a port forward I hadn't yet closed.

The exploit attempt succeeded, in the sense that the framework did the thing it shouldn't. The compromised process tried to call out — specifically, to download a binary from a path that looked like /word/arch, the kind of architecture-tagged URL that suggests commodity malware with payloads ready for whatever it lands on. The local firewall denied the outbound connection and alerted me. Default-deny, with a notification on hits. The log line is unremarkable:

DROP OUT proto=tcp dst=[redacted] dport=80 uid=project-runtime

That's it. That's the whole save, that morning. The second-stage never arrived. Whatever the attacker had planned to do next, they could not do, because the only thing they had reached was a process that could not reach the internet on its own behalf.

It would not be the last save from that compromise. In the days that followed, before the wipe, more than ten distinct IPs would show up in the firewall log, each one a different command-and-control endpoint, each one denied by the same rule. My first guess, while it was happening, was that the dropper had landed on my box and was beaconing out periodically — patient, on idle. I was wrong. What I worked out later was that there was no persistent implant; the vulnerability was the persistent thing. The endpoint stayed reachable; the CVE stayed exploitable; scanners kept finding the endpoint and running the exploit; each successful exploit produced one fresh curl-out attempt to whatever payload server that scanner had been configured to use. The picture in my head had been "an implant retrying." The picture that fit the data was "the same wound, picked at by different hands, every day."

The allow-list rule didn't care about specific addresses. It was a small list of destinations the project actually needed, with everything else dropped. That distinction matters more than it looks. A blocklist requires you to know in advance which addresses are malicious; the attacker only needs to rotate to a new one to defeat it. An allow-list requires you to know which addresses are good; the attacker can rotate forever and never find one. The architecture I had wasn't beating a specific IP. It was beating a category. It also wasn't beating a single exploit attempt — it was beating every exploit attempt the same vuln produced, against the same wound, until I closed the wound.

I didn't read the first log line the next morning. I got the alert in real time and started looking at it within minutes. The firewall rule was scoped to a specific user — the unprivileged user that runs that project, and only that project — so as soon as I saw which user the blocked connection was from, I knew which project the compromise had to be in. From alert to attribution: a couple of minutes. The user-per-project layer wasn't just isolating things; it was labelling them. When something fires, the user identifier tells you immediately what to look at.

the escalation

I work as a contractor on a small team. There is a dev team lead who runs coordination, an architect who owns systems decisions, an infra guy who owns the production environment, and a CEO. We don't have a security person. We are too small, for now.

What I did in the next four minutes, in roughly this order:

I DM'd the dev team lead, in our shared native language. Short and direct. I think we did get infected. My computer is calling that URL via curl. Only from the project's user. My firewall alerted me so it didn't get through. But if you don't have an outgoing firewall, you already have the malware.

I DM'd the architect, in English. I think we did get malware. My project user on my linux box is trying to connect to that URL. I have an outgoing firewall, the alert popped up, I rejected the connection. Probably some dependency is infected, or was at some point.

I DM'd the CEO, in English. Same picture, framed for the executive: I have isolation, I have an outbound firewall, I see traffic, here is the URL, probably one of our dependencies is or was infected.

The CEO's first response was: Have you told anyone else yet? I told him: the dev team lead and the architect. He said okay. A few minutes later he asked me to repeat it in the team channel. I did.

the day

What followed was the first incident the team had ever responded to, by which I mean: an actual security event with a clear inbound vector, a clear blocked outbound, and a payload that turned out, on inspection, to be real malware in the wild.

The technical part of the response was straightforward. I held off on wiping the machine to gather more evidence. I lifted the firewall block briefly so that if the dropper tried again, I could observe the originating process before the rule re-engaged. The dev team lead surfaced threat-intel on the IP — Mirai-style commodity malware, real IOC in active campaigns — meaning whatever had run on my box was not theoretical.

In parallel, I had three private threads running.

With the dev team lead, in our native language, peer register: explaining the architecture he hadn't known I had, walking him through how to set up the equivalent on his machine, correcting his macOS-specific log commands gently — log show isn't a valid command on Linux, I told him, I can check journalctl, but I assure you there's nothing there. He's the dev team lead, not a security person, and his job that day was to assemble the picture for everyone else and eventually write the team's incident summary. My job was to feed him an accurate picture as fast as he could absorb it.

With the architect, who handles systems-level decisions: a different conversation, scoped to what would have to be rotated and in what order. We can't trust the keys now. He pushed back, correctly: rotation has real cost, it takes a lot of effort, it depends on what's actually compromised. He started bounding the scope. He asked what keys I had on the machine. All of them. He asked for my environment variables. I sent them. Then, when the picture was clear enough, I made sure he understood — privately, because it mattered for the kind of rotation we were doing — that I had not actually been compromised. The firewall caught it. The rotation we were about to do was precautionary, not corrective. The distinction changes the urgency and the sequencing. He needed the right model to make the right call.

With the CEO: strategy. The line I sent him half an hour into the morning was the one I most want on the record now. Please don't let the team look into anything like this. We don't know. We won't know. We need to act accordingly. I was telling him, in advance, what was about to happen: the team would do the investigations a team does, the investigations would come back clean, and the team would want to declare clean. The investigations were going to be uninformative. The right response wasn't to find out what happened. The right response was to act as if the worst case had happened and to change the standing posture so that next time we'd have better information. The CEO understood. He thanked me. He told me to keep contributing in the group channel, and later, when I DM'd him saying I thought there was a misunderstanding about networking in the group thread, he wrote back: it's ok to explain if you have thoughts on things, this is a group discussion. He was giving me permission to push back publicly. I pushed back publicly. The conversation moved.

the gap

What I did not say to the team, in any channel, all day, was the thing I most believed.

The team did the investigation a team does. Greps against /var/log. Checks for the IP in shell history. Process tables. Looking for suspicious files in /tmp. Everyone's machine came back clean. The Mac users found nothing. The production servers had no record of the IP in their logs. The conclusion the team reached, and that the dev team lead eventually wrote up, was that only my Linux dev box had been affected and that the rest of the environment was clean.

The investigation was searching for evidence that could not exist.

The compromise on my machine had produced no application-layer signal. Next.js had not thrown an exception. The framework's logs were clean. No stack trace, no error, no anomaly in any request handler. The server kept returning 200s. From the application's point of view, nothing had happened. The OS does not log application-level network activity unless something has specifically been configured to watch it, and the malware does not write to system logs at all. The only artifact from the entire compromise was a line in my outbound firewall log, recording a connection denied — because I had an outbound firewall. The other machines did not. When the team ran grep -R for the IP on their boxes, they were searching for an artifact only an outbound filter could produce, on machines that didn't have one.

The same was true, after a reboot, of my own machine. The firewall rules are session-scoped — they get cleared on logout, fresh on login, a deliberate choice that forces me to re-approve traffic at the start of every session. By the time the team's investigation started its rounds, the original log line was gone. The standard grep on my box would have returned the same empty result it returned on everyone else's. The only reason I knew my machine had been hit was that I had seen the block in real time, before reboot. Half an hour later, no method available to the team could have shown that anything had happened to me either.

I did not bring this up. Pointing out that the team's evidence couldn't distinguish a clean machine from a rebooted compromised one would have changed nothing except how long the afternoon took. The conversation that follows that observation is a conversation about epistemics, in the middle of an incident, with a team that has actual work to get back to. The right move is to let the investigation close on the evidence it has, document what you can be sure of, and put the day's remaining energy into the thing that will reduce next time's uncertainty. The forensic gap stays a gap. You do not pretend it does not.

This is the part of incident response that nobody writes about, because there is no satisfying way to write about it. The picture in everyone's head is forensic — find the malware, analyze the kill chain, post the IOCs, declare resolved. On most teams, on most machines, the compromise will not have produced the things that picture expects to find. The only honest signal is at the network layer, and only if someone has, in advance and on every relevant machine, set up something to watch and judge it against an expected allow-list. We had not. I had. The save was real on my box. The rest of the team's clean reports were the most that could be said from the data available, and that is what incident response usually consists of.

what I got wrong

The framework was unpatched at the time the attack arrived. Not because I had been negligent — the CVE was new. There was no patch yet on the version I was running. This is the actual scenario defense in depth is for: the window between a vulnerability being disclosed and a patch being available, during which the application layer is not where the save can happen. The architecture caught a zero-day window, not a piece of procrastination. The lesson is not "patch faster." Between disclosure and patch, the outer layers are the only thing you have.

What I could have done differently is the network exposure. The webhook endpoint was reachable through a direct port forward. A tunnel service — ngrok, Cloudflare Tunnel, Tailscale Funnel — would have given the webhook an ephemeral hostname per session, invisible to mass scanners. The exposure window narrows from "anyone scanning the right IP range" to "whoever knows the specific tunnel hostname." I have since switched to a tunnel. The port forward is gone.

the rest of the day

By mid-afternoon the technical response was closed and the SOC 2 report was filed. With that done, the team had bandwidth to talk about what we wanted the next incident to look like. How to handle a CVE as a posture rather than as a panic. What to do about the constant background hum of internet probing — /wp-admin/, /.env, /.git/config at every IP that responds to anything, which is mostly other people's old PHP installations being prodded by botnets that have not been updated since 2014 and never will be — and whether to treat that hum as noise or as signal. Whether to add outbound IP block rules at the AWS level for production. Whether to opt into --ignore-scripts on npm installs as a standing policy.

The CEO made the budget calls — he bought Little Snitch licenses for every developer with a Mac, in the channel, in real time, as soon as the conversation surfaced the need. The architect made the technical calls. The infra guy proposed using the AWS firewall going forward.

The conversation was the kind that happens once and changes some things and not others. That seems to be the rate at which these things move.

the year after

That was December. This is May. Five months later, today, the team is implementing the AWS outbound traffic filtering that we decided in principle that afternoon. The implementation conversation is happening in a channel as I write this. I am in it, contributing what I know, but it is not my project. The infra guy is leading. The architect is reviewing. Five months is exactly the right amount of time for an org of this size to move from "we should do this" to "we're doing this" — not so fast that it was reactive panic, not so slow that it never happened.

The team has developed a rough CVE-handling posture since then — what gets patched immediately, what waits, what gets a temporary mitigation, when an event is severe enough to warrant the full Dec 8 escalation pattern. Most CVEs do not warrant it. A few do. Telling the difference quickly is most of what handling these well actually consists of.

Key rotations have happened twice this year, both cleanly. Not as crises. As scheduled work, executed in order, by people who now know what rotation means and what it costs and how to sequence it without breaking production.

Last Sunday, two days before I am writing this, the team ran an IOC sweep against a fresh wave of supply-chain compromises in the npm ecosystem that had broken in the news that weekend. We found nothing. We moved on. By Tuesday morning, when the CEO posted the story to the team channel for general awareness, the operational response had already happened. I replied with a joke and a screenshot of npm list --depth=10000 | wc -l from the project I was working in. The channel reacted with thumbs-ups. The CEO acknowledged. We went back to work.

That exchange — the CEO posting a supply-chain article unprompted on a Tuesday morning, me responding with a screenshot and a joke, the team reacting in acknowledgment — would have been impossible in December. The CEO did not yet know what an outbound firewall was for. The team did not have a name for the kind of work that gets done on a Sunday before a news cycle reaches Tuesday. Something is different now. I do not know how much of the difference came from the long, chaotic day in December and how much from everything that happened since. Probably both.

What I can say is that the team responds to events now. Not perfectly. Not the way a dedicated security team would. But the response is calm, it is competent, and it happens before the news does.

The setup I have described is not a complete defense. No setup is. The architecture sits where it sits because the trade-offs further out start to cost more than they buy, for the way I actually use this machine. Someone defending different things would draw the line further out. I have done enough experimenting at the edges to know that my line is a choice, not a maximum. It is the version that lets me be a developer and also lets me sleep.

If you take one thing from this: don't expose your dev box through a direct port forward when you need a webhook. Use a tunnel — ngrok, Cloudflare Tunnel, Tailscale Funnel, anything in that family. Ephemeral hostname, closes when the terminal does, doesn't put your IP in the scanners' path.

The port forward is gone.

We live in a dangerous world.