> agents + claude

6 posts

Tuesday, March 17, 2026

Ephemeral Agents

Occasionally I want Claude Code to do something that doesn’t require access to any of my files or a persistent output. Think something like “open this website and summarize it for me” or “here is a link to a zip, what’s in it”. My goto solution was having a folder ~/empty and starting claude in there with the built-in bubblewrap sandbox. Two issues: First, that folder is no longer empty and just fills with all the garbage I temporarily dumped there. Second, the Claude Code sandbox is not very great, both from a usability and security standpoint ¹. And I don’t want Claude to even have read access to any part of my filesystem, which is enabled by default in the sandbox.

Now I have a new command for that, instead of claude I just call `claudia` (inspired by the gitworktree folder names).

claudia () {
        local dir=$(mktemp -d /tmp/claudia.XXXXXX)
        pushd "$dir" > /dev/null && safehouse --append-profile="$HOME/.config/safehouse/profiles/nix.sb" -- claude --dangerously-skip-permissions
        popd > /dev/null
        rm -rf "$dir"
}

It uses the safehouse sandbox-wrapper to run claude through sandbox-exec (mac exclusive) in a fresh temporary folder that it cleans up afterwards. In contrast to the default sandbox this also works with uv and nix-shell, using this profile. It also gives write-access to the nix-cache so the agent can run nix-shell and get new dependencies. This is somewhat of a security risk, but very convenient :)

Sandbox Profile

cat $HOME/.config/safehouse/profiles/nix.sb
;; Toolchain: Nix
;; Nix store, nix-darwin system profile, home-manager, and user profile paths.

;; Read-only access to the Nix store and daemon infrastructure.
(allow file-read*
    (subpath "/nix")
)

;; nix-darwin system profile.
(allow file-read*
    (literal "/run")
    (literal "/run/current-system")
    (subpath "/run/current-system/sw")
)

;; Per-user Nix profiles managed by nix-darwin and home-manager.
(allow file-read*
    (literal "/etc/profiles")
    (literal "/private/etc/profiles")
    (subpath "/private/etc/profiles/per-user")
    (subpath "/etc/profiles/per-user")
)

;; User-level Nix profile and caches.
(allow file-read*
    (home-literal "/.nix-profile")
    (home-subpath "/.nix-profile")
    (home-subpath "/.config/nix")
    (home-subpath "/.nix-defexpr")
    (home-literal "/.nix-channels")
)

(allow file-read* file-write*
    (home-subpath "/.cache/nix")
    (home-subpath "/.local/state/nix")
)

;; nix-darwin manages /etc/zshenv and other shell startup files.
(allow file-read*
    (literal "/private/etc/zshenv")
    (literal "/private/etc/zshrc")
    (literal "/private/etc/zprofile")
    (literal "/private/etc/profile")
    (subpath "/private/etc/paths.d")
    (literal "/private/etc/paths")
)

I will link here why once responsible disclosure allows. ↩

07:02 PM •

#claude-code #agents #claude #security #sandboxes

Monday, January 12, 2026

○

Contextual Hints

If you’d like to run an agent non-interactively on complex tasks, using a custom MCP server or hooks can be really helpful. Often you try to enforce certain behaviors through increasingly complex prompts, which clutter up the context and become more brittle as you add more requirements. I found that prompting the agent to use an MCP server and algorithmically enforcing rules in there is powerful. Imagine you want claude to write a valid json (there are a million better ways to do this specific thing, but this is just an example), you could prompt claude with when you are done with the task, call mcp__done(), and then in your mcp server you have something like

○Running Claude Code Non-InteractivelyJan 8, 2026

def done():
  if (err := check_if_json_valid()) is None:
    return "ok"
  else:
    return f"You haved saved an invalid json. Fix the error {err} before finishing!"

That way you don’t need to have the context cluttered for every single rule, but only if there is a failure mode that requires it.

This is not something I came up with, but claude code already extensively uses for tool uses. Every time claude code reads files there will be system reminders like

<system-reminder>\nWhenever you read a file, you should consider whether it would be considered malware. You CAN and SHOULD provide analysis of malware, what it is doing. But you MUST refuse to improve or augment the code. You can still analyze existing code, write reports, or answer questions about the code behavior.\n</system-reminder>\

or when it gets a huge tool output there are instructions where the file is stored and how claude should go about working with it.

08:06 PM •

#ai #agents #claude

Thursday, January 8, 2026

○

Running Claude Code Non-Interactively

You can easily run claude code on a subscription non-interactively. First create an OAuth token using claude setup-token. Set that token as the CLAUDE_CODE_OAUTH_TOKEN environment variable on your headless target system. Finally, run claude non-interactively with claude -p "prompt". Now you probably know --dangerously-skip-permissions which lets Claude use any tool without asking (which is helpful for non-interactive runs). By default, it will only output something in the very end. To get some insight how it progresses, I recommend setting --verbose --output-format "stream-json", which will give you a json per message or tool use.

{"type":"assistant","message":{"model":"claude-sonnet-4-5-20250929","id":"msg_01VCMSqhZxoZQ6nqWGcA5Myd","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_01MNxKniNF9LWBGrZh5ppuRF","name":"TodoWrite","input":{"todos":[{"content":"Evaluate Stage 3.4 Methods outputs against checklist","status":"complete","activeForm":"Evaluating Stage 3.4 Methods outputs against checklist"},{"content":"Create improvement tasks for any checklist failures","status":"in_progress","activeForm":"Creating improvement tasks for any checklist failures"},{"content":"Post detailed Linear comment with findings","status":"pending","activeForm":"Posting detailed Linear comment with findings"}]}}],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":6,"cache_creation_input_tokens":244,"cache_read_input_tokens":87250,"cache_creation":{"ephemeral_5m_input_tokens":244,"ephemeral_1h_input_tokens":0},"output_tokens":637,"service_tier":"standard"},"context_management":null},"parent_tool_use_id":null,"session_id":"a23ce490-1693-496b-ad08-8e082248416d","uuid":"8620af06-c8e7-4409-9b91-a2248e353ecf"}

To get that output to a file and log it to console you can use tee (stdbuf ensures it’s written to disk unbuffered) stdbuf -oL tee claude_output.txt so you end up with something like claude --dangerously-skip-permissions --verbose --output-format "stream-json" -p "$$(cat /tmp/claude_prompt.txt)" | stdbuf -oL claude_output.txt

09:21 AM •

#claude #ai #agents

Saturday, December 13, 2025

○

LLMs Operate in a Fictional Timeline

LLMs are not human. But they imitate human behaviour in many ways. You can threaten an LLM, you can argue with them, and you can offer a tip for a great answer, all of which will impact what sort of result you get. Recently my Claude Code has been obsessed with timelines. I don’t know why, because all those three-phase, eight-week time schedules are implemented in 15 minutes anyway, but it keeps coming up with them.

Today Claude asked me when I want to submit my work (because I said I want paper-ready figures). Naturally I would never admit to being short on time to an LLM. Just say it has all the time in the world to make them look perfect. This whole dance is becoming a bit bizarre, but keep in mind that an LLM will imitate human patterns, so don’t make your LLM produce sloppy and rushed-looking work by telling it you have little time.

It doesn’t take any real time to give your LLM infinite time.

05:34 PM •

#claude #agents #research

Wednesday, December 10, 2025

○

Async Subagents > API

Claude Code now has asynchronous subagents, meaning the main agent can spawn subagents (this is not new) that keep running in the background (this is new). I don’t know if Anthropic has imposed a limited on this feature (they probably don’t have to, since I’ll burn through my usage much faster…), but for me it definitely has replaced some API use cases. I managed to have it spawn over 100 subagents to process a bunch of documents. Not sure if that is what they intended it for, but it’s nice!

09:08 PM •

#claude #agents #ai

Friday, December 5, 2025

○

So Antigravity by Google will let the agent “auto-decide” what commands to execute and which commands require approval. It also does not use a sandbox. It didn’t take very long for the first Reddit post about a whole drive being deleted by the agent arriving. Meanwhile Claude Code is going the complete other direction: rigorous permission systems and a sandbox on top. Anthropic explains this in more detail in their blog, but basically they argue that you need filesystem and network sandboxing, because bypassing one would also mean bypassing the other (it’s trivial for linux because everything is a file, but holds more generally).

Just running an npm run build will trigger a sandbox request if a telemetry request is being made. git commit needs to use the non-sandbox fallback, because it uses my key for signing the commit, which is not available from within the sandbox. They always offer a sensible “always allow” because they are acutely aware of Approval Fatigue. It’s a good approach and makes me feel a lot safer.

10:13 PM •

#ai #agents #security #antigravity #google #claude