If you build AI products and you missed this week, you missed the shift.
Jonathan and I went to the Render Localhost developer conference, and learned so much.

Vercel released “Next.js for agents.”
OpenAI turned Codex into a workflow recorder.
Anthropic gave Claude Code shareable artifacts.
And agents that can self-improve?
I cover all of it today.
ALIGN: the convergence week
The most interesting things I found this week in AI.
The US Government Pulled Anthropic’s Best Models. Four Open-Weight Alternatives Shipped Before Access Was Restored.

On June 12, the US government issued an export-control directive forcing Anthropic to disable Fable 5 and Mythos 5 for every customer worldwide. Within 72 hours, four open-weight coding models filled the vacuum: Cohere’s North Minicode, Moonshot’s Kimi 2.7 Code, and Zhipu’s GLM 5.2 (released at 5:21 PM Eastern as a deliberate echo of the order timestamp). These models were already in the pipeline. The ban exposed that production-quality alternatives existed.
Robert’s Take: If you’re an engineering leader and your agent stack runs on one provider, this week was your wake-up call. The zero-data-retention enterprise deals Anthropic had inked? Pulled overnight. Multi-provider routing is table stakes. We talked about this vendor risk two weeks ago. The Fable 5 ban made it real.
Claude Code Sessions Now Build Shareable Web Pages That Update in Real Time
On June 18, Anthropic launched Artifacts in Claude Code (beta, Team and Enterprise). Sessions can now generate interactive, shareable web pages that update as the agent works. PR walkthroughs, dashboards, architecture maps, debugging investigations. One link, auto-versioned, no copy-pasting terminal output into Slack.
Robert’s Take: Oh man. Jonathan and I have been emailing HTML spec files back and forth for weeks, fighting over who has the latest version. This just killed that workflow. I’m so happy. But here’s the thing I noticed when I dug in: artifacts have no backend. No form input. No API calls at view time. Codex Sites does. So will Anthropic close that gap, or does OpenAI own the “localhost-to-shared” problem? Watching closely.
Vercel Drops “Next.js for Agents” and It’s Already Running 100+ Production Agents
Vercel launched Eve on June 17 at Ship London. Open source, TypeScript-native, Apache-2.0. An Eve agent is a directory: agent.ts for the model, instructions.md for the system prompt, tools/ for what it can do, skills/ for what it knows, channels/ for where it lives. Durable execution, sandboxed compute, human-in-the-loop, evals, and OpenTelemetry observability baked in.
0%
of Vercel deployments now triggered by agents, up from 3% a year ago
Robert’s Take: This is the one. Not because the framework is novel. Because it validates what we’ve been saying on this pod for months: the harness is the moat. Vercel runs a Slack data warehouse agent handling 30,000+ questions per month and a lead-gen SDR agent with 32x ROI at $5K/year. Those numbers come from the harness, not the model.
OpenAI Codex Now Watches You Work and Turns It Into a Reusable Skill
Record & Replay lets you demonstrate a workflow on your Mac and Codex packages the pattern into a permanent agent skill. Unlike traditional RPA that captures exact clicks and coordinates, Codex captures intent. AppShots (press both Command keys) sends the frontmost app window to Codex with all text, even what’s scrolled out of view.
Robert’s Take: Jonathan and I demoed this live on the pod and I had a reusable skill for a complicated content review workflow in under eight minutes. Pretty damn good for a zero-to-one skill creation. The real unlock: I already had a Threads agent running on a schedule. Record & Replay let me add a supervisory layer on top of that existing agent. One agent watching another. That compounds.
BUILD: how four teams independently discovered the same agent architecture
”Eve encapsulates the best practices I’m hearing from our customers.”
We were at the Render Localhost conference in San Francisco.

A PM from Render was talking about Vercel’s Eve framework, which had launched the day before.
And when we opened the Eve repo, we recognized the file structure.

Because we’d already been building something similar.
Because Microsoft’s Agent Framework, which hit GA the same month, uses similar patterns.
Because an awesome-list on GitHub had been cataloguing this convergence in real time.
Different developers around the world are coming to similar conclusions on agent harness architecture, from independent thinking.
What is harness engineering and why does it matter more than the model?

Harness engineering is how you build the environment around an AI model to turn it into a reliable, autonomous agent.
Think about it like a race car.
The engine matters.
But the engine doesn’t win races if there is no car or team around it.
The chassis, the aerodynamics, the pit strategy, the telemetry, the driver’s muscle memory encoded into the setup.
That’s what wins.
Faros AI published a framework that maps to what every team shipping this week converged on.
A production-grade harness has five layers:
- Tool orchestration — what the agent can do
- Verification loops — how you know it did it right
- Context and memory — what the agent knows across sessions
- Guardrails — what the agent must never do
- Observability — how you see what happened and why
Every framework that shipped this week includes all five. The problems forced the same solutions.
The agent harness: where the competitive advantage actually lives now
Models from competing providers look increasingly identical in raw capabilities.
So where did the differentiation go?
Into the skills folder.
The harness.
This isn’t just my opinion. Aparna Dhinakaran, Co-founder & Chief Product Officer of Arize shares this snippet from her blog post “What is an agent harness?”.

Jonathan said something on the pod this week that I keep thinking about. He’d been talking to teams at the Render conference about how they structure their agent harnesses.
The pattern he kept hearing:
“You’re encoding domain expertise into these skills folders. They’re hierarchical. They’re nuanced. And the teams that are really winning, they’re finding ways where the skills can get updated as you go, so they’re like recursive.”
The key word here is recursive.
Getting to closed-loop autonomous agents means agents running recursive functions towards measurable outcomes across domains such as GTM and R&D.
That is the future we believe in.
Every business in the future will be building, deploying, and managing closed loop autonomous agents that self improve.
And every business now that belongs in that future, is working hard to bring it to their organization faster than the competition.
So what does this mean?
Simply put, the skills folder is where institutional knowledge lives.
And the best teams are making those skills self-improving.
Here’s what that looks like in practice.
How can I make a self-improving agent?
I use an agent to help me write content for my Threads account, for my top of funnel for our startup Clarity.
I have two goals for my threads account: test content hypotheses quickly, and build my email list (to ultimately warm up leads into sales).
Like most Founders, I am time constrained. I don’t want to be spending all my time writing threads but I do want to meet those goals. So I decided to create a self-improving agent, and the initial results are great.
Here’s my Threads profile, before and after:

0x
follower growth from ~100 to 700+ in 6 weeks, spending 30 minutes per week
When I demoed Codex Record & Replay on the pod, I already had a Threads agent that generates social content on a schedule using a skill I built.

I even paused for the past two weeks and didn’t post at all, to see if my harness would survive a period of no posts.
It did.

Here’s a thread of mine that I posted yesterday when I turned my threads agent on again. Immediate relative content market fit for my account size with 11k views, 136 likes, 5 reshares, and 3 sends.
On a side note: Google kills everything. Damn you Google.
Google aside, this experiment showed me the harness is proving durable.
Now how can I get it to self-improve?
Right now I have the agent review its output and the metrics weekly, then take the best of the things that work, and iterate on my base evals: the grading rubric and underlying dataset.
The base evals were created from a few hundred Threads I reviewed myself, using Obsidian.

My flow has been:
- Threads agent created threads with input source links and reference material I share as context
- Threads are formatted and written to an Obsidian note, with an original saved one for a diff report
- I review the Obsidian note, I modify the threads, I open code / annotate feedback
- I run diff report and have the agent update my rubric and sharpen my evals based on my changes and open coding
One of the key issues recently I found in the harness was hallucinated career details.
I gave it a factsheet on my career and accomplishments, but it kept making stuff up.
I wanted to eliminate my manual step in reviewing the output for hallucinated career details.
Stuff like “When I was lead product architect at Workday, we had a principle…” where the model invents a specific policy I never held.
So I encoded it as a new eval after labeling some of the data in my review step.
I then had the threads agent regenerate everything.
No more problem.
I think the agent is almost ready for being fully autonomous. I’d say we’re 90% of the way there.
My next improvement: reconcile the latest industry best practices for harness standards from Vercel and others and evolve my threads agent harness.
Build In Public
$10M ARR
I have an outlandish goal that we get to $10M ARR bootstrapped next year. We’re working hard. We just had our first offsite, in Truckee. I have always loved this area, and it was such a joy for Jonathan and I to deepen our Co-Founder bond over some hiking/running/grinding.
We even closed a customer the first night of our offsite!

We’ll have to keep that annual company tradition up. Close the night of the offsite.
We’ve been getting some good grind reps and are almost above the surface with some new case studies.
We’ve finally made meaningful progress on our content strategy. I’ve been working on this problem for over a year, spending hundreds of hours on videos. It’s finally paying off. We have a winning format, and it is actually fun to do. We focus on making our podcast good (long form) and write the script in a way to streamline the post production with AI to create good derivative content (short form).
I believe that my founder brand will be one of the most important assets to the business long term, so seeing more traction now is very encouraging.
Results follow hard work.
So gotta keep working hard and keep refining the process.
Look at the before and after of our changeup, IT’S WORKING!

What’s great is that in the new process, I’m having more fun in the content creation too.
It feels good to grind through the brick walls of pounding my head against my content strategy, and see the light of progress.
Progress is a hell of a drug.
Side Quests
Dog Dad
Kenji continues to grow into the best boy ever. We’ve been spending time together in the mountains and he’s starting to be okay with swimming as well. Not 100% convinced… but he’ll do it!

Climb V10, Run 10 miles
FINALLY I CAN CLIMB AND RUN HARD AGAIN. Let’s go!!!
I mentioned in my last newsletter I’ve been dealing with some nagging injuries. I’m getting past them now and am feeling 90%. I feel like I can push myself again. It feels GREAT.

CULTURE: the eyes that keep appearing

In 1859, Charles Darwin had a problem he couldn’t explain.
The eye.
Not one eye.
Many eyes.
Across species that had never shared a continent, never overlapped in time, never exchanged a single gene.
Octopuses and humans.
Jellyfish and eagles.
Box jellyfish have 24 eyes arranged in clusters of six, each cluster with a different type of lens.
Biologists call this convergent evolution.
When different lineages face the same environmental pressure, they arrive at the same solution independently.
Because the physics of the problem constrained the answer.
Wings evolved four separate times.
Echolocation evolved at least twice.
Crab-shaped bodies evolved so many times that biologists coined a word for it: carcinization.
Given enough pressure, everything becomes a crab!
The thing that struck Darwin wasn’t that evolution was creative.
It was that evolution was predictable. When the selection pressure is strong enough, the design space collapses.
There are only so many ways to solve “detect light” or “move through air.”
The solutions converge because the constraints do.
Daniel Dennett spent decades studying this phenomenon.
His conclusion: convergent evolution reveals what he called “good tricks.”
Solutions so effective that any system facing the same problem will rediscover them, given enough time and pressure.
On technology we see the same thing actually. The tools we make to help us make sense of the world, to find more truth, are innovations that evolve themselves.
I see world class teams betting that the harness matters more than the engine.
Nobody coordinated.
Yet, we’re converging towards standards in the industry on agent harnesses.
That’s a good thing for progress.
If this helped you think about your own stack decisions, forward it to a teammate who’s evaluating frameworks right now.
Catch our Weekly AI Clarity for your dose of signal from the noise, to AI better and build better AI.
Latest episode here:
Follow me on..
