2025 • June • 06

Miscellanea for 2025-06-06

miscellanea

Hello world!
My brain's been eaten by work for most of this week, so the blogging slowed down a bunch. Hoping to pick it up again soon.
- I'm almost afraid to mention that I spent a bunch of this week deep down an LLM vibe-coding rabbit hole in Windsurf.
- Just in time for Anthropic to cut Windsurf off from Claude models - oops.
We'll see how good it all ends up being, but I cycled through a handful of models and ended up with about 11,000 lines of code.
- The code had unit tests and it pretty much did what I intended.
- It wasn't great code - a lot of it was boilerplate - but it's mostly stuff I would have ended up doing myself more tediously while fighting my ADHD.
Trying to compose some thoughts somewhat along the lines of Harper Reed's LLM codegen workflow:
- I settled on a workflow that wasn't just pestering the agent with wishes.
- I had a series of discrete sessions, each started by creating a directory named for a new git branch. I wrote a shell script to semi-automate this.
- In that directory, I wrote a couple hundred words of intention in a spec.md file.
- I asked the agent to expand my intentions into a step-by-step plan.md file.
- I edited the plan and asked the agent to review it critically and ask questions.
- I answered the questions.
- I asked the agent to review it again and tell me if the plan looked clear enough to start implementing.
- When it said "yes", I told it to start implementing.
- The agent started implementing while I watched.
- Sometimes I interrupted and told it that it was on the wrong track. But, for long stretches I was just reviewing the code as it wrote.
- When it claimed to be done, I asked it to review the current changes against the plan and judge if it was really done.
- Sometimes it wasn't and it went back to work.
- When it petered out finally, I told it to make sure all the tests passed and linting errors were fixed. It did that.
- I made sure the tests made sense, myself, fixed a few that didn't. Then I told it to run the tests some more.
- Finally, when I was okay with the results, I told it to review our entire chat history for this session and summarize the results in a notes.md file.
- In particular, I told it to pay special attention to things we did that hadn't been captured in the plan. Try to come up with unexpected conditions and derive some lessons learned.
- These notes ended up being actually pretty good?
- These three artifacts - spec.md, plan.md, and notes.md - were committed along with the code. That marked the end of the session and the branch.
Now, I won't say that each of the sessions I ran went perfectly. But, I expected it to be an exploration.
- I switched models a few times between Claude Sonnet 3.7, GPT-4.1, and SWE-1.
- I found Claude to usually work the best. It just sort of got to work and did the needful without enticing many objections from me.
- GPT-4.1 seemed to like to make very detailed plans (even after reading the plan.md), ask lots of questions, and then drive off into the ditch and need rescuing.
- SWE-1 was about in the middle - but I ended using it more because there's a promotion running right now that makes it free in Windsurf.
- Occasionally, I'd switch models mid-session just to see what happened. I'm not sure how to characterize the differences, but they each had slightly different coding styles.
- Claude and SWE-1 did better than GPT-4.1 at picking up from unfinished work in progress, I think?
- Still, even with the needful babysitting, between these models I did get stuff implemented and it looked a lot like what I would have written if I'd had the executive function to work at it as doggedly.
I think I've learned that a focused scope and context window management are essential.
- A few times, I think I asked the agent to bite off more than it could chew? Maybe I blew out the context windows? This is something I could get quantified answers around, if I paid attention to the metrics.
- In those cases, I stopped the presses, backed up, and reworked the spec into a smaller scope.
- Sometimes, I found it handy to get to the point of having the plan.md tuned up, then started a fresh chat with only the plan as context to start. That seemed to work pretty well - again, I think freeing up some of the context window with more condensed material.
Occasionally, I wandered off into the weeds myself and my session-based approach devolved into chatty iteration. That worked well for making very small tweaks and fussy updates.
- I also learned that I'm good at juggling lots of git commits as save states. Whenever things were in a decent enough state, time to commit now and clean up later.
- I forgot this a few times and lost some progress after driving into a ditch. But that wasn't too much of a hardship, since I could usually just scroll back in the chat and re-attempt the relative bits of the session for similar results.
I should clean all these bullets up into a proper blog post, but maybe tomorrow. The tl;dr, I guess, is that I think I'm getting comfortable with this stuff.
- It's surprising me with how much it gets done.
- I'm getting less surprised with where & how it goes wrong.
- The failures seem manageable and the results seem decent.
I had a kind of meta-chat with Claude about the above process, trying to think through some improvements.
- One interesting notion was to use some big cloud models for the spec.md to plan.md stage.
- But, then, switch to a local model running on my laptop for the actual process of implementing the plan.
- Then, switch back to a big model for the notes.md summary.
- If this worked, it could save a lot of tokens!
I could also see all the above being bundled up and semi-automated into its own agentic workflow.

blog.lmorchard.com

It's all spinning wheels & self-doubt until the first pot of coffee.

Miscellanea for 2025-06-06