Miscellanea for 2025-06-06

  • Hello world!
  • My brain's been eaten by work for most of this week, so the blogging slowed down a bunch. Hoping to pick it up again soon.
  • We'll see how good it all ends up being, but I cycled through a handful of models and ended up with about 11,000 lines of code.
    • The code had unit tests and it pretty much did what I intended.
    • It wasn't great code - a lot of it was boilerplate - but it's mostly stuff I would have ended up doing myself more tediously while fighting my ADHD.
  • Trying to compose some thoughts somewhat along the lines of Harper Reed's LLM codegen workflow:
    • I settled on a workflow that wasn't just pestering the agent with wishes.
    • I had a series of discrete sessions, each started by creating a directory named for a new git branch. I wrote a shell script to semi-automate this.
    • In that directory, I wrote a couple hundred words of intention in a spec.md file.
    • I asked the agent to expand my intentions into a step-by-step plan.md file.
    • I edited the plan and asked the agent to review it critically and ask questions.
    • I answered the questions.
    • I asked the agent to review it again and tell me if the plan looked clear enough to start implementing.
    • When it said "yes", I told it to start implementing.
    • The agent started implementing while I watched.
    • Sometimes I interrupted and told it that it was on the wrong track. But, for long stretches I was just reviewing the code as it wrote.
    • When it claimed to be done, I asked it to review the current changes against the plan and judge if it was really done.
    • Sometimes it wasn't and it went back to work.
    • When it petered out finally, I told it to make sure all the tests passed and linting errors were fixed. It did that.
    • I made sure the tests made sense, myself, fixed a few that didn't. Then I told it to run the tests some more.
    • Finally, when I was okay with the results, I told it to review our entire chat history for this session and summarize the results in a notes.md file.
    • In particular, I told it to pay special attention to things we did that hadn't been captured in the plan. Try to come up with unexpected conditions and derive some lessons learned.
    • These notes ended up being actually pretty good?
    • These three artifacts - spec.md, plan.md, and notes.md - were committed along with the code. That marked the end of the session and the branch.
  • Now, I won't say that each of the sessions I ran went perfectly. But, I expected it to be an exploration.
    • I switched models a few times between Claude Sonnet 3.7, GPT-4.1, and SWE-1.
    • I found Claude to usually work the best. It just sort of got to work and did the needful without enticing many objections from me.
    • GPT-4.1 seemed to like to make very detailed plans (even after reading the plan.md), ask lots of questions, and then drive off into the ditch and need rescuing.
    • SWE-1 was about in the middle - but I ended using it more because there's a promotion running right now that makes it free in Windsurf.
    • Occasionally, I'd switch models mid-session just to see what happened. I'm not sure how to characterize the differences, but they each had slightly different coding styles.
    • Claude and SWE-1 did better than GPT-4.1 at picking up from unfinished work in progress, I think?
    • Still, even with the needful babysitting, between these models I did get stuff implemented and it looked a lot like what I would have written if I'd had the executive function to work at it as doggedly.
  • I think I've learned that a focused scope and context window management are essential.
    • A few times, I think I asked the agent to bite off more than it could chew? Maybe I blew out the context windows? This is something I could get quantified answers around, if I paid attention to the metrics.
    • In those cases, I stopped the presses, backed up, and reworked the spec into a smaller scope.
    • Sometimes, I found it handy to get to the point of having the plan.md tuned up, then started a fresh chat with only the plan as context to start. That seemed to work pretty well - again, I think freeing up some of the context window with more condensed material.
  • Occasionally, I wandered off into the weeds myself and my session-based approach devolved into chatty iteration. That worked well for making very small tweaks and fussy updates.
    • I also learned that I'm good at juggling lots of git commits as save states. Whenever things were in a decent enough state, time to commit now and clean up later.
    • I forgot this a few times and lost some progress after driving into a ditch. But that wasn't too much of a hardship, since I could usually just scroll back in the chat and re-attempt the relative bits of the session for similar results.
  • I should clean all these bullets up into a proper blog post, but maybe tomorrow. The tl;dr, I guess, is that I think I'm getting comfortable with this stuff.
    • It's surprising me with how much it gets done.
    • I'm getting less surprised with where & how it goes wrong.
    • The failures seem manageable and the results seem decent.
  • I had a kind of meta-chat with Claude about the above process, trying to think through some improvements.
    • One interesting notion was to use some big cloud models for the spec.md to plan.md stage.
    • But, then, switch to a local model running on my laptop for the actual process of implementing the plan.
    • Then, switch back to a big model for the notes.md summary.
    • If this worked, it could save a lot of tokens!
  • I could also see all the above being bundled up and semi-automated into its own agentic workflow.
blog comments powered by Disqus
Baby steps into semi-automatic coding  Previous Miscellanea for 2025-06-05 Next