blog.lmorchard.com

It's all spinning wheels & self-doubt until the first pot of coffee.

  • about me
  • archives
  • feed
  • 2025 July 10

    • Hello world!
    • Trying to decide if this is a "still alive" or a "beware, I live!" kind of day?
    • Either way, I've been down a hole of busy-ness for the past few weeks and been wanting to climb out to emit some reports here.
    • Still trying to find a good balance while spinning plates across multiple projects. But, I've gotten a lot done without quite going entirely insane.
    • Feel like I've suddenly become a cyborg over the past few weeks. Been working across multiple instances of Claude Code all day, every day.
    • Also feels like I've picked a side in the war against Skynet in some folks' estimation. Except, to me, it feels like working with a concussed version of the main computer from the USS Enterprise (NCC-1701-D)—which is still pretty advanced for the 21st century.
    • I've gotten well acquainted with Anthropic's usage limits, having managed to reliably burn through my Max plan allowances every 5 hours or so.
    • I'm considering implementing usage monitoring to better understand my consumption patterns with Claude.
    • Both Claude and I could probably use a brief vacation. I guess the kids call that a "micro-retirement" these days?
    # 11:59 pm
    • miscellanea
    • Hello world!
    • Trying to decide if this is a "still alive" or a "beware, I live!" kind of day?
    • Either way, I've been down a hole of busy-ness for the past few weeks and been wanting to climb out to emit some reports here.
    • Keep thinking that I may want to intentionally carve out space and time for weeknotes again.
    • Still trying to find a good balance while spinning plates across multiple projects. But, I've gotten a lot done without quite going entirely insane.
    • Feel like I've suddenly become a cyborg over the past few weeks. Been working across multiple instances of Claude Code all day, every day.
    • I've gotten well acquainted with Anthropic's usage limits, having managed to reliably burn through my Max plan allowances every 5 hours or so.
    • I'm considering implementing usage monitoring to better understand my consumption patterns with Claude.
    • Also feels like I've picked a side in the war against Skynet in some folks' estimation. Except, to me, it feels like working with a concussed version of the main computer from the USS Enterprise (NCC-1701-D)—which is still pretty advanced for the 21st century.
    • Both Claude and I could probably use a brief vacation. I guess the kids call that a "micro-retirement" these days?
    # 11:59 pm
    • miscellanea
    • Hello world!
    • Trying to decide if this is a "still alive" or a "beware, I live!" kind of day?
    • Either way, I've been down a hole of busy-ness for the past few weeks and been wanting to climb out to emit some reports here.
    • Keep thinking that I may want to intentionally carve out space and time for weeknotes again.
    • Still trying to find a good balance while spinning plates across multiple projects. But, I've gotten a lot done without quite going entirely insane.
    • Feel like I've suddenly become a cyborg over the past few weeks. Been working across multiple instances of Claude Code all day, every day.
    • I've gotten well acquainted with Anthropic's usage limits, having managed to reliably burn through my Max plan allowances every 5 hours or so.
    • I'm considering implementing usage monitoring to better understand my consumption patterns with Claude.
    • Also feels like I've picked a side in the war against Skynet in some folks' estimation. Except, to me, it feels like working with a concussed version of the main computer from the USS Enterprise (NCC-1701-D)—which is still pretty advanced for the 21st century.
    • Both Claude and I could probably use a brief vacation. I guess the kids call that a "micro-retirement" these days?
    # 11:59 pm
    • miscellanea
  • Do AI tools really slow me down by 20%?

    Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR:

    We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

    I'm seeing plenty of "I told you so" as this makes the rounds. But, having spent the past month deep in AI-assisted coding, it directly contradicts my experience. Maybe I've drunk too much Kool-Aid, but I don't think I'm entirely delusional. I want to head-scratch through this, though.

    Small sample, specific scenario

    we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years.

    That's a rather small sample and a very specific scenario, isn't it?

    The researchers themselves acknowledge the limitations:

    We caution readers against overgeneralizing on the basis of our results. The slowdown we observe does not imply that current AI tools do not often improve developer’s productivity—we find evidence that the high developer familiarity with repositories and the size and maturity of the repositories both contribute to the observed slowdown, and these factors do not apply in many software development settings. For example, our results are consistent with small greenfield projects or development in unfamiliar codebases seeing substantial speedup from AI assistance.

    Maybe this bit from the "Key Caveats" section basically jibes with my experience? My recent work (admittedly, a sample of 1) has indeed been largely with greenfield projects and relatively unfamiliar codebases.

    It does matter how you use it

    The researchers hint at something important:

    We expect that AI systems that have higher fundamental reliability, lower latency, and/or are better elicited (e.g. via more inference compute/tokens, more skilled prompt-ing/scaffolding, or explicit fine-tuning on repositories) could speed up developers in our setting (i.e. experienced open-source developers on large repositories).

    I think this rhymes with my experience. When I've just charged into a rambling chat & autocomplete session with Cursor, things steer into the ditch early and often.

    But when I've worked with Claude Code through a multi-step process of describing the problem, asking the agent to prompt me with clarifying questions, reviewing the problem and considering a solution, breaking it down into parts, and then asking the agent to methodically execute—that's yielded decently reliable success.

    Waiting, or lack thereof

    The study notes:

    All else equal, faster AI generations would result in developers being slowed down less. Qualitatively, a minority of developers note that they spend significant time waiting on AI to generate code.

    I rarely wait, because I'm juggling multiple projects. When one agent instance is working, I switch to another window. Sometimes it's a separate git worktree of the same codebase. Yes, context switching is tiring, but it also seems to help me overcome ADHD-related activation energy barriers?

    Over the years, there've been days when I just sit there staring at the IDE window, poking my brain with a stick saying "c'mon, do something" and nothing happens for an hour or more. I'm not planning my next move, I'm just dissociating. My executive function doesn't, like, function. Often. My own brain makes me wait long periods of time before it starts generating useful results. 😅

    Maybe it's the cycling novelty that gets me going? I enjoy task switching between prosing and coding. I enjoy finding that the model appears to have "read" everything—evidenced by it echoing my intent back in code or follow-up questions. I enjoy discovering that while I was in another window, new things happened in the background for me to review.

    I've also found that many agents are reliable at handling drudgery. Re-jiggering data structures, applying repeated refactorings, etc. Those tasks can seize me up for tens of minutes at a time with brain-killing waves of tedium. But usually, I can just tell the bot to do it, while I turn to more interesting stuff.

    Summing up

    Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.

    This study provides one data point about one specific scenario: experienced developers using specific tools on massive, mature codebases. The researchers themselves caution against overgeneralization, noting that different contexts likely yield different results.

    These tools aren't magic and they're not universally beneficial. But dismissing them based on this narrow study would be premature. The key is understanding when, how, and why to use them—something that's still evolving rapidly as both tools and techniques improve.

    # 4:26 pm
    • ai
    • genai
    • codegen
    • llms
    • cursor
  • Do AI tools really slow me down by 20%?

    Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR:

    We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

    I'm seeing plenty of "I told you so" as this makes the rounds. But, having spent the past month deep in AI-assisted coding, it directly contradicts my experience. Maybe I've drunk too much Kool-Aid, but I don't think I'm entirely delusional. I want to head-scratch through this, though.

    Small sample, specific scenario

    we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years.

    That's a rather small sample and a very specific scenario, isn't it?

    The researchers themselves acknowledge the limitations:

    We caution readers against overgeneralizing on the basis of our results. The slowdown we observe does not imply that current AI tools do not often improve developer’s productivity—we find evidence that the high developer familiarity with repositories and the size and maturity of the repositories both contribute to the observed slowdown, and these factors do not apply in many software development settings. For example, our results are consistent with small greenfield projects or development in unfamiliar codebases seeing substantial speedup from AI assistance.

    Maybe this double-negative-enriched bit from the "Key Caveats" section basically jibes with my experience? My recent work (admittedly, a sample of 1) has indeed been largely with greenfield projects and relatively unfamiliar codebases.

    It does matter how you use it

    The researchers hint at something important:

    We expect that AI systems that have higher fundamental reliability, lower latency, and/or are better elicited (e.g. via more inference compute/tokens, more skilled prompt-ing/scaffolding, or explicit fine-tuning on repositories) could speed up developers in our setting (i.e. experienced open-source developers on large repositories).

    I think this rhymes with my experience. When I've just charged into a rambling chat & autocomplete session with Cursor, things steer into the ditch early and often.

    But when I've worked with Claude Code through a multi-step process of describing the problem, asking the agent to prompt me with clarifying questions, reviewing the problem and considering a solution, breaking it down into parts, and then asking the agent to methodically execute—that's yielded decently reliable success.

    Waiting, or lack thereof

    The study notes:

    All else equal, faster AI generations would result in developers being slowed down less. Qualitatively, a minority of developers note that they spend significant time waiting on AI to generate code.

    I rarely wait, because I'm juggling multiple projects. When one agent instance is working, I switch to another window. Sometimes it's a separate git worktree of the same codebase. Yes, context switching is tiring, but it also seems to help me overcome ADHD-related activation energy barriers?

    Over the years, there've been days when I just sit there staring at the IDE window, poking my brain with a stick saying "c'mon, do something" and nothing happens for an hour or more. I'm not planning my next move, I'm just dissociating. My executive function doesn't, like, function. Often. My own brain makes me wait long periods of time before it starts generating useful results. 😅

    Maybe it's the cycling novelty that gets me going? I enjoy task switching between prosing and coding. I enjoy finding that the model appears to have "read" everything—evidenced by it echoing my intent back in code or follow-up questions. I enjoy discovering that while I was in another window, new things happened in the background for me to review.

    I've also found that many agents are reliable at handling drudgery. Re-jiggering data structures, applying repeated refactorings, etc. Those tasks can seize me up for tens of minutes at a time with brain-killing waves of tedium. But usually, I can just tell the bot to do it, while I turn to more interesting stuff.

    Summing up

    Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.

    This study provides one data point about one specific scenario: experienced developers using specific tools on massive, mature codebases. The researchers themselves caution against overgeneralization, noting that different contexts likely yield different results.

    These tools aren't magic and they're not universally beneficial. But dismissing them based on this narrow study would be premature. The key is understanding when, how, and why to use them—something that's still evolving rapidly as both tools and techniques improve.

    # 4:26 pm
    • ai
    • genai
    • codegen
    • llms
    • cursor
  • Do AI tools really slow me down by 20%?

    Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR:

    We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

    I'm seeing plenty of "I told you so" as this makes the rounds. But, having spent the past month deep in AI-assisted coding, it directly contradicts my experience. Maybe I've drunk too much Kool-Aid, but I don't think I'm entirely delusional. I want to head-scratch through this, though.

    Small sample, specific scenario

    we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years.

    That's a rather small sample and a very specific scenario, isn't it?

    The researchers themselves acknowledge the limitations:

    We caution readers against overgeneralizing on the basis of our results. The slowdown we observe does not imply that current AI tools do not often improve developer’s productivity—we find evidence that the high developer familiarity with repositories and the size and maturity of the repositories both contribute to the observed slowdown, and these factors do not apply in many software development settings. For example, our results are consistent with small greenfield projects or development in unfamiliar codebases seeing substantial speedup from AI assistance.

    Maybe this double-negative-enriched bit from the "Key Caveats" section basically jibes with my experience? My recent work (admittedly, a sample of 1) has indeed been largely with greenfield projects and relatively unfamiliar codebases.

    It does matter how you use it

    The researchers hint at something important:

    We expect that AI systems that have higher fundamental reliability, lower latency, and/or are better elicited (e.g. via more inference compute/tokens, more skilled prompt-ing/scaffolding, or explicit fine-tuning on repositories) could speed up developers in our setting (i.e. experienced open-source developers on large repositories).

    I think this rhymes with my experience. When I've just charged into a rambling chat & autocomplete session with Cursor, things steer into the ditch early and often.

    But when I've worked with Claude Code through a multi-step process of describing the problem, asking the agent to prompt me with clarifying questions, reviewing the problem and considering a solution, breaking it down into parts, and then asking the agent to methodically execute—that's yielded decently reliable success.

    Waiting, or lack thereof

    The study notes:

    All else equal, faster AI generations would result in developers being slowed down less. Qualitatively, a minority of developers note that they spend significant time waiting on AI to generate code.

    I rarely wait, because I'm juggling multiple projects. When one agent instance is working, I switch to another window. Sometimes it's a separate git worktree of the same codebase. Yes, context switching is tiring, but it also seems to help me overcome ADHD-related activation energy barriers?

    Over the years, there've been days when I just sit there staring at the IDE window, poking my brain with a stick saying "c'mon, do something" and nothing happens for an hour or more. I'm not planning my next move, I'm just dissociating. My executive function doesn't, like, function. Often. My own brain makes me wait long periods of time before it starts generating useful results. 😅

    Maybe it's the cycling novelty that gets me going? I enjoy task switching between prosing and coding. I enjoy finding that the model appears to have "read" everything—evidenced by it echoing my intent back in code or follow-up questions. I enjoy discovering that while I was in another window, new things happened in the background for me to review.

    I've also found that many agents are reliable at handling drudgery. Re-jiggering data structures, applying repeated refactorings, etc. Those tasks can seize me up for tens of minutes at a time with brain-killing waves of tedium. But usually, I can just tell the bot to do it, while I turn to more interesting stuff.

    Summing up

    Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.

    This study provides one data point about one specific scenario: experienced developers using specific tools on massive, mature codebases. The researchers themselves caution against overgeneralization, noting that different contexts likely yield different results.

    These tools aren't magic and they're not universally beneficial. But dismissing them based on this narrow study would be premature. The key is understanding when, how, and why to use them—something that's still evolving rapidly as both tools and techniques improve.

    # 4:26 pm
    • ai
    • genai
    • codegen
    • llms
    • cursor
  • Progress on Pebbling Club

    I've been procrastinating getting back to it, but I finally threw some hours into a substantial overhaul of my Pebbling Club web link sharing project—the first real efforts since December! Migrated from SQLite to Postgres, switched to uv for dependency management, and moved deployment from fly.io to my basement machine running Docker Compose.

    I built my own git-push deployment post-receive hook because I'm a masochist—er, I mean I wanted complete control over the deployment process. It's nice watching your own server rebuild containers when you push to main, even if cloud platforms would be more practical.

    The development environment became a hybrid: Docker Compose for stable services, Honcho + Procfile for active development. Added Flower for Celery monitoring and experimented with Prometheus and Grafana metrics. (But, then, I reverted django-prometheus because it doesn't work at all like I thought it did.)

    I got several useful features working: RSS feed reading, duplicate URL detection through normalized hashing, ActivityStreams-inspired import/export, and a Netscape Bookmarks HTML export (for fun). Built a link inbox that currently handles RSS feeds, with in-progress work to add Mastodon timeline integration and plans for Bluesky.

    Along the way, I wanted to see how far I could get with Claude Code and make tweaks to my overall process. If nothing else, the bot helped me get past the barrier of activation energy to get some things done that I've put off for most of a year. The bot wrote a bunch of just-fine code—and where it was wrong, the wrongness motivated me to get it fixed and done myself.

    # 11:54 am
    • pebblingclub
    • codegen
    • claude
  • Progress on Pebbling Club

    I've been procrastinating getting back to it, but I finally threw some hours into a substantial overhaul of my Pebbling Club web link sharing project—the first real efforts since December! Migrated from SQLite to Postgres, switched to uv for dependency management, and moved deployment from fly.io to my basement machine running Docker Compose.

    I built my own git-push deployment post-receive hook because I'm a masochist—er, I mean I wanted complete control over the deployment process. It's nice watching your own server rebuild containers when you push to main, even if cloud platforms would be more practical.

    The development environment became a hybrid: Docker Compose for stable services, Honcho + Procfile for active development. Added Flower for Celery monitoring and experimented with Prometheus and Grafana metrics. (But, then, I reverted django-prometheus because it doesn't work at all like I thought it did.)

    I got several useful features working: RSS feed reading, duplicate URL detection through normalized hashing, ActivityStreams-inspired import/export, and a Netscape Bookmarks HTML export (for fun). Built a link inbox that currently handles RSS feeds, with in-progress work to add Mastodon timeline integration and plans for Bluesky.

    Along the way, I wanted to see how far I could get with Claude Code and make tweaks to my overall process. If nothing else, the bot helped me get past the barrier of activation energy to get some things done that I've put off for most of a year. The bot wrote a bunch of just-fine code—and where it was wrong, the wrongness motivated me to get it fixed and done myself.

    # 11:54 am
    • pebblingclub
    • codegen
    • claude
  • Progress on Pebbling Club

    I've been procrastinating getting back to it, but I finally threw some hours into a substantial overhaul of my Pebbling Club web link sharing project—the first real efforts since December! Migrated from SQLite to Postgres, switched to uv for dependency management, and moved deployment from fly.io to my basement machine running Docker Compose.

    I built my own git-push deployment post-receive hook because I'm a masochist—er, I mean I wanted complete control over the deployment process. It's nice watching your own server rebuild containers when you push to main, even if cloud platforms would be more practical.

    The development environment became a hybrid: Docker Compose for stable services, Honcho + Procfile for active development. Added Flower for Celery monitoring and experimented with Prometheus and Grafana metrics. (But, then, I reverted django-prometheus because it doesn't work at all like I thought it did.)

    I got several useful features working: RSS feed reading, duplicate URL detection through normalized hashing, ActivityStreams-inspired import/export, and a Netscape Bookmarks HTML export (for fun). Built a link inbox that currently handles RSS feeds, with in-progress work to add Mastodon timeline integration and plans for Bluesky.

    Along the way, I wanted to see how far I could get with Claude Code and make tweaks to my overall process. If nothing else, the bot helped me get past the barrier of activation energy to get some things done that I've put off for most of a year. The bot wrote a bunch of just-fine code—and where it was wrong, the wrongness motivated me to get it fixed and done myself.

    # 11:54 am
    • pebblingclub
    • codegen
    • claude
  • Building a Breakout clone with Claude

    To make steps toward showing and telling about my Claude Code workflow, I built a browser-based Breakout game with Phaser 3. The repository captures the full development process so far—prompts, commands, and session transcripts.

    Along with the basic game, I added a multi-ball power-up to demonstrate iterative development. The game itself isn't particularly novel, but the documented development process might be useful for others exploring AI-assisted coding workflows.

    At some point, this will turn into a show-and-tell presentation for co-workers and maybe a follow-up to last month's blog post on "Baby steps into semi-automatic coding".

    This took all of a couple hours on a Saturday afternoon on the couch watching TV, but I kind of want to keep going with it. It's rather addictive to just kind of riff on ideas and get them into the game with quick little iterations.

    # 11:11 am
    • codegen
    • ai
    • llms
    • claude
    • gamedev
  • Building a Breakout clone with Claude

    To make steps toward showing and telling about my Claude Code workflow, I built a browser-based Breakout game with Phaser 3. The repository captures the full development process so far—prompts, commands, and session transcripts.

    Along with the basic game, I added a multi-ball power-up to demonstrate iterative development. The game itself isn't particularly novel, but the documented development process might be useful for others exploring AI-assisted coding workflows.

    At some point, this will turn into a show-and-tell presentation for co-workers and maybe a follow-up to last month's blog post on "Baby steps into semi-automatic coding".

    This took all of a couple hours on a Saturday afternoon on the couch watching TV, but I kind of want to keep going with it. It's rather addictive to just kind of riff on ideas and get them into the game with quick little iterations.

    # 11:11 am
    • codegen
    • ai
    • llms
    • claude
    • gamedev
  • Building a Breakout clone with Claude

    To make steps toward showing and telling about my Claude Code workflow, I built a browser-based Breakout game with Phaser 3. The repository captures the full development process so far—prompts, commands, and session transcripts.

    Along with the basic game, I added a multi-ball power-up to demonstrate iterative development. The game itself isn't particularly novel, but the documented development process might be useful for others exploring AI-assisted coding workflows.

    At some point, this will turn into a show-and-tell presentation for co-workers and maybe a follow-up to last month's blog post on "Baby steps into semi-automatic coding".

    This took all of a couple hours on a Saturday afternoon on the couch watching TV, but I kind of want to keep going with it. It's rather addictive to just kind of riff on ideas and get them into the game with quick little iterations.

    # 11:11 am
    • codegen
    • ai
    • llms
    • claude
    • gamedev
  • 2025 June 30

    • Hello world!

    TDD, AI agents and coding with Kent Beck - YouTube

    Now, with over five decades of programming experience, Kent is still pushing boundaries—this time with AI coding tools. In this episode of Pragmatic Engineer, I sit down with him to talk about what’s changed, what hasn’t, and why he’s more excited than ever to code.

    This was a really neat interview. I'm a bit behind Kent Beck in years and industry contributions, to say the least. But, it's cool to hear how someone with his level of experience is dealing with the AI era.

    Taste Is the New Intelligence - by stepfanie tyler

    We used to associate intelligence with accumulation. The smartest people were the ones who knew the most. But that model doesn’t hold anymore. AI knows more than anyone. Wikipedia is free. The internet has flattened information access so thoroughly that hoarding knowledge is no longer impressive. What matters now is what you do with it. How you filter it. How you recognize signal in the noise.

    Curation is the new IQ test.

    If I can keep myself motivated to make progress on Pebbling Club, I kinda want to build something that's basically a web curation platform with the ability to share your taste with others.

    amantus-ai/vibetunnel: Turn any browser into your terminal & command your agents on the go.

    Ever wanted to check on your AI agents while you're away? Need to monitor that long-running build from your phone? Want to share a terminal session with a colleague without complex SSH setups? VibeTunnel makes it happen with zero friction.

    This reminds me of when I once in college when I managed to telnet from a computer lab back into my Commodore Amiga 1200 dialed into a SLIP internet connection. No firewalls. It would have really sucked if someone else had known my IP address at the time. 😅

    AYTracker demo

    I've spent a lot of hours this week on this project instead of the usual lucky dip. Here's what I've done so far using my AY tracker terminal-based software

    I built an RC2014 kit computer, last year. I also built a Why Em-Ulator Sound Module for it and managed to get it playing some neato tunes. I should check out this tracker and see if I can make some neato tunes.

    # 11:59 pm
    • miscellanea
  • 2025 June 18

    • Hello world!
    • Since I'm bouncing between multiple teams' projects, this LLM agent-assisted coding thing reminds me of multi-box mining in EVE Online.
      • I haven't done that in years, but it was a way to make mining more interesting. You could fill in the lulls in gameplay by swapping between ships, treating it more like real-time strategy.
      • Apparently, EVE Online multi-boxing UI has gotten more sophisticated these days? I can only imagine this is the direction coding agent orchestration will head.
      • It's totally spinning plates and it's a more energy-consuming activity than I might have first expected.
      • I'm really leaning on the Command-Backtick button to cycle through IDE windows to shepherd the Claude Code sessions as they crunch through execution plans.
      • There is kind of a hyperfocus flow state available—not in the coding on individual projects, but in swapping between agents, keeping things running with answers to questions, performing rescues from ditches.
      • This seems appealing to my ADHD brain, until or unless I get distracted in a way that lets plates start falling.
      • I am finding that writing or generating gratuitous notes as context for both me and the LLM is really handy. Especially helps me remember what I was trying to accomplish when I last cycled into some particular IDE window.
    # 11:59 pm
    • miscellanea
  • On AI, anger, and the way from here

    Jason Santa Maria, Large Language Muddle:

    As someone who has spent their entire career and most of their life participating and creating online, this sucks. It feels like someone just harvested lumber from a forest I helped grow, and now wants to sell me the furniture they made with it.

    The part that stings most is they didn’t even ask. They just assumed they could take everything like it was theirs. The power imbalance is so great, they’ll probably get away with it. ... I imagine there will be a time when using these tools or not creates a rift, and maybe it will be difficult to sustain a career in our field without using them. Maybe something will change, and I’ll come around to using these services regularly. I don’t think I’ll ever not be angry about it.

    This is involuntary stone soup at scale. I'm also dismayed about how LLMs came to be, yet aware that the bomb still works regardless of my feelings. I'm convinced I need to understand this technology—I don't think I can afford to simply opt out.

    But I'm also staying tuned to skeptical takes, fighting to keep my novelty-seeking brain from falling into cult-like enthusiasm. While I can't dismiss this technology as pure sham, I refuse to swallow inflated claims about what it actually is. I want clear-eyed understanding.

    Jason's anger resonates because it points to a deeper loss:

    And still that anger. It’s not just that they didn’t ask. If these tools have so much promise, could this have been a communal effort rather than a heist? I don’t even know what that would’ve looked like, but I can say I would feel much differently about AI if I could use a model built on communal contributions and opt-ins, made for the advancement of everyone, not just those who can pay the monthly subscription.

    Behind that anger is sadness. How do we nurture curiosity and the desire for self growth?

    I believe there's a path forward that can nurture curiosity and growth.

    I've seen how these models can surface insights and patterns from overwhelming pools of information—hallucinations are always possible, but it's surprising how often they don't happen. I've seen how their "spicy autocomplete" can help me get where I intended to go faster—like talking to a fellow ADHD'er who sees where I'm going and jumps straight there.

    And these models aren't disappearing, even if the companies burning cash do. The models already released openly will power unexpected developments for decades, even if just passed around as warez torrents.

    This feels like the dot-com bubble all over again. When that bubble burst, the web didn't die: people with spare time and leftover experience built the blogosphere, API mashups, and the foundations of Web 2.0.

    I suspect we're heading for a similar pattern. Maybe it's wishful thinking, but I kind of expect we'll see a bust followed by cheap, surplus capacity that—while not the communal effort we deserved—becomes accessible to anyone who wants to experiment and build something better.

    # 11:38 am
    • llms
    • genai
    • ai
  • 2025 June 17

    • Hello world!
    • It continues to be kind of a perfect storm to bring a halt to my recent rapid-fire blogging. 😔
      • I'm pitching in on two teams at work, which really cuts down on time to stop and smell the RSS feeds to find things to write about.
      • Even though I'm doing a lot of LLM-assisted coding lately, I'm doing it for more projects than usual.
      • But also, my homebrew RSS feed reader just broke and I've been too busy to fix it. So, I haven't been, you know, reading feeds much lately.
    • I did just reopen the books on this Pebbling Club side project I've had going off & on since last summer. So maybe I'll resume progress on that too?
      • One of the things I did was to get this thing working on a more mundane stack with redis and postgresql, deployed via docker-compose on a server in my basement.
      • And then I wrote this post-receive git hook that enables a git-push deploy process like I'm running a real PaaS next to my water heater.
    • At some point, I need to sit down and actually write out a pitch or something for Pebbling Club. I'd like to make it into something, but the gears of time and motivation keep slipping.
      • It's kind of a mashup of everything I've been interested in building on the web for a very long time.
      • It's also currently a big mess.
      • I did make this sorta mind-map in an Obsidian canvas to try to sketch out the general aspirational concept though:
    • I need to work out a better way to post diagrams here. I've been meaning to do something with Mermaid and a web component, but... yeah.
    # 11:59 pm
    • miscellanea
  • 2025 June 11

    • Hello world!
    • Why is a celebrity podcast starting a mobile phone provider?
    • Finally watching Front 242's final live show and... dang.
    • Blogging has slowed here, but I'm hoping to pick it back up.
      • It was probably a shiny-new-toy phase for the first few weeks.
      • But also, I just changed projects at work and got suddenly a lot busier. So, my time for idle rumination has vanished for now.
      • I did post a relatively big thing on AI coding last weekend, so that's pretty good though?
    • My brain suddenly demands I play Terraria.
      • This happens every few years. And when it does, I get a sudden hyperfocus rabbit hole thing that lasts a week or so and evaporates abruptly.
      • I've only ever made it past the first few bosses in the game despite playing it since release back in 2011. 💀
      • I think that's my general M.O. with games: I get to a point where I'm like "ooh novelty" and then "ah, okay, I get how it goes from here" and wander off.
      • I very rarely want grinding or more of the same thing, once I see the pattern. Until, I guess, the anti-novelty wears off and it feels novel again? (Thus the repeat visits to Terraria)
      • Sometimes I'm really jealous of folks who can just lock into a Special Interest like Terraria and just milk endless reliable dopamine from the thing.
      • Meanwhile I'm like BORED NOW and have to go hunting again.
    # 11:59 pm
    • miscellanea
  • Moderating my Codegen Enthusiasm

    Most of my work has happened in Windsurf and Claude Code over recent weeks. I can picture a future where I'm essentially an LLM manager—keeping code-generation plates spinning and nudging toddling bots away from falling into ditches.

    Some folks claim they play games while the agent codes, but I'm actively reviewing as it writes. Turns out watching a bot write code for you takes surprising mental effort. 😅

    As I get deeper into this, I'm still processing the skeptical pushback. I know I'm drawn to novelty and clever tricks, so I'm trying to temper my enthusiasm and engage seriously with contrary opinions.

    Some people haven't had success with these tools, but "you're holding it wrong" is a bad response that doesn't address the real objections. I'm having concrete wins personally, but figuring out the precise how and why feels elusive—too many variables and RNG elements to be properly scientific about it.

    My main stake in AI coding is that it's what I'm paid to do right now in this industry. I am also rather fascinated with the stuff. Not exactly an unbiased position, but at least I'm not trying to sell anything other than my time & labor.

    I've seen arguments that this could all be Stockholm syndrome and excuse-making for the machine. Others warn that I shouldn't trust my own judgment on AI because I'm essentially self-dosing with cognitohazards.

    The more antagonistic responses make me sympathize with the guy who says his AI skeptic friends are all nuts—which feels like tit-for-tat, since accusations of mental instability seem to flow both ways.

    Honestly, I can also relate to just being done thinking about the whole thing for now. But, personally, I don't think I can afford to do that.

    # 2:05 pm
    • ai
    • llms
    • claude
    • codegen
    • genai
    • career
    • work
  • 2025 June 07

    • Hello world!
    • Well, this is kinda weird? I just noticed that all the H1s on my blog are the wrong sizes now.
      • Turns out Firefox redefined H1 sizes in the built-in browser styles based on nesting within article, aside, nav, section? I guess this will be a thing in other browsers too?
      • I'll have to fix that. I don't like this.
    • Oh hey: I just discovered that turning off Settings > General > Keyboards > Smart Punctuation on iOS means I can stop typing invalid JSON in Obsidian
    # 11:59 pm
    • miscellanea
  • Baby steps into semi-automatic coding

    So I did a thing. I spent time this week building an actual project using an AI coding agent. I ended up with 11,000 lines of code that actually work. To be clear: it wasn't great code—lots of boilerplate, plenty of "I would have written this eventually anyway" stuff—but it did what I intended it to do. More importantly, it got done without me having to fight my ADHD through every tedious implementation detail. [ ... 1017 words ... ]

    # 11:00 am
    • codegen
    • llm
    • ai
    • agents
    • windsurf
    • claude
    • gpt
  • 2025 June 06

    • Hello world!
    • My brain's been eaten by work for most of this week, so the blogging slowed down a bunch. Hoping to pick it up again soon.
      • I'm almost afraid to mention that I spent a bunch of this week deep down an LLM vibe-coding rabbit hole in Windsurf.
      • Just in time for Anthropic to cut Windsurf off from Claude models - oops.
    • We'll see how good it all ends up being, but I cycled through a handful of models and ended up with about 11,000 lines of code.
      • The code had unit tests and it pretty much did what I intended.
      • It wasn't great code - a lot of it was boilerplate - but it's mostly stuff I would have ended up doing myself more tediously while fighting my ADHD.
    • Trying to compose some thoughts somewhat along the lines of Harper Reed's LLM codegen workflow:
      • I settled on a workflow that wasn't just pestering the agent with wishes.
      • I had a series of discrete sessions, each started by creating a directory named for a new git branch. I wrote a shell script to semi-automate this.
      • In that directory, I wrote a couple hundred words of intention in a spec.md file.
      • I asked the agent to expand my intentions into a step-by-step plan.md file.
      • I edited the plan and asked the agent to review it critically and ask questions.
      • I answered the questions.
      • I asked the agent to review it again and tell me if the plan looked clear enough to start implementing.
      • When it said "yes", I told it to start implementing.
      • The agent started implementing while I watched.
      • Sometimes I interrupted and told it that it was on the wrong track. But, for long stretches I was just reviewing the code as it wrote.
      • When it claimed to be done, I asked it to review the current changes against the plan and judge if it was really done.
      • Sometimes it wasn't and it went back to work.
      • When it petered out finally, I told it to make sure all the tests passed and linting errors were fixed. It did that.
      • I made sure the tests made sense, myself, fixed a few that didn't. Then I told it to run the tests some more.
      • Finally, when I was okay with the results, I told it to review our entire chat history for this session and summarize the results in a notes.md file.
      • In particular, I told it to pay special attention to things we did that hadn't been captured in the plan. Try to come up with unexpected conditions and derive some lessons learned.
      • These notes ended up being actually pretty good?
      • These three artifacts - spec.md, plan.md, and notes.md - were committed along with the code. That marked the end of the session and the branch.
    • Now, I won't say that each of the sessions I ran went perfectly. But, I expected it to be an exploration.
      • I switched models a few times between Claude Sonnet 3.7, GPT-4.1, and SWE-1.
      • I found Claude to usually work the best. It just sort of got to work and did the needful without enticing many objections from me.
      • GPT-4.1 seemed to like to make very detailed plans (even after reading the plan.md), ask lots of questions, and then drive off into the ditch and need rescuing.
      • SWE-1 was about in the middle - but I ended using it more because there's a promotion running right now that makes it free in Windsurf.
      • Occasionally, I'd switch models mid-session just to see what happened. I'm not sure how to characterize the differences, but they each had slightly different coding styles.
      • Claude and SWE-1 did better than GPT-4.1 at picking up from unfinished work in progress, I think?
      • Still, even with the needful babysitting, between these models I did get stuff implemented and it looked a lot like what I would have written if I'd had the executive function to work at it as doggedly.
    • I think I've learned that a focused scope and context window management are essential.
      • A few times, I think I asked the agent to bite off more than it could chew? Maybe I blew out the context windows? This is something I could get quantified answers around, if I paid attention to the metrics.
      • In those cases, I stopped the presses, backed up, and reworked the spec into a smaller scope.
      • Sometimes, I found it handy to get to the point of having the plan.md tuned up, then started a fresh chat with only the plan as context to start. That seemed to work pretty well - again, I think freeing up some of the context window with more condensed material.
    • Occasionally, I wandered off into the weeds myself and my session-based approach devolved into chatty iteration. That worked well for making very small tweaks and fussy updates.
      • I also learned that I'm good at juggling lots of git commits as save states. Whenever things were in a decent enough state, time to commit now and clean up later.
      • I forgot this a few times and lost some progress after driving into a ditch. But that wasn't too much of a hardship, since I could usually just scroll back in the chat and re-attempt the relative bits of the session for similar results.
    • I should clean all these bullets up into a proper blog post, but maybe tomorrow. The tl;dr, I guess, is that I think I'm getting comfortable with this stuff.
      • It's surprising me with how much it gets done.
      • I'm getting less surprised with where & how it goes wrong.
      • The failures seem manageable and the results seem decent.
    • I had a kind of meta-chat with Claude about the above process, trying to think through some improvements.
      • One interesting notion was to use some big cloud models for the spec.md to plan.md stage.
      • But, then, switch to a local model running on my laptop for the actual process of implementing the plan.
      • Then, switch back to a big model for the notes.md summary.
      • If this worked, it could save a lot of tokens!
    • I could also see all the above being bundled up and semi-automated into its own agentic workflow.
    # 11:59 pm
    • miscellanea
  • 2025 June 04

    • Hello world!
    • The Verge, How to move a smart home
      • We've moved a lot. Mostly, I distrust smart home gadgets and don't have many. But, several of the houses we've owned had lingering smart devices. Many of them ended up useless. Occasionally a Nest thermostat could be coaxed to betray its former owner and work for me. For the most part, it's a mess.
    • Once upon a time in college, I got a dial-up network connection working to my Commodore Amiga 1200 in my dorm room. I sprinted across campus to a computer lab to telnet back into my A1200. It was so neat. And pointless. But neat.
      • This, of course, was before it occurred to me that anyone with my temporary IP address could have also telnetted into my A1200. 🤷‍♂️
    • Had some adventures in vibe coding, last night. Maybe I'll write about it? I keep reading folks saying this stuff doesn't work, but... it does?
    # 11:59 pm
    • miscellanea
  • Adventures in Vibe Coding with Grafana and Claude

    Since re-launching my blog, I wanted to monitor traffic and logs more closely. Nothing groundbreaking, but it had been a while since I'd run Grafana, Prometheus, and Loki on my own hardware.

    Turns out there's this handy all-in-one docker-compose setup that runs on Synology NAS. It fired up with minimal fuss, and soon I had metrics machinery humming in my basement—except the package didn't include Loki. A quick docs consultation got it running alongside the rest.

    My blog is a static site hosted via AWS S3 and CloudFront. Both services dump logs into an S3 bucket, but I'd never bothered reading them before—and didn't want to start now. Instead, I loaded up Claude.ai and described my problem:

    I want to get logs out of CloudFront. I have enabled new-style log delivery that stores gzipped JSON logs in an S3 bucket at s3://lmorchard-logs/blog.lmorchard.com/ with names like E5YXU82LZHZCM.2025-06-04-04.d024d283.gz

    Can you help me write a script for my home Loki server to download only new log files and push them into Loki?

    Claude stepped right up:

    I'll help you create a script to process CloudFront logs and push them to Loki. Let me write a Python script that tracks processed files and handles the gzipped JSON format.

    After some vibey iteration, we landed at this artifact:

    It's quite verbose and could use some tightening up. But, I really don't care—it does the quick & dirty needful.

    I wrote zero Python. I just henpecked Claude to add features until the script did what I needed. I wasn't even in an IDE, just the Claude.ai interface in a browser. An interesting thing to note is that Claude didn't have access to my AWS resources—I didn't even give it a sample of my logs. But, still, what I told it about JSON, S3, and CloudFront was enough for it to be off to the races.

    Anyway, after a quick review and a satisfactory dry run, I dropped it into a cronjob to grab new logs every 5 minutes. Then I pestered Claude with Grafana dashboard questions I could have figured out myself. But why read docs when you can just ask? (Which I realize is ironic, since I wrote Too long? Read anyway. but I think I make an exception for LLMs.)

    Total time from idea to working dashboard: about an hour.

    Not revolutionary, but pretty satisfying for barely having to think about it.

    # 3:26 pm
    • grafana
    • claude
    • vibecoding
    • llms
    • ai
  • 2025 June 02

    • Hello world!
    • Jotted down a couple posts today on AI stuff that aren't particularly revelatory.
      • If anything, they're just me trying to think out loud and clarify.
      • I'm probably going to try writing more stuff like this, if only to be Wrong on the Internet and lure someone in to correct me. 😅
    • Dang it, I don't wanna go to bed, I just discovered strudel.cc
    # 11:59 pm
    • miscellanea
  • Quoting W. David Marx on Gen AI

    W. David Marx, GenAI is Our Polyester:

    Everyone knows happened next: There was a massive cultural backlash against polyester, which led to the triumphant revaluation of natural fibers such as cotton and linen. The stigma against polyester persists even now. The backlash is often explained as a rejection of its weaknesses as a fiber: polyester's poor aeration makes it feel sticky. ... While polyester took a few decades to lose its appeal, GenAI is already feeling a bit cheesy. We're only a few years into the AI Revolution, and Facebook and X are filled to the brim with “AI slop.” Everyone around the world has near-equal access to these tools, and low-skilled South and Southeast Asian content farmers are the most active creators because their wages are low enough for the platforms' economic incentives to be attractive.

    This along with remembering that some professors are going back to handwritten essays (and also that handwriting is better for memory and learning) had me wondering if there's going to be a handcrafted backlash in the next few years?

    I write journal entries nearly every day by hand—albeit these days on an e-ink tablet. I think that helps me focus on what I want to dredge out of my head. I keep meaning to get back to that handwriting recognition project I started a few weeks ago, since no product I've tried yet has been able to turn my writing into clean machine-readable text.

    But, then again, maybe producing machine-illegible works by hand will be the next big trend?

    # 4:58 pm
    • genai
    • ai
    • llms
  • My New Rube Goldberg Blogging Machine

    According to the count in my archives, I've published over 50 blog posts in the past few weeks. That's roughly 50 more than I managed in the previous 10 years! These aren't masterpieces—mostly just random thoughts and half-baked ideas. But as I mentioned before, I'd rather throw a bunch of stuff at the wall and see what sticks than spend another decade crafting the perfect post that never gets published. So, here's how I tinkered my way into a writing setup that seems to actually be working. [ ... 873 words ... ]

    # 4:04 pm
    • obsidian
    • writing
    • metablogging
  • The Bomb Still Works: On LLM Denial and Magical Thinking

    I found myself in a frustrating argument with someone convinced that LLMs are pure vaporware—incapable of real work. Their reasoning? Since LLMs were trained on stolen material, the results they produce can't actually exist.

    Not that the results should be considered illegitimate or tainted—but that they're literally impossible. That the training data's questionable origins somehow prevents the technology from functioning at all.

    I couldn't convince them otherwise. But, life isn't fair and both things can be true simultaneously: the origin of something can be problematic and the results can be real.

    This analogy kept coming to mind: If someone steals materials to build a bomb and successfully builds it, they have a functioning bomb. The theft doesn't retroactively prevent the bomb from existing or reduce its explosive capability. Proving the theft might help with future bombs or justify going after the bomb-maker, but it doesn't cause the current bomb to magically self-dismantle.

    This seems obvious to me—embarrassingly so. Yet I keep encountering this form of reasoning about LLMs, and it strikes me as a particular kind of denial.

    There's something almost magical in the thinking: that moral illegitimacy can somehow negate physical reality. That if we disapprove strongly enough of how something was created, we can wish away its actual capabilities.

    The ethical questions around LLM training data are important and deserve serious discussion. But pretending the technology doesn't work because we don't like how it was built isn't engaging with reality—it's a form of wishful thinking that prevents us from dealing effectively with the situation we actually face.

    Whether we like it or not, the bomb has been built. Now we need to figure out what to do about it.

    # 12:20 pm
    • llms
    • ai
    • ml
  • Why Prompt Engineering Isn't Just Good Writing

    Someone told me that prompt engineering isn't real—that it's just techbros rebranding "good writing" and "using words well." I disagree, and here's why:

    Prompt engineering fundamentally differs from writing for human audiences because LLMs aren't people. When done rigorously, prompt engineering relies on automated evaluations and measurable metrics at a scale impossible with human communication. While we do test human-facing content through focus groups and A/B testing, the scale and precision (such as it is) here are entirely different.

    The "engineering" aspect involves systematic tinkering—sometimes by humans tweaking language, sometimes by LLMs themselves—to activate specific emergent behaviors in models. Some of these techniques come from formal research; others are educated hunches that prove effective through testing.

    Effective prompts often resemble terrible writing. The ritual forms, repetitions, and structural patterns that improve LLM performance would make a professional editor cringe. Yet they produce measurable improvements in evaluation metrics.

    Consider adversarial prompts: they're often stuffed with tokens that are nonsense to humans but exploit specific model quirks. Here, the goal is explicitly to use language in ways that aren't human-legible, making attacks harder to detect during review.

    Good writing skills can help someone pick up prompt engineering faster, but mastering it requires learning to use words and grammar in weird, counterintuitive ways that are frankly sometimes horrifying.

    All-in-all, prompt engineering may still be somewhat hand-wavy as a discipline, but it's definitely real—and definitely not just rebranded writing advice.

    # 12:12 pm
    • ai
    • llms
    • promptengineering
  • 2025 May 31

    • Hello world!
    • I need to come up with a process here that keeps these miscellanea posts marked as a draft, if I never get past "Hello world!"
      • I start a new file every morning from a template, with the intent that I'll drop by and jot some things here throughout the day. But, this week turned out to be particularly busy. So, I went a few days never getting past "Hello world!" and that's not super interesting to publish.
      • At some point, I want to hook this stuff up to Mastodon and Bluesky accounts. I don't want to just post templated nonsense. (Just intentional nonsense.)
    • Maybe there's something in the air, because a week or two ago I got suddenly compelled to dive down a rabbit hole about the transformer robot watch I had when I was a kid in the 80s.
      • The one I had was confiscated by a teacher and never given back. I'm still salty about that.
      • But, just a couple days ago, I saw this video from Secret Galaxy on the history of the Kronoform watch
      • From there, I found this giant-sized printable version of the Takara Kronoform in desktop clock form - I'm going to have to give that a try.
      • I kind of want to try building some version of the robot watch with some smart guts. I probably won't get around to it, but why do smart watches have to be so boring?
      • Maybe I can split the difference by sticking a smart display in the desktop clock version? Hook it up to Home Assistant and make it do... I don't know what.
    # 11:59 pm
    • miscellanea
  • No-build frontend web development

    Simon Willison on no-build wedev:

    If you've found web development frustrating over the past 5-10 years, here's something that has worked worked great for me: give yourself permission to avoid any form of frontend build system (so no npm / React / TypeScript / JSX / Babel / Vite / Tailwind etc) and code in HTML and JavaScript like it's 2009.

    This blog has a "backend" build process to produce the static HTML. But, the frontend is pretty much build-free.

    Web development with "vanilla" JavaScript has gotten pretty good in the last decade, thanks to Modules, dynamic import(), Custom Elements, and a pile of other relatively recent APIs.

    The easy path at work these days tends to be Next.js, but I kind of hate it. All my side projects start with touch index.{html,js,css}. I roll on from there with maybe a live-reload HTTP server pointed at the directory (e.g. npx reload src).

    That said, I have started playing with carefully re-introducing some build tooling for a few side projects - but, only for external dependencies. I've tinkered a bit with using esbuild to compose bundles as JS modules importable by the rest of my unbundled modules.

    The nice thing about this is that I can treat those external dependencies as standalone utility modules without infecting the rest of my project with build machinery. I can even just check in a copy of the built asset to keep the project stable and usable, years later.

    # 11:04 am
    • es6
    • js
    • javascript
    • webdev
  • 2025 May 29

    • Hello world!
    • Been doing a bunch of vibe coding lately in Windsurf, "pairing" with Claude. A thing I keep wondering is how to make this process more multiplayer.
      • Like, there's a conversation between Claude and I. But I can't easily share that transcript with another human teammate.
      • That conversation is about as important as the code for making sense of things. More so, if we start to consider the code as an increasingly derivative product of the conversation.
      • So, if my teammate is also working in Windsurf with Claude, they're missing all the context I built up that brought the project to its current state.
      • And this isn't even getting into the notion of "mob coding" where maybe there's 2-3 of us humans with an AI agent riding shotgun.
      • I'm thinking the conversation with the agent is a particular form of documentation that should be preserved - maybe as an artifact paired with each discrete git commit?
      • Of course, the conversation is messy, with lots of iteration. So maybe it would help if there's a summary or a tl;dr ginned up at commit time, too? (That could be the commit message, I guess?)
    • I like the notion of Architecture Decision Records (ADRs) - I wonder if something like that could work for iteration sessions with an AI agent?
      • If we can scope a session to something discrete like a feature and capture the conversation from start to end in one of a rolling series of markdown files, that might be interesting context for both human and AI.
    • I know all the above presupposes that coding with an AI agent is a real and valuable thing. But, after putting a bunch of hours into giving it a try, I've morphed from skeptical disbelief to cautious buy-in.
    # 11:59 pm
    • miscellanea
© 2024 Les Orchard <me@lmorchard.com>
  • feed