An AI Skeptic Uses AI for a Week
If you’re an AI skeptic, I get it. I was one too, and still am in some ways. I don’t see any benefit to stuffing a chatbot into most of the apps and websites I use. I don’t care about a summary of an email I received or a meeting I attended.
In that regard, there’s stock-price-driven hype train that I’m ready to get past. However, that’s not all there is to it.
It helped me make BBQ ¶
I have made a point the last week or two to really dive in and give AI a fair shake, and it has opened my eyes. AI doesn’t belong everywhere, but it does have its place, both for daily activities and to assist with software development.
The first, simpler thing I did was use Google’s Gemini to help make a purchasing decision. I was looking for a wireless meat thermometer that I can use when smoking and grilling. There are lot of options on the market, but most of them only work in conjunction with a smartphone app. I like those apps, but I wanted one that also has its own standalone base so I can see check the temps at a glance in the kitchen.
So I told Gemini in a few sentences what I was looking for, and how many probes I wanted. It crunched for a while, then came back with a list of models that met my criteria along with a summary of reviews. This included two I was already aware of from earlier research (The Typhur Sync and Thermomaven P2), plus a third I had not heard of (The Inkbird INT-12-BW).
Gemini showed its whole “thought process”, including links it scanned, reasons why it eliminated other models from the list, and a fairly reasonable synopsis of what differentiated these three. All three models met my criteria exactly.
It was far, far faster than reading dozens of reviews and watching multiple YouTube videos. The synopsis wasn’t perfect — for one model, it listed a probe thickness in mm, while the other two were “thin” and “very thin” — but it was enough of a start for me to dig in and do further research if I wanted. Of course I checked a few reviews myself afterwards to make sure everything checked out with Gemini’s response, and it did. (In the end I opted for the Thermomaven; the median priced of the three.)
Would this be silly for simpler web searches? Of course. But for something like this, where I know I would be clicking through dozens of links, I think it really helped narrow my search.
It is impressive at software development ¶
Of course the real hot topic isn’t web search, it’s what this means for the world of software development. What I found here blew me away.
I installed Warp and used it as part of all my development work for a week or two. This let me choose from multiple agents, though I just left it on the default, Claude 4 Sonnet.
I have used Copilot in the past. Critics often refer to LLMs as “glorified auto-complete”, and that’s very much what Copilot felt like to me. It was useful for rapidly spitting out large blocks of repetative code, and it could write small, common utility functions for me.
Claude is different. It can index my entire codebase, make sense of how the application is structured as a whole, and determine where and how to make changes. It shows me a diff of every change it wants to make before making it, which I can accept, reject, or further refine with additional requests.
At first, I gave it simple tasks:
On the Organizations pages, change the text on all ‘Add users’ buttons to ‘Assign users’.
It did so without any trouble, and without me ever telling it what the organization pages are or what React components render on them. What immediately stood out to me is it found the pertinent files far faster than I would have. Our codebase has nearly 3,000 files of TypeScript, and sometimes just tracking down the right components for a change can take a little work if it’s in a corner of the application I’m not familiar with.
I had another simple change to make:
On the Organization Users page, change the ‘Manage roles’ button icon from a gear to a pencil and update the tooltip to say ‘Manage organization roles’.
It figured out it needed to import PencilAltIcon
(matching a pattern we use elsewhere in the application,
notably not using the PencilIcon)
and replaced the CogIcon in the component.
It didn’t delete the old import of CogIcon, but a follow-up request could
have done it (IIRC, I did it myself).
Then a bit more of a challenge: tests.
We’re in the process of migrating tests off of Cypress, so we now have a
hodgepodge of Cypress, Playwright, and Vitest tests in our repository.
This component was still tested with Cypress component tests, so I asked
Claude to create a vitest file, expecting it to write an empty scaffold
where I could place tests.
Instead, it wrote a half dozen tests for the component:
- render the user list with data
- show the empty state when no users are assigned
- display an error message when loading fails
- show the assign users button when the user has permissions
- disable the assign users button when the user lacks permissions
- show the empty state with a permissions error when the user lacks permissions
They were all good tests, though I told it to rephrase test names to start with “should”. Next I asked it to compare these tests with the existing Cypress tests and determine which Cypress tests were no longer needed. It correctly assessed them, and, after I prompted, deleted the obsolete Cypress tests (leaving one other more complex test in place that I wasn’t ready to convert yet).
Some of these tasks required a little refinement and/or follow-up requests
(“Move the mock data to files with names ending .fixture.json”).
It’s not a matter of asking for magic and then mentally checking out,
but it is still impressive.
At this point, I was determined to push this thing as far as I could and see what happened.
Giving it a more challenging task ¶
That “Assign users” button opens up a multi-step wizard. The second step in that wizard is further sub-divided into two sub-steps, but we’re in the process of consolidating some endpoints and they can be merged.
So, in a new session and a new git branch, here’s the prompt I gave Claude:
The wizard to assign users to an organization has two screens for assigning roles: one for Automation Execution (awx) and another for Automation Decisions (eda). I need to consolidate these two screens into a single screen that uses gatewayAPI urls instead of awxAPI and edaAPI endpoints. Role definitions can be retreived from
/role_definitions/. At the end of the wizard, these assignments need to be POSTed to/organizations/:id/users/associate/
It created the new consolidated wizard step and replaced the two sub-steps with the new one. It figured out how the wizard component worked, how this particular instance of the wizard component pieced together its steps into a configuration array containing multiple other components, and it determined how to work with that without any more specific guidance. This involves dozens of React components and almost as many hooks working in conjunction. I was genuinely floored.
I did notice it was repeating some work we already have elsewhere in the codebase, so I prompted:
there is a generic SelectRolesStep component. Can we use that in this implementation?
It did so.
After each of these prompts, it showed its thinking, usually a screen or so of text. It listed each of the files it read at each stage to see how they work together (sometimes a significant list). Then in presented the proposed code changes.
If something was not quite right, I had the option to ask for refinement immediately, or to accept the change and make a follow-up request afterwards. At the end of the process, it presented a full page summary of the changes we made, including a five bullet point list of the “Benefits of the consolidation”. That’s more than I need, really, but it gives a sense that this understands what the changes are for at a higher level than my prompt provided.
At one point, I noticed a proposed change included a repeated set of five or six lines of code, syntactically incorrect. I thought I would just accept the change and delete the duplication myself, but I didn’t need to. After applying the change, Claude immediately suggested a second edit, fixing the problem.
This particular piece of work took maybe seven or eight prompts, including updates to tests. A few of the test updates needed a little more hand-refinement afterwards, but it took care of the bulk of the work.
The whole process felt very much like pair programming with a junior developer, but one who thinks and makes code changes incredibly quickly. I had to point out a couple places where our codebase already provided utilities so it wasn’t duplicating work. In other cases, it found things that would have taken some digging on my part. If something wasn’t quite right when I tested the UI, I told it — sometimes it found the problem and fixed it; other times it struggled. The place it seemed to have the hardest time was dealing with TypeScript errors (relatable), and for those I ended up fixing the issues myself.
I managed staging and committing all changes myself so I could see everything it was doing to the codebase — though you can make Claude do this as well if you’re confident enough in its abilities.
It’s a tool ¶
I saw this post recently on Mastodon:
I’m witnessing professionals spending hours to engineer prompts for Copilot to do basic things like find a line in a file that has a string in it, or figure out who committed most frequently to a sub-path in a repository, or generate boiler plate code for classes… None of these things require #AI.
grep,git log.., and editor snippets have existed for a long time. They are quick. They are EXACT. They are FREE. They do not boil the oceans. They do not displace the workforce. They are more efficient and productive.Learn the tools of your trade. If all you’re doing is using AI, that means AI can and will replace you.
I understand what he’s getting at here, and it’s an absolutely fair point. I don’t want to turn a blind eye to the resources used in making or using AI, its impact on society, and all the ethical questions surrounding it. But I think at the same time he misses a key fact: AI is now one of the tools of the trade.
Just like any tool, the trick is going to be learning when to use it and when not to. It’s certainly going to take some experience to get a feel for what it can and cannot do, and when it is and is not more efficient than other approaches.
At least one early study indicates AI doesn’t provide the productivity boost many think it does. This is important to keep an eye on. At the same time, I don’t think productivity is the only gain; in a couple instances, it found bugs in my code that I had missed.
With this Warp prompt in front of me, there is certainly a temptation to disengage my thinking and make it do all the work. For simple use-cases, I can see where that sort “vibe coding” can work, but at this junction, I don’t see that being my approach. I value clean, maintanable code too much to blindly accept all changes AI produces.
Just like working with a junior developer, I need to keep my mind engaged to track what it’s doing. If something looks funny, I’ll ask it to do it differently. As I get more familiar with it, I’ll turn to it more when it makes sense to and less because it’s a fun novelty.
My gut says this is really efficient at finding what needs to change and scaffolding in a rough approach, but generally less efficient in the polishing stages of development.
In all, I think I’ve come to find a more balanced approach is needed when it comes to AI. It’s no magic bullet, but neither is it a complete waste of time. In coding, the answer has always been “it depends,” and that still hold true today.
