Preface
12 years ago I've had a chance to firsthand witness embedded paradigm shift. While being at Uni I got lucky and started working for large automotive player in delivering embedding firmware directly to production cars.
It was not responsible of course, but at the time I just though it was cool. In that era there was a lot of hardware guys and a lot of software guys in the industry, proper embedded engineers however, were a new breed which I was a part of. Paradigm shift was in the influx of low powered highly capable very integrated chips which suddenly allowed edge applications to emerge and start treating embedded hardware as a service enablers rather than single purpose devices. This shift allowed me to keep same system level thinking, while letting go of pure robotics. These new devices were low power, embedded, but required networking, associated on-prem high performance software and then cloud management layer. During years I've done automotive, medical, aerospace and industrial functional safety stuff which exposed me to a more formal ways of engineering.
It was quite a ride honestly, from writing custom high performance SPI device on proprietary 32-bit RISC chip in assembly, through automating end of line tests and production flashing of 10K high performance custom radio gateways a year to managing multiple teams having it all: applied physics, radio, embedded, networking, manufacturing, low level on-prem software, management cloud infrastructure and associated mobile apps.
One major thing changed for me during that time: fun. I've heard this French Taoist idea once along the lines of: "When you build a ship, you also build a wreck". While learning tons of ideas and being lucky enough to have a chance to try them out in the real world, I've started to develop anxiety to creation. In personal time I've at least kept doing Advent of Code every year, always trying different new programming languages but even in that joy slowly vanished.
All this changed Yesterday, when I felt spark of that joy again while crafting product requirements document (crazy isn't it) for a long thought of project (Carolus) under mentorship of Opus 4.6 from Claude. This isn't a blog about AI's impeding implications on the society and associated marketed upcoming paradigm shift, but rather a personal journey of mine rediscovering joy in building stuff.
Prologue
I've dabbled with LLMs for quite some time now, but until Opus 4.6 they felt more like a glorified (although very helpful) If-this-then-that automation tools. Yesterday, while being backseat, driving to a weeklong vacation, I've first time tried the approach I envisioned for quite some time. I've spent 3 positively frustrating hours being pushed by Opus to better refine product requirements document. You see, before starting this journey I've set following personal preferences setting in Claude:
Thanks to this, Claude wouldn't budge even when I behaved as an 8 year old and tried to offload "vegetables" work on it.
All right, what are we building? Kindle I have, is one of the best pieces of engineering I've ever user. It solves the "reading" problem for me consistently for last 10 years. One area where I struggle is weekly periodicals. Yes, I am old fashioned but I like to read about society with bit of a filter and a chance to let things settle for a bit. At the same time printing colored 50 pages every week seems a bit wasteful and I do not like to read long texts on my phone on monitor screen. I mainly read RESPEKT (CZE) and .týždeň (SVK).
First one used Kindle News stand subscription and allowed automatic delivery which I liked very much (
Amazon killed the service in 2023). Latter one did not provide this option, so 6 years ago I sat down one weekend and built a thingy which would use credentials of my active subscription, login using PhantomJS minimal browser, scrape the web for latest issue content, apply RESPEKT visual style on in by Python scripts (as I liked it very much) and push it to my Kindle via e-mail. I used this for couple of months but never pushed myself to step it up a notch, clean it up, make it a proper project and open source it for others to use as well. Few years back I deleted the whole project and stopped using it. Year ago I started to play with the idea of making this alive and started to write something in Golang, but again, I didn't want to take this responsibility with all the work I've been doing in the meantime.
Then I've read about people being blown away by Opus 4.6 and with me having planned vacation for mid February I've decided to try it out properly on this problem, to really see the limits.
Day 0 - Requirements
As I had my fair share with formal engineering I am a bit ambiguous about PRD (Product Requirement Document). Having requirements is key, but this mostly fails on people "wanting something" not being able to properly verbalize it and people "doing it" for them not being able to properly guide the first group to a desired common understanding. In order to avoid this frustration and need to understand each other most (even professional) settings choose to skip this step altogether, often times arguing there is simply no time.
Does AI make this go away? Of course not, language is very ambiguous by nature and meanings and interpretations can still be fuzzy. Global teams, such as the one I am currently part of in work mostly use English as standard, but still many things are left unsaid, misunderstood and so on. Where I see benefit is that making this formal automated requirement retrieval process as part of the creation of every new repository can save up a lot of time and namely help junior teams to start properly. By altering the prompt you can even weigh your time/frustration/quality slider based on project criticality.
Again, I spent three hours positively frustrated by the process and smiling during its course. In the end and one additional hour before going to bed PRD was assembled. With my slider moved towards mentoring, so that I still have to do most cognitive load I don't really think I lost that much experience during the process and actually got couple of good remarks along the way. Was the result awesome? No. Was it good enough? Yes. Would I have made it without AI? No.
Day 1 - Architecture
Now we have basic product requirements ready, what's next? Let's start by creating ARCHITECT.md agent, this agent should take PRDs we created Yesterday and produce project file structure containing requirements and high level architecture.
This truly surprised me, I know it is a very simple problem to solve (on purpose) but anyways I've chipped in the request to also include
C4 model based architecture using PlantUML built-in within Markdown files, which are my current go to for all diagrams-as-code needs, and the result came in below 5 minutes and was good enough and without any major conceptual flaws.
Namely data flow seemed spot on on the first look. There will most likely be changes needed along the way, but this architecture is sufficient for such an early stage and since it would take me quite a lot of time to produce all of this I would most likely not do this at all, and just have it in my head instead.
It even included slim ADR part at the end, what a beauty! Most of all, architecture is implementation agnostic, this is the part where I struggled with a lot of my colleagues and friends, I prefer this very much at least for the early stages. From my perspective first step is to understand the problem, make rough high level architecture design, then play roleplay on top if to identify weak spots and flat out nonsenses.
For years I've been proponent of the idea that implementation is necessary evil not the ultimate goal. I envision engineering as a craft in which you own the dynamic problem, meaning not just problem of Today, but also implications this problem will bring tomorrow. Having possibility to template out workflow of independent requirements, architecture and implementation and adoption of best practice that every change starts consulting the requirements first opens up possibility to improve engineering wise even smaller businesses and startups.
Now, you might think, as a next step, let's just build it! Thanks to the cheap inference (remember this design took less than 5 minutes) and flexible agentic workforce (both number-wise and trait-wise) we have a lot more options (e.g. spin up 10 different agents, each with slightly edited ARCHITECT.md file to fake "heterozygosity" of beliefs and let them review designs in parallel and blindly rate them 1-10 and pick the one with the highest score for example). For this sample project I feel we can do just fine with less flare. First let's create PR containing changes and ask for review.
Now, we create QA.md agent and let it do a thorough review of that PR. This is what came as a default definition for that role, good enough for now.
Let's do a review! I am quite surprised by the thoroughness, it request the original PRD file which I created in the car using web Claude project, so it was not present in the repo, and found some inconsistencies which I did not while doing a quick walkthrough.
Review was posted with very nice formatting to the PR
Good, I did a review myself as well, found two minor inconsistencies. First was that ADRs were not in ADR folder an that link led to nowhere. Second was that main README.md project image was linked from a comment, not stored as asset in the repository. Let's merge!
Day 2 - Nitpicking
Well, I was so awestruck by the progress that I've made a major mistake.
I chose PlantUML implementation over Mermaid, because by default Mermaid generates harder to read diagrams, but I forgot that we are on GitHub and not GitLab (due to trying out Agent integration) and that GitHub still does not support native PlantUML rendering. No problem, let's ask our ARCHITECT.md to rework diagrams from PlantUML to Mermaid but to keep C4 style.
Let's provide this feedback to the ARCHITECT.md, my personal experience is that Mermaid is very limited in styling, but let's see what can be done to improve this.
Hmmm, this did improve things a little, but still when you look at C3 level for example, both renderings are still far of from the same. At this point I feel urge to migrate project to GitLab, but I wont do it. Other option is to have raw notation of PlantUML in the repo and have rendered images as static assets, this I've had in the past on some projects and it is a pain. For V1 I will keep it as it is mainly as with this projects my long term goal is to see how good agents are in evolving repos, which is a standard thing, stack changes, hosting shifts, etc. even though current state brings me visceral pain. I still do hope that GitHub will add PlantUML support at some point.

Day 3 - Implementation design
What? Still no implementation? Right, next step would be to ask ARCHITECT.md to make V1 implementation design choices and QA.md to review them. While doing that I am not fully in on the testing part yet, so we need to extend that too. At this point I though of using
spec-kit, but it seems to me as it needs a bit more time to iron out and standardize, although I like the direction!
Python is a good enough choice for V1 (again gives us ability to redo implementation to Go or something else later should the need arise). I have stepped into this and forced V1 implementation in GitHub actions only, as do want this to be small and private for now.
Some of the reasonings below are arguable, but we have them written down! And we can get back to them for V2 or any other future revisions.
DevOps reasoning looks a bit better though, I feel like we have a good enough basis to start. Let's continue by asking our ARCHITECT.md agent to breakdown implementation design to initial issues for the developer, while doing that we also introduce DEVELOPER.md.
With this list we start by asking our ARCHITECT.md to come up with scaffold preparation, both repo structure and tooling. It did quite a good job, although it started to develop directly on a local machine. I nudged to add all the prerequisites and installation steps to README.md and to create a DevContainer configuration in order to stay away from "it runs on my machine" from early on. At the same time, I've added a remark to keep reference of the issue where request came from for traceability (as I like to do a fast forward merges to keep repo clean). This worked quite well and I was able to reproduce the steps and verify everything works from VS Code and the DevContainer.
Good, now, before go all in on implementation, let's finish other two preparation tasks. In the meantime as this is running, I tried Sonnet 4.6 that came out during this week, not too shabby as well!
Still one thing remains before implementation, I dislike the fact that testing is only described quite vaguely for now, advantage is that I have couple of EPUB files I created for myself years ago, I will provide those to the QA.md and ask to come with testing architecture.
This might have been too high level and too ambitious to not break down, but let's see. Result is actually very good, it came with testing.md file in the docs which looked similarly to architecture.md and contains detailed list of
tests. There were minor nitpicks from my end, but mostly to approach and structure, testing methodology seems good enough on the high level to me.
Day 4 - Orchestrate
Now this is the moment where classical music should start playing, let's give it a following prompt and observe the magic.

This actually went
quite ok, although team currently shows same quality as most of the medior teams I worked with in my experience so far: unclear definition of done. As we created all the quality guardrails, code is nicely structured, linted and unit tested. But no one actually tried running it and making sure it really does its job, not even our QA.md and ARCHITECT.md pointed this during review.
As a side note, calling such a high level long running request invoking couple of agents and a lot of tool calls will eat up your tokens extremely fast. So I have to wait for tomorrow.
Day 5 - Orchestrate better
Let's start by asking our ARCHITECT.md and QA.md to create definition of done documents, two points I've forced there myself are need to only allow mering of code to main which has been e2e tested and really works and providing documentation changes as part of the PR as well. Other than that I've let agents to come up with points they deem important. I've hinted this could be made in to PR template.
Team came up with
this, let's apply these learnings, tackle last PR again and see what happens. Changes were made with one last remark:
Now let's step it up a notch and provide actual user credentials in form of repository secrets (not ideal, but good enough for V1) and see what happens.
Finally we have some failing
test! Exciting! Interestingly, it doesn't fail on the good login but on the bad login attempt, most likely not cleaning up between runs? Let's have DEVELOPER.md tackle the debugging.
Well. This resulted in a rather junior try fix loop banging head on the wall without trying to understand the underlying problem first. Again, this is often time the case for humans as well, I do that often too, I come up with amazing idea, implement it first, then it doesn't work and a keep doing minor changes to it rather than taking a step back, throwing it all away, understanding constraints first properly and only then creating something.
Let's forcefully break this loop and try some mentality changes to our developer.
Let's now discard all the code written so far for authentication and ask DEVELOPER.md to start fresh while focusing on updated mentality.
So far so good. At this point I had to buy additional tokens, this is so addictive! Yes, this should have been a retrospective and team should have figure it out by themselves, but I do value resources of this planet to save some tokens whenever possible, on the other hand, I could have written something like "assumption is mother of all fuckups" so I still think this is an appropriate choice.
First try fails again, although not on the good/bad login, but post login behavior. Try again and let's see what happens.
Our mentality trait is doing its charm, will be enough?
Now let's one final review by ARCHITECT.md, QA.md and myself. Looks good, some minor hints here and there, but overall, good enough for now. Good, now let's try another issue and see whether our team has grown in experience.
Looking good, it seems even ARCHITECT.md is reflecting this recent experience. Implementation looked
sufficient albeit with omission of cover image (even though it was explicitly mentioned in the requirements). Quick point to that direction made the DEVELOPER.md add it in breeze. Again hit the token limit. After that, let's continue with next issue. Afterwards we finally have authentication, scraping and conversion ready. Let's look at first .týždeň issue generated by Carolus. It actually works and there are only minor things missing (e.g. word count per article).
Seems good for now.
I've quickly tested sending via e-mail manually and the EPUB works well on the Kindle, there is some styling weirdness (block renders text differently) which I might need to tackle, but other than that all works. I've even asked DEVELOPER.md to research what can be done about thumbnails as didn't have them back in the day and it solved them!
Day 6 - Finalize domain
Let's setup a domain with SMPT server, properly set SPF and DKIM and also try to fool our DEVELOPER.md agent in the meantime to see how well can it find the issue. Instead of providing secret SMTP_USERNAME let's provide SMTP_USER. It didn't see it on the first time. Even our QA.md and ARCHITECT.md agents didn't care, they don't even bother checking E2E tests failing and gave approve on the
PR, fail!
Once this was suggested, even test delivery mails started working.
At this point I was running in continuous loop. Ask ARCHITECT.md agent to create new PR in draft for developer and lay out implementation plan. Then ask DEVELOPER.md to work on the PR and after done ask QA.md and ARCHITECT.md for review. This can eat your tokens very fast, but given the constraints provided team mostly provided okay results. Main problems were consistency (secrets naming for example) and occasional drop of important stuff (EPUB had all the text, but no cover image, even though this requirement exists).
Day 7 - Finishing touches
I've spent all my tokens for this week, so stay tuned for next week and I am not sure about the time dotation I can give this in coming days, in reasonable time I will finish V1 hopefully and also make the repo public so can look at it yourself.
My takeaways
This was a great week. Not only I had a chance to enjoy my vacation properly, I also had a chance to have this journey. Where are we though with this "AI software revolution" from my perspective?
On the scale of
snake oil,
stochastic parrots and AGI revolution I pick stochastic parrots as closest high level. Don't get me wrong, if you treat this as an automation tool and properly set expectations and understand its limits it can help you gain leverage as any other automation. Same as any other engineering decision just be mindful of the tradeoffs.
For me the biggest risk I see long term is "vegetables" vs "sweets" on the cognitive level. Even I myself have started this journey pulling most cognitive weight and being mentored, but over time as I wanted to see how can agents handle this themselves I was at the same time deceived by how much weight could agents take away from my shoulders.
There are highs (such as agents reacting to "don't assume, try it out") and there are lows (three agents failing to check for E2E tests failing and approving a PR) it's good to try out limits and design approach around them.
Where I am really curious is to see how this repo will change over time and how helpful agents will be during the real tough part of the software development lifecycle, which are maintenance and evolution! At the same time I understand this is a new tool and requires a different thinking from my end as well, so I will keep learning what works the best too.
In the meantime, I will enjoy reading the last .týždeň issue from Friday!