← Writing ·

What I Built in 35 Days with Claude Code

One person, a Raspberry Pi, and AI collaboration. 36 containers, 1,300 commits, 1,593 tests. The architecture, the methodology, and the honest accounting.

infrastructure ai claude-code architecture

On February 23, 2026, I started building. By March 31, I had a production infrastructure system running 36 Docker containers on a Raspberry Pi 5, backed by a dedicated GPU node, with a semantic memory server, voice pipeline, multi-agent orchestration framework, and a 19-page React dashboard with SSO. Solo. No team. No company. Just me, a $80 single-board computer, and Claude Code.

This is the honest accounting of what that looks like: what was built, how it was built, what actually works, and where the methodology breaks down.

The starting point

I’m self-taught. No CS degree, no bootcamp, no FAANG internship. Before chris-os, I had surface-level infrastructure knowledge from years of self-directed system administration, enough to be dangerous but not enough to be confident. I knew Docker. I could write Python. I had strong opinions about data ownership. That was about it.

The catalyst was personal: I wanted a single system that unified my fragmented digital life (messages across 6 platforms, health records in silos, calendar events with no automation, no unified view of anything) and I wanted to own all of it. Not in the cloud. Not on someone else’s platform. Mine.

The deeper motivation was that I needed to build something substantial enough to prove (to myself and to potential employers) that I could do real engineering work. Disability income ends on my 37th birthday. The clock is not metaphorical.

What got built

The system is called chris-os. It is a monorepo containing every piece of infrastructure for a personal digital operating system.

Compute layer. A Raspberry Pi 5 (8GB) runs 36 Docker containers in production across 5 isolated networks. A separate AMD Radeon 7900 XTX GPU node (24GB VRAM) handles inference workloads. CPU limits on every container to prevent resource starvation.

Data layer. PostgreSQL 16 with 5 database roles enforcing least-privilege access. 227 sequential migrations tracked via a schema_migrations table. Three-tier data classification (Sacred, Sensitive, Convenience) enforced at the schema and role level, not just in application code. 3.7 million health records, 446,000 messages across 6 platforms, contacts, calendar, location, finances. All in one database with proper access control.

Automation layer. 76 n8n workflows handling email classification, calendar sync, health monitoring, voice pipeline routing, system state collection, and more. Gmail push notifications via Google Pub/Sub. Automated watch renewal. Every workflow that touches the database uses parameterized queries.

AI orchestration. A custom multi-agent framework called the Director Model: one coordinating session dispatches parallel worker agents, each with a declared scope, a tier (builder, observer, or director), and a mandatory result contract. A 5-perspective “Council” for major decisions with weighted scoring. Context budget management that prevents agents from consuming resources without delivering value.

Voice pipeline. End-to-end: custom wake word detection on a hardware satellite device, GPU-accelerated Whisper transcription, n8n-routed inference (Ollama or Claude), custom TTS with GLaDOS-style voice synthesis, and Sonos playback. Wake word to spoken response in under 2 seconds. All inference runs locally.

Semantic memory. A pgvector-backed memory server with three-tier caching (JavaScript Map, Redis, PostgreSQL), knowledge layers (episodic with 7-day decay vs. permanent knowledge), entity relations, and embedding search via a locally-hosted Qwen3 model. This is how the system remembers context across sessions and agents.

Dashboard. A React SPA with 19 pages and 27 widgets, SSE for live data streaming, write-back capabilities, and PWA support including push notifications. A Fastify API backend with 1,038 tests. Authelia SSO with OIDC clients for Home Assistant and a native iOS app. Live at ataraxis.cloud (auth-gated).

Edge and auth. Caddy as reverse proxy with Authelia SSO. Cloudflare Workers for OAuth 2.1 on the MCP proxy layer. Cloudflare Pages for static sites. R2 for encrypted offsite backups. Custom MCP proxy infrastructure that exposes database, n8n, memory, and Home Assistant to AI tools via the Model Context Protocol with dual authentication paths.

Home automation. Home Assistant running on the Pi with 65+ dashboard cards, presence detection via iCloud3, Sonos integration, Ecobee thermostat, Dyson air purifier, smart lighting. Location pipeline feeds events through n8n into the database.

The numbers

Some of these are vanity metrics. Some of them are not.

  • 1,300+ commits in roughly 35 days of active development
  • 1,593 automated tests across API (1,038), web (336), Python (219)
  • 227 database migrations, all tracked and reversible
  • 76 n8n workflows (70 active, 6 inactive)
  • 36 containers in production, 5 isolated Docker networks
  • 3.7 million health records ingested
  • 446,000 messages across 6 platforms
  • 27,500x query optimization on one pipeline (55 seconds to 2 milliseconds)
  • 234 system state properties with automated drift detection

The commit velocity is the one that requires context. 1,300 commits in 35 days is roughly 37 per day. That sounds unreasonable, and it would be for a solo human developer. It is not unreasonable when you are coordinating multiple AI agents that each produce focused, scoped work with conventional commit messages. The methodology is doing the heavy lifting, not superhuman typing speed.

How the AI collaboration actually works

This is the part that matters most, because “I used AI to help me code” is a spectrum that ranges from “I asked ChatGPT to write a function” to something substantially more involved. What I built sits at the far end of that spectrum.

The Director Model. I run a coordinating session (the “Director”) that can dispatch parallel worker agents into separate git worktrees. Each agent gets a prompt with specific instructions, a declared scope (which files and directories it can touch), and a required result contract format. The Director doesn’t write code. It coordinates, makes architectural decisions, and verifies the output.

Agent tiers. Worker agents are classified as Builders (can edit, write, run commands within their scope), Observers (read-only, for research and verification), or Directors (full access, rare). A hook system enforces scope boundaries. This prevents agents from making changes outside their mandate.

Result contracts. Every dispatched agent must produce a structured result block: status (DONE, PARTIAL, BLOCKED, FAILED), files changed, commits made, tests run, deviations from plan, and follow-up work identified. The Director reviews these before moving on. This is not optional. Agents that do not produce contracts get flagged.

Context budget management. AI sessions have finite context windows. The system tracks usage with colored zones (green, yellow, orange, red) and adjusts behavior at each threshold. In yellow, no new agents get dispatched. In orange, the session begins wrapping up and storing breadcrumbs for the next session. In red, it is an emergency handoff. This prevents the most expensive failure mode in AI collaboration: an agent that consumes context without producing useful output.

The Council. For major architectural decisions, a 5-perspective deliberation process runs: Pragmatist, Visionary, Devil’s Advocate, User Experience, and Technical Debt. Each perspective evaluates independently, assigns a weighted score, and the results are synthesized into a recommendation. This sounds like overkill. It has prevented several bad decisions that I would have made on momentum alone.

What this is not. It is not autonomous AI. I make every significant decision. The AI does not have unsupervised access to production systems. It does not deploy code without my review. The value is in the force multiplication: I think about architecture and priorities while agents handle the implementation within well-defined boundaries.

What actually works well

The migration system. 227 migrations, every one numbered, tracked, and idempotent. A custom runner checks the schema_migrations table and skips anything already applied. This is not novel (it is how Flyway works), but implementing it from scratch means I understand every detail of how my schema evolves. Zero data loss incidents.

The privacy tier model. Three tiers with real enforcement. Sacred data (therapy sessions, clinical records) lives in an isolated schema that only the application role can access. The MCP proxy role, the readonly role, and the remote read-write role all have zero visibility into that schema. This is not policy. It is database-level access control. The same pattern maps directly to enterprise data classification requirements.

The automation layer. The n8n workflows are genuinely useful. Email classification using Haiku saves me time every day. Calendar sync works. Health monitoring catches anomalies. The system state collector tracks 234 properties and flags drift automatically. These are not demos. They are operational.

The test suite. 1,593 tests that run in CI on every push. The API test suite alone covers 1,038 scenarios. When I break something (and I do), the tests catch it before it reaches production. This was not always the case; the decision to invest in tests early was one of the best architectural choices in the project.

What does not work well (yet)

Honesty requires this section.

Voice pipeline reliability. It works end-to-end, but the wake word detection has false positive issues, and the TTS voice quality is inconsistent. The custom GLaDOS voice is fun but not production-grade for daily use. I am experimenting with fine-tuned models (Chatterbox Turbo) but have not landed on a solution.

The Pi as production hardware. 36 containers on 8GB of RAM with CPU limits works, but it is tight. Memory pressure is real. The system benefits from the constraint (it forces efficient resource usage), but a commercial deployment would want more headroom. The architecture is designed to be portable to any Docker host, and a cloud migration path exists.

Monitoring and alerting. The System State Registry tracks drift, but the alerting is minimal. There is no PagerDuty, no structured on-call, no SLA. For personal infrastructure this is acceptable. For anything else, it would need real observability: metrics, structured logging, centralized alerting.

Documentation debt. I have design documents, post-mortems, and ADRs for major decisions. I also have large sections of the codebase where the documentation is the code itself. This is fine for a solo developer who wrote all of it. It would be a problem for anyone else trying to understand the system.

What this proves about capability

The question I get asked (and ask myself) is: does building a personal infrastructure project translate to professional capability?

The patterns are transferable. The migration system works the same way whether the database has one user or a million. The privacy tier model maps to enterprise data classification. The CI pipeline, the test suite, the conventional commit history, the Docker networking, the reverse proxy configuration, the OAuth implementation: these are the same skills that infrastructure teams use daily.

What the project demonstrates specifically:

Full vertical ownership. I can design a database schema, write the migrations, build the API, implement the frontend, configure the reverse proxy, set up SSO, deploy to edge infrastructure, and write the automation pipelines. Not “familiar with” each layer. Built and operated each layer.

AI integration that goes beyond prompting. The Director Model is a real orchestration system with scope enforcement, result validation, and resource management. Understanding AI tooling well enough to build infrastructure on top of it is a different skill from knowing how to use ChatGPT.

Production mindset. Tests, migrations, access control, backup strategy, CPU limits, encrypted offsite storage. These are not afterthoughts. They were architectural priorities from the beginning, because the system is designed to survive without active maintenance.

The methodology, distilled

If there is one takeaway from this project, it is that human-AI collaboration at this level requires methodology, not just tools. The tools are available to anyone. What produces results is the framework around them:

  1. Scope everything. Never let an agent operate without boundaries. Unbounded agents produce unbounded mess.
  2. Demand contracts. Every unit of work produces a structured result. No exceptions. This is how you maintain quality at velocity.
  3. Track context. AI sessions are finite resources. Manage them like you would manage any other limited resource: with budgets, thresholds, and escalation procedures.
  4. Test first, not test later. The test suite is the cheapest insurance against velocity-induced regression. Write tests before you trust AI-generated code.
  5. Document decisions, not just code. You will not remember why you made a choice three weeks ago. The AI will not remember either. Write it down.

None of this is revolutionary. It is basic engineering discipline applied to a new category of tooling. The hard part is not knowing the principles. It is maintaining them at 37 commits per day.

What comes next

chris-os is not finished. The voice pipeline needs reliability work. The Cornerstone Discovery Engine (the product thesis: personal data as self-understanding) is architecturally possible but not yet built. The monitoring story needs real observability. And I need to turn this body of work into a career.

If you are building similar systems, or if you need someone who can, I would like to hear from you. The work is the resume. The architecture is the portfolio. Get in touch.