10 weeks from notebook to production EKS. A multi-agent AI crisis simulator built with LangGraph and Terraform for $0.02 per game. No hype, just engineering.

From Notebook to Production EKS in 10 Weeks

From a Jupyter notebook to production EKS in exactly 10 weeks. In Ransom Rampage, three AI agents—a CISO, an SRE, and an autonomous hacker—clash over a fintech startup you generate with a single prompt. It costs just $0.02 per game session, proving that multi-agent AI doesn’t have to be an expensive paperweight. Play it live at ransomrampage.com.

Week 1: Signals Over Slides

I needed a portfolio piece for Scalefine that proved end-to-end delivery. Static dashboards are often just expensive paperweights, so I built a game to force a more complex architecture. Cybersecurity is the perfect theme—DORA regulations and ransomware are top-of-mind for every CTO. By combining multi-agent AI, a deterministic engine, and hardened EKS infrastructure, I created a project that sends three high-value signals at once.

Week 2: The Knowledge Layer — RAG Without the Overhead

The foundation consists of five FAISS corpora totaling 70 indexed chunks, backed by 7 corpus files covering distinct domains. I used metadata filtering (agent=ciso|sre|hacker|fintech|techno) to ensure roles stay distinct — the Hacker shouldn’t be quoting SRE best practices.

Corpus	Agent	Content
threat_intel	CISO	MITRE ATT&CK techniques + defense playbooks
sre_patterns	SRE	Infrastructure resilience + optimization patterns
hacking_kb	Hacker	Offensive techniques and attack vectors
fintech_archetypes	Entity Gen	Fintech company templates + revenue models
technos	Entity Gen	Fictional technology profiles (generated via Grok)

I chose FAISS over Pinecone because it costs $0 and simplifies local dev. For embeddings, I swapped BGE-M3 for BGE-small — 17x smaller (568M → 33M parameters) but delivering the same retrieval quality for domain-specific content. No extra subscription on the bill.

Weeks 3–4: Three Agents, One State, Zero Group Chat

I used LangGraph to build three agents that perform independent state analysis. Instead of a messy group chat, each agent takes the enriched game state, searches its own knowledge base, and returns a structured recommendation. Two agents advise you (CISO + SRE). One attacks you (the Hacker). Here’s how they differ:

	CISO	SRE	Hacker (Byte)
Goal	Protect revenue, stop attacker	System stability, cost efficiency	Compromise, encrypt, exfiltrate
Actions	S1-S6 (Scan, Zero Trust, Honeypot, SOC…)	E1-E6 (Optimize, Restore, Failover, Observability…)	B1-B7 (Compromise, Encrypt, Exfiltrate, DDoS…)
RAG filter	agent=ciso	agent=sre	agent=hacker
Personality	Risk-averse, crisis-first priority	Cost-conscious, rotates targets	Defined by threat profile (10 archetypes)
Mutation direction	Defensive (defense↑, compromised→false)	Restorative (throughput↑, offline→false)	Offensive (compromised→true, locked→true)

Each agent follows the same LangGraph pipeline:

gateway_cache_node → [cache hit?]
    ├─ YES → return cached recommendation (skip LLM)
    └─ NO  → agent_node → generate_recommendation → update_cache_node

Cost control was non-negotiable, so I implemented semantic caching with a 0.9999 cosine threshold. The gateway purifies the game state into a deterministic cache key, then searches FAISS for a semantically identical prior response. When enabled, this is designed to achieve 60%+ cache hit rates after the first few turns — identical game situations don’t pay for a second LLM call. The final output is a structured Pydantic AgentRecommendation, mapping qualitative AI reasoning into concrete node mutations with action IDs, costs, and revenue impact.

Week 5: The Engine — Deterministic Where It Counts

Game outcomes are fully deterministic — no dice rolls on anything that affects win/lose conditions. If your defense is under 6 and Byte attacks, you are breached. Randomness is limited to cosmetic elements: fog-of-war placement and flavor text selection.

One design choice worth highlighting: the hacker’s action is queued this turn but resolved next turn. This was a deliberate fairness decision — the player always sees their own action take effect before the hacker’s response lands. No instant surprises.

The full resolution order per turn:

tick → player action → resolve queued hacker → revenue recalc → regulator → queue next hacker → win/lose check

With 21 possible actions and a revenue model tied to infrastructure throughput (base_revenue × min(throughput) / 10), the math is transparent. You survive 6 to 20 turns depending on your startup’s economic tier — managing cash, compliance, and reputation under constant pressure. The takeaway: use the AI to analyze context and suggest strategies, but let deterministic Python handle the actual business consequences.

Week 6: First Playtest — Kevin From IT

I gave the game to one external tester. Nine feedbacks came back in 30 minutes. Three were critical.

The hacker felt unfair — entering the system too easily even with good defensive play. The evict action (C5) was broken. And scan results weren’t updating visually on the infrastructure graph. The Jira-style triage board that I’d spent time building? Nobody used it. The incident response log was what the player actually read.

23 fixes shipped in a single commit. Player actions now resolve before the hacker (was the opposite — felt punishing). The breach timer went from instant fine to a 3-turn grace period. The triage board got replaced by a Bloomberg-style scrolling news ticker. And 20 named threat personas were added — including Kevin from IT, who “thinks he’s a hacker because he watched Mr. Robot twice.”

That same week, I built the entity generation pipeline — a 4-node LangGraph that turns a text prompt into a complete company:

venture_architect → sre_infra → assembler → value_chain_enricher

Node	What it does	LLM?
Venture Architect	Generates company profile, sector, adversary persona. RAG on fintech corpus	Yes (GPT-4o-mini)
SRE Infra	Creates 4-12 node infrastructure graph, 7 node types, scored characteristics	Yes (GPT-4o-mini)
Assembler	Wires revenue flows, injects vulnerabilities, applies fog-of-war, validates constraints	No — pure Python
Value Chain Enricher	Adds business names, revenue exposure, risk categories to each node	No — pure Python

Describe “fractional real estate for people priced out” and the AI generates TradeBridge — a P2P company with 7 infrastructure nodes, 4 revenue streams, 3 hidden vulnerabilities, and Kevin already scanning your perimeter. The pipeline runs once at game start (~20 seconds, 2-3 LLM calls). After that, the 3 game agents take over every turn.

Weeks 7–9: From docker-compose to EKS

Most portfolio projects end at docker-compose up. This one didn’t.

Six Terraform modules manage the full AWS footprint: VPC with public and private subnets, EKS cluster, ECR image registries, Cognito user pool for Google SSO, SSM for secrets, and a data module (ElastiCache Redis — later commented out for cost). I implemented GitOps via ArgoCD: GitHub Actions builds the image, pushes to ECR, updates the image tag in the ArgoCD application manifest, and ArgoCD auto-syncs to the cluster. No manual kubectl apply in the deploy pipeline.

The CI pipeline runs ruff lints, pytest (engine tests only — no LLM calls needed), Docker build, ECR push, and Trivy security scan. Total: ~11 minutes. ArgoCD picks up the change and syncs in about a minute.

By offloading Google SSO to the ALB level with Cognito, I kept the application code 100% auth-free. Requests to /play and /api/* are blocked by the ALB before they even reach the pods. Zero lines of auth middleware.

Stack: 17 technologies. CI: ~11 min pipeline. CD: ~1 min ArgoCD sync.

The $0.02 Per Game

A full game session with 3 agents queried every turn costs $0.02 in LLM
tokens. The semantic cache is designed to catch redundant queries before
they hit OpenAI — identical game states for the same agent role get served
from FAISS, not from the API.

On the infra side, the initial deployment ran at ~$585/month. The cause:
EKS extended support pricing activated silently — $0.60/h instead of $0.10/h
for the control plane, a 6x multiplier with no Terraform warning and no AWS
alert. After pinning the Kubernetes version to 1.31 (standard support),
removing ElastiCache, and consolidating to a single t3.medium node, the
projected cost drops to ~$238/month including tax. Choosing Loki over ELK
cut log storage from ~2GB RAM to 256MB — same logs in Grafana, different
weight class.

When the cluster isn’t needed: terraform destroy. Monthly cost drops to
$0.50 (Route53 DNS only). Spin back up in 15 minutes with
make setup && terraform apply && make deploy.

	As deployed	Fixed
LLM per game	$0.02	$0.02
AWS always-on	~$585/mo (extended support bug)	~$238/mo (incl. tax)
AWS idle	$0.50/mo	$0.50/mo

Lessons in What Not to Build

Senior engineering is often about knowing what not to build. I originally planned ElastiCache Redis and ELK — the math didn’t hold up for a single-replica portfolio demo. BGE-M3 was 17x larger than what domain-specific retrieval actually needed. Two EKS nodes were provisioned when total memory usage was 3.8GB — well within a single t3.medium’s capacity.

Each cut was a conscious architecture decision, not corner-cutting. ElastiCache stays in Terraform (commented out, one line to re-enable for multi-replica production). The ELK-to-Loki swap preserves the same observability surface — logs appear in Grafana next to Prometheus metrics. The embeddings model change had zero impact on retrieval quality for curated, domain-specific corpora.

Next time, I’d add structured JSON logging and OpenTelemetry traces from day one. Production-grade doesn’t mean over-provisioned. It means every resource earns its cost.

What This Means For Your Project

This is how I deliver: prototype in a notebook, validate with a real user, ship to production with CI/CD and observability. Ransom Rampage went from an idea to 17 technologies on EKS in 10 weeks, solo.

If you recognize these challenges in your own stack — multi-agent AI, streaming pipelines, production Kubernetes — let’s talk. Available for missions Q2-Q3 2026.

Try It

→ Play: ransomrampage.com
→ Code: github.com/acourreg/ransom-rampage
→ Architecture + README: GitHub
→ Book a call: calendly.com/scalefine

Aurélien Courreges-Clercq — Freelance Data & AI Platform Architect. Streaming, GenAI Automations, backend modernization. scalefine.ai

Tags: #LangGraph #Kubernetes #EKS #Terraform #FastAPI #FAISS #ArgoCD #Prometheus #Grafana #MultiAgentAI #GenAI #RAG

Secrets Behind a $0.02 Multi-Agent AI Game — From Notebook to EKS

From Notebook to Production EKS in 10 Weeks

Week 1: Signals Over Slides

Week 2: The Knowledge Layer — RAG Without the Overhead

Weeks 3–4: Three Agents, One State, Zero Group Chat

Week 5: The Engine — Deterministic Where It Counts

Week 6: First Playtest — Kevin From IT

Weeks 7–9: From docker-compose to EKS

The $0.02 Per Game

Lessons in What Not to Build

What This Means For Your Project

Try It

By scalefine.com

Leave a Reply Cancel reply

You Missed

Secrets Behind a $0.02 Multi-Agent AI Game — From Notebook to EKS

Social Engineering Detection: The Brutal Truth About Static Security and The AI Cure

Streaming Analytics: Build Production Pipelines for Under $300/Month

Production Workflow: Proven Mindsets That Ship MVP to Market Fit

Secrets Behind a $0.02 Multi-Agent AI Game — From Notebook to EKS

From Notebook to Production EKS in 10 Weeks

Week 1: Signals Over Slides

Week 2: The Knowledge Layer — RAG Without the Overhead

Weeks 3–4: Three Agents, One State, Zero Group Chat

Week 5: The Engine — Deterministic Where It Counts

Week 6: First Playtest — Kevin From IT

Weeks 7–9: From docker-compose to EKS

The $0.02 Per Game

Lessons in What Not to Build

What This Means For Your Project

Try It

By scalefine.com

Related Post

Social Engineering Detection: The Brutal Truth About Static Security and The AI Cure

Leave a Reply Cancel reply

You Missed

Secrets Behind a $0.02 Multi-Agent AI Game — From Notebook to EKS

Social Engineering Detection: The Brutal Truth About Static Security and The AI Cure

Streaming Analytics: Build Production Pipelines for Under $300/Month

Production Workflow: Proven Mindsets That Ship MVP to Market Fit