Architecture Case Study
Aegis AI: Clinical Decision Support
Same clinical AI agent, two completely different architectures. I built both — edge-optimized Next.js and stateful Ruby on Rails — and ran them against the same bedside scenarios to find out what actually matters.
View Monorepo on GitHubThe Engineering Story
Aegis started as a production Next.js clinical AI agent — a streaming, edge-optimized platform designed for real-time bedside decision support. It featured a multi-model fallback “waterfall” (Gemini 2.5 Flash → Flash Lite → Gemma 3), RAG via Supabase pgvector, and live FHIR R4 data ingestion.
That raised a question I couldn't let go of: “What happens when speed-to-screen isn't the priority — when you need long-term data integrity instead?”
I re-architected the entire system as a Ruby on Rails application — not a surface-level port, but a genuine rethink of how clinical data should flow through the stack. The Rails version replaces edge functions with server-side service objects, swaps Supabase for a persistent PostgreSQL schema optimized for relational clinical mapping, and introduces background workers (ActiveJob/Sidekiq) for asynchronous FHIR synchronization and a persistent audit trail.
Now I have two fully deployed apps running the same clinical scenarios — a direct comparison of how these architectures behave under identical conditions. From first commit to deployed Rails app, the re-architecture took a single weekend.
Architecture Comparison
Edge-Latency vs. Stateful Persistence
Implementation A — Original
Aegis AI — Next.js 14
Edge-Latency Optimized
- Framework: Next.js 14 (App Router)
- Deployment: Vercel Edge Functions
- Vector Store: Supabase pgvector + ANN search
- State Pattern: Micro-hook composition (useClinicalWorkbench)
- AI Streaming: Real-time SSE with sub-2s TTFT
- Rate Limiting: Upstash Redis (sliding-window) + in-memory token bucket fallback
- Validation: Zod schema on all API boundaries
Best for: Real-time bedside analysis and mobile-first clinical interfaces.
Implementation B — Re-Architected
Aegis on Rails — Ruby on Rails 7
Stateful Persistence
- Framework: Ruby on Rails 7 (API-First)
- Deployment: Fly.io (Docker)
- Database: PostgreSQL (relational clinical mapping)
- State Pattern: Server-side service objects
- Background Jobs: ActiveJob / Sidekiq for FHIR sync
- FHIR Sync: Asynchronous data ingestion & flattening into relational tables
- Audit Trail: Persistent insight-to-source mapping with foreign key references
Best for: Comprehensive patient history tracking and retrospective clinical auditing.
Performance Benchmark
Measured across identical clinical scenarios
5
Clinical Scenarios
20
Total Runs
100%
Success Rate
3
Model Tiers Tested
| Clinical Scenario | Next.js Latency | Rails Latency | Rails Citations | Winner |
|---|---|---|---|---|
| Sepsis — Elevated Lactate | 9.5s | 8.2s | 14 avg | Rails |
| ARDS — Severe Hypoxemia | 9.5s | 11.7s | 14 avg | Next.js |
| AKI — Post-Sepsis | 1.3s | 1.8s | 15.5 avg | Next.js |
| Cardiogenic Shock — Post-MI | 1.2s | 1.3s | 9.5 avg | Next.js |
| Post-Op Delirium | 2.6s | 3.2s | 23.5 avg | Next.js |
Inference Cost (10 runs)
$0.000070
Next.js Total
$0.000087
Rails Total
Both platforms operate at sub-cent cost per clinical analysis. Rails runs ~24% higher due to richer citation generation.
Model Waterfall Coverage
- Gemini 2.5 Flash — Primary reasoning (both)
- Flash Lite — Speed specialist (both)
- Gemma 3 (4B) — Privacy fallback (Next.js only)
Next.js exercised all 3 tiers during testing. Rails triggered 2 of 3 — the Gemma fallback was not needed.
Evidence Attribution: A Key Differentiator
The Rails implementation generated 9.5 to 23.5 evidence citations per scenario, mapping each AI claim to specific FHIR resource IDs in the patient record. The Next.js version reported 0 inline citations in this benchmark run — its XAI attribution operates through a separate source-tracing panel rather than inline tags. This reflects a fundamental architectural divergence: Rails prioritizes persistent auditability, while Next.js prioritizes streaming speed with deferred attribution.
Field Observation: Perceived UX vs. Raw Metrics
The benchmark data tells one story — Next.js wins 4 of 5 scenarios on raw latency. But hands-on testing revealed something more nuanced. The Rails application's initial page load felt perceptibly faster (the VM serves a fully server-rendered page in one trip), and it won the Sepsis scenario outright at 8.2s vs 9.5s. Meanwhile, the Next.js streaming SSE architecture pushes tokens to the screen as they generate, creating a “typewriter” effect that gives the clinician immediate perceived feedback even before the full response completes.
The takeaway: raw latency metrics don't capture the full clinical UX picture. Streaming vs. buffered delivery, citation density, and geographic proximity to edge nodes vs. VM regions all shape how “fast” a system feels to the clinician at the bedside.
Shared Technical Pillars
Clinical Reasoning Engine
ICU/ER nursing logic encoded into a multi-tier model fallback chain (Gemini 2.5 Flash → Flash Lite → Gemma 3). If one model hits a rate limit or goes down, the next picks up — no manual intervention, no downtime.
FHIR Normalization
Custom mapping layers that flatten deeply nested FHIR R4 JSON into clean, token-efficient schemas that LLMs can actually reason over. Connects directly to HAPI R4 servers for Patient, Observation, and Condition resources.
Explainable AI (XAI)
Every AI claim maps back to a specific resource in the patient record. If the AI says something, you can trace exactly where it got that — non-negotiable in clinical software.
System Resilience
Layered rate limiting, schema validation on every API boundary, and structured JSON error logging. When something breaks, the logs tell you exactly where and why.
Recruiter & Interview Kit
Designed for rapid evaluation
- One-Click Demos: Pre-loaded clinical scenarios (Sepsis, CHF, Delirium) across both platforms.
- Guided Walkthroughs: Step-by-step overlays to demonstrate how AI maps raw FHIR data into clinical insights.
- LLMOps Dashboard: Real-time visibility into inference costs ($0.0000078 avg), TTFT, and semantic consistency metrics.
Explore the Full Repository
The Aegis monorepo contains both implementations as Git submodules, with the complete benchmarking strategy and deployment analysis.