Architecture Case Study

Aegis AI: Clinical Decision Support

Same clinical AI agent, two completely different architectures. I built both — edge-optimized Next.js and stateful Ruby on Rails — and ran them against the same bedside scenarios to find out what actually matters.

View Monorepo on GitHub

The Engineering Story

Aegis started as a production Next.js clinical AI agent — a streaming, edge-optimized platform designed for real-time bedside decision support. It featured a multi-model fallback “waterfall” (Gemini 2.5 Flash → Flash Lite → Gemma 3), RAG via Supabase pgvector, and live FHIR R4 data ingestion.

That raised a question I couldn't let go of: “What happens when speed-to-screen isn't the priority — when you need long-term data integrity instead?”

I re-architected the entire system as a Ruby on Rails application — not a surface-level port, but a genuine rethink of how clinical data should flow through the stack. The Rails version replaces edge functions with server-side service objects, swaps Supabase for a persistent PostgreSQL schema optimized for relational clinical mapping, and introduces background workers (ActiveJob/Sidekiq) for asynchronous FHIR synchronization and a persistent audit trail.

Now I have two fully deployed apps running the same clinical scenarios — a direct comparison of how these architectures behave under identical conditions. From first commit to deployed Rails app, the re-architecture took a single weekend.

Architecture Comparison

Edge-Latency vs. Stateful Persistence

Implementation A — Original

Aegis AI — Next.js 14

Edge-Latency Optimized

Framework: Next.js 14 (App Router)
Deployment: Vercel Edge Functions
Vector Store: Supabase pgvector + ANN search
State Pattern: Micro-hook composition (useClinicalWorkbench)
AI Streaming: Real-time SSE with sub-2s TTFT
Rate Limiting: Upstash Redis (sliding-window) + in-memory token bucket fallback
Validation: Zod schema on all API boundaries

Best for: Real-time bedside analysis and mobile-first clinical interfaces.

Next.jsTypeScriptSupabaseVercelZodTailwind

Live Demo →Source Code

Implementation B — Re-Architected

Aegis on Rails — Ruby on Rails 7

Stateful Persistence

Framework: Ruby on Rails 7 (API-First)
Deployment: Fly.io (Docker)
Database: PostgreSQL (relational clinical mapping)
State Pattern: Server-side service objects
Background Jobs: ActiveJob / Sidekiq for FHIR sync
FHIR Sync: Asynchronous data ingestion & flattening into relational tables
Audit Trail: Persistent insight-to-source mapping with foreign key references

Best for: Comprehensive patient history tracking and retrospective clinical auditing.

Rails 7RubyPostgreSQLSidekiqFly.ioDocker

Live Demo →Source Code

Performance Benchmark

Measured across identical clinical scenarios

Clinical Scenarios

Total Runs

100%

Success Rate

Model Tiers Tested

Clinical Scenario	Next.js Latency	Rails Latency	Rails Citations	Winner
Sepsis — Elevated Lactate	9.5s	8.2s	14 avg	Rails
ARDS — Severe Hypoxemia	9.5s	11.7s	14 avg	Next.js
AKI — Post-Sepsis	1.3s	1.8s	15.5 avg	Next.js
Cardiogenic Shock — Post-MI	1.2s	1.3s	9.5 avg	Next.js
Post-Op Delirium	2.6s	3.2s	23.5 avg	Next.js

Inference Cost (10 runs)

$0.000070

Next.js Total

$0.000087

Rails Total

Both platforms operate at sub-cent cost per clinical analysis. Rails runs ~24% higher due to richer citation generation.

Model Waterfall Coverage

Gemini 2.5 Flash — Primary reasoning (both)
Flash Lite — Speed specialist (both)
Gemma 3 (4B) — Privacy fallback (Next.js only)

Next.js exercised all 3 tiers during testing. Rails triggered 2 of 3 — the Gemma fallback was not needed.

Evidence Attribution: A Key Differentiator

The Rails implementation generated 9.5 to 23.5 evidence citations per scenario, mapping each AI claim to specific FHIR resource IDs in the patient record. The Next.js version reported 0 inline citations in this benchmark run — its XAI attribution operates through a separate source-tracing panel rather than inline tags. This reflects a fundamental architectural divergence: Rails prioritizes persistent auditability, while Next.js prioritizes streaming speed with deferred attribution.

Field Observation: Perceived UX vs. Raw Metrics

The benchmark data tells one story — Next.js wins 4 of 5 scenarios on raw latency. But hands-on testing revealed something more nuanced. The Rails application's initial page load felt perceptibly faster (the VM serves a fully server-rendered page in one trip), and it won the Sepsis scenario outright at 8.2s vs 9.5s. Meanwhile, the Next.js streaming SSE architecture pushes tokens to the screen as they generate, creating a “typewriter” effect that gives the clinician immediate perceived feedback even before the full response completes.

The takeaway: raw latency metrics don't capture the full clinical UX picture. Streaming vs. buffered delivery, citation density, and geographic proximity to edge nodes vs. VM regions all shape how “fast” a system feels to the clinician at the bedside.

Shared Technical Pillars

Clinical Reasoning Engine

ICU/ER nursing logic encoded into a multi-tier model fallback chain (Gemini 2.5 Flash → Flash Lite → Gemma 3). If one model hits a rate limit or goes down, the next picks up — no manual intervention, no downtime.

FHIR Normalization

Custom mapping layers that flatten deeply nested FHIR R4 JSON into clean, token-efficient schemas that LLMs can actually reason over. Connects directly to HAPI R4 servers for Patient, Observation, and Condition resources.

Explainable AI (XAI)

Every AI claim maps back to a specific resource in the patient record. If the AI says something, you can trace exactly where it got that — non-negotiable in clinical software.

System Resilience

Layered rate limiting, schema validation on every API boundary, and structured JSON error logging. When something breaks, the logs tell you exactly where and why.

Recruiter & Interview Kit

Designed for rapid evaluation

One-Click Demos: Pre-loaded clinical scenarios (Sepsis, CHF, Delirium) across both platforms.
Guided Walkthroughs: Step-by-step overlays to demonstrate how AI maps raw FHIR data into clinical insights.
LLMOps Dashboard: Real-time visibility into inference costs ($0.0000078 avg), TTFT, and semantic consistency metrics.

Explore the Full Repository

The Aegis monorepo contains both implementations as Git submodules, with the complete benchmarking strategy and deployment analysis.

View on GitHub →← Back to Projects