Architecture Case Study

Aegis AI: Clinical Decision Support

Same clinical AI agent, two completely different architectures. I built both — edge-optimized Next.js and stateful Ruby on Rails — and ran them against the same bedside scenarios to find out what actually matters.

View Monorepo on GitHub

The Engineering Story

Aegis started as a production Next.js clinical AI agent — a streaming, edge-optimized platform designed for real-time bedside decision support. It featured a multi-model fallback “waterfall” (Gemini 2.5 Flash → Flash Lite → Gemma 3), RAG via Supabase pgvector, and live FHIR R4 data ingestion.

That raised a question I couldn't let go of: “What happens when speed-to-screen isn't the priority — when you need long-term data integrity instead?”

I re-architected the entire system as a Ruby on Rails application — not a surface-level port, but a genuine rethink of how clinical data should flow through the stack. The Rails version replaces edge functions with server-side service objects, swaps Supabase for a persistent PostgreSQL schema optimized for relational clinical mapping, and introduces background workers (ActiveJob/Sidekiq) for asynchronous FHIR synchronization and a persistent audit trail.

Now I have two fully deployed apps running the same clinical scenarios — a direct comparison of how these architectures behave under identical conditions. From first commit to deployed Rails app, the re-architecture took a single weekend.

Architecture Comparison

Edge-Latency vs. Stateful Persistence

Implementation A — Original

Aegis AI — Next.js 14

Edge-Latency Optimized

  • Framework: Next.js 14 (App Router)
  • Deployment: Vercel Edge Functions
  • Vector Store: Supabase pgvector + ANN search
  • State Pattern: Micro-hook composition (useClinicalWorkbench)
  • AI Streaming: Real-time SSE with sub-2s TTFT
  • Rate Limiting: Upstash Redis (sliding-window) + in-memory token bucket fallback
  • Validation: Zod schema on all API boundaries

Best for: Real-time bedside analysis and mobile-first clinical interfaces.

Next.jsTypeScriptSupabaseVercelZodTailwind

Implementation B — Re-Architected

Aegis on Rails — Ruby on Rails 7

Stateful Persistence

  • Framework: Ruby on Rails 7 (API-First)
  • Deployment: Fly.io (Docker)
  • Database: PostgreSQL (relational clinical mapping)
  • State Pattern: Server-side service objects
  • Background Jobs: ActiveJob / Sidekiq for FHIR sync
  • FHIR Sync: Asynchronous data ingestion & flattening into relational tables
  • Audit Trail: Persistent insight-to-source mapping with foreign key references

Best for: Comprehensive patient history tracking and retrospective clinical auditing.

Rails 7RubyPostgreSQLSidekiqFly.ioDocker

Performance Benchmark

Measured across identical clinical scenarios

5

Clinical Scenarios

20

Total Runs

100%

Success Rate

3

Model Tiers Tested

Clinical ScenarioNext.js LatencyRails LatencyRails CitationsWinner
Sepsis — Elevated Lactate9.5s
8.2s
14 avgRails
ARDS — Severe Hypoxemia9.5s
11.7s
14 avgNext.js
AKI — Post-Sepsis1.3s
1.8s
15.5 avgNext.js
Cardiogenic Shock — Post-MI1.2s
1.3s
9.5 avgNext.js
Post-Op Delirium2.6s
3.2s
23.5 avgNext.js

Inference Cost (10 runs)

$0.000070

Next.js Total

$0.000087

Rails Total

Both platforms operate at sub-cent cost per clinical analysis. Rails runs ~24% higher due to richer citation generation.

Model Waterfall Coverage

  • Gemini 2.5 Flash — Primary reasoning (both)
  • Flash Lite — Speed specialist (both)
  • Gemma 3 (4B) — Privacy fallback (Next.js only)

Next.js exercised all 3 tiers during testing. Rails triggered 2 of 3 — the Gemma fallback was not needed.

Evidence Attribution: A Key Differentiator

The Rails implementation generated 9.5 to 23.5 evidence citations per scenario, mapping each AI claim to specific FHIR resource IDs in the patient record. The Next.js version reported 0 inline citations in this benchmark run — its XAI attribution operates through a separate source-tracing panel rather than inline tags. This reflects a fundamental architectural divergence: Rails prioritizes persistent auditability, while Next.js prioritizes streaming speed with deferred attribution.

Field Observation: Perceived UX vs. Raw Metrics

The benchmark data tells one story — Next.js wins 4 of 5 scenarios on raw latency. But hands-on testing revealed something more nuanced. The Rails application's initial page load felt perceptibly faster (the VM serves a fully server-rendered page in one trip), and it won the Sepsis scenario outright at 8.2s vs 9.5s. Meanwhile, the Next.js streaming SSE architecture pushes tokens to the screen as they generate, creating a “typewriter” effect that gives the clinician immediate perceived feedback even before the full response completes.

The takeaway: raw latency metrics don't capture the full clinical UX picture. Streaming vs. buffered delivery, citation density, and geographic proximity to edge nodes vs. VM regions all shape how “fast” a system feels to the clinician at the bedside.

Shared Technical Pillars

Clinical Reasoning Engine

ICU/ER nursing logic encoded into a multi-tier model fallback chain (Gemini 2.5 Flash → Flash Lite → Gemma 3). If one model hits a rate limit or goes down, the next picks up — no manual intervention, no downtime.

FHIR Normalization

Custom mapping layers that flatten deeply nested FHIR R4 JSON into clean, token-efficient schemas that LLMs can actually reason over. Connects directly to HAPI R4 servers for Patient, Observation, and Condition resources.

Explainable AI (XAI)

Every AI claim maps back to a specific resource in the patient record. If the AI says something, you can trace exactly where it got that — non-negotiable in clinical software.

System Resilience

Layered rate limiting, schema validation on every API boundary, and structured JSON error logging. When something breaks, the logs tell you exactly where and why.

Recruiter & Interview Kit

Designed for rapid evaluation

  • One-Click Demos: Pre-loaded clinical scenarios (Sepsis, CHF, Delirium) across both platforms.
  • Guided Walkthroughs: Step-by-step overlays to demonstrate how AI maps raw FHIR data into clinical insights.
  • LLMOps Dashboard: Real-time visibility into inference costs ($0.0000078 avg), TTFT, and semantic consistency metrics.

Explore the Full Repository

The Aegis monorepo contains both implementations as Git submodules, with the complete benchmarking strategy and deployment analysis.