Technical Portfolio — Rajdeep Gupta
AI, Engineering & Product | January – April 2026
#Executive Summary
Over the past 4-5 months, I've been building across 11 active repositories spanning real estate SaaS, medical education AI, autonomous trading, a personal AI operating system, a native macOS app, and a Chrome extension. The common thread: using AI not as a feature, but as core infrastructure — from RAG pipelines and multi-agent orchestration to ML signal generation and local model experimentation.
This document covers every project's goals, architecture, AI integration, current state, and future direction. The AI/ML sections go deep — chunking strategies, embedding models, retrieval mechanisms, agentic state machines, memory management, context retrieval, and ML training pipelines.
Key numbers:
- —11 repositories, 7 in production
- —32 Convex tables powering the AI agent system
- —4-tier memory architecture with 5-layer retrieval (Hermes-inspired)
- —98 ML features per coin in the trading pipeline with walk-forward validation
- —14-type semantic chunking in the RAG pipeline with conditional reranking
- —XGBoost → ONNX → TypeScript inference (6e-8 max probability diff)
- —6 AI model providers actively used (Anthropic, OpenAI, Google, local MLX, ONNX Runtime, DeepSeek)
- —Hetzner VPS running 7+ autonomous agents 24/7
- —Smart home integration controlling 15+ IoT devices via command queue
#Project Portfolio at a Glance
| # | Project | What It Does | Tech Stack | AI Usage | Status |
|---|---|---|---|---|---|
| 1 | 11sqft Platform | Real estate backend + admin | Next.js 15, Neon PostgreSQL | Cron analytics | Production |
| 2 | Broker OS | SaaS for real estate brokers | Next.js 15, Convex, Cloudflare R2 | 36-file Conversation OS, voice parsing | Active Dev |
| 3 | Broker Mobile | Mobile app for brokers | React Native 0.81, Expo 54 | — | Beta |
| 4 | Properties App | Public property discovery | Next.js 16, Mapbox, TanStack Query | — | Production |
| 5 | 11sqft AI | Scrapers + LLM services | Next.js 16, Python Flask, AI SDK v6 | LLM reasoning, web scraping | Active Dev |
| 6 | Academy | Educational content site | Next.js 16, MDX, next-intl | LLM content generation | Production |
| 7 | Entellect | ENT medical education + RAG | Next.js 16, Neon pgvector, Claude | Full RAG pipeline, content generation | Production |
| 8 | Raven | Autonomous crypto trading | Bun, TypeScript, CCXT, ONNX | XGBoost ML + multi-LLM veto | Active Dev |
| 9 | Jarvis | Personal AI second brain | Convex, Claude Code, Telegram | Multi-agent orchestration, memory system | Active Dev |
| 10 | Ticker | macOS menu bar calendar | Swift 5.9, SwiftUI | — | Production |
| 11 | LeadMapsHub | Google Maps lead scraper | Chrome Extension, Manifest V3 | — | Development |
Production URLs: 11sqft.com↗ | broker.11sqft.com↗ | jarvis.rajdeepgupta.in↗ | polymarket.rajdeepgupta.in↗ | gettickerapp.com↗
#Part 1: The 11sqft Real Estate Ecosystem
The Problem
Indian real estate brokerage is fundamentally broken. Brokers operate through WhatsApp groups, maintain listings in Excel sheets, and have zero digital presence. There's no "Shopify for brokers."
11sqft is a suite of 6 interconnected products solving this.
1.1 Platform (11sqft.com)
Backend API + admin dashboard. 3-layer architecture:
API Layer (public/v1 + admin/v1) — validation, auth, CORS, rate limiting
Service Layer (48+ classes) — business logic, cache invalidation
Repository Layer (30+ models) — SQL via postgres.js tagged templates
PostgreSQL (Neon serverless)
Stack: Next.js 15.5, React 19, TypeScript 5.5, MUI 6.2 + Tailwind, BullMQ + Redis, NextAuth 4.24, Mapbox GL, Sentry, Vercel with crons.
Tables: people, properties, property_groups, leads, favorites, builders, amenities, landmarks, addresses, feedback, region-profiles, media, cache.
1.2 Broker OS (broker.11sqft.com) — The Flagship
"Shopify for Real Estate Brokers" — the product I'm betting everything on.
The AI Story — Conversation OS:
A 36-file multi-turn dialog engine with:
- —27 intents with slot-filling state machine
- —Voice-first property entry — brokers speak into WhatsApp/Telegram, system parses and structures
- —Client relationship extraction from conversation logs
- —Weekly engagement digest — AI-generated broker activity summaries
WhatsApp/Telegram Message → AiSensy / Bot API
→ Intent Classification (27 intents)
→ Slot Filling State Machine
→ Action Execution (property creation, lead capture, CRM)
→ Response Generation → Channel Delivery
Stack: Next.js 15, Convex 1.17, Firebase Phone Auth, Cloudflare R2 + Images, OpenAI API, next-intl (3 languages), Vitest (2,529 tests) + Playwright (156 E2E tests).
Convex Schema (modular): brokerTables, propertyTables, networkingTables, conversationTables, brokerMemoryTables, referralTables, analyticsTables, complianceTables, adminTables, salesTables.
1.3-1.6 Other 11sqft Projects
- —Broker Mobile: React Native 0.81 + Expo 54. Camera/photo upload, deep linking. iOS + Android ready.
- —Properties App: Next.js 16.1, Mapbox GL, Framer Motion, TanStack Query v5. Consumer property search.
- —11sqft AI: Next.js 16.1 + Python Flask. 99acres scraper (BeautifulSoup), Vercel AI SDK v6.
- —Academy: Next.js 16.1 + MDX, next-intl v4.7, recharts, service worker. Custom LLM generation scripts.
Architecture: How They Connect
- —Broker OS uses Convex (real-time for messaging/notifications)
- —Platform uses PostgreSQL/Neon (relational for complex property search)
- —Separate databases by design — different access patterns
- —Shared domain (11sqft.com) with Cloudflare DNS
#Part 2: Entellect — AI-Powered Medical Education (DEEP DIVE)
The Problem
ENT postgraduate students study from 9+ textbooks (Dhingra, Scott-Brown, Cummings). No intelligent tool does cross-textbook retrieval, exam/clinical mode adaptation, spaced repetition, or generates practice MCQs from textbook content.
RAG Pipeline Architecture
User Query
→ Rule-Based Query Classifier (ZERO LLM cost)
├── Mode: Clinical vs Exam (keyword signals)
├── Complexity: Reasoning vs Simple ("why", "explain" triggers)
└── Topics: 40+ ENT keywords extracted
→ OpenAI Embedding (text-embedding-3-small, direct API)
→ Neon pgvector Cosine Similarity (top 20 chunks)
→ Topic Boosting (1.2x multiplier for matching topics)
→ Source Tiering (Tier 1: Indian textbooks for exam, Tier 2: international for clinical)
→ Deduplication (near-duplicate removal)
→ Conditional Reranking (GPT-4o-mini, ONLY for "reasoning" queries)
→ Generation (Claude Sonnet 4.6 via Vercel AI Gateway + OIDC)
→ Structured citations at end
Semantic Chunking Strategy
- —Max chunk size: ~1,200 tokens with 200-token overlap
- —Section-aware parsing: PDF → pdf-parse → heading-based segmentation → token estimation
- —14 content type classifications (detected per chunk via 30-50 keyword indicators each):
Definition | Anatomy | Etiology | Pathology | Clinical Features | Investigations | Differential Diagnosis | Treatment | Complications | Classification | Prognosis | Epidemiology | Surgical Procedures | Pharmacology
No LLM needed for classification — pure keyword matching. This saves significant cost on high-volume indexing.
PDF Extraction Experiments (3 methods compared)
| Method | Text Coverage | Accuracy | Cost | Speed | Verdict |
|---|---|---|---|---|---|
| Gemini Vision batch-35 | 0% (total failure) | 0% | $0.078 | 493s | FAILED |
| Gemini Vision batch-50 | 37% (cascading failure after page 56) | 25% chapter | $0.016 | 133s | FAILED |
| pdftotext (native) | 131% (overcoverage) | 0% structural | $0 | 0.8s (300x faster) | PASSED |
Key finding: Gemini Vision batch extraction hit a degradation wall after page 56 — API context window limits or cumulative token exhaustion. Native pdftotext extracts raw text reliably but loses structural metadata. Decision: Hybrid approach — pdftotext for text, selective LLM refinement for metadata.
Content Generation Pipeline
All generators use Claude Sonnet 4.6 with 20 chunks retrieved per topic:
| Content Type | Per Topic | Total (30 topics) | Key Details |
|---|---|---|---|
| MCQs | 5 | 150 | 30% easy / 40% medium / 30% hard, 6 trap types |
| Flashcards | 5 | 150 | 4 types: definition, concept, clinical, mnemonic |
| Notes | 1 | 30 | 9 required sections (Definition → Mnemonics) |
| Viva | 3 | 90 | question + model_answer + examiner_notes + common_mistakes |
| Total | 14 | 420 items per run |
MCQ Trap Engine (Phase B): 6 engineered trap types — conceptual_confusion, similar_options, outdated_concept, overthinking_trap, negative_framing, partial_knowledge. Each MCQ has structured explanation_v2 (JSONB): correct_reasoning + why_not_others for each option.
Database Schema Evolution
Phase A (Source Classification): Added source_type, weight (0.0-3.0), domains to documents. Dhingra = exam_standard, weight=1.0. Enables tier-based retrieval weighting.
Phase B (Content Upgrade, 8 new tables): media (images/audiograms/CTs), pyqs (previous year questions with year/exam/session), topic_weights (pyq_count, trend, yield_tier), cases + case_steps (branching clinical casebook with OSCE scoring), drug_interactions, session_logs, user_annotations.
PYQ Intelligence: 30 topics x 5 concepts = 150 base PYQ questions. Topic weights track pyq_count, pyq_last_5_years, trend (rising/stable/declining), yield_tier (high/medium/low). High-yield topics get more generated content.
Dual Database Architecture
- —
CONTENT_DATABASE_URL(Neon, us-east-1) → MCQs, flashcards, notes, viva, generation_log - —
DATABASE_URL(Neon, us-east-1) → Embeddings, chunks (pgvector), documents
Cost tracking: Every RAG generation logged with tokens_used and cost_usd (Sonnet: $3/M input, $15/M output).
Current State
Shipped: MCQ practice, mock exams (50/100/200 Q), flashcards (SM-2 spaced repetition), viva practice, notes, mistake bank, progress dashboard, RAG Q&A, topic mastery heatmap. 25 API routes.
In Progress: Phase B migration, clinical casebook framework, drug system.
#Part 3: Raven — ML-Powered Autonomous Trading (DEEP DIVE)
The Problem & Evolution
| Version | Approach | Result | Learning |
|---|---|---|---|
| v1 | Rigid AND gates (RSI<30 AND BB<lower AND ADX>25) | 9,607 cycles, 0 trades | Rules too tight |
| v2 | BB mean reversion + intelligence layer | 2 signals/month, $3,335 PnL but Sharpe -0.84 | Signal-starved |
| v3 | ML signals + multi-TF alignment + LLM veto | 4.9 signals/day at P>0.6 | Hybrid approach |
V3 philosophy: "Indicators detect. ML predicts. LLMs veto. Risk protects."
ML Training Pipeline
Historical OHLCV (24 months, 1h candles)
→ Feature Engineering (98 features per coin)
→ Walk-Forward Validation (6mo train / 1mo test, 16 folds, 20-candle purge buffer)
→ XGBoost Training (Python, scikit-learn)
- n_estimators=200, max_depth=4, learning_rate=0.05
- subsample=0.8, colsample_bytree=0.8, min_child_weight=5
- Multi-class softmax (LONG/SHORT/HOLD)
- compute_sample_weight('balanced') for ~70% HOLD class imbalance
- Early stopping (20 rounds)
→ Platt Scaling Calibration (11,680 out-of-fold samples)
→ ONNX Export (skl2onnx)
→ TypeScript Inference (onnxruntime-node, Bun runtime)
- Feature parity verified: 57/57 features match Python↔TS at 1e-11 precision
- ONNX model accuracy: max probability difference 6e-8 across all 3 models
Models: BTC (419KB), ETH (413KB), SOL (421KB) — all ONNX format.
Feature Engineering (98 Total Per Coin)
19 indicators x 5 timeframes (15m, 1h, 4h, 1d, 1w) = 95 per-timeframe features:
| Category | Features |
|---|---|
| Momentum | rsi_14, stoch_k, stoch_d, adx_14 |
| Trend | ema_12_26_diff, ema_50_200_diff, price_vs_ema20, macd_histogram, macd_signal_diff |
| Volatility | bb_position, bb_z_score, bb_bandwidth, atr_pct |
| Volume | vol_ratio (vs SMA20), obv_slope |
| Price Action | price_position (quantile), ret_1bar, ret_5bar, ret_20bar |
3 cross-timeframe features: trend_alignment, momentum_divergence, vol_regime_expanding
Top importances (XGBoost): 4h_vol_ratio, vol_regime_expanding, 4h_ret_1bar
Walk-Forward Validation (Lopez de Prado methodology)
- —Train windows: 6 months (~4,380 1h candles)
- —Test windows: 1 month (~730 candles)
- —Purge buffer: 20 candles between train/test (prevents lookahead bias)
- —Min folds: 5 before deployment gate
- —Total folds: 16+ per asset
- —Calibration: Platt scaling on out-of-fold predictions (11,680 total samples)
Current results: 53.9% accuracy on BTC (target >55%), train-test gap 0.180 (target <0.15, improved from 0.235 via regularization).
Multi-LLM Veto Layer (Frank Morales Pattern)
ML Signal (P>0.6 threshold)
→ Multi-Timeframe Alignment Check
- 1h (40%) + 4h (30%) + 15m (20%) + 1d (10%)
- Score ≥0.6 → full position, 0.4-0.6 → half, <0.4 → skip
→ Multi-LLM Ensemble Veto
- Claude Haiku 4.5 (screening) + DeepSeek V3 (ensemble diversity)
- Each returns single float: -1.0 (bearish) to +1.0 (bullish)
- Consensus gate: average scores, either fails → block trade
- Cost: <$5/month total API spend
→ Kelly-Criterion Position Sizing (Platt-calibrated probabilities)
→ Execution (Bybit v5 REST via CCXT)
Context Layer ("World Brain")
8 data sources aggregated via PageRank-weighted synthesis:
- —Funding rates (perpetual contract cost) | Open interest (whale positioning)
- —News sentiment (polarity scoring) | Macro calendar (Fed events)
- —Fear & Greed Index | Polymarket prediction odds
- —Long-short ratio (leverage positioning) | CoinMarketCap spot context
Polymarket Forecaster (Separate Module)
- —Two-stage debiased AI estimation — Haiku pre-filter sees NO market price before forming initial estimate (anti-anchoring)
- —Kelly sizing: 0.05x fractional (very conservative)
- —3-gate filter: confidence >0.7, edge >10%, articulable hypothesis
- —Paper trading verified, running on VPS via pm2
- —Dashboard: polymarket.rajdeepgupta.in/status↗
V2 Backtest Results (Before v3 Fixes)
| Metric | Value | Target |
|---|---|---|
| Total PnL | +$3,335 | Positive |
| Win Rate | 20.16% | >33% (with 2:1 R:R) |
| Max Drawdown | -$2,194 | <$1,500 |
| Total Trades | 191 | — |
| Sharpe Ratio | -0.84 | >1.0 |
Next: CNN-LSTM model (Conv1D 3 layers 64→32 + LSTM(64) + Dense(32) → softmax(3)), ensemble with LightGBM/TabNet.
#Part 4: Jarvis — Personal AI Second Brain (DEEP DIVE)
The Vision
Jarvis is NOT an AI assistant. It's a personal AI operating system — a system that manages engineering projects (6 repos), personal productivity, finances, learning, research, smart home, and life admin. The goal: one interface for everything, powered by specialized AI agents that coordinate, learn, and converge toward my actual decision-making patterns.
What Makes This More Than "Claude with Tools"
- —It learns. Every divergence between system suggestion and actual action is a gradient signal. Over 100+ decisions, the system converges toward my real patterns.
- —It has memory. 4-tier memory hierarchy with 5-layer retrieval. Knowledge decays, gets resurfaced, gets rated.
- —It has a body. OpenClaw daemon runs 24/7 on VPS with 20+ messaging platforms, 100+ skills.
- —It coordinates agents. 12+ specialized agents with different models, budgets, and authorities.
- —It controls my environment. Smart home integration — lights, fans, AC, cameras via command queue.
- —It builds itself. Engineering pipeline dispatches agents that write code, review it, and create PRs autonomously.
Evolution: v1 → v2
v1 (Jan-Feb): Simple orchestration. Polling-based dispatch (Plane every 5 min). Burned 7+ Claude sessions/day with nothing to dispatch. Single agent model.
The Pivot (March): Killed polling. Event-driven dispatch. Zero idle cost. Built full state machine.
v2 (March-April): Full multi-agent system. 32 Convex tables. VPS with autonomous agents. Telegram approval flow. Knowledge engine. Personal productivity layer.
"Make Jarvis Usable" Pivot (April): After 50+ sessions building infrastructure, realized all personal productivity agents were stopped on VPS. 3 sprints to activate what existed before building more.
The 7-Step Core Loop (Intelligence Convergence Engine)
Step 0: CAPTURE → Nexus sensory layer (Chrome ext, CLI, Telegram, Donna auto-scan)
All inputs → knowledge_inbox → /digest → knowledge_items with embeddings
Step 1: DISCOVER → Surface relevant knowledge based on current context
Step 2: CONTEXTUALIZE → Assemble multi-source context for the task
Step 3: EXECUTE → Route to appropriate agent with assembled context
Step 4: EVALUATE → Quality gates, review, human approval
Step 5: LEARN → Feedback signals, decision logging, agent learnings
Step 6: MONITOR → Health checks, cost tracking, drift detection
→ Back to Step 0 (continuous loop)
Memory Architecture (4-Tier, Hermes-Inspired)
┌─────────────────────────────────────────────────────────┐
│ SENSORY MEMORY — Raw captures │
│ knowledge_inbox table, recent context, unprocessed inputs │
│ Retention: hours. Everything enters here first. │
├─────────────────────────────────────────────────────────┤
│ WORKING MEMORY — Current session context │
│ Active conversation, recent decisions, task state │
│ 5-source context builder (multiplier effect) │
├─────────────────────────────────────────────────────────┤
│ EPISODIC MEMORY — Timeline of events │
│ episodic_events table, session_summaries, DecisionLog │
│ "What happened when?" — temporal retrieval │
├─────────────────────────────────────────────────────────┤
│ LONG-TERM MEMORY — Patterns, learnings, knowledge │
│ knowledge_items with embeddings (1536-dim vectors) │
│ FTS5 full-text index (2,321 sections across 7 repos) │
│ Confidence decay: items lose relevance unless accessed │
│ Fields: accessCount, lastAccessed, userRating, decayedScore │
└─────────────────────────────────────────────────────────┘
5-Layer Retrieval Architecture
| Layer | Method | What It Finds | Speed |
|---|---|---|---|
| 1. Keyword/FTS5 | Full-text search | Exact terms, file names, function names | Fast |
| 2. Vector Similarity | Embedding cosine similarity (1536-dim) | Semantically related content | Medium |
| 3. Temporal Recency | Timestamp-based scoring | Recent decisions, fresh context | Fast |
| 4. Episodic Links | Decision history traversal | Past similar situations and outcomes | Medium |
| 5. Topic Clustering | Theme coherence scoring | Related knowledge across domains | Slow |
Context is assembled by combining results from all 5 layers, weighted by task type. Engineering tasks weight keyword/vector higher. Personal tasks weight episodic/temporal higher.
Nexus Knowledge Pipeline
CAPTURE (multiple entry points):
Chrome Extension → URL + highlights + context
CLI /capture → quick note, idea, URL
Telegram forward → messages, links, files
Donna auto-scan → email signals, calendar, repo changes
↓
knowledge_inbox (Convex table — raw, unprocessed)
↓
/digest processor:
→ Summarize content
→ Extract tags and topics
→ Generate embeddings (1536-dim vectors)
→ Score relevance and quality
→ Create knowledge_connections (graph relationships)
↓
knowledge_items (processed, searchable, decayable)
↓
RESURFACING:
Donna queries Nexus at 7:30 AM and 10 PM IST
Surfaces items based on: current context + decay score + topic relevance
Items accessed get accessCount++ and freshness boost
Items ignored decay further
The Gradient Descent Intelligence Model
Core insight: The system improves through user corrections, not architectural perfection.
decision-model.md (initial hypothesis — best guess of my decision patterns)
↓
System proposes action/suggestion
↓
I accept / reject / modify
↓
DecisionLog records: (context, suggestion, actual_action, outcome, delta)
↓
Every divergence between suggestion and actual action = gradient signal
↓
Over 100+ decisions, the model converges toward my ACTUAL patterns
↓
Trust calibration: proactive suggestions ignored >70% initially
→ Feedback loop re-weights categories
→ Auto-demotion of consistently-ignored suggestion categories after 2-3 weeks
Convergence timeline:
| Stage | Timeline | System Behavior |
|---|---|---|
| Smart assistant | Week 1-2 | Follows rules, applies heuristics |
| Pattern learner | Week 3-6 | Feedback loops active, starts adapting |
| Aligned partner | Month 2-3 | 100+ DecisionLog entries, suggestions become genuinely useful |
OpenClaw Integration (Body vs Brain)
| OpenClaw (Body) | Jarvis (Brain) | |
|---|---|---|
| Role | Always-on daemon, personal automation | Engineering pipeline, task dispatch |
| Runtime | 30-min heartbeat on VPS | On-demand (CLI or Telegram trigger) |
| Model | Claude Haiku 4.5 ($5/day cap) | Claude Opus 4.6 (orchestration) |
| Platforms | 20+ messaging platforms, Telegram primary | Claude Code CLI, Plane |
| Skills | 100+ (calendar, gmail, plane-tasks) | Engineering agents, code review |
| Memory | SQLite journals (local) | Convex (32 tables, shared) |
| Bridge | Nexus (shared knowledge base, embeddings, FTS5) |
Handoff flow: OpenClaw detects engineering request → writes trigger file → handoff-watcher picks up → launches Claude Code session → Friday agent executes → Groot reviews → Telegram approval → PR created.
Multi-Agent Architecture (12+ Agents)
Core Team (5 VPS agents via pm2):
| Agent | Role | Model | When |
|---|---|---|---|
| Jarvis | Orchestrator | Opus 4.6 | On-demand |
| Friday | Lead Engineer | Sonnet 4.6 | Trigger-based |
| Groot | Code Reviewer | Sonnet 4.6 | After every code change |
| Donna | Executive Intelligence | Haiku 4.5 | Scheduled (7:30 AM, 10 PM) |
| Bran | System Observer | Haiku 4.5 | Always-on (health monitor) |
Specialist Agents: Rocket (research), Vision (architecture), Engineering Agent (autonomous pipeline), Reviewer, Planner, Frontend/Backend/Mobile Dev, Security Auditor, Design Engineer, Test Writer.
Engineering Agent Pipeline (Autonomous)
/work JARVI-42
→ Jarvis reads Plane task → moves to In Progress
→ Dispatches engineering-agent (Sonnet, 60 max tool calls)
→ Agent reads knowledge base + 72 learnings entries
→ Researches codebase → writes plan to temp file
→ Creates feature branch → implements changes
→ Runs quality gates (tsc, lint, build)
→ Self-reviews → commits → pushes branch
→ Returns AWAITING_APPROVAL
→ Jarvis dispatches Groot (reviewer)
→ If APPROVED: sends Telegram message with Approve/Reject buttons
→ I tap Approve on phone
→ Jarvis creates draft PR → updates Plane → cleans up
13-State Task Machine
idle → queued → running → [intermediate states] → completed
↓
blocked → awaiting_approval → approved → completed
↓
rate_limited → (checkpoint saved) → resumed
↓
failed → (retry logic, 5 failure types) → running
Safety: Optimistic concurrency (version field), full audit trail (task_events), dry-run mode, kill switch, local presence flag (pauses VPS when I work locally), mid-task checkpoints, cost caps.
Convex Data Model (32 Tables)
Knowledge (6): knowledge_inbox, knowledge_items, topics, knowledge_connections, doc_insights, episodic_events
Agents (10): agent_state, agent_messages, agent_messages_dryrun, agent_decisions, agent_metrics, agent_learnings, agent_checkpoints, rate_limit_state, task_runtime_state, task_events
Memory & Scheduling (8): donna_config, donna_engagement, session_summaries, daily_budget, usage_cache, telegram_jobs, telegram_approvals, execution_log
Smart Home (2): smart_home_commands, smart_home_state
Plus: local_presence, openai_usage, and more
Framework Synthesis (3-Framework Hybrid)
| Framework | What We Adopted | What We Didn't |
|---|---|---|
| Hermes Agent | Confidence decay, 5-layer memory abstraction, learning patterns | Full architecture (too coupled) |
| gstack | Artifact chaining (research → plan → code — each output becomes next input) | CLI workflow (not agent-native) |
| Paperclip AI | Heartbeat daemon (30 min), budget tracking, org chart of agents | Monolithic orchestrator |
Continuous Learning System
Agent Learnings (72+ entries): Engineering agent reads agent-learnings.md at start of every task. Updated whenever: reviewer blocks code, I reject approval, quality gates fail. Format: What went wrong → Root cause → Rule. Tags enable context-aware injection.
Memory Write Authority: Only Reviewer/Jarvis writes to persistent memory. All other agents propose learnings via structured output. Jarvis reviews and decides what persists. Prevents multi-agent garbage — discovered this after conflicting, low-quality entries accumulated.
Cost Management
Two Claude accounts tracked (personal Max + team office):
| 5h Utilization | Level | Behavior |
|---|---|---|
| < 50% | Full | 3-5 parallel agents |
| 50-70% | Moderate | Max 2-3 parallel |
| 70-85% | Conservative | Single agent only |
| 85-95% | Emergency | Complete current task only |
| > 95% | Paused | Write handoff, defer all |
Adaptive poller: 1h idle, 5m when agents active, exponential backoff on 429s.
Jarvis Dashboard (jarvis.rajdeepgupta.in)
Stack: Next.js 16.1, React 19.2, Tailwind CSS 4, shadcn/ui, Convex 1.33
Pages: /personal, /memory, /projects, /agents/[name] (7 agents), /agents/conversations, /docs, /observability, /focus, /nexus, /sessions/[date], /events, /smart-home
Data sources: Plane API (ISR 60s), Convex (real-time), markdown files copied during prebuild (sessions, subscriptions, goals, learning-log, agent-learnings, behavioral-rules).
Smart Home Integration
Command queue architecture:
Dashboard/Jarvis/OpenClaw → queueCommand(source, command, payload) → Convex
↓ Home Hub (Python, pm2) polls every 2-3s via pollCommands()
↓ Executes via device APIs (tinytuya for Tuya, Tapo SDK, LG ThinQ, EZVIZ)
↓ reportResult(id, status, result, error) → Convex
↓ Dashboard shows real-time status (stale detection at >2 min)
15+ devices: 5 Tapo lights, 4 Tuya multi-gang switches, 2 Atomberg fans, 1 LG AC, 2 EZVIZ cameras.
Smart Scenes:
| Scene | What It Does |
|---|---|
coding | Blue strips 40%, room light off, fan speed 3 |
goodnight | All lights off, fans speed 2 |
movie | Lights off except bed back 20% purple, fan speed 2 |
wake_up | Warm lights 60%, fan off |
The "Second Brain Intelligence" Plan (Latest, April 2026)
ExecutionUnit abstraction (Tier 0 — must exist before any intelligence): All system operations unified under one tracking model with: input/output artifacts, tools_allowed, budget, state, checkpoint, memory_authority, goal_ancestry, feedback_signals.
5-Phase Timeline:
- —Phase 1F (1 week): ExecutionUnit, feedback signals, memory write authority
- —Phase 1G (2 weeks): Confidence decay, goal ancestry, failure replay, Nexus feedback loop
- —Phase 2A (2 weeks): Compound learning — OpenClaw skill self-improvement, Donna learning loop
- —Phase 2B (2 weeks): Rajdeep OS — decision model, role engine, delegation, proactive loop
- —Phase 2C (1 week): E2E + ship → then 4-week calibration period
#Part 5: Local AI Experiments — MLX on Apple Silicon
What We Tried (M1 Max 32GB)
| Experiment | Model | Result |
|---|---|---|
| OCR | Qwen2-VL-2B-4bit (mlx-vlm) | Good for printed text |
| Text generation | Qwen3-4B-4bit (mlx-lm) | Good for summarization |
| Multi-model debate | Qwen3-4B (3 agents debating NVDA) | Worked! 1 round ~18s |
| Local embeddings | nomic-modernbert-embed-base-4bit | 44,000 tok/s throughput |
| Image generation | Flux.1-schnell (mflux) | Abandoned — 15GB download |
MLX vs Alternatives
| MLX | Ollama (MLX backend) | llama.cpp | Cloud API | |
|---|---|---|---|---|
| Speed on Apple Silicon | Fastest (native) | Same (uses MLX now) | 20-87% slower | Network-bound |
| VLM support | Full | Limited | Partial | Full |
| Fine-tuning | Yes (LoRA) | No | No | No |
| Cost | Zero | Zero | Zero | Per-token |
Verdict: Local models = playground and privacy-sensitive tasks, not production replacement at our scale. 4B models are the sweet spot for M1 Max — interactive speed, leaves room for other apps. 7B saturates resources.
Potential Use Cases
| Use Case | Model | Benefit |
|---|---|---|
| Entellect PDF OCR | DOTS-OCR / DeepSeek-OCR | Zero cost, medical data stays local |
| Local embeddings | nomic-modernbert | 44k tok/s, zero API cost |
| Broker OS doc extraction | DeepSeek-OCR + Qwen3-8B | Privacy for client documents |
| LoRA fine-tuning | Qwen3-8B on medical Q&A | Domain-specific quality boost |
#Part 6: Side Projects
Ticker — macOS Menu Bar Calendar
What: Native macOS app showing live meeting countdown in menu bar ("Standup in 23m" → "Standup NOW"). One-click Zoom/Meet join.
Stack: Swift 5.9, SwiftUI (native macOS, no Electron).
Business model: Free tier + Pro ($4.99 one-time). Competes with Fantastical, Dato, Meeter at the lowest price point.
Status: Production (v0.3.0), DMG available. Website: gettickerapp.com↗.
LeadMapsHub — Google Maps Lead Scraper
What: Chrome extension that extracts business data (names, phones, addresses, ratings, websites) from Google Maps with auto-scroll, enrichment, and multi-export (CSV, Excel, JSON).
Stack: Vanilla JS, Shadow DOM, Chrome Manifest V3. No server needed — runs entirely in browser.
Status: Development. Competes with Outscraper ($2/1000 leads), Scrap.io ($49/mo) at a lower price.
#Part 7: AI Model Usage Across All Projects
Model Selection Matrix
| Model | Provider | Used In | Purpose | Why This Model |
|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | Jarvis orchestrator | Complex reasoning, task routing | Highest quality for critical decisions |
| Claude Sonnet 4.6 | Anthropic | Jarvis agents, Entellect RAG, Raven analysis | Code gen, generation, trading | Best quality-cost ratio |
| Claude Haiku 4.5 | Anthropic | OpenClaw, Donna, Raven screening | Briefings, classification, veto | Low cost for high-volume tasks |
| GPT-4o-mini | OpenAI | Entellect reranking | Relevance scoring (conditional) | Good at reranking, cheaper than Claude |
| text-embedding-3-small | OpenAI | Entellect embeddings | Chunk vectors (1536-dim) | Proven quality for medical text |
| Gemini 2.0 Flash | Entellect PDF extraction | Vision-based document parsing | Best vision for complex layouts | |
| DeepSeek V3 | DeepSeek | Raven LLM ensemble | Trading veto (diversity) | Reduces single-model bias |
| XGBoost → ONNX | Local | Raven signal generation | Trading signal prediction | Best for tabular data, fast CPU training |
| Qwen3-4B (MLX) | Local | Experiments | Text gen, multi-model debate | Best 4B for Apple Silicon |
| nomic-modernbert (MLX) | Local | Experiments | Local embeddings | 44k tok/s, Matryoshka dims |
Model Selection Philosophy
- —Cheapest model that works. Haiku for classification ($0.25/M), Sonnet for generation ($3/M), Opus only for orchestration ($15/M) — 90%+ savings.
- —Right provider for each task. OpenAI for embeddings, Gemini for vision, Claude for reasoning, DeepSeek for ensemble diversity.
- —Local for privacy. Medical data (Entellect) and client docs (Broker OS) benefit from on-device processing.
- —Train your own when task is specific. XGBoost for trading signals — tabular data, specialized model outperforms general LLMs.
- —Ensemble for reliability. Raven uses Claude + DeepSeek together — reduces single-model bias.
#Part 8: Infrastructure & Services
Complete Service Map
| Category | Service | Purpose | Used By |
|---|---|---|---|
| Hosting | Vercel | All web apps (8 projects) | All |
| VPS | Hetzner CPX22 (Helsinki) | Agents, trading bot, smart home | Jarvis, Raven |
| DNS/CDN | Cloudflare | DNS, DDoS, R2 storage, Workers | Broker OS |
| Database | Convex | Real-time (agents, messaging) | Jarvis, Broker OS |
| Database | Neon PostgreSQL | Relational + pgvector | Platform, Entellect |
| Database | Supabase | Legacy platform data | Platform |
| Database | SQLite | FTS5 index (2,321 sections) | Jarvis |
| AI | Anthropic (Claude) | Primary LLM | All AI projects |
| AI | OpenAI | Embeddings + reranking | Entellect |
| AI | Google (Gemini) | Vision/PDF extraction | Entellect, 11sqft AI |
| AI | DeepSeek | Trading ensemble diversity | Raven |
| AI | Local MLX | Privacy processing | Experiments |
| AI | ONNX Runtime | ML model inference | Raven |
| Auth | Firebase | Phone OTP | Broker OS, Properties, Mobile |
| Messaging | AiSensy | WhatsApp Business API | Broker OS |
| Messaging | Telegram Bot API | Notifications, approvals | Jarvis |
| Maps | Mapbox GL + Google Maps | Property maps, geocoding | Platform, Properties |
| Project Mgmt | Plane (self-hosted API) | Task tracking (8 projects) | All |
| Monitoring | Sentry | Error tracking | Broker OS, Platform |
| Analytics | GA4 + Mixpanel | User analytics | Broker OS |
| Smart Home | Tapo, Tuya, Atomberg, LG, EZVIZ | IoT device control | Jarvis |
| Testing | Vitest + Playwright | Unit + E2E | Broker OS |
Why Cloudflare + Vercel Together?
- —Vercel: Application hosting, serverless functions, CI/CD
- —Cloudflare: DNS management, DDoS protection, R2 object storage (cheaper than S3), Image CDN optimization, Workers for edge logic (short-link redirects)
- —Complementary: Vercel handles compute, Cloudflare handles CDN/storage/DNS
#Part 9: Key Lessons & Pivots
- —
Infrastructure spiral is real. 50+ sessions building Jarvis infra, all personal agents stopped. Fix: "Make Jarvis Usable" — activate before building more.
- —
Polling is expensive, events are cheap. Jarvis v1 burned 7+ sessions/day polling with nothing to dispatch. Fix: event-driven dispatch (zero idle cost).
- —
Rule-based classification saves real money. Entellect's query classifier uses zero LLM tokens — just keyword matching. Not every AI task needs an LLM.
- —
PDF extraction is harder than expected. Gemini Vision batch extraction fails after page 56. Native pdftotext at 300x speed with hybrid LLM refinement won.
- —
The cheapest model that works is the best model. Haiku at $0.25/M vs Opus at $15/M — 90%+ savings without quality loss on simple tasks.
- —
Multi-agent garbage is real. Multiple agents writing to shared memory created conflicts. Fix: only Reviewer/Jarvis writes — others propose.
- —
Walk-forward validation matters. Without purge buffers and proper train/test splits, XGBoost accuracy was artificially inflated. Lopez de Prado methodology fixed this.
- —
Always research before building. Weeks spent on PineScript indicators before realizing the approach was fundamentally flawed. Deep research (ArXiv, GitHub) saves weeks.
- —
Gradient descent as system design. Don't optimize architecture — optimize the feedback loop. Every feature must answer: "What signal does this create?"
- —
Signal generation was the real bottleneck. Raven v1-v2 had good intelligence layers but produced 0-2 signals/month. ML-generated signals (4.9/day) solved this.
#Part 10: What's Next
| Project | Next Milestone |
|---|---|
| Broker OS | THE BET — distribution > features. WhatsApp-first onboarding. |
| Entellect | Phase B completion, clinical casebook, drug system, LoRA fine-tuning experiments |
| Raven | CNN-LSTM model (Conv1D + LSTM → softmax), LightGBM/TabNet ensemble, testnet live trading |
| Jarvis | ExecutionUnit table, DecisionLog corpus to 100+, trust calibration at >50% acceptance |
| Ticker | App Store launch, Pro tier activation |
Long-term vision:
- —Jarvis as Rajdeep OS — morning briefing → task triage → engineering dispatch → email → learning → evening wrap-up, all autonomous
- —Entellect as platform — expand beyond ENT to other medical specialties (same RAG, different knowledge)
- —Raven live trading — graduate from testnet after calibration
- —11sqft as THE broker SaaS — "Shopify for Indian real estate brokers"
#Part 11: Links & References
Production URLs
| Project | URL |
|---|---|
| 11sqft Platform | 11sqft.com↗ |
| Broker OS | broker.11sqft.com↗ |
| Properties | 11sqft.com/properties↗ |
| Jarvis Dashboard | jarvis.rajdeepgupta.in↗ |
| Raven Polymarket | polymarket.rajdeepgupta.in↗ |
| Ticker | gettickerapp.com↗ |
GitHub Repositories
| Project | Repository |
|---|---|
| Platform | github.com/sethraj14/11sqft↗ |
| Broker OS | github.com/sethraj14/11sqft-broker-os↗ |
| Broker Mobile | github.com/sethraj14/11sqft-broker-app↗ |
| Properties | github.com/sethraj14/11sqft-properties↗ |
| 11sqft AI | github.com/sethraj14/11sqft-ai↗ |
| Academy | github.com/sethraj14/11sqft-academy↗ |
| Raven | github.com/sethraj14/raven↗ |
| Ticker | github.com/sethraj14/ticker↗ |
Document generated April 4, 2026. Covers work from January to April 2026 across 11 active repositories.