Secure Terminal

JARVIS

>_

ENCRYPTED CONNECTION

Jarvis Command Palette

Search commands, navigate, or take actions

JARVIS
0/0
0%

Technical Portfolio

Rajdeep Gupta · AI, Engineering & Product · Jan–Apr 2026

11 repos · 7 in production

Technical Portfolio — Rajdeep Gupta

AI, Engineering & Product | January – April 2026

#Executive Summary

Over the past 4-5 months, I've been building across 11 active repositories spanning real estate SaaS, medical education AI, autonomous trading, a personal AI operating system, a native macOS app, and a Chrome extension. The common thread: using AI not as a feature, but as core infrastructure — from RAG pipelines and multi-agent orchestration to ML signal generation and local model experimentation.

This document covers every project's goals, architecture, AI integration, current state, and future direction. The AI/ML sections go deep — chunking strategies, embedding models, retrieval mechanisms, agentic state machines, memory management, context retrieval, and ML training pipelines.

Key numbers:

  • 11 repositories, 7 in production
  • 32 Convex tables powering the AI agent system
  • 4-tier memory architecture with 5-layer retrieval (Hermes-inspired)
  • 98 ML features per coin in the trading pipeline with walk-forward validation
  • 14-type semantic chunking in the RAG pipeline with conditional reranking
  • XGBoost → ONNX → TypeScript inference (6e-8 max probability diff)
  • 6 AI model providers actively used (Anthropic, OpenAI, Google, local MLX, ONNX Runtime, DeepSeek)
  • Hetzner VPS running 7+ autonomous agents 24/7
  • Smart home integration controlling 15+ IoT devices via command queue

#Project Portfolio at a Glance

#ProjectWhat It DoesTech StackAI UsageStatus
111sqft PlatformReal estate backend + adminNext.js 15, Neon PostgreSQLCron analyticsProduction
2Broker OSSaaS for real estate brokersNext.js 15, Convex, Cloudflare R236-file Conversation OS, voice parsingActive Dev
3Broker MobileMobile app for brokersReact Native 0.81, Expo 54Beta
4Properties AppPublic property discoveryNext.js 16, Mapbox, TanStack QueryProduction
511sqft AIScrapers + LLM servicesNext.js 16, Python Flask, AI SDK v6LLM reasoning, web scrapingActive Dev
6AcademyEducational content siteNext.js 16, MDX, next-intlLLM content generationProduction
7EntellectENT medical education + RAGNext.js 16, Neon pgvector, ClaudeFull RAG pipeline, content generationProduction
8RavenAutonomous crypto tradingBun, TypeScript, CCXT, ONNXXGBoost ML + multi-LLM vetoActive Dev
9JarvisPersonal AI second brainConvex, Claude Code, TelegramMulti-agent orchestration, memory systemActive Dev
10TickermacOS menu bar calendarSwift 5.9, SwiftUIProduction
11LeadMapsHubGoogle Maps lead scraperChrome Extension, Manifest V3Development

Production URLs: 11sqft.com | broker.11sqft.com | jarvis.rajdeepgupta.in | polymarket.rajdeepgupta.in | gettickerapp.com

#Part 1: The 11sqft Real Estate Ecosystem

The Problem

Indian real estate brokerage is fundamentally broken. Brokers operate through WhatsApp groups, maintain listings in Excel sheets, and have zero digital presence. There's no "Shopify for brokers."

11sqft is a suite of 6 interconnected products solving this.

1.1 Platform (11sqft.com)

Backend API + admin dashboard. 3-layer architecture:

API Layer (public/v1 + admin/v1) — validation, auth, CORS, rate limiting
Service Layer (48+ classes) — business logic, cache invalidation
Repository Layer (30+ models) — SQL via postgres.js tagged templates
PostgreSQL (Neon serverless)

Stack: Next.js 15.5, React 19, TypeScript 5.5, MUI 6.2 + Tailwind, BullMQ + Redis, NextAuth 4.24, Mapbox GL, Sentry, Vercel with crons.

Tables: people, properties, property_groups, leads, favorites, builders, amenities, landmarks, addresses, feedback, region-profiles, media, cache.

1.2 Broker OS (broker.11sqft.com) — The Flagship

"Shopify for Real Estate Brokers" — the product I'm betting everything on.

The AI Story — Conversation OS:

A 36-file multi-turn dialog engine with:

  • 27 intents with slot-filling state machine
  • Voice-first property entry — brokers speak into WhatsApp/Telegram, system parses and structures
  • Client relationship extraction from conversation logs
  • Weekly engagement digest — AI-generated broker activity summaries
Architecture
WhatsApp/Telegram Message → AiSensy / Bot API
  → Intent Classification (27 intents)
  → Slot Filling State Machine
  → Action Execution (property creation, lead capture, CRM)
  → Response Generation → Channel Delivery

Stack: Next.js 15, Convex 1.17, Firebase Phone Auth, Cloudflare R2 + Images, OpenAI API, next-intl (3 languages), Vitest (2,529 tests) + Playwright (156 E2E tests).

Convex Schema (modular): brokerTables, propertyTables, networkingTables, conversationTables, brokerMemoryTables, referralTables, analyticsTables, complianceTables, adminTables, salesTables.

1.3-1.6 Other 11sqft Projects

  • Broker Mobile: React Native 0.81 + Expo 54. Camera/photo upload, deep linking. iOS + Android ready.
  • Properties App: Next.js 16.1, Mapbox GL, Framer Motion, TanStack Query v5. Consumer property search.
  • 11sqft AI: Next.js 16.1 + Python Flask. 99acres scraper (BeautifulSoup), Vercel AI SDK v6.
  • Academy: Next.js 16.1 + MDX, next-intl v4.7, recharts, service worker. Custom LLM generation scripts.

Architecture: How They Connect

  • Broker OS uses Convex (real-time for messaging/notifications)
  • Platform uses PostgreSQL/Neon (relational for complex property search)
  • Separate databases by design — different access patterns
  • Shared domain (11sqft.com) with Cloudflare DNS

#Part 2: Entellect — AI-Powered Medical Education (DEEP DIVE)

The Problem

ENT postgraduate students study from 9+ textbooks (Dhingra, Scott-Brown, Cummings). No intelligent tool does cross-textbook retrieval, exam/clinical mode adaptation, spaced repetition, or generates practice MCQs from textbook content.

RAG Pipeline Architecture

Architecture
User Query
  → Rule-Based Query Classifier (ZERO LLM cost)
    ├── Mode: Clinical vs Exam (keyword signals)
    ├── Complexity: Reasoning vs Simple ("why", "explain" triggers)
    └── Topics: 40+ ENT keywords extracted
  → OpenAI Embedding (text-embedding-3-small, direct API)
  → Neon pgvector Cosine Similarity (top 20 chunks)
  → Topic Boosting (1.2x multiplier for matching topics)
  → Source Tiering (Tier 1: Indian textbooks for exam, Tier 2: international for clinical)
  → Deduplication (near-duplicate removal)
  → Conditional Reranking (GPT-4o-mini, ONLY for "reasoning" queries)
  → Generation (Claude Sonnet 4.6 via Vercel AI Gateway + OIDC)
  → Structured citations at end

Semantic Chunking Strategy

  • Max chunk size: ~1,200 tokens with 200-token overlap
  • Section-aware parsing: PDF → pdf-parse → heading-based segmentation → token estimation
  • 14 content type classifications (detected per chunk via 30-50 keyword indicators each):

Definition | Anatomy | Etiology | Pathology | Clinical Features | Investigations | Differential Diagnosis | Treatment | Complications | Classification | Prognosis | Epidemiology | Surgical Procedures | Pharmacology

No LLM needed for classification — pure keyword matching. This saves significant cost on high-volume indexing.

PDF Extraction Experiments (3 methods compared)

MethodText CoverageAccuracyCostSpeedVerdict
Gemini Vision batch-350% (total failure)0%$0.078493sFAILED
Gemini Vision batch-5037% (cascading failure after page 56)25% chapter$0.016133sFAILED
pdftotext (native)131% (overcoverage)0% structural$00.8s (300x faster)PASSED

Key finding: Gemini Vision batch extraction hit a degradation wall after page 56 — API context window limits or cumulative token exhaustion. Native pdftotext extracts raw text reliably but loses structural metadata. Decision: Hybrid approach — pdftotext for text, selective LLM refinement for metadata.

Content Generation Pipeline

All generators use Claude Sonnet 4.6 with 20 chunks retrieved per topic:

Content TypePer TopicTotal (30 topics)Key Details
MCQs515030% easy / 40% medium / 30% hard, 6 trap types
Flashcards51504 types: definition, concept, clinical, mnemonic
Notes1309 required sections (Definition → Mnemonics)
Viva390question + model_answer + examiner_notes + common_mistakes
Total14420 items per run

MCQ Trap Engine (Phase B): 6 engineered trap types — conceptual_confusion, similar_options, outdated_concept, overthinking_trap, negative_framing, partial_knowledge. Each MCQ has structured explanation_v2 (JSONB): correct_reasoning + why_not_others for each option.

Database Schema Evolution

Phase A (Source Classification): Added source_type, weight (0.0-3.0), domains to documents. Dhingra = exam_standard, weight=1.0. Enables tier-based retrieval weighting.

Phase B (Content Upgrade, 8 new tables): media (images/audiograms/CTs), pyqs (previous year questions with year/exam/session), topic_weights (pyq_count, trend, yield_tier), cases + case_steps (branching clinical casebook with OSCE scoring), drug_interactions, session_logs, user_annotations.

PYQ Intelligence: 30 topics x 5 concepts = 150 base PYQ questions. Topic weights track pyq_count, pyq_last_5_years, trend (rising/stable/declining), yield_tier (high/medium/low). High-yield topics get more generated content.

Dual Database Architecture

  • CONTENT_DATABASE_URL (Neon, us-east-1) → MCQs, flashcards, notes, viva, generation_log
  • DATABASE_URL (Neon, us-east-1) → Embeddings, chunks (pgvector), documents

Cost tracking: Every RAG generation logged with tokens_used and cost_usd (Sonnet: $3/M input, $15/M output).

Current State

Shipped: MCQ practice, mock exams (50/100/200 Q), flashcards (SM-2 spaced repetition), viva practice, notes, mistake bank, progress dashboard, RAG Q&A, topic mastery heatmap. 25 API routes.

In Progress: Phase B migration, clinical casebook framework, drug system.

#Part 3: Raven — ML-Powered Autonomous Trading (DEEP DIVE)

The Problem & Evolution

VersionApproachResultLearning
v1Rigid AND gates (RSI<30 AND BB<lower AND ADX>25)9,607 cycles, 0 tradesRules too tight
v2BB mean reversion + intelligence layer2 signals/month, $3,335 PnL but Sharpe -0.84Signal-starved
v3ML signals + multi-TF alignment + LLM veto4.9 signals/day at P>0.6Hybrid approach

V3 philosophy: "Indicators detect. ML predicts. LLMs veto. Risk protects."

ML Training Pipeline

Architecture
Historical OHLCV (24 months, 1h candles)
  → Feature Engineering (98 features per coin)
  → Walk-Forward Validation (6mo train / 1mo test, 16 folds, 20-candle purge buffer)
  → XGBoost Training (Python, scikit-learn)
    - n_estimators=200, max_depth=4, learning_rate=0.05
    - subsample=0.8, colsample_bytree=0.8, min_child_weight=5
    - Multi-class softmax (LONG/SHORT/HOLD)
    - compute_sample_weight('balanced') for ~70% HOLD class imbalance
    - Early stopping (20 rounds)
  → Platt Scaling Calibration (11,680 out-of-fold samples)
  → ONNX Export (skl2onnx)
  → TypeScript Inference (onnxruntime-node, Bun runtime)
    - Feature parity verified: 57/57 features match Python↔TS at 1e-11 precision
    - ONNX model accuracy: max probability difference 6e-8 across all 3 models

Models: BTC (419KB), ETH (413KB), SOL (421KB) — all ONNX format.

Feature Engineering (98 Total Per Coin)

19 indicators x 5 timeframes (15m, 1h, 4h, 1d, 1w) = 95 per-timeframe features:

CategoryFeatures
Momentumrsi_14, stoch_k, stoch_d, adx_14
Trendema_12_26_diff, ema_50_200_diff, price_vs_ema20, macd_histogram, macd_signal_diff
Volatilitybb_position, bb_z_score, bb_bandwidth, atr_pct
Volumevol_ratio (vs SMA20), obv_slope
Price Actionprice_position (quantile), ret_1bar, ret_5bar, ret_20bar

3 cross-timeframe features: trend_alignment, momentum_divergence, vol_regime_expanding

Top importances (XGBoost): 4h_vol_ratio, vol_regime_expanding, 4h_ret_1bar

Walk-Forward Validation (Lopez de Prado methodology)

  • Train windows: 6 months (~4,380 1h candles)
  • Test windows: 1 month (~730 candles)
  • Purge buffer: 20 candles between train/test (prevents lookahead bias)
  • Min folds: 5 before deployment gate
  • Total folds: 16+ per asset
  • Calibration: Platt scaling on out-of-fold predictions (11,680 total samples)

Current results: 53.9% accuracy on BTC (target >55%), train-test gap 0.180 (target <0.15, improved from 0.235 via regularization).

Multi-LLM Veto Layer (Frank Morales Pattern)

Architecture
ML Signal (P>0.6 threshold)
  → Multi-Timeframe Alignment Check
    - 1h (40%) + 4h (30%) + 15m (20%) + 1d (10%)
    - Score ≥0.6 → full position, 0.4-0.6 → half, <0.4 → skip
  → Multi-LLM Ensemble Veto
    - Claude Haiku 4.5 (screening) + DeepSeek V3 (ensemble diversity)
    - Each returns single float: -1.0 (bearish) to +1.0 (bullish)
    - Consensus gate: average scores, either fails → block trade
    - Cost: <$5/month total API spend
  → Kelly-Criterion Position Sizing (Platt-calibrated probabilities)
  → Execution (Bybit v5 REST via CCXT)

Context Layer ("World Brain")

8 data sources aggregated via PageRank-weighted synthesis:

  • Funding rates (perpetual contract cost) | Open interest (whale positioning)
  • News sentiment (polarity scoring) | Macro calendar (Fed events)
  • Fear & Greed Index | Polymarket prediction odds
  • Long-short ratio (leverage positioning) | CoinMarketCap spot context

Polymarket Forecaster (Separate Module)

  • Two-stage debiased AI estimation — Haiku pre-filter sees NO market price before forming initial estimate (anti-anchoring)
  • Kelly sizing: 0.05x fractional (very conservative)
  • 3-gate filter: confidence >0.7, edge >10%, articulable hypothesis
  • Paper trading verified, running on VPS via pm2
  • Dashboard: polymarket.rajdeepgupta.in/status

V2 Backtest Results (Before v3 Fixes)

MetricValueTarget
Total PnL+$3,335Positive
Win Rate20.16%>33% (with 2:1 R:R)
Max Drawdown-$2,194<$1,500
Total Trades191
Sharpe Ratio-0.84>1.0

Next: CNN-LSTM model (Conv1D 3 layers 64→32 + LSTM(64) + Dense(32) → softmax(3)), ensemble with LightGBM/TabNet.

#Part 4: Jarvis — Personal AI Second Brain (DEEP DIVE)

The Vision

Jarvis is NOT an AI assistant. It's a personal AI operating system — a system that manages engineering projects (6 repos), personal productivity, finances, learning, research, smart home, and life admin. The goal: one interface for everything, powered by specialized AI agents that coordinate, learn, and converge toward my actual decision-making patterns.

What Makes This More Than "Claude with Tools"

  1. It learns. Every divergence between system suggestion and actual action is a gradient signal. Over 100+ decisions, the system converges toward my real patterns.
  2. It has memory. 4-tier memory hierarchy with 5-layer retrieval. Knowledge decays, gets resurfaced, gets rated.
  3. It has a body. OpenClaw daemon runs 24/7 on VPS with 20+ messaging platforms, 100+ skills.
  4. It coordinates agents. 12+ specialized agents with different models, budgets, and authorities.
  5. It controls my environment. Smart home integration — lights, fans, AC, cameras via command queue.
  6. It builds itself. Engineering pipeline dispatches agents that write code, review it, and create PRs autonomously.

Evolution: v1 → v2

v1 (Jan-Feb): Simple orchestration. Polling-based dispatch (Plane every 5 min). Burned 7+ Claude sessions/day with nothing to dispatch. Single agent model.

The Pivot (March): Killed polling. Event-driven dispatch. Zero idle cost. Built full state machine.

v2 (March-April): Full multi-agent system. 32 Convex tables. VPS with autonomous agents. Telegram approval flow. Knowledge engine. Personal productivity layer.

"Make Jarvis Usable" Pivot (April): After 50+ sessions building infrastructure, realized all personal productivity agents were stopped on VPS. 3 sprints to activate what existed before building more.

The 7-Step Core Loop (Intelligence Convergence Engine)

Architecture
Step 0: CAPTURE → Nexus sensory layer (Chrome ext, CLI, Telegram, Donna auto-scan)
  All inputs → knowledge_inbox → /digest → knowledge_items with embeddings
Step 1: DISCOVER → Surface relevant knowledge based on current context
Step 2: CONTEXTUALIZE → Assemble multi-source context for the task
Step 3: EXECUTE → Route to appropriate agent with assembled context
Step 4: EVALUATE → Quality gates, review, human approval
Step 5: LEARN → Feedback signals, decision logging, agent learnings
Step 6: MONITOR → Health checks, cost tracking, drift detection
  → Back to Step 0 (continuous loop)

Memory Architecture (4-Tier, Hermes-Inspired)

Architecture
┌─────────────────────────────────────────────────────────┐
│ SENSORY MEMORY — Raw captures                            │
│ knowledge_inbox table, recent context, unprocessed inputs │
│ Retention: hours. Everything enters here first.          │
├─────────────────────────────────────────────────────────┤
│ WORKING MEMORY — Current session context                 │
│ Active conversation, recent decisions, task state         │
│ 5-source context builder (multiplier effect)             │
├─────────────────────────────────────────────────────────┤
│ EPISODIC MEMORY — Timeline of events                     │
│ episodic_events table, session_summaries, DecisionLog    │
│ "What happened when?" — temporal retrieval               │
├─────────────────────────────────────────────────────────┤
│ LONG-TERM MEMORY — Patterns, learnings, knowledge        │
│ knowledge_items with embeddings (1536-dim vectors)        │
│ FTS5 full-text index (2,321 sections across 7 repos)     │
│ Confidence decay: items lose relevance unless accessed    │
│ Fields: accessCount, lastAccessed, userRating, decayedScore │
└─────────────────────────────────────────────────────────┘

5-Layer Retrieval Architecture

LayerMethodWhat It FindsSpeed
1. Keyword/FTS5Full-text searchExact terms, file names, function namesFast
2. Vector SimilarityEmbedding cosine similarity (1536-dim)Semantically related contentMedium
3. Temporal RecencyTimestamp-based scoringRecent decisions, fresh contextFast
4. Episodic LinksDecision history traversalPast similar situations and outcomesMedium
5. Topic ClusteringTheme coherence scoringRelated knowledge across domainsSlow

Context is assembled by combining results from all 5 layers, weighted by task type. Engineering tasks weight keyword/vector higher. Personal tasks weight episodic/temporal higher.

Nexus Knowledge Pipeline

Architecture
CAPTURE (multiple entry points):
  Chrome Extension → URL + highlights + context
  CLI /capture → quick note, idea, URL
  Telegram forward → messages, links, files
  Donna auto-scan → email signals, calendar, repo changes
    ↓
knowledge_inbox (Convex table — raw, unprocessed)
    ↓
/digest processor:
  → Summarize content
  → Extract tags and topics
  → Generate embeddings (1536-dim vectors)
  → Score relevance and quality
  → Create knowledge_connections (graph relationships)
    ↓
knowledge_items (processed, searchable, decayable)
    ↓
RESURFACING:
  Donna queries Nexus at 7:30 AM and 10 PM IST
  Surfaces items based on: current context + decay score + topic relevance
  Items accessed get accessCount++ and freshness boost
  Items ignored decay further

The Gradient Descent Intelligence Model

Core insight: The system improves through user corrections, not architectural perfection.

Architecture
decision-model.md (initial hypothesis — best guess of my decision patterns)
    ↓
System proposes action/suggestion
    ↓
I accept / reject / modify
    ↓
DecisionLog records: (context, suggestion, actual_action, outcome, delta)
    ↓
Every divergence between suggestion and actual action = gradient signal
    ↓
Over 100+ decisions, the model converges toward my ACTUAL patterns
    ↓
Trust calibration: proactive suggestions ignored >70% initially
  → Feedback loop re-weights categories
  → Auto-demotion of consistently-ignored suggestion categories after 2-3 weeks

Convergence timeline:

StageTimelineSystem Behavior
Smart assistantWeek 1-2Follows rules, applies heuristics
Pattern learnerWeek 3-6Feedback loops active, starts adapting
Aligned partnerMonth 2-3100+ DecisionLog entries, suggestions become genuinely useful

OpenClaw Integration (Body vs Brain)

OpenClaw (Body)Jarvis (Brain)
RoleAlways-on daemon, personal automationEngineering pipeline, task dispatch
Runtime30-min heartbeat on VPSOn-demand (CLI or Telegram trigger)
ModelClaude Haiku 4.5 ($5/day cap)Claude Opus 4.6 (orchestration)
Platforms20+ messaging platforms, Telegram primaryClaude Code CLI, Plane
Skills100+ (calendar, gmail, plane-tasks)Engineering agents, code review
MemorySQLite journals (local)Convex (32 tables, shared)
BridgeNexus (shared knowledge base, embeddings, FTS5)

Handoff flow: OpenClaw detects engineering request → writes trigger file → handoff-watcher picks up → launches Claude Code session → Friday agent executes → Groot reviews → Telegram approval → PR created.

Multi-Agent Architecture (12+ Agents)

Core Team (5 VPS agents via pm2):

AgentRoleModelWhen
JarvisOrchestratorOpus 4.6On-demand
FridayLead EngineerSonnet 4.6Trigger-based
GrootCode ReviewerSonnet 4.6After every code change
DonnaExecutive IntelligenceHaiku 4.5Scheduled (7:30 AM, 10 PM)
BranSystem ObserverHaiku 4.5Always-on (health monitor)

Specialist Agents: Rocket (research), Vision (architecture), Engineering Agent (autonomous pipeline), Reviewer, Planner, Frontend/Backend/Mobile Dev, Security Auditor, Design Engineer, Test Writer.

Engineering Agent Pipeline (Autonomous)

Architecture
/work JARVI-42
  → Jarvis reads Plane task → moves to In Progress
  → Dispatches engineering-agent (Sonnet, 60 max tool calls)
  → Agent reads knowledge base + 72 learnings entries
  → Researches codebase → writes plan to temp file
  → Creates feature branch → implements changes
  → Runs quality gates (tsc, lint, build)
  → Self-reviews → commits → pushes branch
  → Returns AWAITING_APPROVAL
  → Jarvis dispatches Groot (reviewer)
  → If APPROVED: sends Telegram message with Approve/Reject buttons
  → I tap Approve on phone
  → Jarvis creates draft PR → updates Plane → cleans up

13-State Task Machine

Architecture
idle → queued → running → [intermediate states] → completed
                 ↓
             blocked → awaiting_approval → approved → completed
                 ↓
             rate_limited → (checkpoint saved) → resumed
                 ↓
             failed → (retry logic, 5 failure types) → running

Safety: Optimistic concurrency (version field), full audit trail (task_events), dry-run mode, kill switch, local presence flag (pauses VPS when I work locally), mid-task checkpoints, cost caps.

Convex Data Model (32 Tables)

Knowledge (6): knowledge_inbox, knowledge_items, topics, knowledge_connections, doc_insights, episodic_events

Agents (10): agent_state, agent_messages, agent_messages_dryrun, agent_decisions, agent_metrics, agent_learnings, agent_checkpoints, rate_limit_state, task_runtime_state, task_events

Memory & Scheduling (8): donna_config, donna_engagement, session_summaries, daily_budget, usage_cache, telegram_jobs, telegram_approvals, execution_log

Smart Home (2): smart_home_commands, smart_home_state

Plus: local_presence, openai_usage, and more

Framework Synthesis (3-Framework Hybrid)

FrameworkWhat We AdoptedWhat We Didn't
Hermes AgentConfidence decay, 5-layer memory abstraction, learning patternsFull architecture (too coupled)
gstackArtifact chaining (research → plan → code — each output becomes next input)CLI workflow (not agent-native)
Paperclip AIHeartbeat daemon (30 min), budget tracking, org chart of agentsMonolithic orchestrator

Continuous Learning System

Agent Learnings (72+ entries): Engineering agent reads agent-learnings.md at start of every task. Updated whenever: reviewer blocks code, I reject approval, quality gates fail. Format: What went wrong → Root cause → Rule. Tags enable context-aware injection.

Memory Write Authority: Only Reviewer/Jarvis writes to persistent memory. All other agents propose learnings via structured output. Jarvis reviews and decides what persists. Prevents multi-agent garbage — discovered this after conflicting, low-quality entries accumulated.

Cost Management

Two Claude accounts tracked (personal Max + team office):

5h UtilizationLevelBehavior
< 50%Full3-5 parallel agents
50-70%ModerateMax 2-3 parallel
70-85%ConservativeSingle agent only
85-95%EmergencyComplete current task only
> 95%PausedWrite handoff, defer all

Adaptive poller: 1h idle, 5m when agents active, exponential backoff on 429s.

Jarvis Dashboard (jarvis.rajdeepgupta.in)

Stack: Next.js 16.1, React 19.2, Tailwind CSS 4, shadcn/ui, Convex 1.33

Pages: /personal, /memory, /projects, /agents/[name] (7 agents), /agents/conversations, /docs, /observability, /focus, /nexus, /sessions/[date], /events, /smart-home

Data sources: Plane API (ISR 60s), Convex (real-time), markdown files copied during prebuild (sessions, subscriptions, goals, learning-log, agent-learnings, behavioral-rules).

Smart Home Integration

Command queue architecture:

Architecture
Dashboard/Jarvis/OpenClaw → queueCommand(source, command, payload) → Convex
  ↓ Home Hub (Python, pm2) polls every 2-3s via pollCommands()
  ↓ Executes via device APIs (tinytuya for Tuya, Tapo SDK, LG ThinQ, EZVIZ)
  ↓ reportResult(id, status, result, error) → Convex
  ↓ Dashboard shows real-time status (stale detection at >2 min)

15+ devices: 5 Tapo lights, 4 Tuya multi-gang switches, 2 Atomberg fans, 1 LG AC, 2 EZVIZ cameras.

Smart Scenes:

SceneWhat It Does
codingBlue strips 40%, room light off, fan speed 3
goodnightAll lights off, fans speed 2
movieLights off except bed back 20% purple, fan speed 2
wake_upWarm lights 60%, fan off

The "Second Brain Intelligence" Plan (Latest, April 2026)

ExecutionUnit abstraction (Tier 0 — must exist before any intelligence): All system operations unified under one tracking model with: input/output artifacts, tools_allowed, budget, state, checkpoint, memory_authority, goal_ancestry, feedback_signals.

5-Phase Timeline:

  • Phase 1F (1 week): ExecutionUnit, feedback signals, memory write authority
  • Phase 1G (2 weeks): Confidence decay, goal ancestry, failure replay, Nexus feedback loop
  • Phase 2A (2 weeks): Compound learning — OpenClaw skill self-improvement, Donna learning loop
  • Phase 2B (2 weeks): Rajdeep OS — decision model, role engine, delegation, proactive loop
  • Phase 2C (1 week): E2E + ship → then 4-week calibration period

#Part 5: Local AI Experiments — MLX on Apple Silicon

What We Tried (M1 Max 32GB)

ExperimentModelResult
OCRQwen2-VL-2B-4bit (mlx-vlm)Good for printed text
Text generationQwen3-4B-4bit (mlx-lm)Good for summarization
Multi-model debateQwen3-4B (3 agents debating NVDA)Worked! 1 round ~18s
Local embeddingsnomic-modernbert-embed-base-4bit44,000 tok/s throughput
Image generationFlux.1-schnell (mflux)Abandoned — 15GB download

MLX vs Alternatives

MLXOllama (MLX backend)llama.cppCloud API
Speed on Apple SiliconFastest (native)Same (uses MLX now)20-87% slowerNetwork-bound
VLM supportFullLimitedPartialFull
Fine-tuningYes (LoRA)NoNoNo
CostZeroZeroZeroPer-token

Verdict: Local models = playground and privacy-sensitive tasks, not production replacement at our scale. 4B models are the sweet spot for M1 Max — interactive speed, leaves room for other apps. 7B saturates resources.

Potential Use Cases

Use CaseModelBenefit
Entellect PDF OCRDOTS-OCR / DeepSeek-OCRZero cost, medical data stays local
Local embeddingsnomic-modernbert44k tok/s, zero API cost
Broker OS doc extractionDeepSeek-OCR + Qwen3-8BPrivacy for client documents
LoRA fine-tuningQwen3-8B on medical Q&ADomain-specific quality boost

#Part 6: Side Projects

Ticker — macOS Menu Bar Calendar

What: Native macOS app showing live meeting countdown in menu bar ("Standup in 23m" → "Standup NOW"). One-click Zoom/Meet join.

Stack: Swift 5.9, SwiftUI (native macOS, no Electron).

Business model: Free tier + Pro ($4.99 one-time). Competes with Fantastical, Dato, Meeter at the lowest price point.

Status: Production (v0.3.0), DMG available. Website: gettickerapp.com.

LeadMapsHub — Google Maps Lead Scraper

What: Chrome extension that extracts business data (names, phones, addresses, ratings, websites) from Google Maps with auto-scroll, enrichment, and multi-export (CSV, Excel, JSON).

Stack: Vanilla JS, Shadow DOM, Chrome Manifest V3. No server needed — runs entirely in browser.

Status: Development. Competes with Outscraper ($2/1000 leads), Scrap.io ($49/mo) at a lower price.

#Part 7: AI Model Usage Across All Projects

Model Selection Matrix

ModelProviderUsed InPurposeWhy This Model
Claude Opus 4.6AnthropicJarvis orchestratorComplex reasoning, task routingHighest quality for critical decisions
Claude Sonnet 4.6AnthropicJarvis agents, Entellect RAG, Raven analysisCode gen, generation, tradingBest quality-cost ratio
Claude Haiku 4.5AnthropicOpenClaw, Donna, Raven screeningBriefings, classification, vetoLow cost for high-volume tasks
GPT-4o-miniOpenAIEntellect rerankingRelevance scoring (conditional)Good at reranking, cheaper than Claude
text-embedding-3-smallOpenAIEntellect embeddingsChunk vectors (1536-dim)Proven quality for medical text
Gemini 2.0 FlashGoogleEntellect PDF extractionVision-based document parsingBest vision for complex layouts
DeepSeek V3DeepSeekRaven LLM ensembleTrading veto (diversity)Reduces single-model bias
XGBoost → ONNXLocalRaven signal generationTrading signal predictionBest for tabular data, fast CPU training
Qwen3-4B (MLX)LocalExperimentsText gen, multi-model debateBest 4B for Apple Silicon
nomic-modernbert (MLX)LocalExperimentsLocal embeddings44k tok/s, Matryoshka dims

Model Selection Philosophy

  1. Cheapest model that works. Haiku for classification ($0.25/M), Sonnet for generation ($3/M), Opus only for orchestration ($15/M) — 90%+ savings.
  2. Right provider for each task. OpenAI for embeddings, Gemini for vision, Claude for reasoning, DeepSeek for ensemble diversity.
  3. Local for privacy. Medical data (Entellect) and client docs (Broker OS) benefit from on-device processing.
  4. Train your own when task is specific. XGBoost for trading signals — tabular data, specialized model outperforms general LLMs.
  5. Ensemble for reliability. Raven uses Claude + DeepSeek together — reduces single-model bias.

#Part 8: Infrastructure & Services

Complete Service Map

CategoryServicePurposeUsed By
HostingVercelAll web apps (8 projects)All
VPSHetzner CPX22 (Helsinki)Agents, trading bot, smart homeJarvis, Raven
DNS/CDNCloudflareDNS, DDoS, R2 storage, WorkersBroker OS
DatabaseConvexReal-time (agents, messaging)Jarvis, Broker OS
DatabaseNeon PostgreSQLRelational + pgvectorPlatform, Entellect
DatabaseSupabaseLegacy platform dataPlatform
DatabaseSQLiteFTS5 index (2,321 sections)Jarvis
AIAnthropic (Claude)Primary LLMAll AI projects
AIOpenAIEmbeddings + rerankingEntellect
AIGoogle (Gemini)Vision/PDF extractionEntellect, 11sqft AI
AIDeepSeekTrading ensemble diversityRaven
AILocal MLXPrivacy processingExperiments
AIONNX RuntimeML model inferenceRaven
AuthFirebasePhone OTPBroker OS, Properties, Mobile
MessagingAiSensyWhatsApp Business APIBroker OS
MessagingTelegram Bot APINotifications, approvalsJarvis
MapsMapbox GL + Google MapsProperty maps, geocodingPlatform, Properties
Project MgmtPlane (self-hosted API)Task tracking (8 projects)All
MonitoringSentryError trackingBroker OS, Platform
AnalyticsGA4 + MixpanelUser analyticsBroker OS
Smart HomeTapo, Tuya, Atomberg, LG, EZVIZIoT device controlJarvis
TestingVitest + PlaywrightUnit + E2EBroker OS

Why Cloudflare + Vercel Together?

  • Vercel: Application hosting, serverless functions, CI/CD
  • Cloudflare: DNS management, DDoS protection, R2 object storage (cheaper than S3), Image CDN optimization, Workers for edge logic (short-link redirects)
  • Complementary: Vercel handles compute, Cloudflare handles CDN/storage/DNS

#Part 9: Key Lessons & Pivots

  1. Infrastructure spiral is real. 50+ sessions building Jarvis infra, all personal agents stopped. Fix: "Make Jarvis Usable" — activate before building more.

  2. Polling is expensive, events are cheap. Jarvis v1 burned 7+ sessions/day polling with nothing to dispatch. Fix: event-driven dispatch (zero idle cost).

  3. Rule-based classification saves real money. Entellect's query classifier uses zero LLM tokens — just keyword matching. Not every AI task needs an LLM.

  4. PDF extraction is harder than expected. Gemini Vision batch extraction fails after page 56. Native pdftotext at 300x speed with hybrid LLM refinement won.

  5. The cheapest model that works is the best model. Haiku at $0.25/M vs Opus at $15/M — 90%+ savings without quality loss on simple tasks.

  6. Multi-agent garbage is real. Multiple agents writing to shared memory created conflicts. Fix: only Reviewer/Jarvis writes — others propose.

  7. Walk-forward validation matters. Without purge buffers and proper train/test splits, XGBoost accuracy was artificially inflated. Lopez de Prado methodology fixed this.

  8. Always research before building. Weeks spent on PineScript indicators before realizing the approach was fundamentally flawed. Deep research (ArXiv, GitHub) saves weeks.

  9. Gradient descent as system design. Don't optimize architecture — optimize the feedback loop. Every feature must answer: "What signal does this create?"

  10. Signal generation was the real bottleneck. Raven v1-v2 had good intelligence layers but produced 0-2 signals/month. ML-generated signals (4.9/day) solved this.

#Part 10: What's Next

ProjectNext Milestone
Broker OSTHE BET — distribution > features. WhatsApp-first onboarding.
EntellectPhase B completion, clinical casebook, drug system, LoRA fine-tuning experiments
RavenCNN-LSTM model (Conv1D + LSTM → softmax), LightGBM/TabNet ensemble, testnet live trading
JarvisExecutionUnit table, DecisionLog corpus to 100+, trust calibration at >50% acceptance
TickerApp Store launch, Pro tier activation

Long-term vision:

  • Jarvis as Rajdeep OS — morning briefing → task triage → engineering dispatch → email → learning → evening wrap-up, all autonomous
  • Entellect as platform — expand beyond ENT to other medical specialties (same RAG, different knowledge)
  • Raven live trading — graduate from testnet after calibration
  • 11sqft as THE broker SaaS — "Shopify for Indian real estate brokers"

Production URLs

GitHub Repositories

Document generated April 4, 2026. Covers work from January to April 2026 across 11 active repositories.