Research Index

Overview

Research map: three pillars, active projects, and deployment-first hypotheses.

01 Agentic Engineering

Lab results mean nothing if they don't survive deployment.

Anveshak

Failed coding agent runs contain real diagnostic work that gets thrown away. We build recovery agents that read those traces and finish the job at a fraction of the cost.

Active Research

DANDI

Enterprise AI agents act on contradictory data across systems and never notice. We build agents that traverse Slack, CRM, and contracts to find and reconcile those conflicts.

Active Research

SUTRAM

Security logs contain evidence of attacks that no pre-built rule anticipated. We investigate whether an AI system can review yesterday's logs and surface attack chains on its own.

Active Research

All agentic engineering projects →

02 Evaluation & Benchmarks

You can't build what you can't measure.

Hindsight

Document AI models extract the right values but assign them to the wrong fields, and no benchmark measures this. We build an evaluation that tests concept-binding across paystubs, invoices, contracts, and tax forms.

Active Research

Kshamta

Vibe coding tools produce apps that work in demos but fail on mobile, security, and accessibility. We build a benchmark that scores the running app in the browser, not the generated code.

Active Research

SMRITI

AI memory systems benchmark on recall, not on whether they actually change behavior for each user. We build an evaluation that tests whether systems adapt tone, timing, and restraint across sessions.

Active Research

All evaluation & benchmarks projects →

03 Model Training & Efficiency

The frontier isn't only about scale.

BodhiLekhan

Handwriting recognition treats every page as if it came from a stranger, even for writers it has seen before. We investigate persistent writer adaptation that learns once and recognizes at full speed from then on.

Active Research

Dhaatu

Multilingual AI requires expensive retraining to reason in each new language. We investigate whether reasoning and language grounding are separable, so one core model serves many languages by swapping only embeddings.

Active Research

Ekdant

Training an AI tutor by rewriting the entire model is expensive and risks degrading the reasoning it teaches. We explore architectures that separate teaching strategy from subject knowledge, so pedagogy trains cheaply and ports across models.

Active Research

All model training & efficiency projects →