Multimodal AI Intelligence Platform (MMAP) is a production, full-stack platform for retrieval-augmented chat over text, PDFs, images, audio, and video. You upload anything readable; it extracts the content, embeds and indexes it, builds a live knowledge graph from your uploads, and lets you chat over everything with grounded, cited answers — optionally augmented with fresh web search. It is deployed and live at projectmmap.com, running entirely on free-tier infrastructure.
• Multimodal ingest — PDFs, images, audio, video, text, and markdown
in one shared vector space (100 MB upload cap)
• Vision + OCR — RapidOCR plus a vision-language model give every
image a searchable, summarized representation
• Audio transcription — Groq Whisper turns recordings into citable,
retrievable text
• Video understanding — adaptive frame sampling (cv2) plus audio
extraction (ffmpeg) feed a single fused Nemotron VL call that cross-references what is said against
what is shown
• Knowledge graph — entities and relationships extracted per document,
visualized client-side, and used to ground answers
• Cited chat — streaming answers with chunk-level citations and a
per-answer subgraph of the entities the model used
• Per-question RAG & Web toggles — answer from your documents,
the model's own knowledge, fresh Tavily web results with clickable [W#] citations, or
any combination
• Strict and regular modes — strict mode fact-checks every answer
against documents and cited sources, withholding anything below the grounding threshold
• Persistent chat history — auto-titled, summarized, searchable
threads with citations intact and multi-turn follow-up context
• Answer verification — each response is checked against retrieved
context and flags unsupported claims
A Next.js browser app talks to a FastAPI service that handles auth, uploads, chat, graph, and RAG endpoints. The API stays light: all heavy lifting — OCR, ASR, embedding, vision, video frame sampling, summarization, and graph extraction — is offloaded to an arq worker so request latency stays bounded. State is split across purpose-built stores: Postgres for users and document metadata, Qdrant for vector search, Neo4j for the entity graph, Redis for the job queue and rate limiting, and S3-compatible object storage (Cloudflare R2 in prod, MinIO in dev) for raw uploads. Models are served via OpenRouter (Nemotron Nano 2 VL, DeepSeek), Groq (Whisper, Llama 3.3 70B), and Tavily for web search.
Backend: Python 3.12, FastAPI,
SQLAlchemy 2 (async), Alembic, Pydantic 2, asyncpg, arq,
sentence-transformers, qdrant-client, neo4j, LangGraph, OpenTelemetry,
prometheus-client
Frontend: Next.js 16 (App Router),
React 19, TypeScript, Tailwind CSS v4, shadcn/ui, Zustand, TanStack Query,
react-force-graph-2d
Data plane: Postgres, Qdrant,
Neo4j, Redis, S3-compatible storage (R2 / MinIO)
Models: OpenRouter (Nemotron Nano 2 VL, DeepSeek), Groq (Whisper Large v3 Turbo,
Llama 3.3 70B), Tavily
Testing: pytest, vitest + Testing Library, Playwright,
with a nightly workflow that spins up the full docker compose stack
MMAP is built as a deployable, observable, multi-service system. CI runs unit, type, and lint suites on every push; a nightly workflow brings up the full docker compose stack and runs the backend integration and Playwright e2e suites against it. Production runs across Vercel, Hugging Face Spaces, Neon, Qdrant Cloud, Neo4j AuraDB, Upstash Redis, and Cloudflare R2. Security spans bcrypt-hashed credentials, email verification, JWT access tokens, per-IP rate limiting, strict frontend response headers, and per-user isolation enforced at every storage layer (Postgres row scope, Qdrant payload filters, Neo4j scoping, namespaced object keys).