Khurshid Answers
AI · Chatbot Platform
Duration
10 weeks
Team
4 engineers
Year
2024
Industry
AI
Measurable Impact
Project Overview
Khurshid Answers is an AI-powered knowledge platform built for scholars, researchers, and students who need instant, cited answers from large collections of Islamic texts, books, PDFs, and documents. Instead of manually searching through hundreds of books, users simply ask questions in natural language and receive accurate, sourced answers in seconds.
The Challenge
The client had an extensive library of scanned PDFs, digitized books, and scholarly articles — totalling thousands of documents. Traditional keyword search was ineffective: it returned too many irrelevant results and couldn't understand the semantic intent behind questions. Users needed contextual answers with source citations, not just search results. The system also needed to handle Arabic and Urdu alongside English.
Our Solution
We built a Retrieval-Augmented Generation (RAG) pipeline using OpenAI's embedding model to vectorize all documents and store them in Pinecone. When a user asks a question, the system retrieves the most semantically relevant document chunks, feeds them as context to GPT-4, and generates a precise answer with direct source citations. We wrapped this in a clean Next.js web app with multi-language support.
The Transformation
- Hours of manual searching through physical books
- Keyword search returning irrelevant results
- No source citation for any answers
- English-only tools for Arabic and Urdu texts
- Scholars sharing notes over WhatsApp groups
- No searchable digital document library
- Precise, cited answers in under 2 seconds
- Semantic search understands intent, not just words
- Every answer cites exact book, chapter, and page
- Full Arabic, Urdu, and English language support
- Academic-grade Q&A platform for 100K+ queries
- All documents indexed and searchable instantly
Key Features
RAG-Powered Q&A
Semantic search across thousands of documents. The system understands the meaning of your question, not just keywords — delivering contextually accurate answers.
Source Citations
Every answer includes direct citations — the exact book, chapter, and page number where the information was found — so users can verify and explore further.
Multi-Language Support
Full support for English, Arabic, and Urdu queries and document ingestion — enabling scholars to work across language boundaries seamlessly.
Document Ingestion Pipeline
Automated pipeline to process, chunk, embed, and index new PDFs and documents — admins can upload new sources and they're searchable within minutes.
Conversation History
Users can maintain multi-turn conversations, ask follow-up questions, and revisit past searches — with full session history stored per user.
Access Control & API
Role-based access for public users, scholars, and administrators. Public API for third-party integrations with rate limiting and key management.
How We Built It
RAG Architecture Design
Weeks 1–2Evaluated vector database options, selected OpenAI embeddings + Pinecone, and designed the chunking and retrieval strategy for multi-language documents.
Document Ingestion Pipeline
Weeks 3–4Built automated ingestion: OCR for scanned PDFs, intelligent text chunking, embedding generation, and Pinecone indexing — supporting Arabic, Urdu, and English.
AI Q&A Engine
Weeks 5–7Connected retrieval to GPT-4, engineered prompts for citation-accurate responses, and built the conversation memory layer for multi-turn scholarly queries.
Web Interface
Weeks 8–9Built the Next.js chat interface with real-time streaming responses, source citation viewer, session history, and admin document management dashboard.
Optimization & Launch
Week 10Fine-tuned chunking parameters, ran accuracy benchmarks against a curated QA test set, and deployed on AWS Lambda for auto-scaling under load.
Tech Stack
Key Results
- Built and deployed a full RAG pipeline across thousands of scholarly documents
- Achieved sub-2-second response time with 95% answer accuracy verified by domain experts
- Processed over 100,000 user queries in the first six months post-launch
- Enabled multi-language support across English, Arabic, and Urdu
- Reduced research time for scholars from hours to seconds per query
Impact Metrics
Visual Walkthrough
Project Screens
Clean chat interface — ask any question, get cited answers instantly
Clean chat interface — ask any question, get cited answers instantly
Document ingestion dashboard — upload and index new sources in minutes
Document ingestion dashboard — upload and index new sources in minutes
Answers displayed with direct source citations and confidence scores
Answers displayed with direct source citations and confidence scores
“What would take a scholar hours of manual research now happens in seconds. The accuracy and the citation system are exactly what the scholarly community needed.”
Khurshid
Founder · Khurshid Answers
Ready to build something like this?
Let's talk about your project. We'll put together a free strategy plan tailored to your goals.
