AI / Knowledge Management

Khurshid Answers

AI · Chatbot Platform

Duration

10 weeks

Team

4 engineers

Year

2024

Industry

Measurable Impact

<2s

Answer Time

95%

Answer Accuracy

100K+

Queries Processed

Languages Supported

Answer Accuracy (Expert-Verified)95%

Query Response Time<2 seconds

Research Time Reduction Per Query~99%

Total Queries Processed100K+

Language Coverage3 Languages

Project Overview

Khurshid Answers is an AI-powered knowledge platform built for scholars, researchers, and students who need instant, cited answers from large collections of Islamic texts, books, PDFs, and documents. Instead of manually searching through hundreds of books, users simply ask questions in natural language and receive accurate, sourced answers in seconds.

The Challenge

The client had an extensive library of scanned PDFs, digitized books, and scholarly articles — totalling thousands of documents. Traditional keyword search was ineffective: it returned too many irrelevant results and couldn't understand the semantic intent behind questions. Users needed contextual answers with source citations, not just search results. The system also needed to handle Arabic and Urdu alongside English.

Our Solution

We built a Retrieval-Augmented Generation (RAG) pipeline using OpenAI's embedding model to vectorize all documents and store them in Pinecone. When a user asks a question, the system retrieves the most semantically relevant document chunks, feeds them as context to GPT-4, and generates a precise answer with direct source citations. We wrapped this in a clean Next.js web app with multi-language support.

The Transformation

Before

Hours of manual searching through physical books
Keyword search returning irrelevant results
No source citation for any answers
English-only tools for Arabic and Urdu texts
Scholars sharing notes over WhatsApp groups
No searchable digital document library

After Pixelpk

Precise, cited answers in under 2 seconds
Semantic search understands intent, not just words
Every answer cites exact book, chapter, and page
Full Arabic, Urdu, and English language support
Academic-grade Q&A platform for 100K+ queries
All documents indexed and searchable instantly

Key Features

RAG-Powered Q&A

Semantic search across thousands of documents. The system understands the meaning of your question, not just keywords — delivering contextually accurate answers.

Source Citations

Every answer includes direct citations — the exact book, chapter, and page number where the information was found — so users can verify and explore further.

Multi-Language Support

Full support for English, Arabic, and Urdu queries and document ingestion — enabling scholars to work across language boundaries seamlessly.

Document Ingestion Pipeline

Automated pipeline to process, chunk, embed, and index new PDFs and documents — admins can upload new sources and they're searchable within minutes.

Conversation History

Users can maintain multi-turn conversations, ask follow-up questions, and revisit past searches — with full session history stored per user.

Access Control & API

Role-based access for public users, scholars, and administrators. Public API for third-party integrations with rate limiting and key management.

How We Built It

RAG Architecture Design

Weeks 1–2

Evaluated vector database options, selected OpenAI embeddings + Pinecone, and designed the chunking and retrieval strategy for multi-language documents.

✓ Vector DB evaluation✓ Chunking strategy doc✓ RAG pipeline architecture

Document Ingestion Pipeline

Weeks 3–4

Built automated ingestion: OCR for scanned PDFs, intelligent text chunking, embedding generation, and Pinecone indexing — supporting Arabic, Urdu, and English.

✓ OCR + text extraction✓ Multi-language embedding pipeline✓ Pinecone index build

AI Q&A Engine

Weeks 5–7

Connected retrieval to GPT-4, engineered prompts for citation-accurate responses, and built the conversation memory layer for multi-turn scholarly queries.

✓ GPT-4 integration✓ Citation extraction logic✓ Conversation memory store

Web Interface

Weeks 8–9

Built the Next.js chat interface with real-time streaming responses, source citation viewer, session history, and admin document management dashboard.

✓ Chat UI (streaming)✓ Citation viewer✓ Admin ingestion dashboard

Optimization & Launch

Week 10

Fine-tuned chunking parameters, ran accuracy benchmarks against a curated QA test set, and deployed on AWS Lambda for auto-scaling under load.

✓ Accuracy benchmarks (500+ QA pairs)✓ AWS Lambda deployment✓ Production launch

Tech Stack

Next.jsPythonFastAPIOpenAI GPT-4OpenAI EmbeddingsPineconeLangChainPostgreSQLAWS Lambda

Key Results

Built and deployed a full RAG pipeline across thousands of scholarly documents
Achieved sub-2-second response time with 95% answer accuracy verified by domain experts
Processed over 100,000 user queries in the first six months post-launch
Enabled multi-language support across English, Arabic, and Urdu
Reduced research time for scholars from hours to seconds per query

Impact Metrics

<2s

Answer Time

95%

Answer Accuracy

100K+

Queries Processed

Languages Supported

Visual Walkthrough

Project Screens

3 screens

Clean chat interface — ask any question, get cited answers instantly

Expand

Clean chat interface — ask any question, get cited answers instantly

Document ingestion dashboard — upload and index new sources in minutes

Answers displayed with direct source citations and confidence scores

“What would take a scholar hours of manual research now happens in seconds. The accuracy and the citation system are exactly what the scholarly community needed.”

Khurshid

Founder · Khurshid Answers

Ready to build something like this?

Let's talk about your project. We'll put together a free strategy plan tailored to your goals.

View More Work