Pixelpk
Khurshid Answers
AI / Knowledge Management

Khurshid Answers

AI · Chatbot Platform

Duration

10 weeks

Team

4 engineers

Year

2024

Industry

AI

Measurable Impact

<2s
Answer Time
95%
Answer Accuracy
100K+
Queries Processed
3
Languages Supported
Answer Accuracy (Expert-Verified)95%
Query Response Time<2 seconds
Research Time Reduction Per Query~99%
Total Queries Processed100K+
Language Coverage3 Languages

Project Overview

Khurshid Answers is an AI-powered knowledge platform built for scholars, researchers, and students who need instant, cited answers from large collections of Islamic texts, books, PDFs, and documents. Instead of manually searching through hundreds of books, users simply ask questions in natural language and receive accurate, sourced answers in seconds.

The Challenge

The client had an extensive library of scanned PDFs, digitized books, and scholarly articles — totalling thousands of documents. Traditional keyword search was ineffective: it returned too many irrelevant results and couldn't understand the semantic intent behind questions. Users needed contextual answers with source citations, not just search results. The system also needed to handle Arabic and Urdu alongside English.

Our Solution

We built a Retrieval-Augmented Generation (RAG) pipeline using OpenAI's embedding model to vectorize all documents and store them in Pinecone. When a user asks a question, the system retrieves the most semantically relevant document chunks, feeds them as context to GPT-4, and generates a precise answer with direct source citations. We wrapped this in a clean Next.js web app with multi-language support.

The Transformation

Before
  • Hours of manual searching through physical books
  • Keyword search returning irrelevant results
  • No source citation for any answers
  • English-only tools for Arabic and Urdu texts
  • Scholars sharing notes over WhatsApp groups
  • No searchable digital document library
After Pixelpk
  • Precise, cited answers in under 2 seconds
  • Semantic search understands intent, not just words
  • Every answer cites exact book, chapter, and page
  • Full Arabic, Urdu, and English language support
  • Academic-grade Q&A platform for 100K+ queries
  • All documents indexed and searchable instantly

Key Features

RAG-Powered Q&A

Semantic search across thousands of documents. The system understands the meaning of your question, not just keywords — delivering contextually accurate answers.

Source Citations

Every answer includes direct citations — the exact book, chapter, and page number where the information was found — so users can verify and explore further.

Multi-Language Support

Full support for English, Arabic, and Urdu queries and document ingestion — enabling scholars to work across language boundaries seamlessly.

Document Ingestion Pipeline

Automated pipeline to process, chunk, embed, and index new PDFs and documents — admins can upload new sources and they're searchable within minutes.

Conversation History

Users can maintain multi-turn conversations, ask follow-up questions, and revisit past searches — with full session history stored per user.

Access Control & API

Role-based access for public users, scholars, and administrators. Public API for third-party integrations with rate limiting and key management.

How We Built It

1

RAG Architecture Design

Weeks 1–2

Evaluated vector database options, selected OpenAI embeddings + Pinecone, and designed the chunking and retrieval strategy for multi-language documents.

Vector DB evaluationChunking strategy docRAG pipeline architecture
2

Document Ingestion Pipeline

Weeks 3–4

Built automated ingestion: OCR for scanned PDFs, intelligent text chunking, embedding generation, and Pinecone indexing — supporting Arabic, Urdu, and English.

OCR + text extractionMulti-language embedding pipelinePinecone index build
3

AI Q&A Engine

Weeks 5–7

Connected retrieval to GPT-4, engineered prompts for citation-accurate responses, and built the conversation memory layer for multi-turn scholarly queries.

GPT-4 integrationCitation extraction logicConversation memory store
4

Web Interface

Weeks 8–9

Built the Next.js chat interface with real-time streaming responses, source citation viewer, session history, and admin document management dashboard.

Chat UI (streaming)Citation viewerAdmin ingestion dashboard
5

Optimization & Launch

Week 10

Fine-tuned chunking parameters, ran accuracy benchmarks against a curated QA test set, and deployed on AWS Lambda for auto-scaling under load.

Accuracy benchmarks (500+ QA pairs)AWS Lambda deploymentProduction launch

Tech Stack

Next.jsPythonFastAPIOpenAI GPT-4OpenAI EmbeddingsPineconeLangChainPostgreSQLAWS Lambda

Key Results

  • Built and deployed a full RAG pipeline across thousands of scholarly documents
  • Achieved sub-2-second response time with 95% answer accuracy verified by domain experts
  • Processed over 100,000 user queries in the first six months post-launch
  • Enabled multi-language support across English, Arabic, and Urdu
  • Reduced research time for scholars from hours to seconds per query

Impact Metrics

<2s
Answer Time
95%
Answer Accuracy
100K+
Queries Processed
3
Languages Supported

Visual Walkthrough

Project Screens

AI chatbot interface

Clean chat interface — ask any question, get cited answers instantly

Expand
1

Clean chat interface — ask any question, get cited answers instantly

Document processing pipeline

Document ingestion dashboard — upload and index new sources in minutes

2

Document ingestion dashboard — upload and index new sources in minutes

Search results with citations

Answers displayed with direct source citations and confidence scores

3

Answers displayed with direct source citations and confidence scores

What would take a scholar hours of manual research now happens in seconds. The accuracy and the citation system are exactly what the scholarly community needed.

Khurshid

Founder · Khurshid Answers

Ready to build something like this?

Let's talk about your project. We'll put together a free strategy plan tailored to your goals.

View More Work