RAG PDF Indexing Experiments (testRAG / testRAG_PDFFolder)

Compared single-PDF vs folder ingest pipelines, extraction quirks, and recall@k on held-out questions with identical embedding settings.

Year 2026

Stack

What was my role?

I was the sole implementer for this coursework module: scoped requirements, wrote the code and experiments for “RAG PDF Indexing Experiments (testRAG / testRAG_PDFFolder),” and produced the write-up with metrics and limitations.

Situation

Course lab for “RAG PDF Indexing Experiments (testRAG / testRAG_PDFFolder)”: short deadlines, public or synthetic data, and rubrics that reward reproducible notebooks and honest limitations.

Task

Produce a small credible artifact—clean repo or notebook—with baselines, evaluation, and a crisp story of what would change in production.

Action

Implemented end-to-end (Compared single-PDF vs folder ingest pipelines, extraction quirks, and recall@k on held-out questions with identical embedding settings.), logged experiments, compared alternatives, and documented dependencies plus failure cases.

Result

Submitted a runnable deliverable with metrics, repeatable setup commands, and a trade-off section suitable for extending to real systems.