RAG Evaluation with RAGAS (ragasPAR)
Faithfulness and context-precision sweeps across chunking choices, logged per query, with failure buckets for low-score generations.
What was my role?
I was the sole implementer for this coursework module: scoped requirements, wrote the code and experiments for “RAG Evaluation with RAGAS (ragasPAR),” and produced the write-up with metrics and limitations.
Situation
Course lab for “RAG Evaluation with RAGAS (ragasPAR)”: short deadlines, public or synthetic data, and rubrics that reward reproducible notebooks and honest limitations.
Task
Produce a small credible artifact—clean repo or notebook—with baselines, evaluation, and a crisp story of what would change in production.
Action
Implemented end-to-end (Faithfulness and context-precision sweeps across chunking choices, logged per query, with failure buckets for low-score generations.), logged experiments, compared alternatives, and documented dependencies plus failure cases.
Result
Submitted a runnable deliverable with metrics, repeatable setup commands, and a trade-off section suitable for extending to real systems.