← Tutorial home

Activity B: Structured Lookup & Grounded Closed-Corpus

A hands-on Jupyter notebook that demonstrates what goes wrong when an LLM is asked to answer information-seeking questions without the right information channel — and what goes right when it is given one. Two levels of the Complexity Ladder are covered in depth.

Open notebook Colab T4 GPU recommended · also runs on laptop CPU

How to run this notebook

Option A — Google Colab (easier)

  1. Click the button above to open the notebook in Google Drive.
  2. In the top-left menu choose File → Open in Colab.
    Use a personal Google account rather than a work or university account — institutional accounts sometimes block third-party Colab access.
  3. Make it editable: File → Save a copy in Drive.
    The shared notebook is view-only; saving a copy to your own Drive gives you a personal, fully editable version. All subsequent changes and outputs are saved there.
  4. Enable a free GPU: Runtime → Change runtime type → T4 GPU → Save.
    A GPU is not required (the default model runs fine on CPU) but speeds up the RAG embedding step significantly.
  5. Run cells top-to-bottom with Runtime → Run all, or step through them one by one with Shift+Enter.

Option B — Local Jupyter (laptop)

  1. Download the notebook from the link above (File → Download → Download .ipynb).
  2. Install dependencies: pip install transformers torch datasets rank_bm25 sentence-transformers requests
  3. Open with jupyter notebook activity_b.ipynb or in VS Code.
  4. The default model (Qwen2-0.5B-Instruct) runs on CPU; no GPU or API key needed.

What you will learn

  • Why an LLM alone cannot reliably answer live or structured-value queries, and how a single API call fixes it.
  • How to build a minimal RAG pipeline (retrieve → prompt → generate) over a closed document corpus.
  • How BM25 and dense (semantic) retrieval differ, when each wins, and how the choice of retriever changes the LLM's answer.
  • What evaluation looks like at each level: trivial ground-truth comparison (Level 2) vs retrieval-hit + faithfulness checks (Level 3).

Complexity Ladder levels covered

LevelNameKey property
2 Structured Lookup Single clean value from a deterministic API (weather, rates, time). Freshness handled by the provider; minimal LLM synthesis.
3 Grounded Closed-Corpus Answer extracted from a fixed document set (RAG). Quality depends on chunking → retrieval → prompting. Evaluation requires retrieval-hit and faithfulness checks.

Notebook outline

  • Setup: install dependencies, load Qwen2-0.5B-Instruct (CPU-friendly default; swap to Phi-3.5-mini on T4 for richer outputs).
  • Part 1 — Structured Lookup: LLM-only weather query (fails) → Open-Meteo API call (free, no key) → LLM with injected API context (succeeds).
  • Part 2 — Grounded Closed-Corpus: load SQuAD Wikipedia passages as a closed corpus → LLM-only (hallucination) → RAG with BM25 → RAG with dense embeddings (all-MiniLM-L6-v2) → side-by-side retriever comparison across 3 questions.
  • Part 3 — Recap: summary table comparing both levels across information source, freshness, synthesis, and evaluation dimensions.

Requirements

No API keys or signups needed. Everything used is free and public:

Citation

If you use this tutorial in your work or teaching, please cite:

@inproceedings{dammu2026information,
  title={Information Seeking in the Age of Agentic AI: A Half-Day Tutorial},
  author={Dammu, Preetam Prabhu Srikar and Roosta, Tanya},
  booktitle={Proceedings of the 2026 Conference on Human Information Interaction and Retrieval},
  pages={429--430},
  year={2026}
}

View on ACM DL  ·  Contact: Preetam Dammu <preetams@uw.edu>, PhD Candidate, University of Washington