Activity B: Structured Lookup & Grounded Closed-Corpus

A hands-on Jupyter notebook that demonstrates what goes wrong when an LLM is asked to answer information-seeking questions without the right information channel — and what goes right when it is given one. Two levels of the Complexity Ladder are covered in depth.

Colab T4 GPU recommended · also runs on laptop CPU

How to run this notebook

Option A — Google Colab (easier)

Click the button above to open the notebook in Google Drive.
In the top-left menu choose File → Open in Colab.
Use a personal Google account rather than a work or university account — institutional accounts sometimes block third-party Colab access.
Make it editable: File → Save a copy in Drive.
The shared notebook is view-only; saving a copy to your own Drive gives you a personal, fully editable version. All subsequent changes and outputs are saved there.
Enable a free GPU: Runtime → Change runtime type → T4 GPU → Save.
A GPU is not required (the default model runs fine on CPU) but speeds up the RAG embedding step significantly.
Run cells top-to-bottom with Runtime → Run all, or step through them one by one with Shift+Enter.

Option B — Local Jupyter (laptop)

Download the notebook from the link above (File → Download → Download .ipynb).
Install dependencies: pip install transformers torch datasets rank_bm25 sentence-transformers requests
Open with jupyter notebook activity_b.ipynb or in VS Code.
The default model (Qwen2-0.5B-Instruct) runs on CPU; no GPU or API key needed.

What you will learn

Why an LLM alone cannot reliably answer live or structured-value queries, and how a single API call fixes it.
How to build a minimal RAG pipeline (retrieve → prompt → generate) over a closed document corpus.
How BM25 and dense (semantic) retrieval differ, when each wins, and how the choice of retriever changes the LLM's answer.
What evaluation looks like at each level: trivial ground-truth comparison (Level 2) vs retrieval-hit + faithfulness checks (Level 3).

Complexity Ladder levels covered

Level	Name	Key property
2	Structured Lookup	Single clean value from a deterministic API (weather, rates, time). Freshness handled by the provider; minimal LLM synthesis.
3	Grounded Closed-Corpus	Answer extracted from a fixed document set (RAG). Quality depends on chunking → retrieval → prompting. Evaluation requires retrieval-hit and faithfulness checks.

Notebook outline

Setup: install dependencies, load Qwen2-0.5B-Instruct (CPU-friendly default; swap to Phi-3.5-mini on T4 for richer outputs).
Part 1 — Structured Lookup: LLM-only weather query (fails) → Open-Meteo API call (free, no key) → LLM with injected API context (succeeds).
Part 2 — Grounded Closed-Corpus: load SQuAD Wikipedia passages as a closed corpus → LLM-only (hallucination) → RAG with BM25 → RAG with dense embeddings (all-MiniLM-L6-v2) → side-by-side retriever comparison across 3 questions.
Part 3 — Recap: summary table comparing both levels across information source, freshness, synthesis, and evaluation dimensions.

Requirements

No API keys or signups needed. Everything used is free and public:

Weather API: open-meteo.com — open-source, no key required.
Corpus: rajpurkar/squad on HuggingFace — public dataset, no login.
Default model: Qwen/Qwen2-0.5B-Instruct — 0.5 B params, ~1 GB, runs on CPU.
GPU upgrade (optional): microsoft/Phi-3.5-mini-instruct — swap in the first code cell when running on Colab T4.

Citation

If you use this tutorial in your work or teaching, please cite:

@inproceedings{dammu2026information,
  title={Information Seeking in the Age of Agentic AI: A Half-Day Tutorial},
  author={Dammu, Preetam Prabhu Srikar and Roosta, Tanya},
  booktitle={Proceedings of the 2026 Conference on Human Information Interaction and Retrieval},
  pages={429--430},
  year={2026}
}

View on ACM DL · Contact: Preetam Dammu <preetams@uw.edu>, PhD Candidate, University of Washington