Activity B: Structured Lookup & Grounded Closed-Corpus
A hands-on Jupyter notebook that demonstrates what goes wrong when an LLM is asked to answer information-seeking questions without the right information channel — and what goes right when it is given one. Two levels of the Complexity Ladder are covered in depth.
How to run this notebook
Option A — Google Colab (easier)
- Click the button above to open the notebook in Google Drive.
-
In the top-left menu choose File → Open in Colab.
Use a personal Google account rather than a work or university account — institutional accounts sometimes block third-party Colab access. -
Make it editable: File → Save a copy in Drive.
The shared notebook is view-only; saving a copy to your own Drive gives you a personal, fully editable version. All subsequent changes and outputs are saved there. -
Enable a free GPU: Runtime → Change runtime type → T4 GPU → Save.
A GPU is not required (the default model runs fine on CPU) but speeds up the RAG embedding step significantly. - Run cells top-to-bottom with Runtime → Run all, or step through them one by one with Shift+Enter.
Option B — Local Jupyter (laptop)
- Download the notebook from the link above (File → Download → Download .ipynb).
- Install dependencies:
pip install transformers torch datasets rank_bm25 sentence-transformers requests - Open with
jupyter notebook activity_b.ipynbor in VS Code. - The default model (
Qwen2-0.5B-Instruct) runs on CPU; no GPU or API key needed.
What you will learn
- Why an LLM alone cannot reliably answer live or structured-value queries, and how a single API call fixes it.
- How to build a minimal RAG pipeline (retrieve → prompt → generate) over a closed document corpus.
- How BM25 and dense (semantic) retrieval differ, when each wins, and how the choice of retriever changes the LLM's answer.
- What evaluation looks like at each level: trivial ground-truth comparison (Level 2) vs retrieval-hit + faithfulness checks (Level 3).
Complexity Ladder levels covered
| Level | Name | Key property |
|---|---|---|
| 2 | Structured Lookup | Single clean value from a deterministic API (weather, rates, time). Freshness handled by the provider; minimal LLM synthesis. |
| 3 | Grounded Closed-Corpus | Answer extracted from a fixed document set (RAG). Quality depends on chunking → retrieval → prompting. Evaluation requires retrieval-hit and faithfulness checks. |
Notebook outline
- Setup: install dependencies, load
Qwen2-0.5B-Instruct(CPU-friendly default; swap to Phi-3.5-mini on T4 for richer outputs). - Part 1 — Structured Lookup: LLM-only weather query (fails) → Open-Meteo API call (free, no key) → LLM with injected API context (succeeds).
- Part 2 — Grounded Closed-Corpus: load SQuAD Wikipedia passages as a closed corpus → LLM-only (hallucination) → RAG with BM25 → RAG with dense embeddings (all-MiniLM-L6-v2) → side-by-side retriever comparison across 3 questions.
- Part 3 — Recap: summary table comparing both levels across information source, freshness, synthesis, and evaluation dimensions.
Requirements
No API keys or signups needed. Everything used is free and public:
- Weather API: open-meteo.com — open-source, no key required.
- Corpus: rajpurkar/squad on HuggingFace — public dataset, no login.
- Default model: Qwen/Qwen2-0.5B-Instruct — 0.5 B params, ~1 GB, runs on CPU.
- GPU upgrade (optional): microsoft/Phi-3.5-mini-instruct — swap in the first code cell when running on Colab T4.
Citation
If you use this tutorial in your work or teaching, please cite:
@inproceedings{dammu2026information,
title={Information Seeking in the Age of Agentic AI: A Half-Day Tutorial},
author={Dammu, Preetam Prabhu Srikar and Roosta, Tanya},
booktitle={Proceedings of the 2026 Conference on Human Information Interaction and Retrieval},
pages={429--430},
year={2026}
}
View on ACM DL · Contact: Preetam Dammu <preetams@uw.edu>, PhD Candidate, University of Washington