← Tutorial home

Activity C: Grounded Live-Corpus

A hands-on Jupyter notebook that demonstrates why some questions can only be answered by searching the live open web — and how to do that reliably using DuckDuckGo Search, a free, privacy-focused search library that requires no API key or account.

Open in Google Colab Colab T4 GPU or laptop CPU

How to run this notebook

Option A — Google Colab (easier)

  1. Click the Open in Colab button above.
  2. In the top-left menu choose File → Open in Colab.
    Use a personal Google account rather than a work or university account — institutional accounts sometimes block third-party Colab access.
  3. Make it editable: File → Save a copy in Drive.
    The shared notebook is view-only; saving a copy to your own Drive gives you a personal, fully editable version.
  4. Enable a free GPU: Runtime → Change runtime type → T4 GPU → Save.
    A GPU is not required (the default model runs on CPU) but speeds up generation.
  5. Run cells top-to-bottom with Runtime → Run all, or step through them with Shift+Enter.

Option B — Local Jupyter (laptop)

  1. Download the notebook from the Drive link (File → Download → Download .ipynb).
  2. Install dependencies: pip install transformers torch ddgs
  3. Open with jupyter notebook activity_c.ipynb or in VS Code.
  4. The default model (Qwen2-0.5B-Instruct) and DuckDuckGo search both work without GPU or API keys.

What you will learn

  • Why LLMs alone, structured APIs, and fixed-corpus RAG all fail for questions that require today's information.
  • How to use the ddgs library as a free, keyless live search tool — no signup, no rate-limit headaches.
  • How to distil a natural-language question into a short search phrase using the LLM before querying.
  • How to build a grounded live-search prompt that instructs the LLM to cite source URLs.
  • How to handle heterogeneous source quality and conflicting snippets from the open web.
  • What evaluation looks like at Level 4: freshness, source credibility, and answer faithfulness — all three required.

Complexity Ladder level covered

LevelNameKey property
4 Grounded Live-Corpus Open web retrieval for freshness. One-shot search → multi-source synthesis → cited answer. Evaluation requires freshness check, source credibility check, and answer faithfulness check.

Notebook outline

  • Setup: install 3 packages (transformers, torch, ddgs); load Qwen2-0.5B-Instruct (CPU-friendly default; swap to Phi-3.5-mini on T4).
  • Part 1 — LLM alone fails: ask about a recent event (Super Bowl 2026) with no context → stale/hedged answer; explain training cutoff.
  • Part 2 — DuckDuckGo live search: LLM distils question to a short phrase → web_search() fetches live results → inject as context → LLM gives current cited answer → repeat for 3 diverse live questions.
  • Part 3 — Comparison: same live question fails on a hardcoded static corpus; decision table for when to use Level 2 vs. 3 vs. 4.
  • Part 4 — Source quality: inspect domain diversity across results; handle a deliberately ambiguous query where sources disagree.
  • Part 5 — Recap: full Complexity Ladder table comparing Levels 2–5; Level 4 → 5 transition explained.

Requirements

No API keys or signups needed:

  • Live search: ddgs — free Python library, no account, no key. Install with pip install ddgs.
  • Heavier experiments (optional): for higher query volume or a private unthrottled instance, self-host SearXNG locally with docker run -d -p 8080:8080 searxng/searxng.
  • Default model: Qwen/Qwen2-0.5B-Instruct — 0.5 B params, ~1 GB, runs on CPU.
  • GPU upgrade (optional): microsoft/Phi-3.5-mini-instruct — swap in the first code cell when running on Colab T4.

Citation

If you use this tutorial in your work or teaching, please cite:

@inproceedings{dammu2026information,
  title={Information Seeking in the Age of Agentic AI: A Half-Day Tutorial},
  author={Dammu, Preetam Prabhu Srikar and Roosta, Tanya},
  booktitle={Proceedings of the 2026 Conference on Human Information Interaction and Retrieval},
  pages={429--430},
  year={2026}
}

View on ACM DL  ·  Contact: Preetam Dammu <preetams@uw.edu>, PhD Candidate, University of Washington