Concurrent Batch Calls (bounded fan-out)
When you need answers for a thousand prompts, calling the model one at a time and waiting for each reply is painfully slow. Real apps fire many requests concurrently -> but not unlimited: every provider caps how many in-flight calls it allows, so you process the work in batches of a fixed size. Two facts make concurrency tricky, and both show up here.
- Results come back out of order. A short prompt may finish before a long one you sent first. Your code must reassemble the answers by their original input position, not by whoever finished first.
- Throughput is bounded by the cap. With
nprompts and a cap ofc, you runceil(n / c)batches -> the number every capacity plan is built around.
A production version would use asyncio and a real network client. That is non-deterministic and untestable, so here you model the same shape with plain functions and a scheduling loop -> no event loop, no network, no real model. Here is the real shape, for reference, the same logic with a live event loop:
import asyncio
async def run_batch(prompts, worker, cap=4):
sem = asyncio.Semaphore(cap) # the concurrency cap
async def one(i):
async with sem: # at most cap in flight
return i, await worker(prompts[i])
pairs = await asyncio.gather(*(one(i) for i in range(len(prompts))))
results = [None] * len(prompts)
for i, r in pairs: # reassemble by ORIGINAL index
results[i] = r["result"]
return resultsA Semaphore enforces the cap, gather runs the calls together, and you still write each answer back into its original slot. That needs a running event loop and a real async client, which this in-browser grader does not have, so below you build the same logic with a deterministic scheduler you can actually test. A worker(prompt) stands in for the model call and returns {"result": ..., "cost": ...}, where cost represents how long that item would take. You deliberately schedule each batch in a different cost order to prove your result assembly does not secretly depend on it.
for start in range(0, len(prompts), cap):
window = range(start, min(start + cap, len(prompts)))
# work the window in ANY order, but write results[i] = worker(prompts[i])["result"]
...Build run_batch(prompts, worker, cap=4). Map worker over every prompt in batches of size cap, and return {"results": [...], "batches": n_batches} where results[i] is the worker result for prompts[i] (input order preserved) and batches is ceil(len(prompts) / cap). An empty prompt list runs zero batches.
Write run_batch(prompts, worker, cap=4) that maps a (mocked) worker over every prompt in batches of cap and returns {"results", "batches"}. results must be in INPUT order even though each item carries a different cost that you may schedule by; batches is ceil(len(prompts) / cap). Use plain functions and a scheduling loop -> no asyncio, no network. A second distinct input set must give different results; an empty list runs zero batches.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.