Chapter 5: Async Programming for AI APIs - Module 1 - Builder Track

Lesson 5: Async Programming for AI APIs

Modern AI applications are often I/O-bound, meaning they spend most of their time waiting for network requests to complete. This is especially true when your application needs to call multiple LLM APIs, databases, or other web services to fulfill a single user request.

Asynchronous programming allows your application to handle multiple of these I/O-bound tasks concurrently, dramatically improving performance and responsiveness. This lesson introduces the fundamentals of asyncio, Python's native library for async programming, and aiohttp, the go-to library for async HTTP requests.

1. The "Why": Sync vs. Async

Imagine making five API calls that each take 1 second to complete.

Synchronous (Sync) Code: Executes one call at a time. Total time = 5 seconds.


    Call 1 (1s) -> Call 2 (1s) -> Call 3 (1s) -> Call 4 (1s) -> Call 5 (1s)  = 5s


Asynchronous (Async) Code: Starts all calls at roughly the same time and waits for them to complete concurrently. Total time = ~1 second (the time of the longest single call).


    Call 1 (start)
    Call 2 (start)
    Call 3 (start)
    Call 4 (start)
    Call 5 (start)
    ... (all wait for 1s) ...
    (all finish) = ~1s


For AI agents that might need to call a search API, a calculator tool, and an LLM in parallel, this difference is massive.

`2. Core Concepts:` async`,` await`, and` asyncio

async def: This syntax defines a coroutine function. It's a special kind of function that can be paused and resumed.

await: This keyword is used inside an async def function to call another coroutine. It essentially says, "Pause this function here, let other tasks run, and resume me when this result is ready."

asyncio: Python's built-in library that manages the event loop, which is responsible for scheduling and running all the async tasks.

`3.` aiohttp`: The Async` requests

You can't use the standard requests library in async code because it's blocking. The most popular alternative is aiohttp. It's designed from the ground up to work with asyncio.

The key is the aiohttp.ClientSession, which allows you to manage a pool of connections for making efficient, concurrent requests.

`4. The Pattern: Concurrent API Calls with` asyncio.gather


The most common pattern for making multiple concurrent API calls is:

1. Create a single aiohttp.ClientSession. 2. Define anasyncfunction that takes the session and a URL and performs a single API call. 3. Create a list of "tasks" by calling your async function for each URL you want to fetch. 4. Useasyncio.gather(*tasks)to run all the tasks concurrently. 5. Useasyncio.run(main_coroutine) to start the entire process.


Example: Calling Multiple LLM Endpoints Concurrently

Let's write a full, runnable example. We'll hit a public test API, httpbin.org`, to simulate calling different services.

# example-04-async-api-calls.py
import asyncio
import aiohttp
import time
We'll use a test API that can simulate delays
URLS = [
    "https://httpbin.org/delay/2",  # Simulates a 2-second response
    "https://httpbin.org/delay/1",  # Simulates a 1-second response
    "https://httpbin.org/get",      # A fast response
    "https://httpbin.org/status/404", # An error response
]
async def fetch_url(session: aiohttp.ClientSession, url: str) -> dict:
    """A coroutine to fetch a single URL."""
    print(f"Starting fetch for {url}...")
    try:
        async with session.get(url, timeout=10) as response:
            # This will raise an exception for 4xx/5xx status codes
            response.raise_for_status()
            print(f"Finished fetch for {url} with status {response.status}")
            # .json() is also a coroutine, so it must be awaited
            return await response.json()
    except aiohttp.ClientError as e:
        print(f"Error fetching {url}: {e.status} {e.message}")
        return {"error": True, "status": e.status, "url": url}
    except asyncio.TimeoutError:
        print(f"Timeout error fetching {url}")
        return {"error": True, "status": "Timeout", "url": url}
async def main():
    """The main coroutine to orchestrate all the concurrent calls."""
    start_time = time.time()
    
    # Create a single session to be reused for all requests.
    async with aiohttp.ClientSession() as session:
        # Create a list of tasks to run. These don't start yet.
        tasks = [fetch_url(session, url) for url in URLS]
        
        # asyncio.gather runs all the tasks concurrently.
        # return_exceptions=True ensures that one failed task doesn't stop the others.
        results = await asyncio.gather(*tasks, return_exceptions=True)
end_time = time.time()
    
    print("\n--- All tasks complete ---")
    for result in results:
        if isinstance(result, dict) and result.get("error"):
            print(f"Failed Task: URL {result.get('url')}, Status: {result.get('status')}")
        else:
            # Process successful results
            print(f"Successful Task: URL {result.get('url')}")
print(f"\nTotal execution time: {end_time - start_time:.2f} seconds")
    print("(Note: This is close to the longest delay, not the sum of all delays!)")
if __name__ == "__main__":
    # This is the entry point that starts the asyncio event loop.
    asyncio.run(main())

When you run this script, you'll see that the total time is just over 2 seconds, proving that the calls ran in parallel. This is the power of async programming and is a fundamental technique for building high-performance AI applications.

Lesson 5: Async Programming for AI APIs

1. The "Why": Sync vs. Async

2. Core Concepts: async, await, and asyncio

3. aiohttp: The Async requests

4. The Pattern: Concurrent API Calls with asyncio.gather

Example: Calling Multiple LLM Endpoints Concurrently

We'll use a test API that can simulate delays

`2. Core Concepts:` async`,` await`, and` asyncio

`3.` aiohttp`: The Async` requests

`4. The Pattern: Concurrent API Calls with` asyncio.gather