MODULE 1 - CHAPTER 3 ⏱️ 30 minutes 📖 2,000 words

Working with JSON & Data Structures

Learn to confidently handle JSON data from LLM APIs

Lesson 3: Working with JSON & Data Structures

In AI development, you are constantly dealing with data, and much of that data comes in the form of JSON (JavaScript Object Notation). It's the lingua franca of web APIs. Whether you're getting a response from an LLM, querying a vector database, or fetching data from a web service, you'll be working with JSON.

This lesson covers the essentials of handling JSON in Python, from parsing and validation to transforming it into more useful data structures.

1. What is JSON? A Quick Refresher

JSON represents data in key-value pairs (like Python dictionaries) and ordered lists (like Python lists).

2. Parsing API Responses with the json Module

Python's built-in json library is your primary tool for encoding and decoding JSON data.

Example: Parsing a Raw API Response

Let's say you get the following JSON string from an API call:

import json

api_response_string = """ { "id": "chatcmpl-12345", "model": "llama3-8b-8192", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." } } ], "usage": {"total_tokens": 42} } """

Parse the JSON string into a Python dictionary

data = json.loads(api_response_string)

Now you can access the data using standard dictionary methods

message_content = data['choices'][0]['message']['content'] print(f"AI Response: {message_content}")

total_tokens = data.get('usage', {}).get('total_tokens', 0) print(f"Tokens Used: {total_tokens}")

3. The Importance of Error Handling

APIs can fail. Networks can be unreliable. JSON can be malformed. Your code must be resilient to these issues.

a. Handling json.JSONDecodeError

If the string you're trying to parse is not valid JSON, json.loads() will raise a JSONDecodeError.

import json

malformed_json = '{"key": "value", "another_key": "oops no closing brace"'

try: data = json.loads(malformed_json) except json.JSONDecodeError as e: print(f"Error decoding JSON: {e}") # Handle the error gracefully, maybe log it or return a default value data = None

b. Handling Missing Keys with .get()

If you try to access a dictionary key that doesn't exist using square brackets (data['missing_key']), your program will crash with a KeyError. Using the .get() method is a much safer way to access data that might not be present.

.get('key', default_value) will return the value for 'key' if it exists, otherwise it will return default_value. If you don't provide a default, it returns None.
data = {"name": "Gemini", "version": "1.5"}

Safe access

model_version = data.get("version", "1.0") # Returns "1.5" author = data.get("author") # Returns None print(f"Version: {model_version}, Author: {author}")

Unsafe access - this would raise a KeyError

author_unsafe = data["author"]

4. Data Transformation with Pydantic

While dictionaries are flexible, they can be clumsy to work with. You don't get autocomplete, and you have to constantly check for missing keys. This is where Pydantic shines.

As we saw in the previous lesson, Pydantic parses raw dictionaries or JSON into structured Python objects. This is the single best practice for working with API data in modern Python.

Example: Transforming a Dictionary into a Pydantic Object

Let's combine the previous examples to show the full, robust workflow.

import json
from pydantic import BaseModel, ValidationError
from typing import List

1. Define your Pydantic models

class Message(BaseModel): role: str content: str

class Choice(BaseModel): index: int message: Message

class APIResponse(BaseModel): id: str model: str choices: List[Choice]

2. Get your raw data (as a string)

api_response_string = """ { "id": "chatcmpl-12345", "model": "llama3-8b-8192", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." } } ] } """

3. Parse, validate, and transform in one step

try: # First, load the string into a Python dict api_data = json.loads(api_response_string) # Then, validate the dictionary with Pydantic structured_data = APIResponse.model_validate(api_data) # Now, work with a clean, predictable object print("Validation successful!") content = structured_data.choices[0].message.content print(f"AI says: '{content}'") # You get autocomplete and type checking on this object! # structured_data.choices[0].message.con... (autocomplete suggests 'content')

except json.JSONDecodeError as e: print(f"Error: The API returned malformed JSON. Details: {e}") except ValidationError as e: print(f"Error: The API response did not match the expected data structure. Details: {e}")

This pattern of Parse -> Validate -> Transform is fundamental to building reliable AI applications that interact with external data sources.