Lesson 3: Working with JSON & Data Structures
In AI development, you are constantly dealing with data, and much of that data comes in the form of JSON (JavaScript Object Notation). It's the lingua franca of web APIs. Whether you're getting a response from an LLM, querying a vector database, or fetching data from a web service, you'll be working with JSON.
This lesson covers the essentials of handling JSON in Python, from parsing and validation to transforming it into more useful data structures.
1. What is JSON? A Quick Refresher
JSON represents data in key-value pairs (like Python dictionaries) and ordered lists (like Python lists).
- Objects (
{}): Correspond to Pythondict.
- Arrays (
[]): Correspond to Pythonlist.
- Strings (
""): Correspond to Pythonstr.
- Numbers (
123.4): Correspond to Pythonintorfloat.
- Booleans (
true,false): Correspond to PythonTrue,False.
- Null (
null): Corresponds to PythonNone.
2. Parsing API Responses with the json Module
Python's built-in json library is your primary tool for encoding and decoding JSON data.
json.loads(json_string): Loads a JSON string into a Python object (adictorlist).
json.dumps(python_object): Dumps a Python object into a JSON formatted string.
Example: Parsing a Raw API Response
Let's say you get the following JSON string from an API call:
import json
api_response_string = """
{
"id": "chatcmpl-12345",
"model": "llama3-8b-8192",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
}
}
],
"usage": {"total_tokens": 42}
}
"""
Parse the JSON string into a Python dictionary
data = json.loads(api_response_string)
Now you can access the data using standard dictionary methods
message_content = data['choices'][0]['message']['content']
print(f"AI Response: {message_content}")
total_tokens = data.get('usage', {}).get('total_tokens', 0)
print(f"Tokens Used: {total_tokens}")
3. The Importance of Error Handling
APIs can fail. Networks can be unreliable. JSON can be malformed. Your code must be resilient to these issues.
a. Handling json.JSONDecodeError
If the string you're trying to parse is not valid JSON, json.loads() will raise a JSONDecodeError.
import json
malformed_json = '{"key": "value", "another_key": "oops no closing brace"'
try:
data = json.loads(malformed_json)
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
# Handle the error gracefully, maybe log it or return a default value
data = None
b. Handling Missing Keys with .get()
If you try to access a dictionary key that doesn't exist using square brackets (data['missing_key']), your program will crash with a KeyError. Using the .get() method is a much safer way to access data that might not be present.
.get('key', default_value) will return the value for 'key' if it exists, otherwise it will return default_value. If you don't provide a default, it returns None.
data = {"name": "Gemini", "version": "1.5"}
Safe access
model_version = data.get("version", "1.0") # Returns "1.5"
author = data.get("author") # Returns None
print(f"Version: {model_version}, Author: {author}")
Unsafe access - this would raise a KeyError
author_unsafe = data["author"]
4. Data Transformation with Pydantic
While dictionaries are flexible, they can be clumsy to work with. You don't get autocomplete, and you have to constantly check for missing keys. This is where Pydantic shines.
As we saw in the previous lesson, Pydantic parses raw dictionaries or JSON into structured Python objects. This is the single best practice for working with API data in modern Python.
Example: Transforming a Dictionary into a Pydantic Object
Let's combine the previous examples to show the full, robust workflow.
import json
from pydantic import BaseModel, ValidationError
from typing import List
1. Define your Pydantic models
class Message(BaseModel):
role: str
content: str
class Choice(BaseModel):
index: int
message: Message
class APIResponse(BaseModel):
id: str
model: str
choices: List[Choice]
2. Get your raw data (as a string)
api_response_string = """
{
"id": "chatcmpl-12345",
"model": "llama3-8b-8192",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
}
}
]
}
"""
3. Parse, validate, and transform in one step
try:
# First, load the string into a Python dict
api_data = json.loads(api_response_string)
# Then, validate the dictionary with Pydantic
structured_data = APIResponse.model_validate(api_data)
# Now, work with a clean, predictable object
print("Validation successful!")
content = structured_data.choices[0].message.content
print(f"AI says: '{content}'")
# You get autocomplete and type checking on this object!
# structured_data.choices[0].message.con... (autocomplete suggests 'content')
except json.JSONDecodeError as e:
print(f"Error: The API returned malformed JSON. Details: {e}")
except ValidationError as e:
print(f"Error: The API response did not match the expected data structure. Details: {e}")
This pattern of Parse -> Validate -> Transform is fundamental to building reliable AI applications that interact with external data sources.