Lab Overview
In this hands-on lab, you'll build a complete multimodal AI application that leverages the unique strengths of all three frontier models. You'll implement text analysis, vision processing, and video understanding while applying production best practices from Chapter 5.
By the end of this lab, you'll have a working application that can analyze text documents, extract data from images, process video content, and serve results through a REST APIβall while handling rate limits, errors, and cost tracking.
Prerequisites
Setup & Authentication
Objective: Install required SDKs, configure API keys securely, and verify connectivity to all three providers.
Step 1: Install Dependencies
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install required packages
pip install openai anthropic google-generativeai python-dotenv requests pillow
Step 2: Configure Environment Variables
Create a .env file in your project directory:
# .env file - NEVER commit this to version control!
OPENAI_API_KEY=sk-proj-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_API_KEY=AIza-your-key-here
Step 3: Test Connectivity
"""
Exercise 1: Test API connectivity
This script verifies that all three APIs are accessible
"""
import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
import google.generativeai as genai
# Load environment variables
load_dotenv()
def test_openai():
"""Test OpenAI API connection"""
try:
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Say 'OpenAI connected!'"}],
max_tokens=10
)
print(f"β
OpenAI: {response.choices[0].message.content}")
return True
except Exception as e:
print(f"β OpenAI failed: {str(e)}")
return False
def test_anthropic():
"""Test Anthropic API connection"""
try:
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=10,
messages=[{"role": "user", "content": "Say 'Anthropic connected!'"}]
)
print(f"β
Anthropic: {message.content[0].text}")
return True
except Exception as e:
print(f"β Anthropic failed: {str(e)}")
return False
def test_google():
"""Test Google AI API connection"""
try:
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content("Say 'Google connected!'")
print(f"β
Google: {response.text}")
return True
except Exception as e:
print(f"β Google failed: {str(e)}")
return False
if __name__ == "__main__":
print("Testing API connections...\n")
results = {
"OpenAI": test_openai(),
"Anthropic": test_anthropic(),
"Google": test_google()
}
print("\n" + "="*50)
if all(results.values()):
print("π All APIs connected successfully!")
else:
print("β οΈ Some APIs failed. Check your keys.")
for api, success in results.items():
if not success:
print(f" - Fix {api} configuration")
Testing API connections...
β OpenAI: OpenAI connected!
β Anthropic: Anthropic connected!
β Google: Google connected!
==================================================
π All APIs connected successfully!
Text Analysis with All Three Models
Objective: Compare the text processing capabilities of GPT-4, Claude, and Gemini on different types of tasks.
Task 2.1: Complex Reasoning with GPT-4
"""
Exercise 2.1: Use GPT-4 for complex multi-step reasoning
GPT-4 excels at breaking down complex problems into steps
"""
from openai import OpenAI
import os
def analyze_business_strategy_with_gpt4():
"""
Analyze a complex business scenario using GPT-4's reasoning
"""
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
scenario = """
A SaaS company has 50,000 users paying $10/month. They want to increase
revenue by 50% in 12 months. They have three options:
1. Increase price to $15/month (expect 20% churn)
2. Launch premium tier at $30/month (expect 15% conversion)
3. Expand to enterprise market (need 6 months development, $500K investment)
Analyze each option's ROI and recommend the best strategy.
"""
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a strategic business analyst. Provide detailed financial analysis with calculations."},
{"role": "user", "content": scenario}
],
temperature=0.2, # Lower temperature for analytical tasks
max_tokens=1000
)
analysis = response.choices[0].message.content
# Track token usage
tokens_used = response.usage.total_tokens
cost = (response.usage.prompt_tokens * 0.00003 +
response.usage.completion_tokens * 0.00006) # GPT-4 pricing
print("="*60)
print("GPT-4 BUSINESS STRATEGY ANALYSIS")
print("="*60)
print(analysis)
print("\n" + "-"*60)
print(f"Tokens: {tokens_used} | Cost: ${cost:.4f}")
print("="*60)
return analysis
if __name__ == "__main__":
analyze_business_strategy_with_gpt4()
Task 2.2: Long Document Analysis with Claude
"""
Exercise 2.2: Analyze a long document with Claude
Claude excels at handling long context windows (200K tokens)
"""
import anthropic
import os
def analyze_long_document_with_claude(document_path: str):
"""
Analyze a long document (e.g., research paper, legal contract)
Claude can handle up to 200K tokens in one request
"""
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Read the document
with open(document_path, 'r', encoding='utf-8') as f:
document_text = f.read()
# For this example, we'll use a sample long text
# In practice, load your actual document
if not os.path.exists(document_path):
document_text = """[Sample 10,000-word business plan would go here]
Executive Summary: This document outlines a comprehensive strategy...
(Imagine this is a 100-page document with detailed analysis)
""" * 50 # Simulate longer content
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""Analyze this document and provide:
1. Executive summary (3 paragraphs)
2. Key findings (5 bullet points)
3. Risk assessment
4. Recommendations
Document:
{document_text[:50000]} # Send up to 50K tokens
"""
}]
)
analysis = message.content[0].text
# Track usage
input_tokens = message.usage.input_tokens
output_tokens = message.usage.output_tokens
cost = (input_tokens * 0.000003 + output_tokens * 0.000015) # Claude pricing
print("="*60)
print("CLAUDE DOCUMENT ANALYSIS")
print("="*60)
print(analysis)
print("\n" + "-"*60)
print(f"Input: {input_tokens} tokens | Output: {output_tokens} tokens")
print(f"Cost: ${cost:.4f}")
print("="*60)
return analysis
if __name__ == "__main__":
# Create a sample document if needed
sample_doc = "sample_business_plan.txt"
if not os.path.exists(sample_doc):
with open(sample_doc, 'w') as f:
f.write("Business Plan for AI Education Platform\n\n" + "Content here...\n" * 1000)
analyze_long_document_with_claude(sample_doc)
Task 2.3: Massive Context with Gemini
"""
Exercise 2.3: Process massive codebase with Gemini
Gemini 1.5 Pro can handle up to 2M tokens (entire repositories!)
"""
import google.generativeai as genai
import os
def analyze_codebase_with_gemini(codebase_files: list):
"""
Analyze an entire codebase using Gemini's 2M token context
This is useful for code review, architecture analysis, security audits
"""
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-1.5-pro')
# Combine multiple code files
combined_code = ""
for filepath in codebase_files:
if os.path.exists(filepath):
with open(filepath, 'r', encoding='utf-8') as f:
combined_code += f"\n\n{'='*60}\n"
combined_code += f"FILE: {filepath}\n"
combined_code += f"{'='*60}\n\n"
combined_code += f.read()
# Simulate a large codebase if no files provided
if not combined_code:
combined_code = """
# main.py
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def root():
return {"status": "ok"}
# Add more files...
""" * 100 # Simulate many files
prompt = f"""Analyze this codebase and provide:
1. Architecture overview
2. Potential security vulnerabilities
3. Performance bottlenecks
4. Code quality assessment
5. Refactoring recommendations
Codebase:
{combined_code[:100000]} # Send up to 100K tokens for demo
"""
response = model.generate_content(prompt)
analysis = response.text
# Gemini doesn't provide detailed token counts in the response object
# Estimate based on content length
estimated_tokens = len(combined_code.split()) * 1.3 # Rough estimate
estimated_cost = estimated_tokens * 0.00000125 # Gemini Pro pricing
print("="*60)
print("GEMINI CODEBASE ANALYSIS")
print("="*60)
print(analysis)
print("\n" + "-"*60)
print(f"Estimated tokens: {estimated_tokens:.0f}")
print(f"Estimated cost: ${estimated_cost:.4f}")
print("="*60)
return analysis
if __name__ == "__main__":
# Example: Analyze Python files in current directory
code_files = ["lab-code.py"] # Add your actual files
analyze_codebase_with_gemini(code_files)
- GPT-4: Complex reasoning, step-by-step analysis, creative tasks
- Claude: Long documents, instruction following, safe outputs
- Gemini: Massive context, multimodal, code understanding
Vision Analysis with Multimodal Models
Objective: Extract structured data from images using vision capabilities of all three models.
Task 3.1: Image Analysis with GPT-4 Vision
"""
Exercise 3.1: Analyze images with GPT-4 Vision
Use GPT-4V to extract data from charts, diagrams, and screenshots
"""
from openai import OpenAI
import base64
import os
def encode_image(image_path: str) -> str:
"""Convert image to base64 for API transmission"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def analyze_chart_with_gpt4v(image_path: str):
"""
Analyze a chart/graph and extract structured data
Example: Upload a sales chart, get CSV data back
"""
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# Encode the image
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": """Analyze this chart and extract:
1. Chart type and title
2. Axis labels and units
3. All data points in CSV format
4. Key insights and trends
5. Any anomalies or notable patterns
Format the data extraction as:
CSV_DATA:
[csv format here]
INSIGHTS:
[bullet points]
"""
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
max_tokens=1000
)
analysis = response.choices[0].message.content
cost = response.usage.total_tokens * 0.00003 # Approximate vision pricing
print("="*60)
print("GPT-4 VISION - CHART ANALYSIS")
print("="*60)
print(analysis)
print("\n" + "-"*60)
print(f"Cost: ${cost:.4f}")
print("="*60)
return analysis
def analyze_receipt_with_gpt4v(image_path: str):
"""
Extract structured data from a receipt
Returns: dict with items, prices, total, tax, etc.
"""
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": """Extract all information from this receipt in JSON format:
{
"merchant": "",
"date": "",
"items": [{"name": "", "quantity": 0, "price": 0}],
"subtotal": 0,
"tax": 0,
"total": 0,
"payment_method": ""
}
"""
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
}
]
}
],
max_tokens=500
)
print("="*60)
print("GPT-4 VISION - RECEIPT EXTRACTION")
print("="*60)
print(response.choices[0].message.content)
print("="*60)
return response.choices[0].message.content
if __name__ == "__main__":
# Example usage - you'll need to provide your own images
# chart_image = "sales_chart.png"
# receipt_image = "receipt.jpg"
print("β οΈ Place your test images in the same directory:")
print(" - sales_chart.png (any chart/graph)")
print(" - receipt.jpg (any receipt)")
print("\nThen uncomment the function calls below:")
print("# analyze_chart_with_gpt4v('sales_chart.png')")
print("# analyze_receipt_with_gpt4v('receipt.jpg')")
Task 3.2: Vision Analysis with Claude
"""
Exercise 3.2: Vision analysis with Claude
Claude excels at detailed image description and safety analysis
"""
import anthropic
import base64
import os
def analyze_diagram_with_claude(image_path: str):
"""
Analyze technical diagrams, flowcharts, or architecture diagrams
Claude provides detailed, structured descriptions
"""
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Read and encode image
with open(image_path, "rb") as image_file:
image_data = base64.standard_b64encode(image_file.read()).decode("utf-8")
# Determine media type
ext = image_path.lower().split('.')[-1]
media_type = f"image/{ext}" if ext in ['png', 'jpeg', 'jpg', 'gif', 'webp'] else "image/jpeg"
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1500,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": image_data,
},
},
{
"type": "text",
"text": """Analyze this diagram and provide:
1. Overall purpose and type of diagram
2. Main components and their relationships
3. Data flow or process flow (if applicable)
4. Technical accuracy assessment
5. Suggestions for improvement
Be extremely detailed and technical.
"""
}
],
}
],
)
analysis = message.content[0].text
cost = (message.usage.input_tokens * 0.000003 +
message.usage.output_tokens * 0.000015)
print("="*60)
print("CLAUDE - DIAGRAM ANALYSIS")
print("="*60)
print(analysis)
print("\n" + "-"*60)
print(f"Cost: ${cost:.4f}")
print("="*60)
return analysis
if __name__ == "__main__":
print("β οΈ Place your test diagram (architecture, flowchart, etc.):")
print(" - diagram.png")
print("\nThen run:")
print("# analyze_diagram_with_claude('diagram.png')")
Task 3.3: Compare Vision Capabilities
"""
Exercise 3.3: Compare vision capabilities across all models
Send the same image to GPT-4V, Claude, and Gemini
"""
import os
import base64
from openai import OpenAI
import anthropic
import google.generativeai as genai
from PIL import Image
def compare_vision_models(image_path: str, prompt: str):
"""
Send the same image and prompt to all three models
Compare their responses
"""
results = {}
# 1. GPT-4 Vision
print("Analyzing with GPT-4 Vision...")
try:
openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
with open(image_path, "rb") as img:
base64_image = base64.b64encode(img.read()).decode('utf-8')
response = openai_client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
]
}],
max_tokens=500
)
results["GPT-4 Vision"] = response.choices[0].message.content
except Exception as e:
results["GPT-4 Vision"] = f"Error: {str(e)}"
# 2. Claude Vision
print("Analyzing with Claude Vision...")
try:
claude_client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
with open(image_path, "rb") as img:
image_data = base64.standard_b64encode(img.read()).decode("utf-8")
message = claude_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": image_data}},
{"type": "text", "text": prompt}
]
}]
)
results["Claude Vision"] = message.content[0].text
except Exception as e:
results["Claude Vision"] = f"Error: {str(e)}"
# 3. Gemini Vision
print("Analyzing with Gemini Vision...")
try:
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-1.5-pro')
img = Image.open(image_path)
response = model.generate_content([prompt, img])
results["Gemini Vision"] = response.text
except Exception as e:
results["Gemini Vision"] = f"Error: {str(e)}"
# Print comparison
print("\n" + "="*70)
print("VISION MODEL COMPARISON")
print("="*70)
print(f"Image: {image_path}")
print(f"Prompt: {prompt}\n")
for model_name, response in results.items():
print(f"\n{'-'*70}")
print(f"{model_name}:")
print(f"{'-'*70}")
print(response)
print("\n" + "="*70)
return results
if __name__ == "__main__":
# Example: Compare how each model describes the same image
test_image = "test_image.jpg" # Replace with your image
test_prompt = "Describe this image in detail. What are the main elements?"
if os.path.exists(test_image):
compare_vision_models(test_image, test_prompt)
else:
print(f"β οΈ Please provide a test image: {test_image}")
Video Analysis with Gemini
Objective: Process video content using Gemini's native video understanding capabilities.
Task 4.1: Upload and Process Video
"""
Exercise 4.1: Video analysis with Gemini
Upload a video file and extract insights
"""
import google.generativeai as genai
import os
import time
def upload_video_to_gemini(video_path: str):
"""
Upload video file to Gemini for processing
Gemini can handle videos up to 2 hours long
"""
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
print(f"Uploading video: {video_path}")
print("This may take a few minutes for large files...")
# Upload the video file
video_file = genai.upload_file(path=video_path)
print(f"β
Upload complete: {video_file.uri}")
# Wait for processing
print("Processing video...")
while video_file.state.name == "PROCESSING":
time.sleep(2)
video_file = genai.get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError("Video processing failed")
print("β
Video ready for analysis")
return video_file
def analyze_video_content(video_file, analysis_type="summary"):
"""
Analyze video content based on analysis type
Types: summary, transcript, key_moments, objects, actions
"""
model = genai.GenerativeModel('gemini-1.5-pro')
prompts = {
"summary": "Provide a comprehensive summary of this video including main topics, key points, and overall narrative.",
"transcript": "Generate a detailed transcript of all spoken content in this video. Include timestamps.",
"key_moments": "Identify and describe the 5 most important moments in this video with timestamps.",
"objects": "List all objects visible in this video and when they appear.",
"actions": "Describe all actions and activities happening in this video chronologically."
}
prompt = prompts.get(analysis_type, prompts["summary"])
print(f"\nAnalyzing video: {analysis_type}")
response = model.generate_content([video_file, prompt])
print("="*60)
print(f"VIDEO ANALYSIS: {analysis_type.upper()}")
print("="*60)
print(response.text)
print("="*60)
return response.text
def extract_video_qa(video_file, questions: list):
"""
Answer specific questions about video content
This is useful for extracting specific information
"""
model = genai.GenerativeModel('gemini-1.5-pro')
results = {}
for question in questions:
print(f"\nQuestion: {question}")
response = model.generate_content([video_file, question])
results[question] = response.text
print(f"Answer: {response.text}\n")
return results
if __name__ == "__main__":
# Example usage
video_path = "sample_video.mp4" # Replace with your video
if not os.path.exists(video_path):
print("β οΈ Please provide a test video file:")
print(" - sample_video.mp4 (max 2GB, formats: MP4, MOV, AVI, etc.)")
print("\nExample test videos you can use:")
print(" - Product demo")
print(" - Tutorial/lecture")
print(" - Meeting recording")
print(" - Security camera footage")
else:
# Upload video
video_file = upload_video_to_gemini(video_path)
# Run different analyses
analyze_video_content(video_file, "summary")
analyze_video_content(video_file, "key_moments")
# Ask specific questions
questions = [
"What products are shown in this video?",
"What are the main technical concepts explained?",
"Are there any safety concerns visible?"
]
extract_video_qa(video_file, questions)
Task 4.2: Video Search and Indexing
"""
Exercise 4.2: Build a video search system
Index video content and enable semantic search
"""
import google.generativeai as genai
import os
import json
def build_video_index(video_file, segment_duration=30):
"""
Break video into segments and create searchable index
Each segment gets a timestamp and description
"""
model = genai.GenerativeModel('gemini-1.5-pro')
# Get video duration (you'd normally use ffmpeg for this)
# For demo, we'll assume it and ask Gemini
duration_response = model.generate_content([
video_file,
"What is the total duration of this video in seconds? Reply with just the number."
])
try:
total_duration = int(duration_response.text.strip())
except:
total_duration = 300 # Default 5 minutes
# Create segments
segments = []
num_segments = (total_duration // segment_duration) + 1
print(f"Creating {num_segments} segments ({segment_duration}s each)...")
for i in range(num_segments):
start_time = i * segment_duration
end_time = min((i + 1) * segment_duration, total_duration)
# Ask Gemini to describe this time segment
prompt = f"Describe what happens in this video between {start_time}s and {end_time}s. Be specific and detailed."
response = model.generate_content([video_file, prompt])
segment = {
"segment_id": i,
"start_time": start_time,
"end_time": end_time,
"description": response.text
}
segments.append(segment)
print(f"β
Segment {i+1}/{num_segments} indexed")
# Save index
index_file = "video_index.json"
with open(index_file, 'w') as f:
json.dump(segments, f, indent=2)
print(f"\nβ
Video index saved to {index_file}")
return segments
def search_video_index(query: str, index_file="video_index.json"):
"""
Search video index using semantic matching
Returns relevant segments with timestamps
"""
# Load index
with open(index_file, 'r') as f:
segments = json.load(f)
# Use Gemini to find relevant segments
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-1.5-pro')
index_text = "\n\n".join([
f"Segment {s['segment_id']} ({s['start_time']}s - {s['end_time']}s):\n{s['description']}"
for s in segments
])
prompt = f"""Given this video index:
{index_text}
Find segments relevant to this query: "{query}"
Return segment IDs and timestamps in this format:
Segment X (start_time - end_time): Why it's relevant
"""
response = model.generate_content(prompt)
print("="*60)
print(f"VIDEO SEARCH: {query}")
print("="*60)
print(response.text)
print("="*60)
return response.text
if __name__ == "__main__":
video_path = "sample_video.mp4"
if os.path.exists(video_path):
# Build index
video_file = upload_video_to_gemini(video_path)
segments = build_video_index(video_file, segment_duration=30)
# Search examples
search_video_index("product features")
search_video_index("technical specifications")
search_video_index("pricing information")
else:
print("β οΈ Provide video file: sample_video.mp4")
Production-Ready API Service
Objective: Build a production-ready REST API with rate limiting, error handling, and cost tracking.
Task 5.1: Build FastAPI Service
"""
Exercise 5: Production-ready multimodal API service
Combines all models with proper error handling and monitoring
"""
from fastapi import FastAPI, HTTPException, UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional, Literal
import os
from datetime import datetime
import asyncio
# Import all our previous functions
from openai import OpenAI
import anthropic
import google.generativeai as genai
app = FastAPI(title="Multimodal AI API", version="1.0.0")
# CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Initialize clients
openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
claude_client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
# Request models
class TextAnalysisRequest(BaseModel):
text: str
model: Literal["gpt-4", "claude", "gemini"]
task: Literal["summarize", "analyze", "extract"]
class ComparisonRequest(BaseModel):
prompt: str
# Cost tracking
usage_log = []
def log_usage(model: str, tokens: int, cost: float):
"""Track API usage and costs"""
usage_log.append({
"timestamp": datetime.now().isoformat(),
"model": model,
"tokens": tokens,
"cost": cost
})
# Endpoints
@app.get("/")
def root():
return {
"service": "Multimodal AI API",
"status": "operational",
"endpoints": [
"/analyze-text",
"/analyze-image",
"/compare-models",
"/usage-stats"
]
}
@app.post("/analyze-text")
async def analyze_text(request: TextAnalysisRequest):
"""
Analyze text using specified model
"""
try:
if request.model == "gpt-4":
response = openai_client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"{request.task.capitalize()} this text:\n\n{request.text}"
}],
max_tokens=500
)
result = response.choices[0].message.content
tokens = response.usage.total_tokens
cost = tokens * 0.00003
log_usage("gpt-4", tokens, cost)
elif request.model == "claude":
response = claude_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": f"{request.task.capitalize()} this text:\n\n{request.text}"
}]
)
result = response.content[0].text
tokens = response.usage.input_tokens + response.usage.output_tokens
cost = (response.usage.input_tokens * 0.000003 +
response.usage.output_tokens * 0.000015)
log_usage("claude", tokens, cost)
elif request.model == "gemini":
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content(
f"{request.task.capitalize()} this text:\n\n{request.text}"
)
result = response.text
tokens = len(request.text.split()) * 1.3 # Estimate
cost = tokens * 0.00000125
log_usage("gemini", tokens, cost)
return {
"model": request.model,
"result": result,
"tokens_used": tokens,
"cost": f"${cost:.6f}"
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/compare-models")
async def compare_models(request: ComparisonRequest):
"""
Send same prompt to all models and compare responses
"""
results = {}
# GPT-4
try:
response = openai_client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": request.prompt}],
max_tokens=300
)
results["gpt-4"] = response.choices[0].message.content
except Exception as e:
results["gpt-4"] = f"Error: {str(e)}"
# Claude
try:
response = claude_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": request.prompt}]
)
results["claude"] = response.content[0].text
except Exception as e:
results["claude"] = f"Error: {str(e)}"
# Gemini
try:
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content(request.prompt)
results["gemini"] = response.text
except Exception as e:
results["gemini"] = f"Error: {str(e)}"
return {"prompt": request.prompt, "responses": results}
@app.get("/usage-stats")
def get_usage_stats():
"""
Get usage statistics and cost tracking
"""
if not usage_log:
return {"message": "No usage data yet"}
total_cost = sum(entry["cost"] for entry in usage_log)
total_tokens = sum(entry["tokens"] for entry in usage_log)
model_breakdown = {}
for entry in usage_log:
model = entry["model"]
if model not in model_breakdown:
model_breakdown[model] = {"calls": 0, "tokens": 0, "cost": 0}
model_breakdown[model]["calls"] += 1
model_breakdown[model]["tokens"] += entry["tokens"]
model_breakdown[model]["cost"] += entry["cost"]
return {
"total_api_calls": len(usage_log),
"total_tokens": total_tokens,
"total_cost": f"${total_cost:.4f}",
"breakdown_by_model": model_breakdown,
"recent_calls": usage_log[-10:] # Last 10 calls
}
if __name__ == "__main__":
import uvicorn
print("="*60)
print("π Starting Multimodal AI API Service")
print("="*60)
print("API Documentation: http://localhost:8000/docs")
print("="*60)
uvicorn.run(app, host="0.0.0.0", port=8000)
Task 5.2: Test the API
"""
Test the production API
"""
import requests
import json
BASE_URL = "http://localhost:8000"
def test_text_analysis():
"""Test text analysis endpoint"""
response = requests.post(
f"{BASE_URL}/analyze-text",
json={
"text": "Artificial intelligence is transforming industries worldwide.",
"model": "gpt-4",
"task": "analyze"
}
)
print("Text Analysis Test:")
print(json.dumps(response.json(), indent=2))
def test_model_comparison():
"""Test model comparison"""
response = requests.post(
f"{BASE_URL}/compare-models",
json={"prompt": "Explain quantum computing in one sentence."}
)
print("\nModel Comparison Test:")
print(json.dumps(response.json(), indent=2))
def test_usage_stats():
"""Test usage statistics"""
response = requests.get(f"{BASE_URL}/usage-stats")
print("\nUsage Statistics:")
print(json.dumps(response.json(), indent=2))
if __name__ == "__main__":
print("Testing API endpoints...\n")
test_text_analysis()
test_model_comparison()
test_usage_stats()
Troubleshooting Common Issues
Solutions:
- Verify .env file exists and contains correct keys
- Check keys have no extra spaces or line breaks
- Ensure keys have proper permissions (GPT-4 access, etc.)
- Try regenerating keys from provider dashboards
Solutions:
- Add time.sleep(1) between API calls
- Implement exponential backoff retry logic
- Check your tier limits in provider dashboard
- Consider upgrading to higher tier plan
Solutions:
- Compress images to under 20MB (use PIL/Pillow)
- Convert videos to MP4 format if needed
- Check file isn't corrupted (try opening locally)
- Ensure proper base64 encoding for images
Solutions:
- Truncate input text to fit model limits
- Use Claude or Gemini for longer contexts
- Implement text chunking and summarization
- Consider using RAG (Retrieval Augmented Generation)
Bonus Challenges
π Challenge 1: Add Rate Limiting
Implement a rate limiter using Redis or in-memory cache to prevent hitting API limits.
Hints: Use slowapi or fastapi-limiter package
π Challenge 2: Streaming Responses
Modify the API to stream responses token-by-token using Server-Sent Events (SSE).
Hints: Use OpenAI's stream parameter and FastAPI's StreamingResponse
π Challenge 3: Build a Dashboard
Create a web dashboard (React/Vue/vanilla JS) to visualize usage statistics and costs in real-time.
Hints: Use Chart.js for graphs, fetch /usage-stats endpoint every 5 seconds
π Challenge 4: Multi-Step Workflow
Build a workflow that uses multiple models sequentially: GPT-4 for planning β Claude for execution β Gemini for verification.
Hints: Create a /workflow endpoint that chains calls and passes outputs
π Challenge 5: Deploy to Production
Deploy your API service to AWS/GCP/Azure with Docker, HTTPS, and monitoring.
Hints: Use Docker Compose, Let's Encrypt for SSL, DataDog for monitoring
π Lab Complete!
Congratulations on building a production-ready multimodal AI application!
- β Integrated OpenAI, Anthropic, and Google APIs
- β Implemented text, vision, and video analysis
- β Built a production REST API with FastAPI
- β Added cost tracking and usage monitoring
- β Compared capabilities of all three frontier models
Next Steps: Complete the Module 1 Quiz to test your knowledge!