Hands-On Lab: Build a Multimodal AI Application - Module 1 - Deep Dive Track

Lab Overview

In this hands-on lab, you'll build a complete multimodal AI application that leverages the unique strengths of all three frontier models. You'll implement text analysis, vision processing, and video understanding while applying production best practices from Chapter 5.

By the end of this lab, you'll have a working application that can analyze text documents, extract data from images, process video content, and serve results through a REST API—all while handling rate limits, errors, and cost tracking.

⚠️ API Costs Warning: This lab will use real API calls to OpenAI, Anthropic, and Google. Estimated cost: $2-5 USD for completing all exercises. Make sure you have API credits available.

Prerequisites

Python 3.10 or higher installed

OpenAI API key with GPT-4 access

Anthropic API key (Claude)

Google AI API key (Gemini)

Basic understanding of Python async/await

Text editor or IDE (VS Code recommended)

Sample files: long document (PDF/TXT), image, short video

💡 Tip: Download the complete lab code package from lab-code.py and requirements.txt before starting.

EXERCISE 1

Setup & Authentication

⏱️ 15 minutes

Objective: Install required SDKs, configure API keys securely, and verify connectivity to all three providers.

Step 1: Install Dependencies

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install required packages
pip install openai anthropic google-generativeai python-dotenv requests pillow

Step 2: Configure Environment Variables

Create a .env file in your project directory:

# .env file - NEVER commit this to version control!
OPENAI_API_KEY=sk-proj-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_API_KEY=AIza-your-key-here

Step 3: Test Connectivity

"""
Exercise 1: Test API connectivity
This script verifies that all three APIs are accessible
"""
import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
import google.generativeai as genai

# Load environment variables
load_dotenv()

def test_openai():
    """Test OpenAI API connection"""
    try:
        client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": "Say 'OpenAI connected!'"}],
            max_tokens=10
        )
        print(f"✅ OpenAI: {response.choices[0].message.content}")
        return True
    except Exception as e:
        print(f"❌ OpenAI failed: {str(e)}")
        return False

def test_anthropic():
    """Test Anthropic API connection"""
    try:
        client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
        message = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=10,
            messages=[{"role": "user", "content": "Say 'Anthropic connected!'"}]
        )
        print(f"✅ Anthropic: {message.content[0].text}")
        return True
    except Exception as e:
        print(f"❌ Anthropic failed: {str(e)}")
        return False

def test_google():
    """Test Google AI API connection"""
    try:
        genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
        model = genai.GenerativeModel('gemini-1.5-pro')
        response = model.generate_content("Say 'Google connected!'")
        print(f"✅ Google: {response.text}")
        return True
    except Exception as e:
        print(f"❌ Google failed: {str(e)}")
        return False

if __name__ == "__main__":
    print("Testing API connections...\n")
    results = {
        "OpenAI": test_openai(),
        "Anthropic": test_anthropic(),
        "Google": test_google()
    }

    print("\n" + "="*50)
    if all(results.values()):
        print("🎉 All APIs connected successfully!")
    else:
        print("⚠️ Some APIs failed. Check your keys.")
        for api, success in results.items():
            if not success:
                print(f"   - Fix {api} configuration")

Expected Output:
Testing API connections...

✅ OpenAI: OpenAI connected!
✅ Anthropic: Anthropic connected!
✅ Google: Google connected!

==================================================
🎉 All APIs connected successfully!

✓ Exercise 1 Checklist:

Virtual environment created and activated

All packages installed successfully

.env file created with all three API keys

Connectivity test passes for all APIs

EXERCISE 2

Text Analysis with All Three Models

⏱️ 20 minutes

Objective: Compare the text processing capabilities of GPT-4, Claude, and Gemini on different types of tasks.

Task 2.1: Complex Reasoning with GPT-4

"""
Exercise 2.1: Use GPT-4 for complex multi-step reasoning
GPT-4 excels at breaking down complex problems into steps
"""
from openai import OpenAI
import os

def analyze_business_strategy_with_gpt4():
    """
    Analyze a complex business scenario using GPT-4's reasoning
    """
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

    scenario = """
    A SaaS company has 50,000 users paying $10/month. They want to increase
    revenue by 50% in 12 months. They have three options:
    1. Increase price to $15/month (expect 20% churn)
    2. Launch premium tier at $30/month (expect 15% conversion)
    3. Expand to enterprise market (need 6 months development, $500K investment)

    Analyze each option's ROI and recommend the best strategy.
    """

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a strategic business analyst. Provide detailed financial analysis with calculations."},
            {"role": "user", "content": scenario}
        ],
        temperature=0.2,  # Lower temperature for analytical tasks
        max_tokens=1000
    )

    analysis = response.choices[0].message.content

    # Track token usage
    tokens_used = response.usage.total_tokens
    cost = (response.usage.prompt_tokens * 0.00003 +
            response.usage.completion_tokens * 0.00006)  # GPT-4 pricing

    print("="*60)
    print("GPT-4 BUSINESS STRATEGY ANALYSIS")
    print("="*60)
    print(analysis)
    print("\n" + "-"*60)
    print(f"Tokens: {tokens_used} | Cost: ${cost:.4f}")
    print("="*60)

    return analysis

if __name__ == "__main__":
    analyze_business_strategy_with_gpt4()

Task 2.2: Long Document Analysis with Claude

"""
Exercise 2.2: Analyze a long document with Claude
Claude excels at handling long context windows (200K tokens)
"""
import anthropic
import os

def analyze_long_document_with_claude(document_path: str):
    """
    Analyze a long document (e.g., research paper, legal contract)
    Claude can handle up to 200K tokens in one request
    """
    client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

    # Read the document
    with open(document_path, 'r', encoding='utf-8') as f:
        document_text = f.read()

    # For this example, we'll use a sample long text
    # In practice, load your actual document
    if not os.path.exists(document_path):
        document_text = """[Sample 10,000-word business plan would go here]

        Executive Summary: This document outlines a comprehensive strategy...
        (Imagine this is a 100-page document with detailed analysis)
        """ * 50  # Simulate longer content

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""Analyze this document and provide:
            1. Executive summary (3 paragraphs)
            2. Key findings (5 bullet points)
            3. Risk assessment
            4. Recommendations

            Document:
            {document_text[:50000]}  # Send up to 50K tokens
            """
        }]
    )

    analysis = message.content[0].text

    # Track usage
    input_tokens = message.usage.input_tokens
    output_tokens = message.usage.output_tokens
    cost = (input_tokens * 0.000003 + output_tokens * 0.000015)  # Claude pricing

    print("="*60)
    print("CLAUDE DOCUMENT ANALYSIS")
    print("="*60)
    print(analysis)
    print("\n" + "-"*60)
    print(f"Input: {input_tokens} tokens | Output: {output_tokens} tokens")
    print(f"Cost: ${cost:.4f}")
    print("="*60)

    return analysis

if __name__ == "__main__":
    # Create a sample document if needed
    sample_doc = "sample_business_plan.txt"
    if not os.path.exists(sample_doc):
        with open(sample_doc, 'w') as f:
            f.write("Business Plan for AI Education Platform\n\n" + "Content here...\n" * 1000)

    analyze_long_document_with_claude(sample_doc)

Task 2.3: Massive Context with Gemini

"""
Exercise 2.3: Process massive codebase with Gemini
Gemini 1.5 Pro can handle up to 2M tokens (entire repositories!)
"""
import google.generativeai as genai
import os

def analyze_codebase_with_gemini(codebase_files: list):
    """
    Analyze an entire codebase using Gemini's 2M token context
    This is useful for code review, architecture analysis, security audits
    """
    genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
    model = genai.GenerativeModel('gemini-1.5-pro')

    # Combine multiple code files
    combined_code = ""
    for filepath in codebase_files:
        if os.path.exists(filepath):
            with open(filepath, 'r', encoding='utf-8') as f:
                combined_code += f"\n\n{'='*60}\n"
                combined_code += f"FILE: {filepath}\n"
                combined_code += f"{'='*60}\n\n"
                combined_code += f.read()

    # Simulate a large codebase if no files provided
    if not combined_code:
        combined_code = """
        # main.py
        from fastapi import FastAPI
        app = FastAPI()

        @app.get("/")
        def root():
            return {"status": "ok"}

        # Add more files...
        """ * 100  # Simulate many files

    prompt = f"""Analyze this codebase and provide:
    1. Architecture overview
    2. Potential security vulnerabilities
    3. Performance bottlenecks
    4. Code quality assessment
    5. Refactoring recommendations

    Codebase:
    {combined_code[:100000]}  # Send up to 100K tokens for demo
    """

    response = model.generate_content(prompt)
    analysis = response.text

    # Gemini doesn't provide detailed token counts in the response object
    # Estimate based on content length
    estimated_tokens = len(combined_code.split()) * 1.3  # Rough estimate
    estimated_cost = estimated_tokens * 0.00000125  # Gemini Pro pricing

    print("="*60)
    print("GEMINI CODEBASE ANALYSIS")
    print("="*60)
    print(analysis)
    print("\n" + "-"*60)
    print(f"Estimated tokens: {estimated_tokens:.0f}")
    print(f"Estimated cost: ${estimated_cost:.4f}")
    print("="*60)

    return analysis

if __name__ == "__main__":
    # Example: Analyze Python files in current directory
    code_files = ["lab-code.py"]  # Add your actual files
    analyze_codebase_with_gemini(code_files)

💡 Model Selection Tips:

GPT-4: Complex reasoning, step-by-step analysis, creative tasks
Claude: Long documents, instruction following, safe outputs
Gemini: Massive context, multimodal, code understanding

✓ Exercise 2 Checklist:

GPT-4 business analysis completed

Claude document analysis completed

Gemini codebase analysis completed

Compared outputs and understood strengths of each model

EXERCISE 3

Vision Analysis with Multimodal Models

⏱️ 25 minutes

Objective: Extract structured data from images using vision capabilities of all three models.

Task 3.1: Image Analysis with GPT-4 Vision

"""
Exercise 3.1: Analyze images with GPT-4 Vision
Use GPT-4V to extract data from charts, diagrams, and screenshots
"""
from openai import OpenAI
import base64
import os

def encode_image(image_path: str) -> str:
    """Convert image to base64 for API transmission"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def analyze_chart_with_gpt4v(image_path: str):
    """
    Analyze a chart/graph and extract structured data
    Example: Upload a sales chart, get CSV data back
    """
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

    # Encode the image
    base64_image = encode_image(image_path)

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": """Analyze this chart and extract:
                        1. Chart type and title
                        2. Axis labels and units
                        3. All data points in CSV format
                        4. Key insights and trends
                        5. Any anomalies or notable patterns

                        Format the data extraction as:
                        CSV_DATA:
                        [csv format here]

                        INSIGHTS:
                        [bullet points]
                        """
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]
            }
        ],
        max_tokens=1000
    )

    analysis = response.choices[0].message.content
    cost = response.usage.total_tokens * 0.00003  # Approximate vision pricing

    print("="*60)
    print("GPT-4 VISION - CHART ANALYSIS")
    print("="*60)
    print(analysis)
    print("\n" + "-"*60)
    print(f"Cost: ${cost:.4f}")
    print("="*60)

    return analysis

def analyze_receipt_with_gpt4v(image_path: str):
    """
    Extract structured data from a receipt
    Returns: dict with items, prices, total, tax, etc.
    """
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
    base64_image = encode_image(image_path)

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": """Extract all information from this receipt in JSON format:
                        {
                            "merchant": "",
                            "date": "",
                            "items": [{"name": "", "quantity": 0, "price": 0}],
                            "subtotal": 0,
                            "tax": 0,
                            "total": 0,
                            "payment_method": ""
                        }
                        """
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
                    }
                ]
            }
        ],
        max_tokens=500
    )

    print("="*60)
    print("GPT-4 VISION - RECEIPT EXTRACTION")
    print("="*60)
    print(response.choices[0].message.content)
    print("="*60)

    return response.choices[0].message.content

if __name__ == "__main__":
    # Example usage - you'll need to provide your own images
    # chart_image = "sales_chart.png"
    # receipt_image = "receipt.jpg"

    print("⚠️  Place your test images in the same directory:")
    print("   - sales_chart.png (any chart/graph)")
    print("   - receipt.jpg (any receipt)")
    print("\nThen uncomment the function calls below:")
    print("# analyze_chart_with_gpt4v('sales_chart.png')")
    print("# analyze_receipt_with_gpt4v('receipt.jpg')")

Task 3.2: Vision Analysis with Claude

"""
Exercise 3.2: Vision analysis with Claude
Claude excels at detailed image description and safety analysis
"""
import anthropic
import base64
import os

def analyze_diagram_with_claude(image_path: str):
    """
    Analyze technical diagrams, flowcharts, or architecture diagrams
    Claude provides detailed, structured descriptions
    """
    client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

    # Read and encode image
    with open(image_path, "rb") as image_file:
        image_data = base64.standard_b64encode(image_file.read()).decode("utf-8")

    # Determine media type
    ext = image_path.lower().split('.')[-1]
    media_type = f"image/{ext}" if ext in ['png', 'jpeg', 'jpg', 'gif', 'webp'] else "image/jpeg"

    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1500,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": image_data,
                        },
                    },
                    {
                        "type": "text",
                        "text": """Analyze this diagram and provide:
                        1. Overall purpose and type of diagram
                        2. Main components and their relationships
                        3. Data flow or process flow (if applicable)
                        4. Technical accuracy assessment
                        5. Suggestions for improvement

                        Be extremely detailed and technical.
                        """
                    }
                ],
            }
        ],
    )

    analysis = message.content[0].text
    cost = (message.usage.input_tokens * 0.000003 +
            message.usage.output_tokens * 0.000015)

    print("="*60)
    print("CLAUDE - DIAGRAM ANALYSIS")
    print("="*60)
    print(analysis)
    print("\n" + "-"*60)
    print(f"Cost: ${cost:.4f}")
    print("="*60)

    return analysis

if __name__ == "__main__":
    print("⚠️  Place your test diagram (architecture, flowchart, etc.):")
    print("   - diagram.png")
    print("\nThen run:")
    print("# analyze_diagram_with_claude('diagram.png')")

Task 3.3: Compare Vision Capabilities

"""
Exercise 3.3: Compare vision capabilities across all models
Send the same image to GPT-4V, Claude, and Gemini
"""
import os
import base64
from openai import OpenAI
import anthropic
import google.generativeai as genai
from PIL import Image

def compare_vision_models(image_path: str, prompt: str):
    """
    Send the same image and prompt to all three models
    Compare their responses
    """
    results = {}

    # 1. GPT-4 Vision
    print("Analyzing with GPT-4 Vision...")
    try:
        openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
        with open(image_path, "rb") as img:
            base64_image = base64.b64encode(img.read()).decode('utf-8')

        response = openai_client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                ]
            }],
            max_tokens=500
        )
        results["GPT-4 Vision"] = response.choices[0].message.content
    except Exception as e:
        results["GPT-4 Vision"] = f"Error: {str(e)}"

    # 2. Claude Vision
    print("Analyzing with Claude Vision...")
    try:
        claude_client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
        with open(image_path, "rb") as img:
            image_data = base64.standard_b64encode(img.read()).decode("utf-8")

        message = claude_client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": [
                    {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": image_data}},
                    {"type": "text", "text": prompt}
                ]
            }]
        )
        results["Claude Vision"] = message.content[0].text
    except Exception as e:
        results["Claude Vision"] = f"Error: {str(e)}"

    # 3. Gemini Vision
    print("Analyzing with Gemini Vision...")
    try:
        genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
        model = genai.GenerativeModel('gemini-1.5-pro')
        img = Image.open(image_path)
        response = model.generate_content([prompt, img])
        results["Gemini Vision"] = response.text
    except Exception as e:
        results["Gemini Vision"] = f"Error: {str(e)}"

    # Print comparison
    print("\n" + "="*70)
    print("VISION MODEL COMPARISON")
    print("="*70)
    print(f"Image: {image_path}")
    print(f"Prompt: {prompt}\n")

    for model_name, response in results.items():
        print(f"\n{'-'*70}")
        print(f"{model_name}:")
        print(f"{'-'*70}")
        print(response)

    print("\n" + "="*70)

    return results

if __name__ == "__main__":
    # Example: Compare how each model describes the same image
    test_image = "test_image.jpg"  # Replace with your image
    test_prompt = "Describe this image in detail. What are the main elements?"

    if os.path.exists(test_image):
        compare_vision_models(test_image, test_prompt)
    else:
        print(f"⚠️  Please provide a test image: {test_image}")

✓ Exercise 3 Checklist:

Extracted data from chart using GPT-4V

Analyzed diagram with Claude

Compared vision outputs across all three models

Understood strengths/weaknesses of each vision model

EXERCISE 4

Video Analysis with Gemini

⏱️ 30 minutes

Objective: Process video content using Gemini's native video understanding capabilities.

⚠️ Note: Video processing is currently exclusive to Gemini 1.5 Pro/Flash. This exercise demonstrates Gemini's unique multimodal advantage.

Task 4.1: Upload and Process Video

"""
Exercise 4.1: Video analysis with Gemini
Upload a video file and extract insights
"""
import google.generativeai as genai
import os
import time

def upload_video_to_gemini(video_path: str):
    """
    Upload video file to Gemini for processing
    Gemini can handle videos up to 2 hours long
    """
    genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))

    print(f"Uploading video: {video_path}")
    print("This may take a few minutes for large files...")

    # Upload the video file
    video_file = genai.upload_file(path=video_path)
    print(f"✅ Upload complete: {video_file.uri}")

    # Wait for processing
    print("Processing video...")
    while video_file.state.name == "PROCESSING":
        time.sleep(2)
        video_file = genai.get_file(video_file.name)

    if video_file.state.name == "FAILED":
        raise ValueError("Video processing failed")

    print("✅ Video ready for analysis")
    return video_file

def analyze_video_content(video_file, analysis_type="summary"):
    """
    Analyze video content based on analysis type
    Types: summary, transcript, key_moments, objects, actions
    """
    model = genai.GenerativeModel('gemini-1.5-pro')

    prompts = {
        "summary": "Provide a comprehensive summary of this video including main topics, key points, and overall narrative.",
        "transcript": "Generate a detailed transcript of all spoken content in this video. Include timestamps.",
        "key_moments": "Identify and describe the 5 most important moments in this video with timestamps.",
        "objects": "List all objects visible in this video and when they appear.",
        "actions": "Describe all actions and activities happening in this video chronologically."
    }

    prompt = prompts.get(analysis_type, prompts["summary"])

    print(f"\nAnalyzing video: {analysis_type}")
    response = model.generate_content([video_file, prompt])

    print("="*60)
    print(f"VIDEO ANALYSIS: {analysis_type.upper()}")
    print("="*60)
    print(response.text)
    print("="*60)

    return response.text

def extract_video_qa(video_file, questions: list):
    """
    Answer specific questions about video content
    This is useful for extracting specific information
    """
    model = genai.GenerativeModel('gemini-1.5-pro')
    results = {}

    for question in questions:
        print(f"\nQuestion: {question}")
        response = model.generate_content([video_file, question])
        results[question] = response.text
        print(f"Answer: {response.text}\n")

    return results

if __name__ == "__main__":
    # Example usage
    video_path = "sample_video.mp4"  # Replace with your video

    if not os.path.exists(video_path):
        print("⚠️  Please provide a test video file:")
        print("   - sample_video.mp4 (max 2GB, formats: MP4, MOV, AVI, etc.)")
        print("\nExample test videos you can use:")
        print("   - Product demo")
        print("   - Tutorial/lecture")
        print("   - Meeting recording")
        print("   - Security camera footage")
    else:
        # Upload video
        video_file = upload_video_to_gemini(video_path)

        # Run different analyses
        analyze_video_content(video_file, "summary")
        analyze_video_content(video_file, "key_moments")

        # Ask specific questions
        questions = [
            "What products are shown in this video?",
            "What are the main technical concepts explained?",
            "Are there any safety concerns visible?"
        ]
        extract_video_qa(video_file, questions)

Task 4.2: Video Search and Indexing

"""
Exercise 4.2: Build a video search system
Index video content and enable semantic search
"""
import google.generativeai as genai
import os
import json

def build_video_index(video_file, segment_duration=30):
    """
    Break video into segments and create searchable index
    Each segment gets a timestamp and description
    """
    model = genai.GenerativeModel('gemini-1.5-pro')

    # Get video duration (you'd normally use ffmpeg for this)
    # For demo, we'll assume it and ask Gemini
    duration_response = model.generate_content([
        video_file,
        "What is the total duration of this video in seconds? Reply with just the number."
    ])

    try:
        total_duration = int(duration_response.text.strip())
    except:
        total_duration = 300  # Default 5 minutes

    # Create segments
    segments = []
    num_segments = (total_duration // segment_duration) + 1

    print(f"Creating {num_segments} segments ({segment_duration}s each)...")

    for i in range(num_segments):
        start_time = i * segment_duration
        end_time = min((i + 1) * segment_duration, total_duration)

        # Ask Gemini to describe this time segment
        prompt = f"Describe what happens in this video between {start_time}s and {end_time}s. Be specific and detailed."
        response = model.generate_content([video_file, prompt])

        segment = {
            "segment_id": i,
            "start_time": start_time,
            "end_time": end_time,
            "description": response.text
        }
        segments.append(segment)
        print(f"✅ Segment {i+1}/{num_segments} indexed")

    # Save index
    index_file = "video_index.json"
    with open(index_file, 'w') as f:
        json.dump(segments, f, indent=2)

    print(f"\n✅ Video index saved to {index_file}")
    return segments

def search_video_index(query: str, index_file="video_index.json"):
    """
    Search video index using semantic matching
    Returns relevant segments with timestamps
    """
    # Load index
    with open(index_file, 'r') as f:
        segments = json.load(f)

    # Use Gemini to find relevant segments
    genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
    model = genai.GenerativeModel('gemini-1.5-pro')

    index_text = "\n\n".join([
        f"Segment {s['segment_id']} ({s['start_time']}s - {s['end_time']}s):\n{s['description']}"
        for s in segments
    ])

    prompt = f"""Given this video index:

    {index_text}

    Find segments relevant to this query: "{query}"

    Return segment IDs and timestamps in this format:
    Segment X (start_time - end_time): Why it's relevant
    """

    response = model.generate_content(prompt)

    print("="*60)
    print(f"VIDEO SEARCH: {query}")
    print("="*60)
    print(response.text)
    print("="*60)

    return response.text

if __name__ == "__main__":
    video_path = "sample_video.mp4"

    if os.path.exists(video_path):
        # Build index
        video_file = upload_video_to_gemini(video_path)
        segments = build_video_index(video_file, segment_duration=30)

        # Search examples
        search_video_index("product features")
        search_video_index("technical specifications")
        search_video_index("pricing information")
    else:
        print("⚠️  Provide video file: sample_video.mp4")

✓ Exercise 4 Checklist:

Successfully uploaded video to Gemini

Generated video summary and transcript

Identified key moments with timestamps

Built searchable video index

EXERCISE 5

Production-Ready API Service

⏱️ 30 minutes

Objective: Build a production-ready REST API with rate limiting, error handling, and cost tracking.

Task 5.1: Build FastAPI Service

"""
Exercise 5: Production-ready multimodal API service
Combines all models with proper error handling and monitoring
"""
from fastapi import FastAPI, HTTPException, UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional, Literal
import os
from datetime import datetime
import asyncio

# Import all our previous functions
from openai import OpenAI
import anthropic
import google.generativeai as genai

app = FastAPI(title="Multimodal AI API", version="1.0.0")

# CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Initialize clients
openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
claude_client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))

# Request models
class TextAnalysisRequest(BaseModel):
    text: str
    model: Literal["gpt-4", "claude", "gemini"]
    task: Literal["summarize", "analyze", "extract"]

class ComparisonRequest(BaseModel):
    prompt: str

# Cost tracking
usage_log = []

def log_usage(model: str, tokens: int, cost: float):
    """Track API usage and costs"""
    usage_log.append({
        "timestamp": datetime.now().isoformat(),
        "model": model,
        "tokens": tokens,
        "cost": cost
    })

# Endpoints
@app.get("/")
def root():
    return {
        "service": "Multimodal AI API",
        "status": "operational",
        "endpoints": [
            "/analyze-text",
            "/analyze-image",
            "/compare-models",
            "/usage-stats"
        ]
    }

@app.post("/analyze-text")
async def analyze_text(request: TextAnalysisRequest):
    """
    Analyze text using specified model
    """
    try:
        if request.model == "gpt-4":
            response = openai_client.chat.completions.create(
                model="gpt-4",
                messages=[{
                    "role": "user",
                    "content": f"{request.task.capitalize()} this text:\n\n{request.text}"
                }],
                max_tokens=500
            )
            result = response.choices[0].message.content
            tokens = response.usage.total_tokens
            cost = tokens * 0.00003
            log_usage("gpt-4", tokens, cost)

        elif request.model == "claude":
            response = claude_client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=500,
                messages=[{
                    "role": "user",
                    "content": f"{request.task.capitalize()} this text:\n\n{request.text}"
                }]
            )
            result = response.content[0].text
            tokens = response.usage.input_tokens + response.usage.output_tokens
            cost = (response.usage.input_tokens * 0.000003 +
                   response.usage.output_tokens * 0.000015)
            log_usage("claude", tokens, cost)

        elif request.model == "gemini":
            model = genai.GenerativeModel('gemini-1.5-pro')
            response = model.generate_content(
                f"{request.task.capitalize()} this text:\n\n{request.text}"
            )
            result = response.text
            tokens = len(request.text.split()) * 1.3  # Estimate
            cost = tokens * 0.00000125
            log_usage("gemini", tokens, cost)

        return {
            "model": request.model,
            "result": result,
            "tokens_used": tokens,
            "cost": f"${cost:.6f}"
        }

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/compare-models")
async def compare_models(request: ComparisonRequest):
    """
    Send same prompt to all models and compare responses
    """
    results = {}

    # GPT-4
    try:
        response = openai_client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=300
        )
        results["gpt-4"] = response.choices[0].message.content
    except Exception as e:
        results["gpt-4"] = f"Error: {str(e)}"

    # Claude
    try:
        response = claude_client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": request.prompt}]
        )
        results["claude"] = response.content[0].text
    except Exception as e:
        results["claude"] = f"Error: {str(e)}"

    # Gemini
    try:
        model = genai.GenerativeModel('gemini-1.5-pro')
        response = model.generate_content(request.prompt)
        results["gemini"] = response.text
    except Exception as e:
        results["gemini"] = f"Error: {str(e)}"

    return {"prompt": request.prompt, "responses": results}

@app.get("/usage-stats")
def get_usage_stats():
    """
    Get usage statistics and cost tracking
    """
    if not usage_log:
        return {"message": "No usage data yet"}

    total_cost = sum(entry["cost"] for entry in usage_log)
    total_tokens = sum(entry["tokens"] for entry in usage_log)

    model_breakdown = {}
    for entry in usage_log:
        model = entry["model"]
        if model not in model_breakdown:
            model_breakdown[model] = {"calls": 0, "tokens": 0, "cost": 0}
        model_breakdown[model]["calls"] += 1
        model_breakdown[model]["tokens"] += entry["tokens"]
        model_breakdown[model]["cost"] += entry["cost"]

    return {
        "total_api_calls": len(usage_log),
        "total_tokens": total_tokens,
        "total_cost": f"${total_cost:.4f}",
        "breakdown_by_model": model_breakdown,
        "recent_calls": usage_log[-10:]  # Last 10 calls
    }

if __name__ == "__main__":
    import uvicorn
    print("="*60)
    print("🚀 Starting Multimodal AI API Service")
    print("="*60)
    print("API Documentation: http://localhost:8000/docs")
    print("="*60)
    uvicorn.run(app, host="0.0.0.0", port=8000)

Task 5.2: Test the API

"""
Test the production API
"""
import requests
import json

BASE_URL = "http://localhost:8000"

def test_text_analysis():
    """Test text analysis endpoint"""
    response = requests.post(
        f"{BASE_URL}/analyze-text",
        json={
            "text": "Artificial intelligence is transforming industries worldwide.",
            "model": "gpt-4",
            "task": "analyze"
        }
    )
    print("Text Analysis Test:")
    print(json.dumps(response.json(), indent=2))

def test_model_comparison():
    """Test model comparison"""
    response = requests.post(
        f"{BASE_URL}/compare-models",
        json={"prompt": "Explain quantum computing in one sentence."}
    )
    print("\nModel Comparison Test:")
    print(json.dumps(response.json(), indent=2))

def test_usage_stats():
    """Test usage statistics"""
    response = requests.get(f"{BASE_URL}/usage-stats")
    print("\nUsage Statistics:")
    print(json.dumps(response.json(), indent=2))

if __name__ == "__main__":
    print("Testing API endpoints...\n")
    test_text_analysis()
    test_model_comparison()
    test_usage_stats()

✓ Exercise 5 Checklist:

FastAPI service running locally

All endpoints working correctly

Cost tracking operational

Error handling tested

Troubleshooting Common Issues

API Key Errors

Error: "Invalid API key" or "Authentication failed"

Solutions:

Verify .env file exists and contains correct keys
Check keys have no extra spaces or line breaks
Ensure keys have proper permissions (GPT-4 access, etc.)
Try regenerating keys from provider dashboards

Rate Limit Errors

Error: "Rate limit exceeded" or 429 status code

Solutions:

Add time.sleep(1) between API calls
Implement exponential backoff retry logic
Check your tier limits in provider dashboard
Consider upgrading to higher tier plan

Image/Video Processing Failures

Error: "File too large" or "Unsupported format"

Solutions:

Compress images to under 20MB (use PIL/Pillow)
Convert videos to MP4 format if needed
Check file isn't corrupted (try opening locally)
Ensure proper base64 encoding for images

Context Length Errors

Error: "Context length exceeded"

Solutions:

Truncate input text to fit model limits
Use Claude or Gemini for longer contexts
Implement text chunking and summarization
Consider using RAG (Retrieval Augmented Generation)

Bonus Challenges

🏆 Challenge 1: Add Rate Limiting

Implement a rate limiter using Redis or in-memory cache to prevent hitting API limits.

Hints: Use slowapi or fastapi-limiter package

🏆 Challenge 2: Streaming Responses

Modify the API to stream responses token-by-token using Server-Sent Events (SSE).

Hints: Use OpenAI's stream parameter and FastAPI's StreamingResponse

🏆 Challenge 3: Build a Dashboard

Create a web dashboard (React/Vue/vanilla JS) to visualize usage statistics and costs in real-time.

Hints: Use Chart.js for graphs, fetch /usage-stats endpoint every 5 seconds

🏆 Challenge 4: Multi-Step Workflow

Build a workflow that uses multiple models sequentially: GPT-4 for planning → Claude for execution → Gemini for verification.

Hints: Create a /workflow endpoint that chains calls and passes outputs

🏆 Challenge 5: Deploy to Production

Deploy your API service to AWS/GCP/Azure with Docker, HTTPS, and monitoring.

Hints: Use Docker Compose, Let's Encrypt for SSL, DataDog for monitoring

🎉 Lab Complete!

Congratulations on building a production-ready multimodal AI application!

What you've accomplished:

✅ Integrated OpenAI, Anthropic, and Google APIs
✅ Implemented text, vision, and video analysis
✅ Built a production REST API with FastAPI
✅ Added cost tracking and usage monitoring
✅ Compared capabilities of all three frontier models

Next Steps: Complete the Module 1 Quiz to test your knowledge!