Building AI Agents with Serverless, Strands, and MCP

By leveraging serverless compute for scalability, Strands for streamlined agent development, and MCP for modular tool integration, you can create robust, enterprise-grade agents with minimal overhead.

18 5 minutes read

This tutorial provides a comprehensive, expert-level guide to building production-ready AI agents using AWS serverless technologies, the Strands Agents SDK, and the Model Context Protocol (MCP).

AI agents represent a shift from passive LLMs to autonomous systems capable of planning, reasoning, and acting on complex tasks.

By leveraging serverless compute for scalability, Strands for streamlined agent development, and MCP for modular tool integration, you can create robust, enterprise-grade agents with minimal overhead.

We’ll cover key concepts, architecture, prerequisites, step-by-step implementation (including code examples), advanced topics like multi-agent systems and multimodal capabilities, best practices, and deployment strategies. This draws from AWS best practices and hands-on examples to ensure your agents are scalable, secure, and efficient.

Key Concepts

AI Agents and the Agentic Loop

AI agents are goal-oriented systems that use LLMs to decompose tasks into steps, execute actions (e.g., API calls), and iterate based on results. The core mechanism is the agentic loop:

Observe: Gather context from user input and memory.
Plan/Reason: Use LLM to decide next steps.
Act: Invoke tools or external services.
Reflect: Feed results back into the loop until resolution.

This enables dynamic behavior, such as a travel agent booking flights while adhering to policies.

Serverless Computing on AWS

Serverless architectures (e.g., AWS Lambda, API Gateway) allow agents to run without managing infrastructure. Benefits include auto-scaling, pay-per-use pricing, and seamless integration with AWS services like Amazon Bedrock for LLMs. Agents become stateless functions, with state externalized to services like Amazon S3 for session persistence.

Strands Agents SDK

Strands is a lightweight, code-first Python SDK for building agents. It handles boilerplate like memory management, tool registration, and observability, allowing agents to run on Lambda, ECS, or locally. Key features:

Modular design for composability.
Built-in support for 40+ tools (e.g., image generation, web browsing).
Integration with multiple LLMs (e.g., Bedrock, Anthropic).

Model Context Protocol (MCP)

MCP is an open standard for decoupling agents from tools via a client-server model. Agents act as MCP clients connecting to MCP servers that expose tools, prompts, and resources. This promotes reuse, versioning, and governance—e.g., a quiz service MCP server can be shared across agents without embedding logic.

Architecture Overview

A typical serverless AI agent architecture includes:

Component	Role	AWS Service Examples
Compute	Run agent logic	AWS Lambda, Amazon ECS
LLM Access	Power reasoning	Amazon Bedrock
State Management	Persist sessions/memory	Amazon S3 (Vectors for semantic memory), DynamoDB
API Layer	Handle requests/auth	Amazon API Gateway, Cognito
Tool Integration	Execute actions	In-process tools (Strands) or remote (MCP servers on Lambda)
Observability	Monitor performance	CloudWatch, OpenTelemetry

This setup ensures scalability: Agents scale horizontally, MCP servers handle tool logic independently, and serverless compute manages bursts in demand.

Prerequisites

AWS account with access to Amazon Bedrock, Lambda, S3, API Gateway, and Cognito.
Python 3.9+ and pip.

Install dependencies:

pip install strands-agents strands-agents-tools boto3 fastapi uvicorn mcp

AWS CLI configured (aws configure).
Familiarity with LLMs and Python.

For multimodal agents, enable Amazon Nova models in Bedrock.

Step-by-Step Guide to Building an AI Agent

Step 1: Set Up Your Environment

Create an S3 bucket for session storage:

aws s3 mb s3://your-agent-sessions --region us-east-1

Set up IAM roles for Lambda with permissions for Bedrock, S3, and other services.

Step 2: Build a Basic Agent with Strands

Start with a simple travel assistant agent.

from strands import Agent
from strands.models import BedrockModel
from strands.session import S3SessionManager

# Configure session manager for serverless persistence
session_manager = S3SessionManager(
    session_id="user_session_123",
    bucket="your-agent-sessions",
    prefix="sessions/"
)

# Define the agent
agent = Agent(
    system_prompt="You are a travel assistant that books business trips per company policy.",
    model=BedrockModel(model_id="anthropic.claude-3-5-sonnet-20241022-v2:0"),  # Use Bedrock LLM
    session_manager=session_manager
)

# Invoke the agent
response = agent("Book a flight to NYC next Monday and find a hotel under $200/night.")
print(response.text)

This agent uses the agentic loop to reason and respond, persisting state in S3 for follow-up queries.

Step 3: Add In-Process Tools

Extend the agent with custom tools using the @tool decorator.

from strands import tool

@tool
def get_weather(city: str) -> str:
    # Simulate API call (replace with real integration)
    return f"Sunny in {city} with 75°F."

@tool
def book_flight(destination: str, date: str) -> str:
    # Integrate with airline API
    return f"Flight to {destination} on {date} booked."

# Add tools to agent
agent = Agent(
    ...  # Previous config
    tools=[get_weather, book_flight]
)

The agent now autonomously calls tools during the loop.

Step 4: Integrate MCP for Remote Tools

Build an MCP server for reusable tools (e.g., a policy checker).

MCP Server (policy_mcp_server.py):

from mcp.server import FastMCP

mcp = FastMCP(name="Company Policy Service", host="0.0.0.0", port=8080)

POLICIES = {"hotel_max": 200, "flight_class": "economy"}

@mcp.tool()
def check_policy(category: str, value: float) -> dict:
    if category == "hotel" and value > POLICIES["hotel_max"]:
        return {"approved": False, "reason": "Exceeds budget"}
    return {"approved": True}

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

Run: python policy_mcp_server.py

Connect Agent to MCP:

from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client

mcp_client = MCPClient(lambda: streamablehttp_client("http://localhost:8080/mcp"))

with mcp_client:
    tools = mcp_client.list_tools_sync()
    agent.tool_registry.process_tools(tools)  # Add MCP tools to agent

Deploy the MCP server as a Lambda function for serverless operation.

Step 5: Add Multimodal Capabilities

Use pre-built tools for images/videos.

from strands_tools import generate_image, image_reader

agent = Agent(
    ...  # Previous config
    tools=[generate_image, image_reader]
)

response = agent("Generate an image of NYC skyline and describe it.")

For videos, integrate Amazon Nova via Bedrock.

Step 6: Implement Persistent Memory

Use S3 Vectors for semantic memory across sessions.

from strands.memory import S3VectorMemory

memory = S3VectorMemory(
    bucket="your-memory-bucket",
    index_name="agent-memory",
    embedding_model=BedrockModel(model_id="amazon.titan-embed-text-v2:0")
)

agent = Agent(
    ...  # Previous
    memory=memory
)

# Store and recall
memory.add("User prefers window seats.")
recalled = memory.search("Flight preferences")

This enables personalization, like recalling past trips.

Step 7: Deploy to Serverless Production

Use AWS Lambda for the agent and MCP server.

Package code: Zip your Python files and dependencies.
Create Lambda functions via AWS Console or Terraform.
Front with API Gateway for auth (Cognito JWT).

For managed runtime, use Bedrock AgentCore:

agentcore configure -e your_agent.py
agentcore launch

Monitor with CloudWatch: Track metrics like cycle time, token usage.

Advanced Topics

Multi-Agent Systems

Build collaborative networks:

Swarm Pattern: Parallel agents for tasks (e.g., one for flights, one for hotels).
Graph-Based: Define dependencies (e.g., book hotel after flight).

from strands import AgentGraph

graph = AgentGraph()
graph.add_agent(flight_agent)
graph.add_agent(hotel_agent, depends_on=[flight_agent])

response = graph.execute("Plan a trip to NYC.")

From the learning path, this enables complex workflows.

Security and Observability

Auth: Use Cognito for user-aware agents.
Sanitize inputs to prevent injections.
Trace with OpenTelemetry: Strands auto-exports metrics (e.g., token counts, durations).
Alerts: Set CloudWatch alarms for high costs.

Best Practices

Development: Prototype locally, test with Jupyter notebooks, then deploy.
Performance: Monitor token usage; use sliding window for memory.
Scalability: Externalize all state; use MCP for tool modularity.
Cost Management: Track via AWS Cost Explorer; optimize LLM calls.
Error Handling: Implement retries and fallbacks in tools.
Testing: Use RAGAS for evaluation; simulate multi-session scenarios.

Avoid tight coupling—design agents as composable units.

Conclusion

By combining serverless compute, Strands Agents SDK, and MCP, you can build scalable AI agents that handle real-world tasks autonomously. Start with basics, iterate to multimodal and multi-agent systems, and deploy confidently to production. For full examples, explore the referenced repositories and AWS documentation. This approach minimizes code while maximizing flexibility, making it ideal for enterprise AI development.