Create Expert Content: Deploying a Multi-Agent System with Terraform and Cloud Run

1 day ago 4
Shir Meir Lador

Head of AI, Product DevRel

In support of our mission to accelerate the developer journey on Google Cloud, we built Dev Signal: a multi-agent system designed to transform raw community signals into reliable technical guidance by automating the path from discovery to expert creation. 

In the first three parts of this series, we laid the essential groundwork by establishing its core capabilities and local verification process:

In part 1, we standardize the agent's capabilities through the Model Context Protocol (MCP), connecting it to Reddit for trend discovery and Google Cloud Docs for technical grounding. In part 2, we built a multi-agent architecture and integrated the Vertex AI memory bank to allow the system to learn and persist user preferences across different conversations. In part 3, we verified the full end-to-end lifecycle locally using a dedicated test runner to ensure that research, content creation, and cloud-based memory retrieval were perfectly synchronized.

If you’d like to dive straight into the code, you can clone the repository here.

Deployment to Cloud Run and the Path to Production

To help you transition from this local prototype to a production service, this final part focuses on building the production backbone of your agent using the foundational deployment patterns provided by the Agent Starter Pack. We will implement the essential structural components required for monitoring, data integrity, and long-term state management in the cloud. You will learn to implement the application server and helper utilities needed for a production-ready deployment before provisioning secure, reproducible infrastructure with Terraform.

While the Dockerfile packages your agent's code and its specialized dependencies, such as Node.js for the Reddit MCP tool, Terraform is used to build the platform it lives on. Terraform automates the creation of your Artifact Registry, least-privilege service accounts, and Secret Manager integrations to ensure your API keys remain protected.

By the end of this part, you will have a standardized application framework deployed on Google Cloud Run and a roadmap for graduating your prototype through continuous evaluation, CI/CD and advanced observability.

Production Utilities and Server: Building the System's Body

In this section, you implement the structural components required for monitoring and long-term state management in the cloud.

  • The Application Server: Initializing the FastAPI server and establishing a vital connection to the Vertex AI memory bank.

  • Implementing Telemetry: Enabling 'Agent Traces' for visibility into internal reasoning.

The Application Server 

The fast_api_app.py file serves as the vital entry point for your agent, transforming the core logic into a production FastAPI server that acts as the "body" of your system. When deploying to Cloud Run, this server is essential because it provides the necessary web interface to listen for incoming HTTP requests and dispatch them to the agent for processing. Beyond basic serving, its most critical role is establishing a connection to the Vertex AI memory bank by defining a MEMORY_URI, which allows the ADK framework to persist and retrieve user preferences across different production sessions. Additionally, the application server initializes production-grade telemetry for real-time monitoring.

Go back to the dev_signal_agent folder.

Paste the following code in dev_signal_agent/fast_api_app.py: 

Implementing Telemetry

In a production environment, visibility into your agent's reasoning is critical. We leverage the built-in observability features of the Google ADK by setting the otel_to_cloud=True flag in our application server. This single parameter handles the majority of the instrumentation automatically, exporting "Agent Traces" directly to the Google Cloud Console. These traces provide a "visual waterfall" of the agent's operation, including individual agent thought processes, LLM invocations, and MCP tool calls.

Monitoring vs. Targeted Evaluation

It is essential to understand that production tracing is subject to sampling to balance performance and cost. Because Cloud Run captures only a subset of requests, not every individual user interaction will be visible.

  • System Traces (Monitoring): Used to analyze behavior "at large," such as identifying latency bottlenecks or system timeouts.

  • Reasoning Traces (Evaluation): High-quality evaluation mandates targeted trace capture. This means calling the agent specifically for a test case where you know you will evaluate that particular request in full detail.

Viewing the Trace

To see your traces, navigate to the Trace Explorer in the Google Cloud Console and filter for your service (e.g., dev-signal). Clicking a specific Trace ID opens a Gantt chart that allows you to distinguish between cognitive reasoning failures (wrong decisions) and physical system issues (timeouts).

For advanced configurations, refer to the following documentation:

Infrastructure as Code: Provisioning Secure Cloud Resources

We utilize the infrastructure-as-code patterns provided by the Agent Starter Pack's security-first design. The starter pack builds the professional platform required to automate the creation of least-privilege service accounts and robust secret management in seconds.

Using Terraform ensures that your entire Google Cloud environment - from IAM roles to Secret Manager versions - is defined in reproducible, secure code. We break our infrastructure into the following logical blocks:

  • Resources & Variables: Define the specific project, region, and sensitive API secrets used by the agent.

  • Core Infrastructure: Enable essential APIs and provision a private Artifact Registry to host your agent's container images.

  • Identity & Access Management (IAM): Configure specialized Service Accounts that strictly follow the Principle of Least Privilege to ensure your system remains secure.

  • Secret Management: Securely ingest API credentials into Google Secret Manager for protected runtime access.

  • Cloud Run Configuration: Define the container environment, resource limits, and automated secret injection for the final deployment.

To begin provisioning, return to the root folder of your project (dev-signal) and create the necessary deployment directories:

Terraform Resources and Variables

The variables.tf file defines the configurable parameters for your deployment, allowing you to customize the infrastructure without altering the underlying logic. It includes variables for the project_id, the deployment region (defaulting to us-central1), and the service_name for your Cloud Run instance. Furthermore, it defines a secrets map used to securely ingest sensitive API credentials—such as Reddit and Developer Knowledge keys—into Google Secret Manager for runtime access. This modular approach ensures your production environment remains reproducible, secure, and adaptable across different projects.

Paste the following code into deployment/terraform/variables.tf:

Core Infrastructure Logic 

We define our infrastructure in logical blocks. Here is what each part does:

1. Enable APIs: Ensures the project has the necessary services active (Cloud Run, Vertex AI, etc.). We use disable_on_destroy = false to prevent accidental data loss if the Terraform is destroyed.

Paste the following code into deployment/terraform/main.tf:

2. Artifact Registry: Creates a private Docker registry to store our agent's container images.

3. Service Account & IAM: Adhering to the Principle of Least Privilege - This is a critical security step. In accordance with the Principle of Least Privilege, we avoid using the default compute service account and instead provision a dedicated user-managed service account (dev-signal-sa). By designating this as the Cloud Run service identity, we can grant it only the minimum necessary permissions—specifically roles/aiplatform.user, roles/logging.logWriter, and roles/storage.objectAdmin. This granular access control ensures that the agent has the exact permissions required to interact with Vertex AI and Cloud Storage without over-granting access to other sensitive cloud resources, significantly reducing the potential impact of a compromised account. Learn more best practices for using service accounts securely.

4. Secret Management: This handles your API keys securely. It creates secrets in Google Secret Manager and gives the agent's Service Account permission to access them at runtime.

5. Cloud Run Configuration: 

Security Best Practice: To satisfy production security standards, our main.tf grants the Service Account the secretmanager.secretAccessor role. Our Python application then uses the Secret Manager SDK to pull these credentials directly into local memory at runtime, ensuring they never touch the container's environment configuration

Provision the Infrastructure

Before we can deploy our code, we need to provision the Google Cloud infrastructure we just defined.

Initialize Terraform: This downloads the necessary provider plugins. Run this in deployment/terraform folder:

Create a Variables File

Paste this code in  deployment/terraform/terraform.tfvars and update it with your project details and secrets.

Plan configuration: This allows you to review the changes before they are applied. Run this in the deployment/terraform folder:

Apply Configuration: Once you have reviewed the plan and confirmed it does what you want, run:

Deployment: Containerization and the Cloud Build Pipeline

In this final stage of the build process, we package our agent's "body" and "brain" into a portable, production-ready container. This ensures that every component - from our Python logic to the Node.js environment required for the Reddit MCP tool - is bundled together with its exact dependencies.

We utilize a Dockerfile to define this environment and a Makefile to orchestrate the deployment pipeline. When you trigger the deployment, Google Cloud Build takes your local source code, builds the container image according to the Dockerfile, and stores it in the private Artifact Registry created earlier by Terraform. Finally, the pipeline automatically updates your Cloud Run service to serve traffic using this fresh image, completing the journey from local code to a live, secure cloud workload.

Paste this code in dev-signal/Dockerfile:

The Makefile automates the build and deploys.

Paste this code in dev-signal/Makefile:

Deploy Application

Now that our infrastructure is ready, we can build and deploy the application code.

Run the following command from the root of your project:

What happens when you run this?

  1. Build: Google Cloud Build takes your local code and the Dockerfile, builds a container image, and stores it in the Artifact Registry.

  2. Deploy: It updates the Cloud Run service defined in Terraform to use this new image.

When the deployment completes, you should get a message like this:

Service [dev-signal] revision [dev-signal...] has been deployed and is serving 100 percent of traffic.

Service URL: https://dev-signal-...-.us-central1.run.app

Verification: Accessing and Testing Your Deployed Agent

Since production services are private by default, this section covers how to grant permissions and access the agent securely.

Managing IAM Permissions: Granting the necessary run.invoker role to authorized users.

Secure Access via Cloud Run Proxy: Using the gcloud proxy to interact with your live service.

Granting User Permissions

Before you can invoke the service, you must grant your Google account the roles/run.invoker role for this specific service. Run the following command:

Launch the Proxy

Now, access your private service securely via the proxy:

Visit http://localhost:8080 to chat with your deployed agent! See a possible test scenario in part 3 of the series.

Summary

Congratulations! You have successfully built Dev Signal.

What we covered:

  1. Tooling (MCP): You connected your agent to Reddit, Google Docs, and a Local Image Generator using the Model Context Protocol.

  2. Architecture: You implemented a Root Orchestrator managing specialized agents (Scanner, Expert, Drafter).

  3. Memory: You integrated Vertex AI memory bank to give your agent long-term persistence across sessions.

  4. Production: You deployed the entire stack to Google Cloud Run using Terraform for secure, reproducible infrastructure.

You now have a solid foundation for building sophisticated, stateful AI applications on Google Cloud.

Posted in
Read Entire Article