Skip to main content

Deploy OpenAI Apps to GCP Vertex AI

Let's assume you have an application that uses an OpenAI client library and you want to deploy it to the cloud using GCP Vertex AI.

This tutorial shows you how Defang makes it easy.

info

You must configure GCP Vertex AI model access for each model you intend to use in your GCP account.

Suppose you start with a compose.yaml file with one app service, like this:

services:
app:
build:
context: .
ports:
- 3000:3000
environment:
OPENAI_API_KEY:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/"]

Add an LLM Service to Your Compose File

You can use Vertex AI without changing your app code by introducing a new defangio/openai-access-gateway service. We'll call the new service llm. This new service will act as a proxy between your application and Vertex AI, and will transparently handle converting your OpenAI requests into Vertex AI requests and Vertex AI responses into OpenAI responses. This allows you to use Vertex AI with your existing OpenAI client SDK.

+  llm:
+ image: defangio/openai-access-gateway
+ x-defang-llm: true
+ ports:
+ - target: 80
+ published: 80
+ mode: host
+ environment:
+ - OPENAI_API_KEY
+ - GCP_PROJECT_ID
+ - REGION

Notes:

  • The container image is based on aws-samples/bedrock-access-gateway, with enhancements.
  • x-defang-llm: true signals to Defang that this service should be configured to use target platform AI services.
  • New environment variables:
    • REGION is the zone where the services runs (e.g. us-central1)
    • GCP_PROJECT_ID is your project to deploy to (e.g. my-project-456789)
tip

OpenAI Key

You no longer need your original OpenAI API Key.
We recommend generating a random secret for authentication with the gateway:

defang config set OPENAI_API_KEY --random

Redirect Application Traffic

Modify your app service to send API calls to the openai-access-gateway:

 services:
app:
ports:
- 3000:3000
environment:
OPENAI_API_KEY:
+ OPENAI_BASE_URL: "http://llm/api/v1"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/"]

Now, all OpenAI traffic will be routed through your gateway service and onto GCP Vertex AI.


Selecting a Model

You should configure your application to specify the model you want to use.

 services:
app:
ports:
- 3000:3000
environment:
OPENAI_API_KEY:
OPENAI_BASE_URL: "http://llm/api/v1"
+ MODEL: "google/gemini-2.5-pro-preview-03-25" # for Vertex AI
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/"]

Choose the correct MODEL depending on which cloud provider you are using.

Ensure you have the necessary permissions to access the model you intend to use. To do this, you can check your AWS Bedrock model access or GCP Vertex AI model access.

info

Choosing the Right Model

Alternatively, Defang supports model mapping through the openai-access-gateway. This takes a model with a Docker naming convention (e.g. ai/llama3.3) and maps it to the closest matching one on the target platform. If no such match can be found, it can fallback onto a known existing model (e.g. ai/mistral). These environment variables are USE_MODEL_MAPPING (default to true) and FALLBACK_MODEL (no default), respectively.

Complete Example Compose File

services:
app:
build:
context: .
ports:
- 3000:3000
environment:
OPENAI_API_KEY:
OPENAI_BASE_URL: "http://llm/api/v1"
MODEL: "google/gemini-2.5-pro-preview-03-25"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/"]

llm:
image: defangio/openai-access-gateway
x-defang-llm: true
ports:
- target: 80
published: 80
mode: host
environment:
- OPENAI_API_KEY
- GCP_PROJECT_ID
- REGION

Environment Variable Matrix

VariableGCP Vertex AI
GCP_PROJECT_IDRequired
REGIONRequired
MODELVertex model ID or Docker model name, for example publishers/meta/models/llama-3.3-70b-instruct-maas or ai/llama3.3

You now have a single app that can:

  • Talk to GCP Vertex AI
  • Use the same OpenAI-compatible client code
  • Easily switch between models or cloud providers by changing a few environment variables