GPT-5.4 API Guide 2026: Access OpenAI Worldwide

Open Source LLM Hub

DeepSeek R1 / Qwen 3.5 / Llama 4 — run locally, no API needed

Explore Models →

What This Guide Covers: GPT-5.4 (model ID: gpt-5.4-2026-03-05) became available on March 5, 2026. This guide provides a complete developer reference: every supported API endpoint, exact pricing figures, rate limits across all tiers, the new tools available exclusively with GPT-5.4, and a practical walkthrough for developers who need to access OpenAI's API from countries where it is restricted or throttled. If you've been locked out of OpenAI due to geographic restrictions, this guide has the solution.

GPT-5.4 Model IDs and Versions

OpenAI provides both an alias and a versioned snapshot ID. Use the versioned ID when you need consistent, locked behavior for production systems:

Alias (Latest)

model="gpt-5.4"

Always points to the latest GPT-5.4 snapshot. OpenAI may update this silently. Use for development and experimentation where you want the latest improvements automatically.

Versioned Snapshot (Stable)

model="gpt-5.4-2026-03-05"

Locked to the March 5, 2026 release. Behavior is guaranteed to remain identical. Use for production systems where reproducibility and regression testing matter.

GPT-5.4 Pro Variant

A GPT-5.4 Pro variant is available for high-performance workloads. Model ID: gpt-5.4-pro-2026-03-05. Pro offers higher rate limits, priority processing, and optimizations for sustained enterprise throughput. Pricing is higher — check OpenAI's pricing page for current Pro rates.

Complete Pricing Reference

GPT-5.4 pricing is competitive for its capability tier. Understanding the token economics helps you estimate costs before building:

Token Type	Standard Price	Batch API Price	Notes
Input tokens	$2.50/1M	$1.25/1M	All modalities: text and image
Cached input tokens	$0.25/1M	$0.125/1M	90% discount — reuse expensive system prompts
Output tokens	$15.00/1M	$7.50/1M	Main cost driver for long completions
Long context surcharge	2× input, 1.5× output	Same multiplier	Applied when input > 272K tokens in a session
Regional processing	+10% all prices	+10%	Data residency endpoints (EU, Japan, etc.)

$0.0025

Per 1K input tokens

$0.00025

Per 1K cached tokens

$0.015

Per 1K output tokens

128K

Max output tokens

All Supported API Endpoints

GPT-5.4 supports a comprehensive set of endpoints. Notably, it is the first GPT model to support computer use via the Responses API:

Endpoint	Path	GPT-5.4	Key Use Case
Responses API	v1/responses	✅ Primary	Agentic workflows, computer use, multi-turn with tools
Chat Completions	v1/chat/completions	✅	Standard Q&A, code generation, text tasks
Realtime	v1/realtime	✅	Voice agents, real-time audio transcription
Assistants	v1/assistants	✅	Persistent threads with tools and file access
Batch	v1/batch	✅ (50% off)	Async bulk processing — best cost efficiency
Fine-tuning	v1/fine-tuning	❌ Not supported	Use GPT-5 mini for fine-tuning use cases
Images	v1/images/generations	✅	Image generation via DALL-E integration
Videos	v1/videos	✅ New	Video generation (Sora integration)

New Tools Exclusive to GPT-5.4

GPT-5.4 introduces several new tools available through the Responses API that were not available in previous models:

computer_use

The flagship new tool. Enables the model to receive screenshots and output click/type/scroll/key actions. Tool call: {"type": "computer_use"}

mcp (Model Context Protocol)

Native MCP support lets GPT-5.4 connect directly to MCP servers — databases, APIs, file systems — without custom integration code.

hosted_shell

Execute shell commands in a sandboxed environment managed by OpenAI. The model can run code, install packages, and process data without you managing a separate execution environment.

skills

Install pre-built capability packages that extend the model's abilities for specific domains (accounting workflows, legal document analysis, medical coding, etc.).

# Using Multiple Tools in a Single GPT-5.4 Response

response = client.responses.create(

model="gpt-5.4-2026-03-05",

tools=[

{"type": "computer_use"},

{"type": "web_search"},

{"type": "code_interpreter"},

{"type": "file_search", "vector_store_ids": ["vs_abc123"]}

reasoning={"effort": "high"},

input=[{"role": "user", "content": "Research competitors, write a comparison, and save to a spreadsheet"}]

)

Rate Limits by Usage Tier

Rate limits automatically increase as you spend more. Here are the limits for the long context variant (which most GPT-5.4 use cases require):

Tier	RPM	TPM	Batch Queue	Unlock Requirement
Free	Not supported	N/A	N/A	GPT-5.4 requires paid API
Tier 1	500	500K	1.5M	$5 spend + 7 days
Tier 2	5,000	1M	3M	$50 spend + 7 days
Tier 3	5,000	2M	100M	$100 spend + 7 days
Tier 4	10,000	4M	200M	$250 spend + 14 days
Tier 5	15,000	40M	15B	$1,000 spend + 30 days

Getting Started: Full Code Examples

1. Basic Chat Completion (Python)

from openai import OpenAI

client = OpenAI() # reads OPENAI_API_KEY from env

response = client.chat.completions.create(

model="gpt-5.4",

messages=[

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "Explain GPT-5.4's computer use capability in simple terms."}

reasoning={"effort": "medium"} # none, low, medium, high, xhigh

)

2. Streaming Response (for Real-Time UI)

with client.chat.completions.stream(

model="gpt-5.4",

messages=[{"role": "user", "content": "Write a 1000-word essay on AI agents."}]

) as stream:

for text in stream.text_stream:

print(text, end="", flush=True)

3. Batch Processing (50% Cost Reduction)

# Create a batch JSONL file

batch = client.batches.create(

input_file_id="file-abc123", # JSONL with multiple requests

endpoint="/v1/chat/completions",

completion_window="24h"

)

Accessing GPT-5.4 API from Restricted Countries

OpenAI's API is currently unavailable or severely throttled in the following countries and territories:

China Mainland

Blocked

Russia

Restricted

Iran / North Korea

Blocked

Some African regions

Limited access

Even in countries where OpenAI is technically available, some ISPs throttle HTTPS traffic to api.openai.com, causing timeout errors at the API level. This manifests as random ReadTimeoutError or ConnectionResetError exceptions in your code, especially during long-running streaming or computer use sessions.

API Connection Stability Test

import time, openai

client = openai.OpenAI()

for i in range(5):

start = time.time()

client.models.list()

print(f"Ping {i+1}: {(time.time()-start)*1000:.0f}ms")

Run this test before and after connecting to VPN07. Target: under 200ms average, zero failures over 5 pings. If you see timeouts or >500ms without VPN, your ISP is throttling OpenAI API traffic.

Solution: VPN Proxy for API Calls

Route your API traffic through a VPN server in a supported country. The openai Python SDK respects standard HTTP proxy environment variables:

# Set in your shell environment

export HTTPS_PROXY="http://127.0.0.1:7890" # VPN07 local proxy port

export HTTP_PROXY="http://127.0.0.1:7890"

# Or pass to client directly

import httpx

client = openai.OpenAI(http_client=httpx.Client(proxy="http://127.0.0.1:7890"))

Best Practices for Production GPT-5.4 Integration

Use Prompt Caching for Repeated System Prompts

If your system prompt is large and consistent across requests (common in agentic setups), cached input tokens cost 90% less ($0.25 vs $2.50 per 1M). Structure your prompts so the large static context comes first — OpenAI automatically caches the longest matching prefix.

Use Batch API for Non-Urgent High-Volume Work

The Batch API processes requests asynchronously within 24 hours at 50% discount. For tasks like bulk document analysis, content generation for a product catalog, or data extraction from thousands of files, batch mode cuts your costs in half with no quality difference.

Implement Exponential Backoff for Rate Limit Errors

GPT-5.4 will occasionally return 429 rate limit errors at lower tiers during high-traffic periods. Always implement retry logic with exponential backoff: wait 1s, then 2s, then 4s, then 8s before each retry attempt. The openai Python SDK includes built-in retry with max_retries=3.

Pin Model Version for Production

Always use gpt-5.4-2026-03-05 (not just gpt-5.4) in production. When OpenAI releases GPT-5.5 or updates the alias, your system prompt, few-shot examples, and expected output format may need recalibration. Pinning prevents surprise behavioral changes after alias updates.

Pre-Launch API Integration Checklist

Before shipping a GPT-5.4 integration to production, run through this checklist to ensure reliability and cost predictability:

✓

API key stored in environment variable

Never hardcode API keys in source code. Use env vars or secrets managers.

✓

Max tokens limit set per request

Use max_completion_tokens to cap output length and prevent runaway generation costs.

✓

Retry logic with exponential backoff

Handle 429 (rate limit) and 503 (server busy) errors gracefully with increasing wait times.

✓

Spend limit configured in OpenAI dashboard

Set both a monthly soft limit (alert) and hard limit (stop) to prevent unexpected billing surprises.

✓

Model ID pinned to versioned snapshot

Use gpt-5.4-2026-03-05 not gpt-5.4 for production stability.

✓

Token usage logged per request

Track input/output/cached token counts for cost analysis and per-user billing if applicable.

✓

Network connectivity tested from deployment region

Verify that your production server can reach api.openai.com without throttling. Use VPN07 if deploying in Asia.

✓

Content moderation outputs handled

GPT-5.4 may refuse certain requests. Your application must gracefully handle finish_reason: "content_filter" responses.

Frequently Asked Questions

Q: Is there a free tier for GPT-5.4?

No. GPT-5.4 is not available on the free tier. You need a paid OpenAI API account and at least $5 in prior spend to unlock Tier 1 access. GPT-4o mini and GPT-5 mini do have free tier access for basic experimentation, but GPT-5.4 requires billing setup. Once you're on a paid plan, the minimum spend to start using GPT-5.4 is low — even a $10 prepaid credit gets you access to hundreds of typical completions.

Q: Can I use GPT-5.4 API if I'm not in a supported country?

OpenAI's terms of service require usage from supported countries. Many developers work around geographic restrictions for legitimate development purposes by routing API calls through VPN servers located in the US, UK, or EU. VPN07 provides 1000Mbps connections to 70+ countries, including optimal routing to US and EU OpenAI data centers for developers in Asia and other regions.

Q: What is the knowledge cutoff for GPT-5.4?

GPT-5.4's training data has a knowledge cutoff of August 31, 2025. For current information beyond that date, use the built-in web_search tool in the Responses API, which retrieves live search results. This is particularly important for news, recent AI model releases, market data, and any rapidly evolving topic areas.

Reasoning Effort: A New GPT-5.4 Feature

GPT-5.4 introduces a reasoning.effort parameter that controls how much internal computation the model spends before answering. This is a major tool for balancing cost and quality:

Effort Level	Speed	Quality	Token Cost	Best For
none	Fastest	Standard	1×	Simple Q&A, summaries, translations
low	Fast	Good	1.2×	Coding, classification, extraction tasks
medium	Moderate	High	1.5×	Analysis, document review, planning
high	Slow	Very High	2×	Math, complex reasoning, code debugging
xhigh	Slowest	Maximum	3–4×	Research synthesis, advanced agentic tasks

# Setting Reasoning Effort by Task Type

response = client.chat.completions.create(

model="gpt-5.4",

messages=[{"role":"user","content":prompt}],

reasoning={"effort": "high"} # Use for complex problems

)

Cost Optimization with Reasoning Effort

Default (none) reasoning is sufficient for 70% of typical tasks. Using reasoning={"effort":"none"} on simple tasks and reserving high or xhigh for genuinely complex reasoning tasks can reduce your total API spend by 40–60% without sacrificing quality where it matters.

Complete JavaScript / Node.js Example

For developers building web applications and backend services with Node.js, here's the complete integration pattern:

// Node.js / TypeScript integration for GPT-5.4

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function callGPT54(userMessage: string): Promise<string> {

const response = await client.chat.completions.create({

model: "gpt-5.4-2026-03-05",

messages: [{ role: "user", content: userMessage }],

reasoning: { effort: "medium" },

max_completion_tokens: 4096,

});

return response.choices[0].message.content ?? "";

}

// With automatic retry via VPN-routed HTTP client

const proxyClient = new OpenAI({

httpAgent: new HttpsProxyAgent("http://127.0.0.1:7890"), // VPN07 proxy

maxRetries: 3,

});

Want Zero-Cost AI? Run Models Locally

DeepSeek R1 / Qwen 3.5 — no API fees, full privacy

View All Models →

VPN07 — Unlock GPT-5.4 API Globally

1000Mbps · 70+ Countries · Trusted Since 2015

Calling the GPT-5.4 API from a restricted region? API timeouts ruining your agent workflows? VPN07 provides 1000Mbps bandwidth through 70+ countries, with servers optimized for low-latency connections to OpenAI's US and EU endpoints. Our system has been running continuously for over 10 years — the reliability your production API integrations require. At $1.5/month with a 30-day money-back guarantee, VPN07 is the most cost-effective way to ensure uninterrupted GPT-5.4 access anywhere in the world.

$1.5

Per Month

1000Mbps

Bandwidth

70+

Countries

30 Days

Money Back

Start Free Trial → View Pricing

GPT-5.4 vs DeepSeek R1 vs Qwen 3.5: Best AI 2026

Full benchmark comparison: GPT-5.4 vs the best free open-source models. Benchmarks, cost analysis, and which model wins for your use case.

GPT-5.4 1M Context 2026: Complete Workflow Guide

How to actually use GPT-5.4's 1 million token context window. Documents, codebases, legal contracts — complete practical guide with examples.

GPT-5.4 API Guide 2026: Access OpenAI From Any Country

GPT-5.4 Model IDs and Versions

Alias (Latest)

Versioned Snapshot (Stable)

GPT-5.4 Pro Variant

Complete Pricing Reference

All Supported API Endpoints

New Tools Exclusive to GPT-5.4

computer_use

mcp (Model Context Protocol)

hosted_shell

skills

Rate Limits by Usage Tier

Getting Started: Full Code Examples

1. Basic Chat Completion (Python)

2. Streaming Response (for Real-Time UI)

3. Batch Processing (50% Cost Reduction)

Accessing GPT-5.4 API from Restricted Countries

API Connection Stability Test

Solution: VPN Proxy for API Calls

Best Practices for Production GPT-5.4 Integration

Use Prompt Caching for Repeated System Prompts

Use Batch API for Non-Urgent High-Volume Work

Implement Exponential Backoff for Rate Limit Errors

Pin Model Version for Production

Pre-Launch API Integration Checklist

API key stored in environment variable

Max tokens limit set per request

Retry logic with exponential backoff

Spend limit configured in OpenAI dashboard

Model ID pinned to versioned snapshot

Token usage logged per request

Network connectivity tested from deployment region

Content moderation outputs handled

Frequently Asked Questions

Q: Is there a free tier for GPT-5.4?

Q: Can I use GPT-5.4 API if I'm not in a supported country?

Q: What is the knowledge cutoff for GPT-5.4?

Reasoning Effort: A New GPT-5.4 Feature

Cost Optimization with Reasoning Effort

Complete JavaScript / Node.js Example

VPN07 — Unlock GPT-5.4 API Globally

Related Articles

GPT-5.4 vs DeepSeek R1 vs Qwen 3.5: Best AI 2026

GPT-5.4 1M Context 2026: Complete Workflow Guide