Qwen3.5-Plus API Tutorial: Build AI Agents with OpenAI SDK
Quick Summary: Qwen3.5-Plus is Alibaba's recommended production API model as of February 2026, accessible via Alibaba Cloud ModelStudio with a fully OpenAI-compatible endpoint. This tutorial covers the complete workflow: getting an API key, making your first call, using tool calling for AI agents, processing images and video, streaming responses, and building a practical autonomous research agent — all using the standard OpenAI Python SDK.
What Is Qwen3.5-Plus and Why Build With It?
Qwen3.5-Plus is Alibaba Cloud's production-grade API offering from the Qwen3.5 model family. Released February 16, 2026 alongside the open-weight model releases, Qwen3.5-Plus is specifically optimized for API deployment with fast inference, consistent output quality, and robust tool calling capabilities for agent applications.
Unlike running Qwen3.5 models locally, Qwen3.5-Plus through the API means no hardware investment, no model management, and consistent performance regardless of your local machine's specs. And unlike OpenAI's GPT-5 API, Qwen3.5-Plus is significantly more affordable — starting at $0.10 per million input tokens and $0.30 per million output tokens via the DashScope/ModelStudio platform.
Qwen3.5-Plus Capabilities
- Text generation, reasoning, and summarization
- Function/tool calling for AI agent workflows
- Image and video understanding (multimodal)
- GUI interaction and web automation
- Code generation and execution planning
- 201-language multilingual support
API Pricing vs OpenAI GPT-5
Qwen3.5-Plus is approximately 25x cheaper than GPT-5 for the same task.
Step 1: Get Your Alibaba Cloud API Key
Qwen3.5-Plus is accessed through Alibaba Cloud's ModelStudio (DashScope) platform. Getting set up takes about 5 minutes:
Register on Alibaba Cloud
Visit dashscope.aliyuncs.com or modelstudio.aliyun.com. Register with your email or phone number. You can also sign up at qwen.ai and authenticate via Qwen OAuth — this immediately gives you access to Qwen3.5-Plus without requiring separate Alibaba Cloud account setup.
Navigate to API Keys Section
In the DashScope console: click your avatar → API Key Management. Click Create API Key and give it a name (e.g., "qwen-agent-dev"). Copy the generated key immediately — it won't be shown again in full after closing the dialog.
Set Environment Variable
# Linux / macOS
export DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxx"
# Windows PowerShell
$env:DASHSCOPE_API_KEY = "sk-xxxxxxxxxxxxxxxx"
Accessing Alibaba Cloud from Outside China
Alibaba Cloud's API endpoints are globally accessible from most countries. However, users in some regions may experience connectivity issues or need a reliable international connection for consistent API response times. VPN07's 1000Mbps network with servers across 70+ countries ensures stable, low-latency API calls whether you're using DashScope from the US, Europe, Southeast Asia, or anywhere else.
Step 2: First API Call with OpenAI SDK
The Qwen3.5-Plus API is 100% compatible with the OpenAI Python SDK. Just change two parameters — the base URL and API key — and your existing OpenAI code immediately works with Qwen3.5-Plus.
# Install: pip install openai
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ.get("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen3.5-plus",
messages=[
{
"role": "system",
"content": "You are a helpful assistant with expertise in AI and technology."
},
{
"role": "user",
"content": "Summarize the key improvements in Qwen3.5 compared to previous Qwen versions."
}
],
max_tokens=1024,
temperature=0.7
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
One-Line Migration from OpenAI
If you have existing OpenAI code, the migration requires only two changes:
# Before (OpenAI):
client = OpenAI(api_key="sk-openai...")
model="gpt-4o"
# After (Qwen3.5-Plus, 25x cheaper):
client = OpenAI(api_key="sk-dashscope...", base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")
model="qwen3.5-plus"
Step 3: Tool Calling — The Foundation of AI Agents
Tool calling (function calling) is what transforms Qwen3.5-Plus from a chatbot into an AI agent. By defining functions that the model can invoke, you enable Qwen3.5-Plus to take real-world actions: search the web, query databases, send emails, call external APIs, and more.
# Tool calling example: weather lookup agent
import json
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
# Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. Tokyo, New York"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
]
messages = [
{"role": "user", "content": "What's the weather in Tokyo and should I bring an umbrella?"}
]
# First call: model decides which tool to use
response = client.chat.completions.create(
model="qwen3.5-plus",
messages=messages,
tools=tools,
tool_choice="auto"
)
# Check if model wants to call a tool
if response.choices[0].finish_reason == "tool_calls":
tool_call = response.choices[0].message.tool_calls[0]
function_args = json.loads(tool_call.function.arguments)
# Execute the actual function
weather_result = get_weather(function_args["city"]) # your actual function
# Add tool result to conversation
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(weather_result)
})
# Final response with tool results
final_response = client.chat.completions.create(
model="qwen3.5-plus",
messages=messages
)
print(final_response.choices[0].message.content)
Parallel Tool Calls
Qwen3.5-Plus supports parallel tool calling — in a single response, it can request multiple tool executions simultaneously. For agent workflows that need to fetch data from multiple sources, this dramatically reduces latency. Example: a research agent can simultaneously search Wikipedia, query a news API, and retrieve a stock price in one model call instead of three sequential ones.
Step 4: Vision API — Process Images and Video
Qwen3.5-Plus is natively multimodal. You can send images, video frames, and audio alongside text in the same API call. This makes it powerful for visual analysis tasks, document processing, UI testing, and content moderation.
# Vision: analyze an image from URL
response = client.chat.completions.create(
model="qwen3.5-plus",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/chart.png"
}
},
{
"type": "text",
"text": "Analyze this chart and identify the key trends. What business insights can you extract?"
}
]
}
]
)
print(response.choices[0].message.content)
Image Analysis
Charts, screenshots, photos, diagrams — Qwen3.5-Plus extracts structured information from any image format
Document OCR
Scored 90.8 on OmniDocBench — industry-leading document understanding for PDFs, invoices, and contracts
Video Understanding
Process video frames for content moderation, video summarization, and automated quality checks
Step 5: Streaming Responses for Real-Time UX
For production applications, streaming is essential — users shouldn't wait for the entire response before seeing output. Qwen3.5-Plus supports streaming via the same OpenAI SDK pattern:
# Streaming response
stream = client.chat.completions.create(
model="qwen3.5-plus",
messages=[
{"role": "user", "content": "Write a detailed analysis of MoE architecture in LLMs"}
],
stream=True # Enable streaming
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # New line after completion
Building a Complete AI Research Agent
Let's put everything together into a practical autonomous research agent that can: search the web, analyze documents, and generate comprehensive reports. This demonstrates the full power of Qwen3.5-Plus for production AI agent use cases.
# Complete research agent with Qwen3.5-Plus
from openai import OpenAI
import json, os
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
# Define research tools
research_tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the internet for recent information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"num_results": {"type": "integer", "default": 5}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "analyze_document",
"description": "Extract key information from a document URL",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "Document or webpage URL"},
"focus": {"type": "string", "description": "What to focus on"}
},
"required": ["url"]
}
}
},
{
"type": "function",
"function": {
"name": "write_report",
"description": "Write and save a final research report",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string"},
"content": {"type": "string"},
"filename": {"type": "string"}
},
"required": ["title", "content"]
}
}
}
]
def run_agent(task: str) -> str:
"""Run the research agent with a given task."""
messages = [
{
"role": "system",
"content": """You are an autonomous research agent. Use the available tools
to thoroughly research topics and write comprehensive reports.
Always search multiple sources before drawing conclusions."""
},
{"role": "user", "content": task}
]
# Agent loop: continue until task is complete
max_iterations = 10
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="qwen3.5-plus",
messages=messages,
tools=research_tools,
tool_choice="auto"
)
message = response.choices[0].message
# Task complete — no more tool calls needed
if response.choices[0].finish_reason == "stop":
return message.content
# Process tool calls
if message.tool_calls:
messages.append(message)
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
# Execute appropriate tool
if func_name == "web_search":
result = perform_web_search(func_args["query"])
elif func_name == "analyze_document":
result = analyze_document_url(func_args["url"])
elif func_name == "write_report":
result = save_report(func_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
return "Agent reached maximum iterations"
# Run the agent
result = run_agent(
"Research the current state of open-source AI models in 2026, "
"focusing on Qwen3.5's position in the market, and write a 500-word report."
)
print(result)
Agent Frameworks That Work with Qwen3.5-Plus
Since Qwen3.5-Plus is OpenAI-compatible, it works with all major agent frameworks out of the box:
Advanced Configuration and Best Practices
Temperature and Sampling Settings
For agent tasks requiring precision (code generation, data extraction), use temperature=0.1. For creative tasks (writing, brainstorming), use temperature=0.8-1.0. The top_p parameter can additionally control output diversity. Qwen3.5-Plus also supports enable_thinking=True for chain-of-thought reasoning that shows the model's reasoning process.
Error Handling for Production
from openai import RateLimitError, APITimeoutError
import time
def call_with_retry(client, **kwargs, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
time.sleep(2 ** attempt) # Exponential backoff
except APITimeoutError:
time.sleep(1)
raise Exception("Max retries exceeded")
Context Management for Long Agent Sessions
Qwen3.5-Plus supports a 256K token context, but long agent sessions accumulate message history that increases cost. Implement a sliding window that keeps the system prompt, last N user/assistant exchanges, and all tool call results. Summarize older parts of the conversation to maintain context without exponentially increasing token usage.
Qwen3.5-Plus vs Other LLM APIs: Practical Comparison
Before committing to Qwen3.5-Plus for production agent development, here's how it stacks up against the main alternatives for API-based AI agent work:
| Feature | Qwen3.5-Plus | GPT-5 API | Claude Opus 4.5 |
|---|---|---|---|
| Input Cost | $0.10/M tokens | ~$2.50/M tokens | ~$3.00/M tokens |
| Context Window | 256K tokens | 128K tokens | 200K tokens |
| Tool Calling | ✓ Parallel | ✓ Parallel | ✓ Parallel |
| Vision | ✓ Images + Video | ✓ Images | ✓ Images |
| OpenAI Compatible | ✓ Full | ✓ Native | Partial |
| Chinese Language | ★★★★★ Native | ★★★★☆ | ★★★☆☆ |
When to Choose Qwen3.5-Plus Over GPT-5
- High-volume applications: At 25x lower cost, Qwen3.5-Plus makes previously uneconomical AI features viable
- Chinese/multilingual content: Qwen3.5's native multilingual training outperforms GPT-5 on Chinese tasks
- Document processing: 256K context window handles longer documents without chunking
- Migration from OpenAI: Drop-in replacement with two-line code change
Common API Issues and Fixes
Problem: 401 Authentication Error
Fix: Verify your API key starts with "sk-" and was copied completely. Check that you're using the DashScope API key (not an Alibaba Cloud Access Key ID). Confirm the environment variable is set: echo $DASHSCOPE_API_KEY. Note: DashScope API keys and standard Alibaba Cloud keys are different — generate one specifically in the ModelStudio API Key Management section.
Problem: Connection timeout or slow responses
Fix: Alibaba Cloud API endpoints may be slow from certain geographic locations. Enable VPN07 on your development machine or server to route API calls through a faster network path. VPN07's 1000Mbps bandwidth and 70+ server locations ensure low-latency connections to Alibaba Cloud's API servers regardless of your location. For production deployments, consider deploying your agent server in a region with good Alibaba Cloud connectivity.
Problem: Tool calls not working as expected
Fix: Ensure your function descriptions are clear and unambiguous — Qwen3.5-Plus decides which tool to call based on description quality. Add examples in the description if the usage isn't obvious. Also verify your JSON schema for parameters is valid. Use tool_choice="required" when you need the model to always call a tool, or tool_choice={"type": "function", "function": {"name": "specific_tool"}} to force a specific tool call.
VPN07 — Stable Connectivity for AI Development
1000Mbps · 70+ Countries · Trusted Since 2015
Building AI agents with Qwen3.5-Plus requires reliable, low-latency connections to Alibaba Cloud's API endpoints. VPN07's 1000Mbps global network ensures your API calls complete without timeouts, your Hugging Face model downloads finish in minutes, and your development workflow stays uninterrupted. Developers in 70+ countries trust VPN07 for consistent access to international AI services. Get started with a 30-day money-back guarantee.
Related Articles
Qwen3.5-397B Benchmark: Open Source AI Beats GPT-5 in 2026
Full benchmark analysis of Qwen3.5-397B-A17B. AIME 91.3, LiveCodeBench 83.6 — the open source model that changed everything in 2026.
Read More →Qwen3.5 Ollama Setup: Run 0.8B to 35B Free on PC & Mac
Prefer local over API? Install Qwen3.5 via Ollama on your machine. Complete setup guide for all platforms with performance tips.
Read More →