Qwen3.5-Plus API: Build AI Agents with OpenAI SDK 2026

Quick Summary: Qwen3.5-Plus is Alibaba's recommended production API model as of February 2026, accessible via Alibaba Cloud ModelStudio with a fully OpenAI-compatible endpoint. This tutorial covers the complete workflow: getting an API key, making your first call, using tool calling for AI agents, processing images and video, streaming responses, and building a practical autonomous research agent — all using the standard OpenAI Python SDK.

What Is Qwen3.5-Plus and Why Build With It?

Qwen3.5-Plus is Alibaba Cloud's production-grade API offering from the Qwen3.5 model family. Released February 16, 2026 alongside the open-weight model releases, Qwen3.5-Plus is specifically optimized for API deployment with fast inference, consistent output quality, and robust tool calling capabilities for agent applications.

Unlike running Qwen3.5 models locally, Qwen3.5-Plus through the API means no hardware investment, no model management, and consistent performance regardless of your local machine's specs. And unlike OpenAI's GPT-5 API, Qwen3.5-Plus is significantly more affordable — starting at $0.10 per million input tokens and $0.30 per million output tokens via the DashScope/ModelStudio platform.

$0.10

Per 1M Input Tokens

256K

Context Window

OAI

Compatible API

Multimodal

Qwen3.5-Plus Capabilities

Text generation, reasoning, and summarization
Function/tool calling for AI agent workflows
Image and video understanding (multimodal)
GUI interaction and web automation
Code generation and execution planning
201-language multilingual support

API Pricing vs OpenAI GPT-5

Qwen3.5-Plus Input $0.10/M tokens

GPT-5 Input (est.) $2.50/M tokens

Qwen3.5-Plus is approximately 25x cheaper than GPT-5 for the same task.

Step 1: Get Your Alibaba Cloud API Key

Qwen3.5-Plus is accessed through Alibaba Cloud's ModelStudio (DashScope) platform. Getting set up takes about 5 minutes:

Register on Alibaba Cloud

Visit dashscope.aliyuncs.com or modelstudio.aliyun.com. Register with your email or phone number. You can also sign up at qwen.ai and authenticate via Qwen OAuth — this immediately gives you access to Qwen3.5-Plus without requiring separate Alibaba Cloud account setup.

Navigate to API Keys Section

In the DashScope console: click your avatar → API Key Management. Click Create API Key and give it a name (e.g., "qwen-agent-dev"). Copy the generated key immediately — it won't be shown again in full after closing the dialog.

Set Environment Variable


                                        # Linux / macOS

                                        export DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxx"


                                        # Windows PowerShell

                                        $env:DASHSCOPE_API_KEY = "sk-xxxxxxxxxxxxxxxx"

Accessing Alibaba Cloud from Outside China

Alibaba Cloud's API endpoints are globally accessible from most countries. However, users in some regions may experience connectivity issues or need a reliable international connection for consistent API response times. VPN07's 1000Mbps network with servers across 70+ countries ensures stable, low-latency API calls whether you're using DashScope from the US, Europe, Southeast Asia, or anywhere else.

Step 2: First API Call with OpenAI SDK

The Qwen3.5-Plus API is 100% compatible with the OpenAI Python SDK. Just change two parameters — the base URL and API key — and your existing OpenAI code immediately works with Qwen3.5-Plus.

# Install: pip install openai

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant with expertise in AI and technology."
        },
        {
            "role": "user",
            "content": "Summarize the key improvements in Qwen3.5 compared to previous Qwen versions."
        }
    ],
    max_tokens=1024,
    temperature=0.7
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

One-Line Migration from OpenAI

If you have existing OpenAI code, the migration requires only two changes:


                                # Before (OpenAI):

                                client = OpenAI(api_key="sk-openai...")

                                model="gpt-4o"


                                # After (Qwen3.5-Plus, 25x cheaper):

                                client = OpenAI(api_key="sk-dashscope...", base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")

                                model="qwen3.5-plus"

Step 3: Tool Calling — The Foundation of AI Agents

Tool calling (function calling) is what transforms Qwen3.5-Plus from a chatbot into an AI agent. By defining functions that the model can invoke, you enable Qwen3.5-Plus to take real-world actions: search the web, query databases, send emails, call external APIs, and more.

# Tool calling example: weather lookup agent

import json
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. Tokyo, New York"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "What's the weather in Tokyo and should I bring an umbrella?"}
]

# First call: model decides which tool to use
response = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

# Check if model wants to call a tool
if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    function_args = json.loads(tool_call.function.arguments)
    
    # Execute the actual function
    weather_result = get_weather(function_args["city"])  # your actual function
    
    # Add tool result to conversation
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(weather_result)
    })
    
    # Final response with tool results
    final_response = client.chat.completions.create(
        model="qwen3.5-plus",
        messages=messages
    )
    print(final_response.choices[0].message.content)

Parallel Tool Calls

Qwen3.5-Plus supports parallel tool calling — in a single response, it can request multiple tool executions simultaneously. For agent workflows that need to fetch data from multiple sources, this dramatically reduces latency. Example: a research agent can simultaneously search Wikipedia, query a news API, and retrieve a stock price in one model call instead of three sequential ones.

Step 4: Vision API — Process Images and Video

Qwen3.5-Plus is natively multimodal. You can send images, video frames, and audio alongside text in the same API call. This makes it powerful for visual analysis tasks, document processing, UI testing, and content moderation.

# Vision: analyze an image from URL

response = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/chart.png"
                    }
                },
                {
                    "type": "text",
                    "text": "Analyze this chart and identify the key trends. What business insights can you extract?"
                }
            ]
        }
    ]
)
print(response.choices[0].message.content)

Image Analysis

Charts, screenshots, photos, diagrams — Qwen3.5-Plus extracts structured information from any image format

Document OCR

Scored 90.8 on OmniDocBench — industry-leading document understanding for PDFs, invoices, and contracts

Video Understanding

Process video frames for content moderation, video summarization, and automated quality checks

Step 5: Streaming Responses for Real-Time UX

For production applications, streaming is essential — users shouldn't wait for the entire response before seeing output. Qwen3.5-Plus supports streaming via the same OpenAI SDK pattern:

# Streaming response

stream = client.chat.completions.create(
    model="qwen3.5-plus",
    messages=[
        {"role": "user", "content": "Write a detailed analysis of MoE architecture in LLMs"}
    ],
    stream=True  # Enable streaming
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # New line after completion

Building a Complete AI Research Agent

Let's put everything together into a practical autonomous research agent that can: search the web, analyze documents, and generate comprehensive reports. This demonstrates the full power of Qwen3.5-Plus for production AI agent use cases.

# Complete research agent with Qwen3.5-Plus

from openai import OpenAI
import json, os

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# Define research tools
research_tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the internet for recent information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "num_results": {"type": "integer", "default": 5}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "analyze_document",
            "description": "Extract key information from a document URL",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "Document or webpage URL"},
                    "focus": {"type": "string", "description": "What to focus on"}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_report",
            "description": "Write and save a final research report",
            "parameters": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "content": {"type": "string"},
                    "filename": {"type": "string"}
                },
                "required": ["title", "content"]
            }
        }
    }
]

def run_agent(task: str) -> str:
    """Run the research agent with a given task."""
    messages = [
        {
            "role": "system",
            "content": """You are an autonomous research agent. Use the available tools 
            to thoroughly research topics and write comprehensive reports.
            Always search multiple sources before drawing conclusions."""
        },
        {"role": "user", "content": task}
    ]
    
    # Agent loop: continue until task is complete
    max_iterations = 10
    for iteration in range(max_iterations):
        response = client.chat.completions.create(
            model="qwen3.5-plus",
            messages=messages,
            tools=research_tools,
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        # Task complete — no more tool calls needed
        if response.choices[0].finish_reason == "stop":
            return message.content
        
        # Process tool calls
        if message.tool_calls:
            messages.append(message)
            
            for tool_call in message.tool_calls:
                func_name = tool_call.function.name
                func_args = json.loads(tool_call.function.arguments)
                
                # Execute appropriate tool
                if func_name == "web_search":
                    result = perform_web_search(func_args["query"])
                elif func_name == "analyze_document":
                    result = analyze_document_url(func_args["url"])
                elif func_name == "write_report":
                    result = save_report(func_args)
                
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result)
                })
    
    return "Agent reached maximum iterations"

# Run the agent
result = run_agent(
    "Research the current state of open-source AI models in 2026, "
    "focusing on Qwen3.5's position in the market, and write a 500-word report."
)
print(result)

Agent Frameworks That Work with Qwen3.5-Plus

Since Qwen3.5-Plus is OpenAI-compatible, it works with all major agent frameworks out of the box:

LangChain / LangGraph AutoGen (Microsoft) CrewAI Phidata Haystack OpenAI Agents SDK

Advanced Configuration and Best Practices

Temperature and Sampling Settings

For agent tasks requiring precision (code generation, data extraction), use temperature=0.1. For creative tasks (writing, brainstorming), use temperature=0.8-1.0. The top_p parameter can additionally control output diversity. Qwen3.5-Plus also supports enable_thinking=True for chain-of-thought reasoning that shows the model's reasoning process.

Error Handling for Production

from openai import RateLimitError, APITimeoutError
import time

def call_with_retry(client, **kwargs, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            time.sleep(2 ** attempt)  # Exponential backoff
        except APITimeoutError:
            time.sleep(1)
    raise Exception("Max retries exceeded")

Context Management for Long Agent Sessions

Qwen3.5-Plus supports a 256K token context, but long agent sessions accumulate message history that increases cost. Implement a sliding window that keeps the system prompt, last N user/assistant exchanges, and all tool call results. Summarize older parts of the conversation to maintain context without exponentially increasing token usage.

Qwen3.5-Plus vs Other LLM APIs: Practical Comparison

Before committing to Qwen3.5-Plus for production agent development, here's how it stacks up against the main alternatives for API-based AI agent work:

Feature	Qwen3.5-Plus	GPT-5 API	Claude Opus 4.5
Input Cost	$0.10/M tokens	~$2.50/M tokens	~$3.00/M tokens
Context Window	256K tokens	128K tokens	200K tokens
Tool Calling	✓ Parallel	✓ Parallel	✓ Parallel
Vision	✓ Images + Video	✓ Images	✓ Images
OpenAI Compatible	✓ Full	✓ Native	Partial
Chinese Language	★★★★★ Native	★★★★☆	★★★☆☆

When to Choose Qwen3.5-Plus Over GPT-5

High-volume applications: At 25x lower cost, Qwen3.5-Plus makes previously uneconomical AI features viable
Chinese/multilingual content: Qwen3.5's native multilingual training outperforms GPT-5 on Chinese tasks
Document processing: 256K context window handles longer documents without chunking
Migration from OpenAI: Drop-in replacement with two-line code change

Common API Issues and Fixes

Problem: 401 Authentication Error

Fix: Verify your API key starts with "sk-" and was copied completely. Check that you're using the DashScope API key (not an Alibaba Cloud Access Key ID). Confirm the environment variable is set: echo $DASHSCOPE_API_KEY. Note: DashScope API keys and standard Alibaba Cloud keys are different — generate one specifically in the ModelStudio API Key Management section.

Problem: Connection timeout or slow responses

Fix: Alibaba Cloud API endpoints may be slow from certain geographic locations. Enable VPN07 on your development machine or server to route API calls through a faster network path. VPN07's 1000Mbps bandwidth and 70+ server locations ensure low-latency connections to Alibaba Cloud's API servers regardless of your location. For production deployments, consider deploying your agent server in a region with good Alibaba Cloud connectivity.

Problem: Tool calls not working as expected

Fix: Ensure your function descriptions are clear and unambiguous — Qwen3.5-Plus decides which tool to call based on description quality. Add examples in the description if the usage isn't obvious. Also verify your JSON schema for parameters is valid. Use tool_choice="required" when you need the model to always call a tool, or tool_choice={"type": "function", "function": {"name": "specific_tool"}} to force a specific tool call.

VPN07 — Stable Connectivity for AI Development

1000Mbps · 70+ Countries · Trusted Since 2015

Building AI agents with Qwen3.5-Plus requires reliable, low-latency connections to Alibaba Cloud's API endpoints. VPN07's 1000Mbps global network ensures your API calls complete without timeouts, your Hugging Face model downloads finish in minutes, and your development workflow stays uninterrupted. Developers in 70+ countries trust VPN07 for consistent access to international AI services. Get started with a 30-day money-back guarantee.

$1.5

Per Month

1000Mbps

Bandwidth

70+

Countries

30 Days

Money Back

Start Free Trial → View Pricing

Qwen3.5-397B Benchmark: Open Source AI Beats GPT-5 in 2026

Full benchmark analysis of Qwen3.5-397B-A17B. AIME 91.3, LiveCodeBench 83.6 — the open source model that changed everything in 2026.

Qwen3.5 Ollama Setup: Run 0.8B to 35B Free on PC & Mac

Prefer local over API? Install Qwen3.5 via Ollama on your machine. Complete setup guide for all platforms with performance tips.

Qwen3.5-Plus API Tutorial: Build AI Agents with OpenAI SDK