在前两篇文章中,我们介绍了 Qwen3.5-27B 的特性和部署方法。本文将详细讲解如何使用 Qwen3.5-27B,包括 API 调用、多模态输入、工具调用和 Agent 开发。
目录
环境准备
在开始之前,确保你已经:
- 成功部署了 Qwen3.5-27B(参考第二篇文章)
- 服务正在运行(默认端口 8000)
- 安装了 OpenAI Python SDK
安装 OpenAI SDK
pip install openai
配置环境变量
# 设置 API 基础 URL
export OPENAI_BASE_URL="http://localhost:8000/v1"
# 设置 API Key(本地部署可以使用任意值)
export OPENAI_API_KEY="EMPTY"
基础 API 调用
1. 简单的文本对话
from openai import OpenAI
# 初始化客户端(使用环境变量配置)
client = OpenAI()
# 创建对话
messages = [
{"role": "user", "content": "你好,请介绍一下你自己"}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=1.0,
top_p=0.95,
extra_body={
"top_k": 20,
"presence_penalty": 1.5,
}
)
print(response.choices[0].message.content)
2. 流式输出
流式输出可以实时获取模型生成的内容:
from openai import OpenAI
client = OpenAI()
messages = [
{"role": "user", "content": "写一首关于春天的诗"}
]
stream = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=1.0,
stream=True # 启用流式输出
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
3. 多轮对话
from openai import OpenAI
client = OpenAI()
# 维护对话历史
messages = []
def chat(user_input):
# 添加用户消息
messages.append({"role": "user", "content": user_input})
# 调用 API
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=1.0,
top_p=0.95,
extra_body={"top_k": 20, "presence_penalty": 1.5}
)
# 获取助手回复
assistant_message = response.choices[0].message.content
# 添加助手消息到历史
messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
# 使用示例
print(chat("你好,我想学习 Python"))
print(chat("从哪里开始比较好?"))
print(chat("能推荐一些学习资源吗?"))
采样参数详解
Qwen3.5-27B 支持多种采样参数,不同的参数组合适用于不同的场景。
推荐的参数配置
1. 思考模式 - 通用任务
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=32768,
temperature=1.0, # 较高的随机性
top_p=0.95, # 核采样
extra_body={
"top_k": 20,
"min_p": 0.0,
"presence_penalty": 1.5, # 减少重复
"repetition_penalty": 1.0
}
)
适用场景:
- 创意写作
- 头脑风暴
- 开放式问答
- 内容生成
2. 思考模式 - 精确编程任务
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=32768,
temperature=0.6, # 较低的随机性
top_p=0.95,
extra_body={
"top_k": 20,
"min_p": 0.0,
"presence_penalty": 0.0, # 不惩罚重复(代码可能需要重复结构)
"repetition_penalty": 1.0
}
)
适用场景:
- 代码生成
- Web 开发
- 算法实现
- 技术文档编写
3. 指令模式 - 通用任务
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=32768,
temperature=0.7, # 中等随机性
top_p=0.8,
extra_body={
"top_k": 20,
"min_p": 0.0,
"presence_penalty": 1.5,
"repetition_penalty": 1.0,
"chat_template_kwargs": {"enable_thinking": False} # 禁用思考模式
}
)
适用场景:
- 快速问答
- 信息提取
- 文本分类
- 简单任务
4. 指令模式 - 推理任务
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=32768,
temperature=1.0,
top_p=1.0,
extra_body={
"top_k": 40,
"min_p": 0.0,
"presence_penalty": 2.0,
"repetition_penalty": 1.0,
"chat_template_kwargs": {"enable_thinking": False}
}
)
适用场景:
- 数学问题
- 逻辑推理
- 复杂分析
参数说明
| 参数 | 范围 | 说明 |
|---|---|---|
| temperature | 0.0-2.0 | 控制随机性,越高越随机 |
| top_p | 0.0-1.0 | 核采样,保留累积概率达到 p 的 tokens |
| top_k | 1-100 | 只考虑概率最高的 k 个 tokens |
| min_p | 0.0-1.0 | 最小概率阈值 |
| presence_penalty | 0.0-2.0 | 惩罚已出现的 tokens,减少重复 |
| repetition_penalty | 0.0-2.0 | 重复惩罚因子 |
| max_tokens | 1-81920 | 最大生成 token 数 |
多模态输入
Qwen3.5-27B 原生支持文本、图像和视频输入。
1. 图像输入
from openai import OpenAI
client = OpenAI()
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
# 或使用本地文件:
# "url": "data:image/jpeg;base64,{base64_string}"
}
},
{
"type": "text",
"text": "这张图片中有什么?请详细描述。"
}
]
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=1.0,
top_p=0.95,
extra_body={"top_k": 20, "presence_penalty": 1.5}
)
print(response.choices[0].message.content)
使用本地图像
import base64
from pathlib import Path
def encode_image(image_path):
"""将本地图像编码为 base64"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# 读取本地图像
image_path = "path/to/your/image.jpg"
base64_image = encode_image(image_path)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
},
{
"type": "text",
"text": "分析这张图片中的内容"
}
]
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048
)
2. 视频输入
from openai import OpenAI
client = OpenAI()
messages = [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://example.com/video.mp4"
}
},
{
"type": "text",
"text": "这个视频讲述了什么内容?"
}
]
}
]
# 配置视频采样参数(仅 vLLM 支持)
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=1.0,
top_p=0.95,
extra_body={
"top_k": 20,
"presence_penalty": 1.5,
"mm_processor_kwargs": {
"fps": 2, # 每秒采样帧数
"do_sample_frames": True
}
}
)
print(response.choices[0].message.content)
3. 多图像输入
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "https://example.com/image1.jpg"}
},
{
"type": "image_url",
"image_url": {"url": "https://example.com/image2.jpg"}
},
{
"type": "text",
"text": "比较这两张图片的异同"
}
]
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048
)
工具调用(Function Calling)
Qwen3.5-27B 支持强大的工具调用能力,可以让模型调用外部函数和 API。
1. 定义工具
from openai import OpenAI
client = OpenAI()
# 定义工具
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取指定城市的天气信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "城市名称,例如:北京、上海"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "温度单位"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "在网络上搜索信息",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
}
},
"required": ["query"]
}
}
}
]
2. 实现工具函数
import json
def get_weather(city, unit="celsius"):
"""模拟获取天气信息"""
# 实际应用中,这里应该调用真实的天气 API
weather_data = {
"北京": {"temp": 15, "condition": "晴朗"},
"上海": {"temp": 20, "condition": "多云"},
"深圳": {"temp": 25, "condition": "小雨"}
}
if city in weather_data:
data = weather_data[city]
temp = data["temp"]
if unit == "fahrenheit":
temp = temp * 9/5 + 32
return json.dumps({
"city": city,
"temperature": temp,
"unit": unit,
"condition": data["condition"]
}, ensure_ascii=False)
else:
return json.dumps({"error": "城市未找到"}, ensure_ascii=False)
def search_web(query):
"""模拟网络搜索"""
# 实际应用中,这里应该调用真实的搜索 API
return json.dumps({
"query": query,
"results": [
f"{query} 的搜索结果 1",
f"{query} 的搜索结果 2",
f"{query} 的搜索结果 3"
]
}, ensure_ascii=False)
# 工具函数映射
available_functions = {
"get_weather": get_weather,
"search_web": search_web
}
3. 使用工具调用
def chat_with_tools(user_message):
messages = [{"role": "user", "content": user_message}]
# 第一次调用:让模型决定是否使用工具
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
tools=tools,
tool_choice="auto", # 自动决定是否使用工具
max_tokens=2048
)
response_message = response.choices[0].message
messages.append(response_message)
# 检查是否有工具调用
if response_message.tool_calls:
# 执行工具调用
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f"调用工具: {function_name}")
print(f"参数: {function_args}")
# 执行函数
function_response = available_functions[function_name](**function_args)
# 将工具响应添加到消息历史
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": function_name,
"content": function_response
})
# 第二次调用:让模型基于工具结果生成最终回复
final_response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048
)
return final_response.choices[0].message.content
else:
# 没有工具调用,直接返回回复
return response_message.content
# 使用示例
print(chat_with_tools("北京今天天气怎么样?"))
print(chat_with_tools("搜索一下 Qwen3.5 的最新信息"))
4. 并行工具调用
Qwen3.5-27B 支持同时调用多个工具:
def chat_with_parallel_tools(user_message):
messages = [{"role": "user", "content": user_message}]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
tools=tools,
tool_choice="auto",
max_tokens=2048,
extra_body={
"parallel_tool_calls": True # 启用并行工具调用
}
)
response_message = response.choices[0].message
messages.append(response_message)
if response_message.tool_calls:
# 并行执行所有工具调用
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
function_response = available_functions[function_name](**function_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": function_name,
"content": function_response
})
final_response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048
)
return final_response.choices[0].message.content
else:
return response_message.content
# 使用示例
print(chat_with_parallel_tools("告诉我北京和上海的天气,并搜索一下这两个城市的旅游景点"))
Agent 开发
1. 使用 Qwen-Agent 框架
Qwen-Agent 是官方推荐的 Agent 开发框架,提供了完整的工具集成和状态管理。
安装 Qwen-Agent
pip install qwen-agent
基础 Agent 示例
import os
from qwen_agent.agents import Assistant
# 配置 LLM
llm_cfg = {
'model': 'Qwen/Qwen3.5-27B',
'model_type': 'qwenvl_oai',
'model_server': 'http://localhost:8000/v1',
'api_key': 'EMPTY',
'generate_cfg': {
'use_raw_api': True,
'extra_body': {
'chat_template_kwargs': {'enable_thinking': True}
},
},
}
# 定义工具
tools = [
{
'name': 'calculator',
'description': '执行数学计算',
'parameters': {
'type': 'object',
'properties': {
'expression': {
'type': 'string',
'description': '数学表达式,例如:2+3*4'
}
},
'required': ['expression']
}
}
]
# 创建 Agent
bot = Assistant(llm=llm_cfg, function_list=tools)
# 运行 Agent
messages = [{'role': 'user', 'content': '计算 (123 + 456) * 789'}]
for responses in bot.run(messages=messages):
pass
print(responses)
集成 MCP 服务器
Qwen-Agent 支持 Model Context Protocol (MCP) 服务器:
from qwen_agent.agents import Assistant
llm_cfg = {
'model': 'Qwen/Qwen3.5-27B',
'model_type': 'qwenvl_oai',
'model_server': 'http://localhost:8000/v1',
'api_key': 'EMPTY',
'generate_cfg': {
'use_raw_api': True,
'extra_body': {
'chat_template_kwargs': {'enable_thinking': True}
},
},
}
# 定义 MCP 服务器配置
tools = [
{
'mcpServers': {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/directory"]
}
}
}
]
# 创建 Agent
bot = Assistant(llm=llm_cfg, function_list=tools)
# 使用示例
messages = [{'role': 'user', 'content': '列出目录中的所有文件'}]
for responses in bot.run(messages=messages):
pass
print(responses)
2. 自定义 Agent 循环
对于更复杂的场景,可以实现自定义的 Agent 循环:
from openai import OpenAI
import json
class CustomAgent:
def __init__(self, model="Qwen/Qwen3.5-27B"):
self.client = OpenAI()
self.model = model
self.tools = []
self.tool_functions = {}
self.messages = []
self.max_iterations = 10
def add_tool(self, tool_definition, tool_function):
"""添加工具"""
self.tools.append(tool_definition)
self.tool_functions[tool_definition["function"]["name"]] = tool_function
def run(self, user_message):
"""运行 Agent"""
self.messages = [{"role": "user", "content": user_message}]
for iteration in range(self.max_iterations):
print(f"\n--- 迭代 {iteration + 1} ---")
# 调用模型
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
tools=self.tools if self.tools else None,
tool_choice="auto" if self.tools else None,
max_tokens=2048,
temperature=1.0,
top_p=0.95,
extra_body={"top_k": 20, "presence_penalty": 1.5}
)
response_message = response.choices[0].message
self.messages.append(response_message)
# 检查是否有工具调用
if response_message.tool_calls:
print(f"模型决定调用 {len(response_message.tool_calls)} 个工具")
# 执行工具调用
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f" 调用: {function_name}({function_args})")
# 执行函数
try:
function_response = self.tool_functions[function_name](**function_args)
print(f" 结果: {function_response}")
except Exception as e:
function_response = json.dumps({"error": str(e)})
print(f" 错误: {e}")
# 添加工具响应
self.messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": function_name,
"content": function_response
})
else:
# 没有工具调用,返回最终答案
print("模型生成最终答案")
return response_message.content
return "达到最大迭代次数"
# 使用示例
agent = CustomAgent()
# 添加工具
agent.add_tool(
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "获取当前时间",
"parameters": {"type": "object", "properties": {}}
}
},
lambda: json.dumps({"time": "2026-03-03 12:00:00"})
)
agent.add_tool(
{
"type": "function",
"function": {
"name": "calculate",
"description": "执行数学计算",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "数学表达式"}
},
"required": ["expression"]
}
}
},
lambda expression: str(eval(expression))
)
# 运行 Agent
result = agent.run("现在几点了?帮我计算一下 123 * 456")
print(f"\n最终结果: {result}")
3. ReAct Agent 模式
ReAct(Reasoning and Acting)是一种流行的 Agent 模式:
class ReActAgent:
def __init__(self, model="Qwen/Qwen3.5-27B"):
self.client = OpenAI()
self.model = model
self.tools = []
self.tool_functions = {}
def add_tool(self, name, description, function):
"""添加工具"""
self.tools.append({
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": {
"type": "object",
"properties": {
"input": {"type": "string", "description": "工具输入"}
},
"required": ["input"]
}
}
})
self.tool_functions[name] = function
def run(self, task, max_steps=5):
"""运行 ReAct Agent"""
messages = [
{
"role": "system",
"content": """你是一个 ReAct Agent。对于每个任务,你需要:
1. Thought: 思考下一步应该做什么
2. Action: 选择一个工具并提供输入
3. Observation: 观察工具的输出
4. 重复上述步骤直到完成任务
5. Answer: 给出最终答案
可用工具:
""" + "\n".join([f"- {t['function']['name']}: {t['function']['description']}" for t in self.tools])
},
{"role": "user", "content": f"任务: {task}"}
]
for step in range(max_steps):
print(f"\n=== 步骤 {step + 1} ===")
# 获取模型响应
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=self.tools,
tool_choice="auto",
max_tokens=2048
)
response_message = response.choices[0].message
# 打印思考过程
if response_message.content:
print(f"Thought: {response_message.content}")
messages.append(response_message)
# 执行工具调用
if response_message.tool_calls:
for tool_call in response_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f"Action: {function_name}({function_args['input']})")
# 执行工具
observation = self.tool_functions[function_name](function_args['input'])
print(f"Observation: {observation}")
# 添加观察结果
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": function_name,
"content": observation
})
else:
# 没有工具调用,返回最终答案
print(f"Answer: {response_message.content}")
return response_message.content
return "达到最大步骤数"
# 使用示例
agent = ReActAgent()
# 添加工具
agent.add_tool(
"search",
"在知识库中搜索信息",
lambda query: f"关于 '{query}' 的搜索结果:这是一些相关信息..."
)
agent.add_tool(
"calculate",
"执行数学计算",
lambda expr: str(eval(expr))
)
# 运行任务
result = agent.run("搜索 Qwen3.5 的信息,然后计算它的参数量是否超过 20B")
最佳实践
1. 提示词工程
结构化输出
使用 JSON 模式获取结构化输出:
messages = [
{
"role": "system",
"content": """你是一个数据提取助手。请以 JSON 格式返回结果。
输出格式:
{
"name": "人名",
"age": 年龄(数字),
"occupation": "职业",
"location": "地点"
}"""
},
{
"role": "user",
"content": "张三是一位35岁的软件工程师,住在北京。"
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=512,
temperature=0.3, # 较低温度以获得更确定的输出
extra_body={
"chat_template_kwargs": {"enable_thinking": False}
}
)
# 解析 JSON
import json
result = json.loads(response.choices[0].message.content)
print(result)
少样本学习(Few-Shot Learning)
messages = [
{
"role": "system",
"content": "你是一个情感分析助手。"
},
{
"role": "user",
"content": "这个产品太棒了!"
},
{
"role": "assistant",
"content": "情感:积极"
},
{
"role": "user",
"content": "质量很差,不推荐购买。"
},
{
"role": "assistant",
"content": "情感:消极"
},
{
"role": "user",
"content": "还可以,没有特别的感觉。"
},
{
"role": "assistant",
"content": "情感:中性"
},
{
"role": "user",
"content": "这次购物体验非常愉快!"
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=128,
temperature=0.3
)
print(response.choices[0].message.content)
2. 长上下文处理
文档问答
def document_qa(document, question):
"""基于长文档的问答"""
messages = [
{
"role": "system",
"content": "你是一个文档分析助手。请基于提供的文档回答问题。"
},
{
"role": "user",
"content": f"""文档内容:
{document}
问题:{question}
请基于文档内容回答问题。如果文档中没有相关信息,请明确说明。"""
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=0.7,
extra_body={
"chat_template_kwargs": {"enable_thinking": True}
}
)
return response.choices[0].message.content
# 使用示例
long_document = """
[这里放置长文档内容,可以是技术文档、论文、报告等]
"""
answer = document_qa(long_document, "文档的主要结论是什么?")
print(answer)
代码库分析
def analyze_codebase(code_files, query):
"""分析代码库"""
# 合并所有代码文件
combined_code = "\n\n".join([
f"# 文件: {filename}\n{content}"
for filename, content in code_files.items()
])
messages = [
{
"role": "system",
"content": "你是一个代码分析专家。请分析提供的代码并回答问题。"
},
{
"role": "user",
"content": f"""代码库:
{combined_code}
问题:{query}"""
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=4096,
temperature=0.6,
extra_body={
"chat_template_kwargs": {"enable_thinking": True}
}
)
return response.choices[0].message.content
3. 批处理优化
对于大量请求,使用批处理可以提高效率:
import asyncio
from openai import AsyncOpenAI
async def process_batch(prompts):
"""异步批处理"""
client = AsyncOpenAI()
async def process_single(prompt):
response = await client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=[{"role": "user", "content": prompt}],
max_tokens=512
)
return response.choices[0].message.content
# 并发处理所有请求
tasks = [process_single(prompt) for prompt in prompts]
results = await asyncio.gather(*tasks)
return results
# 使用示例
prompts = [
"总结这段文本:...",
"翻译成英文:...",
"提取关键词:..."
]
results = asyncio.run(process_batch(prompts))
for i, result in enumerate(results):
print(f"结果 {i+1}: {result}")
4. 错误处理和重试
import time
from openai import OpenAI, OpenAIError
def chat_with_retry(messages, max_retries=3):
"""带重试机制的聊天"""
client = OpenAI()
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
timeout=60 # 设置超时
)
return response.choices[0].message.content
except OpenAIError as e:
print(f"尝试 {attempt + 1} 失败: {e}")
if attempt < max_retries - 1:
# 指数退避
wait_time = 2 ** attempt
print(f"等待 {wait_time} 秒后重试...")
time.sleep(wait_time)
else:
raise
return None
实际应用案例
案例 1:智能客服系统
class CustomerServiceAgent:
def __init__(self):
self.client = OpenAI()
self.conversation_history = {}
def get_customer_info(self, customer_id):
"""获取客户信息(模拟)"""
return {
"id": customer_id,
"name": "张三",
"vip_level": "金牌",
"order_history": ["订单1", "订单2"]
}
def search_knowledge_base(self, query):
"""搜索知识库(模拟)"""
return f"关于 '{query}' 的帮助文档内容..."
def handle_customer_query(self, customer_id, query):
"""处理客户咨询"""
# 获取或创建对话历史
if customer_id not in self.conversation_history:
customer_info = self.get_customer_info(customer_id)
self.conversation_history[customer_id] = [
{
"role": "system",
"content": f"""你是一个专业的客服助手。
客户信息:
- 姓名:{customer_info['name']}
- VIP等级:{customer_info['vip_level']}
- 历史订单:{', '.join(customer_info['order_history'])}
请提供专业、友好的服务。"""
}
]
# 添加用户查询
self.conversation_history[customer_id].append({
"role": "user",
"content": query
})
# 定义工具
tools = [
{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "搜索产品知识库",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "搜索关键词"}
},
"required": ["query"]
}
}
}
]
# 调用模型
response = self.client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=self.conversation_history[customer_id],
tools=tools,
tool_choice="auto",
max_tokens=2048,
temperature=0.7
)
response_message = response.choices[0].message
self.conversation_history[customer_id].append(response_message)
# 处理工具调用
if response_message.tool_calls:
for tool_call in response_message.tool_calls:
if tool_call.function.name == "search_knowledge_base":
args = json.loads(tool_call.function.arguments)
result = self.search_knowledge_base(args["query"])
self.conversation_history[customer_id].append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": "search_knowledge_base",
"content": result
})
# 再次调用模型生成最终回复
final_response = self.client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=self.conversation_history[customer_id],
max_tokens=2048,
temperature=0.7
)
final_message = final_response.choices[0].message
self.conversation_history[customer_id].append(final_message)
return final_message.content
return response_message.content
# 使用示例
agent = CustomerServiceAgent()
print(agent.handle_customer_query("C001", "我想了解退货政策"))
print(agent.handle_customer_query("C001", "退货需要多长时间?"))
案例 2:代码审查助手
def code_review(code, language="python"):
"""代码审查"""
messages = [
{
"role": "system",
"content": f"""你是一个资深的 {language} 代码审查专家。
请从以下方面审查代码:
1. 代码质量和可读性
2. 潜在的 bug 和错误
3. 性能优化建议
4. 安全性问题
5. 最佳实践建议
请提供具体的改进建议和示例代码。"""
},
{
"role": "user",
"content": f"""请审查以下 {language} 代码:
```{language}
{code}
```"""
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=4096,
temperature=0.6,
extra_body={
"chat_template_kwargs": {"enable_thinking": True}
}
)
return response.choices[0].message.content
# 使用示例
code_to_review = """
def calculate_total(items):
total = 0
for item in items:
total = total + item['price'] * item['quantity']
return total
"""
review = code_review(code_to_review)
print(review)
案例 3:多模态内容分析
def analyze_product_image(image_path, product_name):
"""分析产品图片"""
# 读取并编码图片
with open(image_path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
messages = [
{
"role": "system",
"content": """你是一个产品图片分析专家。
请分析图片并提供以下信息:
1. 产品外观描述
2. 质量评估
3. 潜在问题
4. 改进建议"""
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
},
{
"type": "text",
"text": f"请分析这张 {product_name} 的产品图片"
}
]
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048,
temperature=0.7
)
return response.choices[0].message.content
# 使用示例
# analysis = analyze_product_image("product.jpg", "智能手表")
# print(analysis)
案例 4:文档生成助手
class DocumentGenerator:
def __init__(self):
self.client = OpenAI()
def generate_api_documentation(self, code):
"""生成 API 文档"""
messages = [
{
"role": "system",
"content": """你是一个技术文档专家。
请为提供的代码生成详细的 API 文档,包括:
1. 函数/类描述
2. 参数说明
3. 返回值说明
4. 使用示例
5. 注意事项
使用 Markdown 格式输出。"""
},
{
"role": "user",
"content": f"请为以下代码生成 API 文档:\n\n```python\n{code}\n```"
}
]
response = self.client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=4096,
temperature=0.6,
extra_body={
"chat_template_kwargs": {"enable_thinking": False}
}
)
return response.choices[0].message.content
def generate_user_manual(self, product_info):
"""生成用户手册"""
messages = [
{
"role": "system",
"content": """你是一个技术写作专家。
请生成清晰、易懂的用户手册,包括:
1. 产品介绍
2. 快速开始指南
3. 详细功能说明
4. 常见问题解答
5. 故障排除
使用友好、专业的语言,适合普通用户阅读。"""
},
{
"role": "user",
"content": f"请为以下产品生成用户手册:\n\n{product_info}"
}
]
response = self.client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=8192,
temperature=0.7
)
return response.choices[0].message.content
# 使用示例
generator = DocumentGenerator()
code = """
def process_data(data, options=None):
'''处理数据'''
if options is None:
options = {}
result = []
for item in data:
processed = transform(item, options)
result.append(processed)
return result
"""
doc = generator.generate_api_documentation(code)
print(doc)
性能监控和调试
1. 记录请求和响应
import logging
from datetime import datetime
# 配置日志
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('qwen_api.log'),
logging.StreamHandler()
]
)
def chat_with_logging(messages):
"""带日志记录的聊天"""
start_time = datetime.now()
try:
logging.info(f"发送请求: {len(messages)} 条消息")
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048
)
end_time = datetime.now()
duration = (end_time - start_time).total_seconds()
logging.info(f"请求成功,耗时: {duration:.2f}秒")
logging.info(f"生成 tokens: {response.usage.completion_tokens}")
logging.info(f"总 tokens: {response.usage.total_tokens}")
return response.choices[0].message.content
except Exception as e:
logging.error(f"请求失败: {str(e)}")
raise
2. 性能分析
import time
from collections import defaultdict
class PerformanceMonitor:
def __init__(self):
self.metrics = defaultdict(list)
def measure(self, func_name):
"""装饰器:测量函数执行时间"""
def decorator(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
self.metrics[func_name].append(duration)
return result
return wrapper
return decorator
def get_stats(self):
"""获取统计信息"""
stats = {}
for func_name, durations in self.metrics.items():
stats[func_name] = {
"count": len(durations),
"total": sum(durations),
"avg": sum(durations) / len(durations),
"min": min(durations),
"max": max(durations)
}
return stats
# 使用示例
monitor = PerformanceMonitor()
@monitor.measure("chat_completion")
def chat(message):
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=[{"role": "user", "content": message}],
max_tokens=512
)
return response.choices[0].message.content
# 执行多次请求
for i in range(10):
chat(f"测试消息 {i}")
# 查看统计
print(monitor.get_stats())
常见问题解答
Q1: 如何减少响应延迟?
答案:
- 使用流式输出获取首 token
- 减少 max_tokens 限制
- 降低 temperature(更确定的输出)
- 使用多步预测(MTP)
- 考虑使用量化模型
Q2: 如何处理超长上下文?
答案:
def truncate_messages(messages, max_tokens=200000):
"""截断消息历史以适应上下文限制"""
# 保留系统消息和最近的消息
system_messages = [m for m in messages if m["role"] == "system"]
other_messages = [m for m in messages if m["role"] != "system"]
# 简单的截断策略:保留最近的 N 条消息
recent_messages = other_messages[-20:] # 保留最近 20 条
return system_messages + recent_messages
Q3: 如何提高输出质量?
答案:
- 使用清晰、具体的提示词
- 提供少样本示例
- 启用思考模式(enable_thinking: True)
- 调整 temperature 和 top_p
- 使用系统消息设定角色和规则
Q4: 如何处理多语言场景?
答案:
def multilingual_chat(message, target_language="中文"):
"""多语言对话"""
messages = [
{
"role": "system",
"content": f"请使用{target_language}回答所有问题。"
},
{
"role": "user",
"content": message
}
]
response = client.chat.completions.create(
model="Qwen/Qwen3.5-27B",
messages=messages,
max_tokens=2048
)
return response.choices[0].message.content
Q5: 如何实现对话记忆?
答案:
import json
from pathlib import Path
class ConversationMemory:
def __init__(self, storage_path="conversations"):
self.storage_path = Path(storage_path)
self.storage_path.mkdir(exist_ok=True)
def save_conversation(self, conversation_id, messages):
"""保存对话"""
file_path = self.storage_path / f"{conversation_id}.json"
with open(file_path, "w", encoding="utf-8") as f:
json.dump(messages, f, ensure_ascii=False, indent=2)
def load_conversation(self, conversation_id):
"""加载对话"""
file_path = self.storage_path / f"{conversation_id}.json"
if file_path.exists():
with open(file_path, "r", encoding="utf-8") as f:
return json.load(f)
return []
def delete_conversation(self, conversation_id):
"""删除对话"""
file_path = self.storage_path / f"{conversation_id}.json"
if file_path.exists():
file_path.unlink()
# 使用示例
memory = ConversationMemory()
# 保存对话
messages = [
{"role": "user", "content": "你好"},
{"role": "assistant", "content": "你好!有什么可以帮助你的吗?"}
]
memory.save_conversation("user_123", messages)
# 加载对话
loaded_messages = memory.load_conversation("user_123")
总结
本文详细介绍了 Qwen3.5-27B 的使用方法,包括:
- 基础 API 调用:文本对话、流式输出、多轮对话
- 采样参数配置:不同场景的最佳参数设置
- 多模态输入:图像、视频处理
- 工具调用:Function Calling 和并行工具调用
- Agent 开发:Qwen-Agent 框架、自定义 Agent、ReAct 模式
- 最佳实践:提示词工程、长上下文处理、批处理优化
- 实际应用案例:客服系统、代码审查、文档生成
- 性能监控:日志记录、性能分析
通过这些技术和实践,你可以充分发挥 Qwen3.5-27B 的强大能力,构建各种智能应用。
参考资源
系列文章
- 第一篇:Qwen3.5-27B 介绍
- 第二篇:Windows + AMD 显卡部署教程
- 第三篇:使用教程与实践(本文)
内容基于 Qwen3.5 官方文档和最佳实践改编,遵循内容许可限制