流式推理1

(“stream”=true,使用sse格式返回):


data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","usage":{"prompt_tokens":54,"completion_tokens":17,"total_tokens":71},"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]}

data: [DONE]

流式推理2

(“stream”=true,配置项“fullTextEnabled”=true,使用sse格式返回):

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello!"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today?"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","full_text":"Hello! How can I assist you today?","usage":{"prompt_tokens":31,"completion_tokens":10,"total_tokens":41},"choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today?"},"finish_reason":"length"}]}

data: [DONE]

输出说明

表1

文本推理结果说明

参数名

类型

说明

id

string

请求ID。

object

string

返回结果类型目前都返回”chat.completion”。

created

integer

推理请求时间戳,精确到秒。

model

string

使用的推理模型。

choices

list

推理结果列表。

-

index

integer

choices消息index,当前只能为0。

message

object

推理消息。

-

role

string

角色,目前都返回”assistant”。

content

string

推理文本结果。

tool_calls

list

模型工具调用输出。

-

function

dict

函数调用说明。

-

arguments

string

调用函数的参数,JSON格式的字符串。

name

string

调用的函数名。

id

string

模型调用工具的ID。

type

string

工具的类型,目前仅支持function。

finish_reason

string

结束原因。

  • stop:

    • 请求被CANCEL或STOP,用户不感知,丢弃响应。
    • 请求执行中出错,响应输出为空,err_msg非空。
    • 请求输入校验异常,响应输出为空,err_msg非空。
    • 请求遇eos结束符正常结束。
  • length:

    • 请求因达到最大序列长度而结束,响应为最后一轮迭代输出。
    • 请求因达到最大输出长度(包括请求和模型粒度)而结束,响应为最后一轮迭代输出。
  • tool_calls:表示模型调用了工具。

usage

object

推理结果统计数据。

-

prompt_tokens

int

用户输入的prompt文本对应的token长度。

completion_tokens

int

推理结果token数量。PD场景下统计P和D推理结果的总token数量。当一个请求的推理长度上限取maxIterTimes的值时,D节点响应中completion_tokens数量为maxIterTimes+1,即增加了P推理结果的首token数量。

total_tokens

int

请求和推理的总token数。

prefill_time

float

推理首token时延。

decode_time_arr

list

推理Decode时延数组。

表2

流式推理结果说明

参数名

类型

说明

data

object

一次推理返回的结果。

-

id

string

请求ID。

object

string

目前都返回”chat.completion.chunk”。

created

integer

推理请求时间戳,精确到秒。

model

string

使用的推理模型。

full_text

string

全量文本结果,配置项“fullTextEnabled”=true时才有此返回值。

usage

object

推理结果统计数据。

-

prompt_tokens

int

用户输入的prompt文本对应的token长度。

completion_tokens

int

推理结果token数量。PD场景下统计P和D推理结果的总token数量。当一个请求的推理长度上限取maxIterTimes的值时,D节点响应中completion_tokens数量为maxIterTimes+1,即增加了P推理结果的首token数量。

total_tokens

int

请求和推理的总token数。

choices

list

流式推理结果。

-

index

integer

choices消息index,当前只能为0。

delta

object

推理返回结果,最后一个响应为空。

-

role

string

角色,目前都返回”assistant”。

content

string

推理文本结果。

finish_reason

string

结束原因,只在最后一次推理结果返回。

  • stop:

    • 请求被CANCEL或STOP,用户不感知,丢弃响应。
    • 请求执行中出错,响应输出为空,err_msg非空。
    • 请求输入校验异常,响应输出为空,err_msg非空。
    • 请求遇eos结束符正常结束。
  • length:

    • 请求因达到最大序列长度而结束,响应为最后一轮迭代输出。
    • 请求因达到最大输出长度(包括请求和模型粒度)而结束,响应为最后一轮迭代输出。