流式推理 - Mint Starter Kit

流式推理1

（“stream”=true，使用sse格式返回）：

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","usage":{"prompt_tokens":54,"completion_tokens":17,"total_tokens":71},"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]}

data: [DONE]

流式推理2

（“stream”=true，配置项“fullTextEnabled”=true，使用sse格式返回）：

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello!"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today?"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","full_text":"Hello! How can I assist you today?","usage":{"prompt_tokens":31,"completion_tokens":10,"total_tokens":41},"choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today?"},"finish_reason":"length"}]}

data: [DONE]

输出说明

表1

文本推理结果说明

参数名					类型	说明
id					string	请求ID。
object					string	返回结果类型目前都返回”chat.completion”。
created					integer	推理请求时间戳，精确到秒。
model					string	使用的推理模型。
choices					list	推理结果列表。
-	index				integer	choices消息index，当前只能为0。
	message				object	推理消息。
	-	role			string	角色，目前都返回”assistant”。
		content			string	推理文本结果。
		tool_calls			list	模型工具调用输出。
		-	function		dict	函数调用说明。
			-	arguments	string	调用函数的参数，JSON格式的字符串。
			-	name	string	调用的函数名。
			id		string	模型调用工具的ID。
			type		string	工具的类型，目前仅支持function。
	finish_reason				string	结束原因。 stop：请求被CANCEL或STOP，用户不感知，丢弃响应。请求执行中出错，响应输出为空，err_msg非空。请求输入校验异常，响应输出为空，err_msg非空。请求遇eos结束符正常结束。 length：请求因达到最大序列长度而结束，响应为最后一轮迭代输出。请求因达到最大输出长度（包括请求和模型粒度）而结束，响应为最后一轮迭代输出。 tool_calls：表示模型调用了工具。
usage					object	推理结果统计数据。
-	prompt_tokens				int	用户输入的prompt文本对应的token长度。
	completion_tokens				int	推理结果token数量。PD场景下统计P和D推理结果的总token数量。当一个请求的推理长度上限取maxIterTimes的值时，D节点响应中completion_tokens数量为maxIterTimes+1，即增加了P推理结果的首token数量。
	total_tokens				int	请求和推理的总token数。
prefill_time					float	推理首token时延。
decode_time_arr					list	推理Decode时延数组。

表2

流式推理结果说明

参数名				类型	说明
data				object	一次推理返回的结果。
-	id			string	请求ID。
	object			string	目前都返回”chat.completion.chunk”。
	created			integer	推理请求时间戳，精确到秒。
	model			string	使用的推理模型。
	full_text			string	全量文本结果，配置项“fullTextEnabled”=true时才有此返回值。
	usage			object	推理结果统计数据。
	-	prompt_tokens		int	用户输入的prompt文本对应的token长度。
		completion_tokens		int	推理结果token数量。PD场景下统计P和D推理结果的总token数量。当一个请求的推理长度上限取maxIterTimes的值时，D节点响应中completion_tokens数量为maxIterTimes+1，即增加了P推理结果的首token数量。
		total_tokens		int	请求和推理的总token数。
	choices			list	流式推理结果。
	-	index		integer	choices消息index，当前只能为0。
		delta		object	推理返回结果，最后一个响应为空。
		-	role	string	角色，目前都返回”assistant”。
		-	content	string	推理文本结果。
		finish_reason		string	结束原因，只在最后一次推理结果返回。 stop：请求被CANCEL或STOP，用户不感知，丢弃响应。请求执行中出错，响应输出为空，err_msg非空。请求输入校验异常，响应输出为空，err_msg非空。请求遇eos结束符正常结束。 length：请求因达到最大序列长度而结束，响应为最后一轮迭代输出。请求因达到最大输出长度（包括请求和模型粒度）而结束，响应为最后一轮迭代输出。

API 文档

​流式推理1

​流式推理2

​输出说明

​表1

​表2

流式推理1

流式推理2

输出说明

表1

表2