Single-turn conversation:

Single-modal model:

{
    "model": "DeepSeek-R1",
    "messages": [{
        "role": "user",
        "content": "You are a helpful assistant."
    }],
    "stream": false,
    "presence_penalty": 1.03,
    "frequency_penalty": 1.0,
    "repetition_penalty": 1.0,
    "temperature": 0.5,
    "top_p": 0.95,
    "top_k": 0,
    "seed": null,
    "stop": ["stop1", "stop2"],
    "stop_token_ids": [2, 13],
    "include_stop_str_in_output": false,
    "skip_special_tokens": true,
    "ignore_eos": false,
    "max_tokens": 20
}

Multimodal model:

Note: The value of the “image_url” parameter should be modified according to the actual situation.

{
    "model": "DeepSeek-R1",
    "messages": [{
        "role": "user",
        "content": [
           {"type": "text", "text": "My name is Olivier and I"},
           {"type": "image_url", "image_url": "/xxxx/test.png"}
        ]
    }],
    "stream": false,
    "presence_penalty": 1.03,
    "frequency_penalty": 1.0,
    "repetition_penalty": 1.0,
    "temperature": 0.5,
    "top_p": 0.95,
    "top_k": 0,
    "seed": null,
    "stop": ["stop1", "stop2"],
    "stop_token_ids": [2, 13],
    "include_stop_str_in_output": false,
    "skip_special_tokens": true,
    "ignore_eos": false,
    "max_tokens": 20
}

Multi-turn conversation

Request Example 1:

{
    "model": "DeepSeek-R1",
    "messages": [{
        "role": "system",
        "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."
        },
        {
        "role": "user",
        "content": "Hi, can you tell me the delivery date for my order? my order id is 12345."
        }
    ],
    "stream": false,
    "presence_penalty": 1.03,
    "frequency_penalty": 1.0,
    "repetition_penalty": 1.0,
    "temperature": 0.5,
    "top_p": 0.95,
    "top_k": 0,
    "seed": null,
    "stop": ["stop1", "stop2"],
    "stop_token_ids": [2],
    "ignore_eos": false,
    "max_tokens": 1024,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_delivery_date",
                "strict": true,
                "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "order_id": {
                            "type": "string",
                            "description": "The customer's order ID."
                        }
                    },
                    "required": [
                        "order_id"
                    ],
                    "additionalProperties": false
                }
            }
        }
    ],
   "tool_choice": "auto"
}

Request Example 2:

{
    "model": "DeepSeek-R1",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."
        },
        {
            "role": "user",
            "content": "Hi, can you tell me the delivery date for my order? my order id is 12345."
        },
        {
            "role": "assistant",
            "tool_calls": [
                {
                    "function": {
                        "arguments": "{\"order_id\": \"12345\"}",
                        "name": "get_delivery_date"
                    },
                    "id": "tool_call_8p2Nk",
                    "type": "function"
                }
            ]
        },
        {
            "role": "tool",
            "content": "the delivery date is 2024.09.10.",
            "tool_call_id": "tool_call_8p2Nk"
        }
    ],
    "stream": false,
    "repetition_penalty": 1.1,
    "temperature": 0.9,
    "top_p": 1,
    "max_tokens": 1024,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_delivery_date",
                "strict": true,
                "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "order_id": {
                            "type": "string",
                            "description": "The customer's order ID."
                        }
                    },
                    "required": [
                        "order_id"
                    ],
                    "additionalProperties": false
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Response Example:

Text Inference (“stream”=false):

Single-turn conversation:

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "model": "DeepSeek-R1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nHello there, how may I assist you today?"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    },
    "prefill_time": 200,
    "decode_time_arr": [56, 28, 28]
}

Multi-turn conversation:

Response Example 1:

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "model": "DeepSeek-R1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "function": {
                            "arguments": "{\"order_id\": \"12345\"}",
                            "name": "get_delivery_date"
                        },
                        "id": "call_JwmTNF3O",
                        "type": "function"
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ],
    "usage": {
        "prompt_tokens": 226,
        "completion_tokens": 122,
        "total_tokens": 348
    },
    "prefill_time": 200,
    "decode_time_arr": [56, 28, 28]
}

Response Example 2:

{
    "id": "endpoint_common_25",
    "object": "chat.completion",
    "created": 1728959154,
    "model": "DeepSeek-R1",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n Your order with ID 12345 is scheduled for delivery on September 10th, 2024.",
                "tool_calls": null
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 265,
        "completion_tokens": 30,
        "total_tokens": 295
    },
    "prefill_time": 200,
    "decode_time_arr": [56, 28, 28]
}

Streaming Inference:

Streaming Inference 1

(“stream”=true, using sse format return):


data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"\t"},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"endpoint_common_8","object":"chat.completion.chunk","created":1729614610,"model":"DeepSeek-R1","usage":{"prompt_tokens":54,"completion_tokens":17,"total_tokens":71},"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"stop"}]}

data: [DONE]

Streaming Inference 2

(“stream”=true, with configuration “fullTextEnabled”=true, using sse format return):

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello!"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today?"},"finish_reason":null}]}

data: {"id":"endpoint_common_11","object":"chat.completion.chunk","created":1730184192,"model":"DeepSeek-R1","full_text":"Hello! How can I assist you today?","usage":{"prompt_tokens":31,"completion_tokens":10,"total_tokens":41},"choices":[{"index":0,"delta":{"role":"assistant","content":"Hello! How can I assist you today?"},"finish_reason":"length"}]}

data: [DONE]

Output Explanation

Table 1

Explanation of text inference results

Parameter Name

Type

Description

id

string

Request ID.

object

string

The type of the returned result, currently always “chat.completion”.

created

integer

Inference request timestamp, accurate to the second.

model

string

The inference model used.

choices

list

List of inference results.

-

index

integer

Index of the choices message, currently only 0.

message

object

Inference message.

-

role

string

Role, currently always returns “assistant”.

content

string

Inference text result.

tool_calls

list

Model tool invocation output.

-

function

dict

Function call description.

-

arguments

string

Arguments for the function call, a JSON-formatted string.

name

string

Name of the called function.

id

string

ID of the model’s tool invocation.

type

string

Type of the tool, currently only supports “function”.

finish_reason

string

Reason for completion.

  • stop:

    • Request was CANCELLED or STOPPED, user is unaware, response is discarded.
    • Error occurred during execution, response output is empty, err_msg is non-empty.
    • Input validation exception, response output is empty, err_msg is non-empty.
    • Request ended normally upon encountering the eos (end of stream) symbol.
  • length:

    • Request ended due to reaching the maximum sequence length, response is the output of the last iteration.
    • Request ended due to reaching the maximum output length (including request and model granularity), response is the output of the last iteration.
  • tool_calls: Indicates that the model called a tool.

usage

object

Inference result statistics data.

-

prompt_tokens

int

Token length corresponding to the user input prompt text.

completion_tokens

int

Number of tokens in the inference result. In the PD scenario, it counts the total token number of both P and D inference results. When the inference length limit for a request is set to maxIterTimes, the D node response’s completion_tokens count is maxIterTimes+1, i.e., it includes the first token from the P inference result.

total_tokens

int

Total token count for the request and inference.

prefill_time

float

Time delay for the first token of the inference.

decode_time_arr

list

Array of time delays during the inference decoding process.

Table 2

Explanation of Streaming Inference Results

Parameter Name

Type

Description

data

object

The result of a single inference.

-

id

string

Request ID.

object

string

Currently returns “chat.completion.chunk”.

created

integer

Inference request timestamp, accurate to the second.

model

string

The inference model used.

full_text

string

Full text result, only returned when the configuration item “fullTextEnabled” is set to true.

usage

object

Inference result statistics.

-

prompt_tokens

int

The token length corresponding to the user-input prompt text.

completion_tokens

int

The number of tokens in the inference result. In PD scenarios, it counts the total tokens of both P and D inference results. When the inference length limit for a request is set to the value of maxIterTimes, the D node response will have a completion_tokens count of maxIterTimes + 1, which adds the first token of the P inference result.

total_tokens

int

The total number of tokens for the request and inference.

choices

list

Streaming inference results.

-

index

integer

The choices message index, which can only be 0 currently.

delta

object

The inference return result, the last response is empty.

-

role

string

The role, currently always returns “assistant”.

content

string

The inference text result.

finish_reason

string

Reason for completion, returned only in the final inference result.

  • stop:

    • The request was CANCELLED or STOPPED, without user awareness, and the response is discarded.
    • An error occurred during request execution, and the response is empty with a non-empty err_msg.
    • Input validation for the request failed, with the response being empty and err_msg non-empty.
    • The request ended normally due to encountering an EOS (End of Stream) symbol.
  • length:

    • The request ended because it reached the maximum sequence length, and the response is the output from the last iteration.
    • The request ended because it reached the maximum output length (including both the request and model-level length), and the response is the output from the last iteration.