Request Parameters
Parameter | Required | Description | Value Requirements | |||
---|---|---|---|---|---|---|
model | Required | Model Name | Must match the value of | |||
messages | Required | Inference request message structure. | List type, with the number of characters in the | |||
- | role | Required | Inference request message role. | String type. Possible roles are:
| ||
content | 必选 | 推理请求内容。单模态文本模型为string类型,多模态模型为list类型。 |
| |||
- | type | Optional | Type of inference request content. |
The total number of image_url, video_url, and audio_url in a single request should be <= 20. | ||
text | Optional | The inference request content is text. | Cannot be empty, supports both Chinese and English. | |||
image_url | Optional | The inference request content is an image. | Supports images from server local paths, image types support jpg, png, jpeg, and base64 encoded jpg images. Supports image URLs with both HTTP and HTTPS protocols. The maximum image size supported is 20MB. | |||
video_url | Optional | The inference request content is a video. | Supports videos from server local paths, video types support MP4, AVI, WMV, and video URLs with both HTTP and HTTPS protocols. The maximum video size supported is 512MB. | |||
audio_url | Optional | The inference request content is audio. | Supports audio from server local paths, audio types support MP3, WAV, FLAC, and audio URLs with both HTTP and HTTPS protocols. The maximum audio size supported is 20MB. | |||
tool_calls | Optional | Tool calls generated by the model. | Type is List[dict]. When the role is assistant, it represents the model’s call to the tool. | |||
- | function | Required | Represents the tool invoked by the model. | Type is dict.
| ||
id | Required | Represents the ID of a specific tool invocation by the model. | String. | |||
type | Required | The type of tool being invoked. | String, only supports “function”. | |||
tool_call_id | Required when the role is “tool”, otherwise optional. | Associates with the ID of the model’s tool invocation. | String. | |||
stream | Optional | Specifies whether the result should be text-based inference or stream-based inference. | Boolean type, default value is false.
| |||
presence_penalty | Optional | The presence penalty ranges from -2.0 to 2.0 and affects how the model penalizes new tokens based on whether they have appeared in the text so far. Positive values will penalize words already used, increasing the likelihood of the model introducing new topics. | Float type, value range [-2.0, 2.0], default value 0.0. | |||
frequency_penalty | Optional | The frequency penalty ranges from -2.0 to 2.0 and influences how the model penalizes new words based on the existing frequency of words in the text. Positive values will penalize frequently used words, reducing the likelihood of repetition in a line. | Float type, value range [-2.0, 2.0], default value 0.0. | |||
repetition_penalty | Optional | The repetition penalty is a technique used to reduce the probability of repeating segments in text generation. It penalizes previously generated text, making the model more likely to choose new, non-repetitive content. | Float type, value range (0.0, 2.0], default value 1.0. | |||
temperature | Optional | Controls the randomness of the generated output, with higher values producing more diverse outputs. | Float type, value range [0.0, 2.0], default value 1.0. The higher the value, the greater the randomness of the result. It is recommended to use a value greater than or equal to 0.001, as values below 0.001 may lead to poor text quality. | |||
top_p | Optional | Controls the range of words considered by the model during generation by using cumulative probability to select candidate words until the cumulative probability exceeds the given threshold. This parameter can also control the diversity of the generated result by selecting candidate words based on cumulative probability. | Float type, value range (0.0, 1.0], default value 1.0. | |||
top_k | Optional | Controls the range of words considered by the model during generation by selecting only from the top k candidate words with the highest probabilities. | Int32 type, value range [0, 2147483647], default value determined by the backend model when the field is not set. For more details, refer to documentation. If the value is greater than or equal to vocabSize, the default value will be vocabSize. vocabSize is read from the config.json file under the modelWeightPath directory. It is recommended that users add the vocab_size or padded_vocab_size parameters to config.json to avoid inference failures. | |||
seed | Optional | Used to specify the random seed for the inference process. The same seed value ensures reproducibility of inference results, while different seed values increase the randomness of the results. | UInt64 type, value range [0, 18446744073709551615]. If not provided, the system generates a random seed value. A WARNING may appear when the seed approaches the maximum value, but it will not affect usage. To remove the WARNING, you can reduce the seed value. | |||
stop | Optional | Text to stop the inference. The output result does not include the stop words by default. | List[string] type or string type, default value null.
PD-separated scenarios do not support this parameter. | |||
stop_token_ids | Optional | List of token IDs to stop the inference. The output result does not include the token IDs in the stop inference list by default. | List[int32] type, elements exceeding int32 will be ignored, default value is null. | |||
include_stop_str_in_output | Optional | Determines whether to include the stop string in the generated inference text. | Bool type, default value is false.
If stop or stop_token_ids is not provided, this field will be ignored. PD-separated scenarios do not support this parameter. | |||
skip_special_tokens | Optional | Specifies whether to skip special tokens in the generated inference text. | Bool type, default value is true.
| |||
ignore_eos | Optional | Specifies whether to ignore the eos_token end symbol during inference text generation. | Bool type, default value is false.
| |||
max_tokens | Optional | Specifies the maximum number of tokens allowed in the generated inference. The actual number of tokens generated is also affected by the maxIterTimes parameter in the configuration file, and the generated token count is less than or equal to min(maxIterTimes, max_tokens). | Int type, range (0, 2147483647], default value is maxIterTimes. | |||
tools | Optional | A list of tools that may be used. | List[dict] type. | |||
- | type | Required | Indicates the tool type. | Only supports the string “function”. | ||
function | Required | Function description. | dict type. | |||
- | name | Required | Function name. | String. | ||
strict | Optional | Indicates whether the generated tool calls strictly follow the schema format. | bool type, default is false. | |||
description | Optional | Describes the function’s functionality and usage. | String. | |||
parameters | Optional | Indicates the parameters accepted by the function. | JSON schema format. | |||
- | type | Required | Indicates the type of the function parameter’s attribute. | String, only supports “object”. | ||
properties | Required | Properties of the function parameters. Each key represents a parameter name, which can be defined by the user. The value is of type dict, representing the parameter description, containing type and description fields. | dict type. | |||
required | Required | Indicates the list of required parameters for the function. | List[string] type. | |||
additionalProperties | Optional | Indicates whether additional unmentioned parameters are allowed. | bool type, default value is false.
| |||
tool_choice | Optional | Controls whether the model calls a tool. | string type or dict type, can be null, default value is “auto”.
By specifying you can specify a particular tool, forcing the model to call that tool. |