Parameter

Required

Description

Value Requirements

model

Required

Model Name

Must match the value of modelName in the MindIE Server configuration file.

messages

Required

Inference request message structure.

List type, with the number of characters in the messages content being between 0KB and 4MB. Supports both Chinese and English. The number of tokens after tokenization should be less than or equal to the minimum value among maxInputTokenLen, maxSeqLen - 1, max_position_embeddings, and 1MB. The max_position_embeddings is obtained from the weight file config.json, and other related parameters are taken from the configuration file.

-

role

Required

Inference request message role.

String type. Possible roles are:

  • system: System role
  • user: User role
  • assistant: Assistant role
  • tool: Tool role

content

必选

推理请求内容。单模态文本模型为string类型,多模态模型为list类型。

  • string:

    • If the role is ‘assistant’ and tool_calls is not empty, content can be omitted; for other roles, content must be provided.
    • In other cases, content must be provided.
  • list: Follows the example format of multimodal model ‘inputs’ parameter.

-

type

Optional

Type of inference request content.

  • text: Text
  • image_url: Image
  • video_url: Video
  • audio_url: Audio

The total number of image_url, video_url, and audio_url in a single request should be <= 20.

text

Optional

The inference request content is text.

Cannot be empty, supports both Chinese and English.

image_url

Optional

The inference request content is an image.

Supports images from server local paths, image types support jpg, png, jpeg, and base64 encoded jpg images. Supports image URLs with both HTTP and HTTPS protocols. The maximum image size supported is 20MB.

video_url

Optional

The inference request content is a video.

Supports videos from server local paths, video types support MP4, AVI, WMV, and video URLs with both HTTP and HTTPS protocols. The maximum video size supported is 512MB.

audio_url

Optional

The inference request content is audio.

Supports audio from server local paths, audio types support MP3, WAV, FLAC, and audio URLs with both HTTP and HTTPS protocols. The maximum audio size supported is 20MB.

tool_calls

Optional

Tool calls generated by the model.

Type is List[dict]. When the role is assistant, it represents the model’s call to the tool.

-

function

Required

Represents the tool invoked by the model.

Type is dict.

  • arguments, required, a JSON-formatted string representing the parameters for the function call.
  • name, required, a string representing the name of the function being called.

id

Required

Represents the ID of a specific tool invocation by the model.

String.

type

Required

The type of tool being invoked.

String, only supports “function”.

tool_call_id

Required when the role is “tool”, otherwise optional.

Associates with the ID of the model’s tool invocation.

String.

stream

Optional

Specifies whether the result should be text-based inference or stream-based inference.

Boolean type, default value is false.

  • true: Stream-based inference.
  • false: Text-based inference.

presence_penalty

Optional

The presence penalty ranges from -2.0 to 2.0 and affects how the model penalizes new tokens based on whether they have appeared in the text so far. Positive values will penalize words already used, increasing the likelihood of the model introducing new topics.

Float type, value range [-2.0, 2.0], default value 0.0.

frequency_penalty

Optional

The frequency penalty ranges from -2.0 to 2.0 and influences how the model penalizes new words based on the existing frequency of words in the text. Positive values will penalize frequently used words, reducing the likelihood of repetition in a line.

Float type, value range [-2.0, 2.0], default value 0.0.

repetition_penalty

Optional

The repetition penalty is a technique used to reduce the probability of repeating segments in text generation. It penalizes previously generated text, making the model more likely to choose new, non-repetitive content.

Float type, value range (0.0, 2.0], default value 1.0.

temperature

Optional

Controls the randomness of the generated output, with higher values producing more diverse outputs.

Float type, value range [0.0, 2.0], default value 1.0.

The higher the value, the greater the randomness of the result. It is recommended to use a value greater than or equal to 0.001, as values below 0.001 may lead to poor text quality.

top_p

Optional

Controls the range of words considered by the model during generation by using cumulative probability to select candidate words until the cumulative probability exceeds the given threshold. This parameter can also control the diversity of the generated result by selecting candidate words based on cumulative probability.

Float type, value range (0.0, 1.0], default value 1.0.

top_k

Optional

Controls the range of words considered by the model during generation by selecting only from the top k candidate words with the highest probabilities.

Int32 type, value range [0, 2147483647], default value determined by the backend model when the field is not set. For more details, refer to documentation.

If the value is greater than or equal to vocabSize, the default value will be vocabSize.

vocabSize is read from the config.json file under the modelWeightPath directory. It is recommended that users add the vocab_size or padded_vocab_size parameters to config.json to avoid inference failures.

seed

Optional

Used to specify the random seed for the inference process. The same seed value ensures reproducibility of inference results, while different seed values increase the randomness of the results.

UInt64 type, value range [0, 18446744073709551615]. If not provided, the system generates a random seed value.

A WARNING may appear when the seed approaches the maximum value, but it will not affect usage. To remove the WARNING, you can reduce the seed value.

stop

Optional

Text to stop the inference. The output result does not include the stop words by default.

List[string] type or string type, default value null.

  • For List[string], the number of elements should not exceed 1024, with each element having a length of 1-1024. The total length of the list elements should not exceed 32768 (256*128). An empty list is equivalent to null.
  • For string type, the length range is 1~1024 characters.

PD-separated scenarios do not support this parameter.

stop_token_ids

Optional

List of token IDs to stop the inference. The output result does not include the token IDs in the stop inference list by default.

List[int32] type, elements exceeding int32 will be ignored, default value is null.

include_stop_str_in_output

Optional

Determines whether to include the stop string in the generated inference text.

Bool type, default value is false.

  • true: Includes stop string.
  • false: Does not include stop string.

If stop or stop_token_ids is not provided, this field will be ignored.

PD-separated scenarios do not support this parameter.

skip_special_tokens

Optional

Specifies whether to skip special tokens in the generated inference text.

Bool type, default value is true.

  • true: Skip special tokens.
  • false: Retain special tokens.

ignore_eos

Optional

Specifies whether to ignore the eos_token end symbol during inference text generation.

Bool type, default value is false.

  • true: Ignore eos_token end symbol.
  • false: Do not ignore eos_token end symbol.

max_tokens

Optional

Specifies the maximum number of tokens allowed in the generated inference. The actual number of tokens generated is also affected by the maxIterTimes parameter in the configuration file, and the generated token count is less than or equal to min(maxIterTimes, max_tokens).

Int type, range (0, 2147483647], default value is maxIterTimes.

tools

Optional

A list of tools that may be used.

List[dict] type.

-

type

Required

Indicates the tool type.

Only supports the string “function”.

function

Required

Function description.

dict type.

-

name

Required

Function name.

String.

strict

Optional

Indicates whether the generated tool calls strictly follow the schema format.

bool type, default is false.

description

Optional

Describes the function’s functionality and usage.

String.

parameters

Optional

Indicates the parameters accepted by the function.

JSON schema format.

-

type

Required

Indicates the type of the function parameter’s attribute.

String, only supports “object”.

properties

Required

Properties of the function parameters. Each key represents a parameter name, which can be defined by the user. The value is of type dict, representing the parameter description, containing type and description fields.

dict type.

required

Required

Indicates the list of required parameters for the function.

List[string] type.

additionalProperties

Optional

Indicates whether additional unmentioned parameters are allowed.

bool type, default value is false.

  • true: Allows additional unmentioned parameters.
  • false: Does not allow additional unmentioned parameters.

tool_choice

Optional

Controls whether the model calls a tool.

string type or dict type, can be null, default value is “auto”.

  • “none”: Indicates that the model will not call any tools and will generate a message instead.
  • ”auto”: Indicates that the model can either generate a message or call one or more tools.
  • ”required”: Indicates that the model must call one or more tools.

By specifying you can specify a particular tool, forcing the model to call that tool.