Skip to main content
Version: v0.13 ๐Ÿšง

inference

Indexโ€‹

Schemasโ€‹

Inferenceโ€‹

Inference is a module schema consisting of model, framework and so on

Attributesโ€‹

nametypedescriptiondefault value
framework required"Ollama""KubeRay"The framework or environment in which the model operates.
model requiredstrThe model name to be used for inference.
num_ctxintThe size of the context window used to generate the next token.2048
num_predictintMaximum number of tokens to predict when generating text.128
systemstrThe system message, which will be set in the template.""
temperaturefloatA parameter determines whether the model's output is more random and creative or more predictable.0.8
templatestrThe full prompt template, which will be sent to the model.""
top_kintA higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative.40
top_pfloatA higher value (e.g. 0.9) will give more diverse answers, while a lower value (e.g. 0.5) will be more conservative.0.9

Examplesโ€‹

import inference.v1.infer

accessories: {
"inference@v0.1.0": infer.Inference {
model: "llama3"
framework: "Ollama"

system: "You are Mario from super mario bros, acting as an assistant."
template: "{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant"

top_k: 40
top_p: 0.9
temperature: 0.8

num_predict: 128
num_ctx: 2048
}
}