inference
Indexโ
Schemasโ
Inferenceโ
Inference is a module schema consisting of model, framework and so on
Attributesโ
name | type | description | default value |
---|---|---|---|
framework required | "Ollama" | "KubeRay" | The framework or environment in which the model operates. | |
model required | str | The model name to be used for inference. | |
num_ctx | int | The size of the context window used to generate the next token. | 2048 |
num_predict | int | Maximum number of tokens to predict when generating text. | 128 |
system | str | The system message, which will be set in the template. | "" |
temperature | float | A parameter determines whether the model's output is more random and creative or more predictable. | 0.8 |
template | str | The full prompt template, which will be sent to the model. | "" |
top_k | int | A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. | 40 |
top_p | float | A higher value (e.g. 0.9) will give more diverse answers, while a lower value (e.g. 0.5) will be more conservative. | 0.9 |
Examplesโ
import inference.v1.infer
accessories: {
"inference@v0.1.0": infer.Inference {
model: "llama3"
framework: "Ollama"
system: "You are Mario from super mario bros, acting as an assistant."
template: "{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant"
top_k: 40
top_p: 0.9
temperature: 0.8
num_predict: 128
num_ctx: 2048
}
}