表格分类的零样本批量推理 (TabLLM)
Ludwig 中的 Prompt 模板使用 Python 风格的占位符表示法,其中每个占位符对应于输入数据集中的一个列。
prompt:
template: "The {color} {animal} jumped over the {size} {object}"
提供如上所示的 Prompt 模板时,填充所有占位符的 Prompt 将用作 LLM 的文本输入特征值。
Dataset:
| color | animal | size | object |
| ----- | ------ | ---- | ------ |
| brown | fox | big | dog |
| white | cat | huge | rock |
Prompts:
"The brown fox jumped over the big dog"
"The white cat jumped over the huge rock"
通过使用括号引用列名,可以将表格数据集直接馈送给 LLM。Ludwig 会自动处理行值的格式化。
例如,以下配置可用于对具有以下列名的表格数据集执行零样本二元分类
Recency -- 距上次捐赠的月数
Frequency -- 捐赠总次数
Monetary -- 捐献血液总量(c.c.)
Time -- 距首次捐赠的月数
配置¶
model_type: llm
base_model: facebook/opt-350m
generation:
temperature: 0.1
top_p: 0.75
top_k: 40
num_beams: 4
max_new_tokens: 64
prompt:
template: >-
The Recency -- months since last donation is {Recency -- months since
last donation}. The Frequency -- total number of donations is {Frequency
-- total number of donations}. The Monetary -- total blood donated in
c.c. is {Monetary -- total blood donated in c.c.}. The Time -- months
since first donation is {Time -- months since first donation}.
input_features:
-
name: review
type: text
output_features:
-
name: label
type: category
preprocessing:
fallback_label: "neutral"
decoder:
type: category_extractor
match:
"negative":
type: contains
value: "positive"
"neutral":
type: contains
value: "neutral"
"positive":
type: contains
value: "positive"