跳到内容

表格分类的零样本批量推理 (TabLLM)

Ludwig 中的 Prompt 模板使用 Python 风格的占位符表示法,其中每个占位符对应于输入数据集中的一个列。

prompt:
 template: "The {color} {animal} jumped over the {size} {object}"

提供如上所示的 Prompt 模板时,填充所有占位符的 Prompt 将用作 LLM 的文本输入特征值。

Dataset:
| color | animal | size | object |
| ----- | ------ | ---- | ------ |
| brown | fox    | big  | dog    |
| white | cat    | huge | rock   |

Prompts:
"The brown fox jumped over the big dog"
"The white cat jumped over the huge rock"

通过使用括号引用列名,可以将表格数据集直接馈送给 LLM。Ludwig 会自动处理行值的格式化。

例如,以下配置可用于对具有以下列名的表格数据集执行零样本二元分类

  • Recency -- 距上次捐赠的月数
  • Frequency -- 捐赠总次数
  • Monetary -- 捐献血液总量(c.c.)
  • Time -- 距首次捐赠的月数

配置

model_type: llm
base_model: facebook/opt-350m
generation:
    temperature: 0.1
    top_p: 0.75
    top_k: 40
    num_beams: 4
    max_new_tokens: 64
prompt:
    template: >-
        The Recency -- months since last donation is {Recency -- months since
        last donation}. The Frequency -- total number of donations is {Frequency
        -- total number of donations}. The Monetary -- total blood donated in
        c.c. is {Monetary -- total blood donated in c.c.}. The Time -- months
        since first donation is {Time -- months since first donation}.
input_features:
-
    name: review
    type: text
output_features:
-
    name: label
    type: category
    preprocessing:
        fallback_label: "neutral"
    decoder:
        type: category_extractor
        match:
            "negative":
                type: contains
                value: "positive"
            "neutral":
                type: contains
                value: "neutral"
            "positive":
                type: contains
                value: "positive"