配置
配置结构¶
Ludwig 模型通过一个包含以下参数的配置进行配置:
model_type: ecd
input_features: []
output_features: []
combiner: {}
preprocessing: {}
defaults: {}
trainer: {}
hyperopt: {}
backend: {}
该配置指定了输入特征、输出特征、预处理、模型架构、训练循环、超参数搜索和后端基础设施——构建、训练和评估模型所需的一切。
- model_type:用于训练的模型变体。默认为 ECD,这是一种基于神经网络的架构。还支持 LLM(用于文本生成的大语言模型)和 GBM(梯度提升机,一种基于树的模型)。
- input_features:训练数据集中将用作模型输入的列、它们的数据类型、应如何进行预处理以及应如何进行编码。
- output_features:我们希望模型学习预测的目标。输出特征的数据类型定义了任务(例如,
number
是回归任务,category
是多类别分类任务等)。 - combiner:接受所有编码输入特征并将其转换为单个嵌入向量的主干模型架构。Combiner 有效地将单个特征级别的模型组合成一个可以接受任意数量输入的模型。GBM 模型不使用 Combiner。
- preprocessing:全局预处理选项,包括如何分割数据集和如何对数据进行采样。
- defaults 默认特征配置。当您有许多相同类型的输入特征,并希望对所有这些特征应用相同的预处理、编码器等时非常有用。如果提供了特征级别的配置,则会覆盖此处的设置。
- trainer:用于控制训练过程的超参数,包括批量大小(batch size)、学习率(learning rate)、训练周期数(epochs)等。
- hyperopt:超参数优化选项。前几节中的任何参数都可以视为超参数,并与其他配置参数结合进行探索。
- backend:基础设施和运行时选项,包括训练期间将使用哪些库和分布式策略、每个训练工作进程使用的集群资源数量、总工作进程数量、是否使用 GPU 等。
Ludwig 的配置兼具易用性和灵活性,通过合理的默认值提供便利,并通过对模型参数的详细控制提供灵活性。只有 input_features
和 output_features
是必需的,而所有其他字段都使用合理的默认值,但可以根据需要手动设置或修改。
配置可以表示为 Python 字典(用于 Ludwig 的 CLI 的 --config_str
参数),或 YAML 文件(--config
参数)。
input_features:
-
name: Pclass
type: category
-
name: Sex
type: category
-
name: Age
type: number
preprocessing:
missing_value_strategy: fill_with_mean
-
name: SibSp
type: number
-
name: Parch
type: number
-
name: Fare
type: number
preprocessing:
missing_value_strategy: fill_with_mean
-
name: Embarked
type: category
output_features:
-
name: Survived
type: binary
{
"input_features": [
{
"name": "Pclass",
"type": "category"
},
{
"name": "Sex",
"type": "category"
},
{
"name": "Age",
"type": "number",
"preprocessing": {
"missing_value_strategy": "fill_with_mean"
}
},
{
"name": "SibSp",
"type": "number"
},
{
"name": "Parch",
"type": "number"
},
{
"name": "Fare",
"type": "number",
"preprocessing": {
"missing_value_strategy": "fill_with_mean"
}
},
{
"name": "Embarked",
"type": "category"
}
],
"output_features": [
{
"name": "Survived",
"type": "binary"
}
]
}
渲染后的默认值¶
Ludwig 有许多参数选项,但除了输入和输出特征名称和类型外,所有其他参数都是可选的。当参数未指定时,Ludwig 会为其分配一个合理的默认值。Ludwig 将“合理”定义为不太可能产生糟糕的结果,并且能够在商用硬件上在合理的时间内完成训练。换句话说,Ludwig 的默认值旨在成为良好的基线配置,可以在其基础上添加更高级的选项。
以下是使用以下命令生成的最小配置示例:
ludwig init_config --dataset ludwig://sst2 --target label --output sst2.yaml
input_features:
- name: sentence
type: text
output_features:
- name: label
type: binary
以下是使用以下命令生成的完整渲染配置:
ludwig render_config --config sst2.yaml --output sst2_rendered.yaml
input_features:
- active: true
name: sentence
type: text
column: sentence
proc_column: sentence_jcnVJf
tied: null
preprocessing:
pretrained_model_name_or_path: null
tokenizer: space_punct
vocab_file: null
sequence_length: null
max_sequence_length: 256
most_common: 20000
padding_symbol: <PAD>
unknown_symbol: <UNK>
padding: right
lowercase: false
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
ngram_size: 2
cache_encoder_embeddings: false
compute_idf: false
prompt:
template: null
task: null
retrieval:
type: null
index_name: null
model_name: null
k: 0
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
output_features:
- active: true
name: label
type: binary
column: label
proc_column: label_2Xl8CP
reduce_input: sum
default_validation_metric: roc_auc
dependencies: []
reduce_dependencies: sum
input_size: null
num_classes: null
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: binary_weighted_cross_entropy
weight: 1.0
positive_class_weight: null
robust_lambda: 0
confidence_penalty: 0
calibration: false
preprocessing:
missing_value_strategy: drop_row
fallback_true_label: null
fill_value: null
computed_fill_value: null
threshold: 0.5
model_type: ecd
trainer:
validation_field: label
validation_metric: roc_auc
early_stop: 5
skip_all_evaluation: false
enable_profiling: false
profiler:
wait: 1
warmup: 1
active: 3
repeat: 5
skip_first: 0
learning_rate: 0.001
learning_rate_scheduler:
decay: null
decay_rate: 0.96
decay_steps: 10000
staircase: false
reduce_on_plateau: 0
reduce_on_plateau_patience: 10
reduce_on_plateau_rate: 0.1
warmup_evaluations: 0
warmup_fraction: 0.0
reduce_eval_metric: loss
reduce_eval_split: training
t_0: null
t_mult: 1
eta_min: 0
epochs: 100
checkpoints_per_epoch: 0
train_steps: null
eval_steps: null
steps_per_checkpoint: 0
effective_batch_size: auto
batch_size: auto
max_batch_size: 1099511627776
gradient_accumulation_steps: auto
eval_batch_size: null
evaluate_training_set: false
optimizer:
type: adam
betas:
- 0.9
- 0.999
eps: 1.0e-08
weight_decay: 0.0
amsgrad: false
regularization_type: l2
regularization_lambda: 0.0
should_shuffle: true
increase_batch_size_on_plateau: 0
increase_batch_size_on_plateau_patience: 5
increase_batch_size_on_plateau_rate: 2.0
increase_batch_size_eval_metric: loss
increase_batch_size_eval_split: training
gradient_clipping:
clipglobalnorm: 0.5
clipnorm: null
clipvalue: null
learning_rate_scaling: linear
bucketing_field: null
use_mixed_precision: false
compile: false
enable_gradient_checkpointing: false
preprocessing:
sample_ratio: 1.0
sample_size: null
oversample_minority: null
undersample_majority: null
split:
type: random
probabilities:
- 0.7
- 0.1
- 0.2
global_max_sequence_length: null
defaults:
audio:
preprocessing:
audio_file_length_limit_in_s: 7.5
missing_value_strategy: bfill
fill_value: null
computed_fill_value: null
in_memory: true
padding_value: 0.0
norm: null
type: fbank
window_length_in_s: 0.04
window_shift_in_s: 0.02
num_fft_points: null
window_type: hamming
num_filter_bands: 80
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
bag:
preprocessing:
tokenizer: space
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
lowercase: false
most_common: 10000
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
vocab: null
representation: dense
embedding_size: 50
force_embedding_size: false
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
binary:
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: binary_weighted_cross_entropy
weight: 1.0
positive_class_weight: null
robust_lambda: 0
confidence_penalty: 0
preprocessing:
missing_value_strategy: fill_with_false
fallback_true_label: null
fill_value: null
computed_fill_value: null
encoder:
type: passthrough
skip: false
category:
decoder:
type: classifier
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
num_classes: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
preprocessing:
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
lowercase: false
most_common: 10000
cache_encoder_embeddings: false
encoder:
type: dense
skip: false
dropout: 0.0
vocab: null
embedding_initializer: null
embedding_size: 50
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
date:
preprocessing:
missing_value_strategy: fill_with_const
fill_value: ''
computed_fill_value: ''
datetime_format: null
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
embedding_size: 10
embeddings_on_cpu: false
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
h3:
preprocessing:
missing_value_strategy: fill_with_const
fill_value: 576495936675512319
computed_fill_value: 576495936675512319
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
embedding_size: 10
embeddings_on_cpu: false
reduce_output: sum
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
image:
decoder:
type: unet
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: 1024
height: null
width: null
num_channels: null
conv_norm: batch
num_classes: null
loss:
type: softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
preprocessing:
missing_value_strategy: bfill
fill_value: null
computed_fill_value: null
height: null
width: null
num_channels: null
resize_method: interpolate
infer_image_num_channels: true
infer_image_dimensions: true
infer_image_max_height: 256
infer_image_max_width: 256
infer_image_sample_size: 100
standardize_image: null
in_memory: true
num_processes: 1
requires_equal_dimensions: false
num_classes: null
infer_image_num_classes: false
encoder:
type: stacked_cnn
skip: false
conv_dropout: 0.0
conv_activation: relu
height: null
width: null
num_channels: null
out_channels: 32
kernel_size: 3
stride: 1
padding_mode: zeros
padding: valid
dilation: 1
groups: 1
pool_function: max
pool_kernel_size: 2
pool_stride: null
pool_padding: 0
pool_dilation: 1
output_size: 128
conv_use_bias: true
conv_norm: null
conv_norm_params: null
num_conv_layers: null
conv_layers: null
fc_dropout: 0.0
fc_activation: relu
fc_use_bias: true
fc_bias_initializer: zeros
fc_weights_initializer: xavier_uniform
fc_norm: null
fc_norm_params: null
num_fc_layers: 1
fc_layers: null
augmentation: []
number:
decoder:
type: regressor
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: mean_squared_error
weight: 1.0
preprocessing:
missing_value_strategy: fill_with_const
fill_value: 0.0
computed_fill_value: 0.0
normalization: zscore
outlier_strategy: null
outlier_threshold: 3.0
computed_outlier_fill_value: 0.0
encoder:
type: passthrough
skip: false
sequence:
decoder:
type: generator
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
vocab_size: null
max_sequence_length: null
cell_type: gru
input_size: 256
reduce_input: sum
num_layers: 1
loss:
type: sequence_softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
unique: false
preprocessing:
tokenizer: space
vocab_file: null
sequence_length: null
max_sequence_length: 256
most_common: 20000
padding_symbol: <PAD>
unknown_symbol: <UNK>
padding: right
lowercase: false
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
ngram_size: 2
cache_encoder_embeddings: false
encoder:
type: embed
skip: false
dropout: 0.0
max_sequence_length: null
representation: dense
vocab: null
weights_initializer: uniform
reduce_output: sum
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
set:
decoder:
type: classifier
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
num_classes: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
loss:
type: sigmoid_cross_entropy
weight: 1.0
class_weights: null
preprocessing:
tokenizer: space
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
lowercase: false
most_common: 10000
encoder:
type: embed
skip: false
dropout: 0.0
activation: relu
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
embedding_size: 50
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
output_size: 10
norm: null
norm_params: null
num_fc_layers: 0
fc_layers: null
text:
decoder:
type: generator
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
vocab_size: null
max_sequence_length: null
cell_type: gru
input_size: 256
reduce_input: sum
num_layers: 1
loss:
type: sequence_softmax_cross_entropy
weight: 1.0
class_weights: null
robust_lambda: 0
confidence_penalty: 0
class_similarities: null
class_similarities_temperature: 0
unique: false
preprocessing:
pretrained_model_name_or_path: null
tokenizer: space_punct
vocab_file: null
sequence_length: null
max_sequence_length: 256
most_common: 20000
padding_symbol: <PAD>
unknown_symbol: <UNK>
padding: right
lowercase: false
missing_value_strategy: fill_with_const
fill_value: <UNK>
computed_fill_value: <UNK>
ngram_size: 2
cache_encoder_embeddings: false
compute_idf: false
prompt:
template: null
task: null
retrieval:
type: null
index_name: null
model_name: null
k: 0
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
timeseries:
decoder:
type: projector
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
output_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
activation: null
multiplier: 1.0
clip: null
loss:
type: huber
weight: 1.0
delta: 1.0
preprocessing:
tokenizer: space
timeseries_length_limit: 256
padding_value: 0.0
padding: right
missing_value_strategy: fill_with_const
fill_value: ''
computed_fill_value: ''
window_size: 0
encoder:
type: parallel_cnn
skip: false
dropout: 0.0
activation: relu
max_sequence_length: null
representation: dense
vocab: null
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
should_embed: true
embedding_size: 256
embeddings_on_cpu: false
embeddings_trainable: true
pretrained_embeddings: null
reduce_output: sum
num_conv_layers: null
conv_layers: null
num_filters: 256
filter_size: 3
pool_function: max
pool_size: null
output_size: 256
norm: null
norm_params: null
num_fc_layers: null
fc_layers: null
vector:
decoder:
type: projector
fc_layers: null
num_fc_layers: 0
fc_output_size: 256
fc_use_bias: true
fc_weights_initializer: xavier_uniform
fc_bias_initializer: zeros
fc_norm: null
fc_norm_params: null
fc_activation: relu
fc_dropout: 0.0
input_size: null
output_size: null
use_bias: true
weights_initializer: xavier_uniform
bias_initializer: zeros
activation: null
multiplier: 1.0
clip: null
loss:
type: mean_squared_error
weight: 1.0
preprocessing:
vector_size: null
missing_value_strategy: fill_with_const
fill_value: ''
computed_fill_value: ''
encoder:
type: dense
skip: false
dropout: 0.0
activation: relu
input_size: null
output_size: 256
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
norm: null
norm_params: null
num_layers: 1
fc_layers: null
hyperopt: null
backend: null
ludwig_version: 0.10.3
combiner:
type: concat
dropout: 0.0
activation: relu
flatten_inputs: false
residual: false
use_bias: true
bias_initializer: zeros
weights_initializer: xavier_uniform
num_fc_layers: 0
output_size: 256
norm: null
norm_params: null
fc_layers: null