跳到内容

配置

配置结构

Ludwig 模型通过一个包含以下参数的配置进行配置:

model_type: ecd
input_features: []
output_features: []
combiner: {}
preprocessing: {}
defaults: {}
trainer: {}
hyperopt: {}
backend: {}

该配置指定了输入特征、输出特征、预处理、模型架构、训练循环、超参数搜索和后端基础设施——构建、训练和评估模型所需的一切。

  • model_type:用于训练的模型变体。默认为 ECD,这是一种基于神经网络的架构。还支持 LLM(用于文本生成的大语言模型)和 GBM(梯度提升机,一种基于树的模型)。
  • input_features:训练数据集中将用作模型输入的列、它们的数据类型、应如何进行预处理以及应如何进行编码。
  • output_features:我们希望模型学习预测的目标。输出特征的数据类型定义了任务(例如,number 是回归任务,category 是多类别分类任务等)。
  • combiner:接受所有编码输入特征并将其转换为单个嵌入向量的主干模型架构。Combiner 有效地将单个特征级别的模型组合成一个可以接受任意数量输入的模型。GBM 模型不使用 Combiner。
  • preprocessing:全局预处理选项,包括如何分割数据集和如何对数据进行采样。
  • defaults 默认特征配置。当您有许多相同类型的输入特征,并希望对所有这些特征应用相同的预处理、编码器等时非常有用。如果提供了特征级别的配置,则会覆盖此处的设置。
  • trainer:用于控制训练过程的超参数,包括批量大小(batch size)、学习率(learning rate)、训练周期数(epochs)等。
  • hyperopt:超参数优化选项。前几节中的任何参数都可以视为超参数,并与其他配置参数结合进行探索。
  • backend:基础设施和运行时选项,包括训练期间将使用哪些库和分布式策略、每个训练工作进程使用的集群资源数量、总工作进程数量、是否使用 GPU 等。

Ludwig 的配置兼具易用性和灵活性,通过合理的默认值提供便利,并通过对模型参数的详细控制提供灵活性。只有 input_featuresoutput_features 是必需的,而所有其他字段都使用合理的默认值,但可以根据需要手动设置或修改。

配置可以表示为 Python 字典(用于 Ludwig 的 CLI--config_str 参数),或 YAML 文件(--config 参数)。

input_features:
    -
        name: Pclass
        type: category
    -
        name: Sex
        type: category
    -
        name: Age
        type: number
        preprocessing:
            missing_value_strategy: fill_with_mean
    -
        name: SibSp
        type: number
    -
        name: Parch
        type: number
    -
        name: Fare
        type: number
        preprocessing:
            missing_value_strategy: fill_with_mean
    -
        name: Embarked
        type: category

output_features:
    -
        name: Survived
        type: binary
{
    "input_features": [
        {
            "name": "Pclass",
            "type": "category"
        },
        {
            "name": "Sex",
            "type": "category"
        },
        {
            "name": "Age",
            "type": "number",
            "preprocessing": {
                "missing_value_strategy": "fill_with_mean"
            }
        },
        {
            "name": "SibSp",
            "type": "number"
        },
        {
            "name": "Parch",
            "type": "number"
        },
        {
            "name": "Fare",
            "type": "number",
            "preprocessing": {
                "missing_value_strategy": "fill_with_mean"
            }
        },
        {
            "name": "Embarked",
            "type": "category"
        }
    ],
    "output_features": [
        {
            "name": "Survived",
            "type": "binary"
        }
    ]
}

渲染后的默认值

Ludwig 有许多参数选项,但除了输入和输出特征名称和类型外,所有其他参数都是可选的。当参数未指定时,Ludwig 会为其分配一个合理的默认值。Ludwig 将“合理”定义为不太可能产生糟糕的结果,并且能够在商用硬件上在合理的时间内完成训练。换句话说,Ludwig 的默认值旨在成为良好的基线配置,可以在其基础上添加更高级的选项。

以下是使用以下命令生成的最小配置示例:

ludwig init_config --dataset ludwig://sst2 --target label --output sst2.yaml
input_features:
- name: sentence
  type: text
output_features:
- name: label
  type: binary

以下是使用以下命令生成的完整渲染配置:

ludwig render_config --config sst2.yaml --output sst2_rendered.yaml
input_features:
-   active: true
    name: sentence
    type: text
    column: sentence
    proc_column: sentence_jcnVJf
    tied: null
    preprocessing:
        pretrained_model_name_or_path: null
        tokenizer: space_punct
        vocab_file: null
        sequence_length: null
        max_sequence_length: 256
        most_common: 20000
        padding_symbol: <PAD>
        unknown_symbol: <UNK>
        padding: right
        lowercase: false
        missing_value_strategy: fill_with_const
        fill_value: <UNK>
        computed_fill_value: <UNK>
        ngram_size: 2
        cache_encoder_embeddings: false
        compute_idf: false
        prompt:
            template: null
            task: null
            retrieval:
                type: null
                index_name: null
                model_name: null
                k: 0
    encoder:
        type: parallel_cnn
        skip: false
        dropout: 0.0
        activation: relu
        max_sequence_length: null
        representation: dense
        vocab: null
        use_bias: true
        bias_initializer: zeros
        weights_initializer: xavier_uniform
        should_embed: true
        embedding_size: 256
        embeddings_on_cpu: false
        embeddings_trainable: true
        pretrained_embeddings: null
        reduce_output: sum
        num_conv_layers: null
        conv_layers: null
        num_filters: 256
        filter_size: 3
        pool_function: max
        pool_size: null
        output_size: 256
        norm: null
        norm_params: null
        num_fc_layers: null
        fc_layers: null
output_features:
-   active: true
    name: label
    type: binary
    column: label
    proc_column: label_2Xl8CP
    reduce_input: sum
    default_validation_metric: roc_auc
    dependencies: []
    reduce_dependencies: sum
    input_size: null
    num_classes: null
    decoder:
        type: regressor
        fc_layers: null
        num_fc_layers: 0
        fc_output_size: 256
        fc_use_bias: true
        fc_weights_initializer: xavier_uniform
        fc_bias_initializer: zeros
        fc_norm: null
        fc_norm_params: null
        fc_activation: relu
        fc_dropout: 0.0
        input_size: null
        use_bias: true
        weights_initializer: xavier_uniform
        bias_initializer: zeros
    loss:
        type: binary_weighted_cross_entropy
        weight: 1.0
        positive_class_weight: null
        robust_lambda: 0
        confidence_penalty: 0
    calibration: false
    preprocessing:
        missing_value_strategy: drop_row
        fallback_true_label: null
        fill_value: null
        computed_fill_value: null
    threshold: 0.5
model_type: ecd
trainer:
    validation_field: label
    validation_metric: roc_auc
    early_stop: 5
    skip_all_evaluation: false
    enable_profiling: false
    profiler:
        wait: 1
        warmup: 1
        active: 3
        repeat: 5
        skip_first: 0
    learning_rate: 0.001
    learning_rate_scheduler:
        decay: null
        decay_rate: 0.96
        decay_steps: 10000
        staircase: false
        reduce_on_plateau: 0
        reduce_on_plateau_patience: 10
        reduce_on_plateau_rate: 0.1
        warmup_evaluations: 0
        warmup_fraction: 0.0
        reduce_eval_metric: loss
        reduce_eval_split: training
        t_0: null
        t_mult: 1
        eta_min: 0
    epochs: 100
    checkpoints_per_epoch: 0
    train_steps: null
    eval_steps: null
    steps_per_checkpoint: 0
    effective_batch_size: auto
    batch_size: auto
    max_batch_size: 1099511627776
    gradient_accumulation_steps: auto
    eval_batch_size: null
    evaluate_training_set: false
    optimizer:
        type: adam
        betas:
        - 0.9
        - 0.999
        eps: 1.0e-08
        weight_decay: 0.0
        amsgrad: false
    regularization_type: l2
    regularization_lambda: 0.0
    should_shuffle: true
    increase_batch_size_on_plateau: 0
    increase_batch_size_on_plateau_patience: 5
    increase_batch_size_on_plateau_rate: 2.0
    increase_batch_size_eval_metric: loss
    increase_batch_size_eval_split: training
    gradient_clipping:
        clipglobalnorm: 0.5
        clipnorm: null
        clipvalue: null
    learning_rate_scaling: linear
    bucketing_field: null
    use_mixed_precision: false
    compile: false
    enable_gradient_checkpointing: false
preprocessing:
    sample_ratio: 1.0
    sample_size: null
    oversample_minority: null
    undersample_majority: null
    split:
        type: random
        probabilities:
        - 0.7
        - 0.1
        - 0.2
    global_max_sequence_length: null
defaults:
    audio:
        preprocessing:
            audio_file_length_limit_in_s: 7.5
            missing_value_strategy: bfill
            fill_value: null
            computed_fill_value: null
            in_memory: true
            padding_value: 0.0
            norm: null
            type: fbank
            window_length_in_s: 0.04
            window_shift_in_s: 0.02
            num_fft_points: null
            window_type: hamming
            num_filter_bands: 80
        encoder:
            type: parallel_cnn
            skip: false
            dropout: 0.0
            activation: relu
            max_sequence_length: null
            representation: dense
            vocab: null
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            should_embed: true
            embedding_size: 256
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
            reduce_output: sum
            num_conv_layers: null
            conv_layers: null
            num_filters: 256
            filter_size: 3
            pool_function: max
            pool_size: null
            output_size: 256
            norm: null
            norm_params: null
            num_fc_layers: null
            fc_layers: null
    bag:
        preprocessing:
            tokenizer: space
            missing_value_strategy: fill_with_const
            fill_value: <UNK>
            computed_fill_value: <UNK>
            lowercase: false
            most_common: 10000
        encoder:
            type: embed
            skip: false
            dropout: 0.0
            activation: relu
            vocab: null
            representation: dense
            embedding_size: 50
            force_embedding_size: false
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            output_size: 10
            norm: null
            norm_params: null
            num_fc_layers: 0
            fc_layers: null
    binary:
        decoder:
            type: regressor
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: null
            use_bias: true
            weights_initializer: xavier_uniform
            bias_initializer: zeros
        loss:
            type: binary_weighted_cross_entropy
            weight: 1.0
            positive_class_weight: null
            robust_lambda: 0
            confidence_penalty: 0
        preprocessing:
            missing_value_strategy: fill_with_false
            fallback_true_label: null
            fill_value: null
            computed_fill_value: null
        encoder:
            type: passthrough
            skip: false
    category:
        decoder:
            type: classifier
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: null
            num_classes: null
            use_bias: true
            weights_initializer: xavier_uniform
            bias_initializer: zeros
        loss:
            type: softmax_cross_entropy
            weight: 1.0
            class_weights: null
            robust_lambda: 0
            confidence_penalty: 0
            class_similarities: null
            class_similarities_temperature: 0
        preprocessing:
            missing_value_strategy: fill_with_const
            fill_value: <UNK>
            computed_fill_value: <UNK>
            lowercase: false
            most_common: 10000
            cache_encoder_embeddings: false
        encoder:
            type: dense
            skip: false
            dropout: 0.0
            vocab: null
            embedding_initializer: null
            embedding_size: 50
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
    date:
        preprocessing:
            missing_value_strategy: fill_with_const
            fill_value: ''
            computed_fill_value: ''
            datetime_format: null
        encoder:
            type: embed
            skip: false
            dropout: 0.0
            activation: relu
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            embedding_size: 10
            embeddings_on_cpu: false
            output_size: 10
            norm: null
            norm_params: null
            num_fc_layers: 0
            fc_layers: null
    h3:
        preprocessing:
            missing_value_strategy: fill_with_const
            fill_value: 576495936675512319
            computed_fill_value: 576495936675512319
        encoder:
            type: embed
            skip: false
            dropout: 0.0
            activation: relu
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            embedding_size: 10
            embeddings_on_cpu: false
            reduce_output: sum
            output_size: 10
            norm: null
            norm_params: null
            num_fc_layers: 0
            fc_layers: null
    image:
        decoder:
            type: unet
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: 1024
            height: null
            width: null
            num_channels: null
            conv_norm: batch
            num_classes: null
        loss:
            type: softmax_cross_entropy
            weight: 1.0
            class_weights: null
            robust_lambda: 0
            confidence_penalty: 0
            class_similarities: null
            class_similarities_temperature: 0
        preprocessing:
            missing_value_strategy: bfill
            fill_value: null
            computed_fill_value: null
            height: null
            width: null
            num_channels: null
            resize_method: interpolate
            infer_image_num_channels: true
            infer_image_dimensions: true
            infer_image_max_height: 256
            infer_image_max_width: 256
            infer_image_sample_size: 100
            standardize_image: null
            in_memory: true
            num_processes: 1
            requires_equal_dimensions: false
            num_classes: null
            infer_image_num_classes: false
        encoder:
            type: stacked_cnn
            skip: false
            conv_dropout: 0.0
            conv_activation: relu
            height: null
            width: null
            num_channels: null
            out_channels: 32
            kernel_size: 3
            stride: 1
            padding_mode: zeros
            padding: valid
            dilation: 1
            groups: 1
            pool_function: max
            pool_kernel_size: 2
            pool_stride: null
            pool_padding: 0
            pool_dilation: 1
            output_size: 128
            conv_use_bias: true
            conv_norm: null
            conv_norm_params: null
            num_conv_layers: null
            conv_layers: null
            fc_dropout: 0.0
            fc_activation: relu
            fc_use_bias: true
            fc_bias_initializer: zeros
            fc_weights_initializer: xavier_uniform
            fc_norm: null
            fc_norm_params: null
            num_fc_layers: 1
            fc_layers: null
        augmentation: []
    number:
        decoder:
            type: regressor
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: null
            use_bias: true
            weights_initializer: xavier_uniform
            bias_initializer: zeros
        loss:
            type: mean_squared_error
            weight: 1.0
        preprocessing:
            missing_value_strategy: fill_with_const
            fill_value: 0.0
            computed_fill_value: 0.0
            normalization: zscore
            outlier_strategy: null
            outlier_threshold: 3.0
            computed_outlier_fill_value: 0.0
        encoder:
            type: passthrough
            skip: false
    sequence:
        decoder:
            type: generator
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            vocab_size: null
            max_sequence_length: null
            cell_type: gru
            input_size: 256
            reduce_input: sum
            num_layers: 1
        loss:
            type: sequence_softmax_cross_entropy
            weight: 1.0
            class_weights: null
            robust_lambda: 0
            confidence_penalty: 0
            class_similarities: null
            class_similarities_temperature: 0
            unique: false
        preprocessing:
            tokenizer: space
            vocab_file: null
            sequence_length: null
            max_sequence_length: 256
            most_common: 20000
            padding_symbol: <PAD>
            unknown_symbol: <UNK>
            padding: right
            lowercase: false
            missing_value_strategy: fill_with_const
            fill_value: <UNK>
            computed_fill_value: <UNK>
            ngram_size: 2
            cache_encoder_embeddings: false
        encoder:
            type: embed
            skip: false
            dropout: 0.0
            max_sequence_length: null
            representation: dense
            vocab: null
            weights_initializer: uniform
            reduce_output: sum
            embedding_size: 256
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
    set:
        decoder:
            type: classifier
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: null
            num_classes: null
            use_bias: true
            weights_initializer: xavier_uniform
            bias_initializer: zeros
        loss:
            type: sigmoid_cross_entropy
            weight: 1.0
            class_weights: null
        preprocessing:
            tokenizer: space
            missing_value_strategy: fill_with_const
            fill_value: <UNK>
            computed_fill_value: <UNK>
            lowercase: false
            most_common: 10000
        encoder:
            type: embed
            skip: false
            dropout: 0.0
            activation: relu
            representation: dense
            vocab: null
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            embedding_size: 50
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
            output_size: 10
            norm: null
            norm_params: null
            num_fc_layers: 0
            fc_layers: null
    text:
        decoder:
            type: generator
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            vocab_size: null
            max_sequence_length: null
            cell_type: gru
            input_size: 256
            reduce_input: sum
            num_layers: 1
        loss:
            type: sequence_softmax_cross_entropy
            weight: 1.0
            class_weights: null
            robust_lambda: 0
            confidence_penalty: 0
            class_similarities: null
            class_similarities_temperature: 0
            unique: false
        preprocessing:
            pretrained_model_name_or_path: null
            tokenizer: space_punct
            vocab_file: null
            sequence_length: null
            max_sequence_length: 256
            most_common: 20000
            padding_symbol: <PAD>
            unknown_symbol: <UNK>
            padding: right
            lowercase: false
            missing_value_strategy: fill_with_const
            fill_value: <UNK>
            computed_fill_value: <UNK>
            ngram_size: 2
            cache_encoder_embeddings: false
            compute_idf: false
            prompt:
                template: null
                task: null
                retrieval:
                    type: null
                    index_name: null
                    model_name: null
                    k: 0
        encoder:
            type: parallel_cnn
            skip: false
            dropout: 0.0
            activation: relu
            max_sequence_length: null
            representation: dense
            vocab: null
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            should_embed: true
            embedding_size: 256
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
            reduce_output: sum
            num_conv_layers: null
            conv_layers: null
            num_filters: 256
            filter_size: 3
            pool_function: max
            pool_size: null
            output_size: 256
            norm: null
            norm_params: null
            num_fc_layers: null
            fc_layers: null
    timeseries:
        decoder:
            type: projector
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: null
            output_size: null
            use_bias: true
            weights_initializer: xavier_uniform
            bias_initializer: zeros
            activation: null
            multiplier: 1.0
            clip: null
        loss:
            type: huber
            weight: 1.0
            delta: 1.0
        preprocessing:
            tokenizer: space
            timeseries_length_limit: 256
            padding_value: 0.0
            padding: right
            missing_value_strategy: fill_with_const
            fill_value: ''
            computed_fill_value: ''
            window_size: 0
        encoder:
            type: parallel_cnn
            skip: false
            dropout: 0.0
            activation: relu
            max_sequence_length: null
            representation: dense
            vocab: null
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            should_embed: true
            embedding_size: 256
            embeddings_on_cpu: false
            embeddings_trainable: true
            pretrained_embeddings: null
            reduce_output: sum
            num_conv_layers: null
            conv_layers: null
            num_filters: 256
            filter_size: 3
            pool_function: max
            pool_size: null
            output_size: 256
            norm: null
            norm_params: null
            num_fc_layers: null
            fc_layers: null
    vector:
        decoder:
            type: projector
            fc_layers: null
            num_fc_layers: 0
            fc_output_size: 256
            fc_use_bias: true
            fc_weights_initializer: xavier_uniform
            fc_bias_initializer: zeros
            fc_norm: null
            fc_norm_params: null
            fc_activation: relu
            fc_dropout: 0.0
            input_size: null
            output_size: null
            use_bias: true
            weights_initializer: xavier_uniform
            bias_initializer: zeros
            activation: null
            multiplier: 1.0
            clip: null
        loss:
            type: mean_squared_error
            weight: 1.0
        preprocessing:
            vector_size: null
            missing_value_strategy: fill_with_const
            fill_value: ''
            computed_fill_value: ''
        encoder:
            type: dense
            skip: false
            dropout: 0.0
            activation: relu
            input_size: null
            output_size: 256
            use_bias: true
            bias_initializer: zeros
            weights_initializer: xavier_uniform
            norm: null
            norm_params: null
            num_layers: 1
            fc_layers: null
hyperopt: null
backend: null
ludwig_version: 0.10.3
combiner:
    type: concat
    dropout: 0.0
    activation: relu
    flatten_inputs: false
    residual: false
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    num_fc_layers: 0
    output_size: 256
    norm: null
    norm_params: null
    fc_layers: null