⇅ 图像特征

输入的图像特征会被转换成大小为 N x C x H x W 的浮点张量（其中 N 是数据集大小，C 是通道数，H x W 是图像的高度和宽度，可由用户指定）。这些张量会被添加到 HDF5 中，其键反映数据集中的列名。

列名被添加到 JSON 文件中，并附带一个包含有关调整大小信息的预处理字典。

支持的图像格式¶

图像中的通道数由图像格式决定。下表列出了支持的图像格式及其通道数。

格式	通道数
灰度图	1
带 Alpha 通道的灰度图	2
RGB	3
带 Alpha 通道的 RGB	4

预处理¶

在预处理过程中，原始图像文件被转换成 numpy 数组并保存为 hdf5 格式。

注意

传递给图像编码器的图像应具有相同的大小。如果图像大小不同，默认情况下会将其大小调整为数据集中第一张图像的尺寸。此外，可以在特征预处理参数中指定 resize_method 以及目标 width 和 height，在这种情况下，所有图像都将调整到指定的目标大小。

preprocessing:
    missing_value_strategy: bfill
    fill_value: null
    height: null
    width: null
    num_channels: null
    num_processes: 1
    num_classes: null
    resize_method: interpolate
    infer_image_num_channels: true
    infer_image_dimensions: true
    infer_image_max_height: 256
    infer_image_max_width: 256
    infer_image_sample_size: 100
    standardize_image: null
    in_memory: true
    requires_equal_dimensions: false
    infer_image_num_classes: false

参数

missing_value_strategy（默认：bfill） : 当图像列中存在缺失值时采用的策略。选项：fill_with_const, fill_with_mode, bfill, ffill, drop_row。
fill_value（默认：null）：要考虑的最常见 token 的最大数量。如果数据包含超过此数量的 token，则最不常见的 token 将被视为未知。
height（默认：null）：图像高度（像素）。如果设置此参数，将使用 resize_method 参数将图像调整到指定高度。如果为 None，图像将调整为数据集中第一张图像的大小。
width（默认：null）：图像宽度（像素）。如果设置此参数，将使用 resize_method 参数将图像调整到指定宽度。如果为 None，图像将调整为数据集中第一张图像的大小。
num_channels（默认：null）：图像中的通道数。如果指定，将以指定的通道数模式读取图像。如果未指定，将从数据集中第一个有效图像的格式推断通道数。
num_processes（默认：1）：指定用于预处理图像的进程数。
num_classes（默认：null）：图像中的通道类别数。如果指定，此值将与推断的类别数进行验证。使用 2 将灰度图像转换为二值图像。
resize_method（默认：interpolate）：用于调整图像大小的方法。选项：crop_or_pad, interpolate。
infer_image_num_channels（默认：true）：如果为 true，则从数据集中第一张图像的样本推断通道数。选项：true, false。
infer_image_dimensions（默认：true）：如果为 true，则从数据集中第一张图像的样本推断图像的高度和宽度。不符合这些尺寸的图像将根据 resize_method 进行调整大小。如果设置为 false，则图像的高度和宽度将由用户指定。选项：true, false。
infer_image_max_height（默认：256）：如果设置了 infer_image_dimensions，此值用作数据集中图像的最大高度。
infer_image_max_width（默认：256）：如果设置了 infer_image_dimensions，此值用作数据集中图像的最大宽度。
infer_image_sample_size（默认：100）：用于在 infer_image_dimensions 中推断图像尺寸的样本大小。
standardize_image（默认：null）：通过每通道均值归零和标准差缩放对图像进行标准化。选项：imagenet1k, null。
in_memory（默认：true）：定义图像数据集在训练过程中是驻留在内存中，还是动态地从磁盘读取（适用于大型数据集）。如果是后者，每个训练迭代都会从磁盘读取一批输入图像。选项：true, false。
requires_equal_dimensions（默认：false）：如果为 true，则宽度和高度必须相等。选项：true, false。
infer_image_num_classes（默认：false）：如果为 true，则从数据集中第一张图像的样本推断通道类别数。每个唯一的通道值将映射到一个类别，并且预处理将根据通道类别创建一个掩码图像。选项：true, false。

预处理参数也可以在类型全局预处理部分中定义一次，并应用于所有图像输入特征。

输入特征¶

在特征级别指定的编码器参数包括：

tied（默认 null）：用于绑定编码器权重的另一个输入特征的名称。它必须是相同类型且具有相同编码器参数的特征名称。
augmentation（默认 False）：指定用于生成合成训练数据的图像数据增强操作。有关图像增强的更多详细信息可在此处找到此处。

输入特征列表中图像特征的示例条目

name: image_column_name
type: image
tied: null
encoder: 
    type: stacked_cnn

可用的编码器参数包括：

type（默认 stacked_cnn）：可能的值包括 stacked_cnn, resnet, mlp_mixer, vit, 和 TorchVision 预训练图像分类模型。

编码器类型和编码器参数也可以在类型全局编码器部分中定义一次，并应用于所有图像输入特征。

编码器¶

卷积堆栈编码器 (`stacked_cnn`)¶

由二维卷积层堆栈组成，可选地包含归一化、dropout 和下采样池化层，之后是一个可选的完全连接层堆栈。

卷积堆栈编码器接受以下可选参数：

encoder:
    type: stacked_cnn
    conv_dropout: 0.0
    output_size: 128
    num_conv_layers: null
    out_channels: 32
    conv_norm: null
    fc_norm: null
    fc_norm_params: null
    conv_activation: relu
    kernel_size: 3
    stride: 1
    padding_mode: zeros
    padding: valid
    dilation: 1
    groups: 1
    pool_function: max
    pool_kernel_size: 2
    pool_stride: null
    pool_padding: 0
    pool_dilation: 1
    conv_norm_params: null
    conv_layers: null
    fc_dropout: 0.0
    fc_activation: relu
    fc_use_bias: true
    fc_bias_initializer: zeros
    fc_weights_initializer: xavier_uniform
    num_fc_layers: 1
    fc_layers: null
    num_channels: null
    conv_use_bias: true

参数

conv_dropout（默认：0.0） : Dropout 率
output_size（默认：128）：如果 fc_layers 中未指定 output_size，则这是将用于每一层的默认 output_size。它表示全连接层的输出大小。
num_conv_layers（默认：null） : 编码器中使用的卷积层数。
out_channels（默认：32）：表示滤波器数量，进而表示二维卷积的输出通道数。如果 conv_layers 中未指定 out_channels，则这是将用于每一层的默认 out_channels。
conv_norm（默认：null）：如果在 conv_layers 中未指定归一化，则这是将用于每一层的默认归一化。它表示应用于激活的归一化，可以是 null、batch 或 layer。选项：batch, layer, null。
fc_norm（默认：null）：如果在 fc_layers 中未指定归一化，则这是将用于每一层的默认归一化。它表示输出的归一化，可以是 null、batch 或 layer。选项：batch, layer, null。
fc_norm_params（默认：null）：如果 norm 是 batch 或 layer，则使用的参数。有关与 batch 相关的参数信息，请参阅 Torch 关于批量归一化的文档；有关与 layer 相关的参数信息，请参阅 Torch 关于层归一化的文档。
conv_activation（默认：relu）：如果在 conv_layers 中未指定激活函数，则这是将用于每一层的默认激活函数。它表示应用于输出的激活函数。选项：elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
kernel_size（默认：3）：指定核大小的整数或整数对。单个整数指定方形核，整数对按顺序 (h, w) 指定核的高度和宽度。如果在 conv_layers 中未指定 kernel_size，则这是将用于每一层的 kernel_size。
stride（默认：1）：指定沿高度和宽度进行卷积的步长的整数或整数对。如果在 conv_layers 中未指定步长，则指定将用于每一层的二维卷积核的默认步长。
padding_mode（默认：zeros）：如果在 conv_layers 中未指定 padding_mode，则指定将用于每一层的二维卷积核的默认 padding_mode。选项：zeros, reflect, replicate, circular。
padding（默认：valid）：指定用于卷积核的填充的整数、整数对 (h, w) 或 ['valid', 'same'] 之一。
dilation（默认：1）：指定用于空洞卷积的空洞率的整数或整数对。如果在 conv_layers 中未指定空洞率，则指定将用于每一层的二维卷积核的默认空洞率。
groups（默认：1）：Groups 控制卷积输入和输出之间的连接。当 groups = 1 时，每个输出通道依赖于每个输入通道。当 groups > 1 时，输入和输出通道被分成独立的组，每个输出通道仅依赖于其各自输入通道组中的输入。in_channels 和 out_channels 都必须能被 groups 整除。
pool_function（默认：max）：使用的池化函数。选项：max, average, avg, mean。
pool_kernel_size（默认：2）：指定池化大小的整数或整数对。如果在 conv_layers 中未指定 pool_kernel_size，则这是将用于每一层的默认值。
pool_stride（默认：null）：指定池化步长的整数或整数对，这是池化层对特征图进行下采样的因子。默认为 pool_kernel_size。
pool_padding（默认：0）：指定池化填充 (h, w) 的整数或整数对。
pool_dilation（默认：1）：指定池化空洞率 (h, w) 的整数或整数对。
conv_norm_params（默认：null）：如果 conv_norm 是 batch 或 layer，则使用的参数。
conv_layers（默认：null）：编码器中使用的卷积层列表。
fc_dropout（默认：0.0）：Dropout 率
fc_activation（默认：relu）：如果在 fc_layers 中未指定激活函数，则这是将用于每一层的默认激活函数。它表示应用于输出的激活函数。选项：elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
fc_use_bias（默认：true）：层是否使用偏置向量。选项：true, false。
fc_bias_initializer（默认：zeros）：偏置向量初始化器。选项：constant, dirac, eye, identity, kaiming_normal, kaiming_uniform, normal, ones, orthogonal, sparse, uniform, xavier_normal, xavier_uniform, zeros。
fc_weights_initializer（默认：xavier_uniform）：权重矩阵初始化器。选项：constant, dirac, eye, identity, kaiming_normal, kaiming_uniform, normal, ones, orthogonal, sparse, uniform, xavier_normal, xavier_uniform, zeros。
num_fc_layers（默认：1）：堆叠的全连接层数。
fc_layers（默认：null）：包含所有全连接层参数的字典列表。列表的长度决定了堆叠的全连接层数，每个字典的内容决定了特定层的参数。每一层可用的参数有：activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer 和 weights_initializer。如果字典中缺少任何这些值，将使用编码器参数中指定的默认值。
num_channels（默认：null）：编码器中使用的通道数。
conv_use_bias（默认：true）：如果在 conv_layers 中未指定偏置，则指定二维卷积核是否应具有偏置项。选项：true, false。

MLP-Mixer 编码器¶

使用 MLP-Mixer 对图像进行编码，如 MLP-Mixer: An all-MLP Architecture for Vision 中所述。MLP-Mixer 将图像分成大小相等的块，对每个块应用全连接层以计算每个块的表示（tokens），并使用全连接混合器层组合这些表示。

MLP-Mixer 编码器接受以下可选参数：

encoder:
    type: mlp_mixer
    dropout: 0.0
    num_layers: 8
    patch_size: 16
    num_channels: null
    embed_size: 512
    token_size: 2048
    channel_dim: 256
    avg_pool: true

参数

dropout（默认：0.0） : Dropout 率。
num_layers（默认：8） : 网络的深度（Mixer 块的数量）。
patch_size（默认：16）：图像块大小。每个块包含 patch_size² 像素。必须能够整除图像的宽度和高度。
num_channels（默认：null）：编码器中使用的通道数。
embed_size（默认：512）：块嵌入大小，如果 avg_pool 为 true，则是混合器的输出大小。
token_size（默认：2048）：每个块的嵌入大小。
channel_dim（默认：256）：隐藏层中的通道数。
avg_pool（默认：true）：如果为 true，则在块维度上进行池化，输出形状为 (embed_size) 的向量。如果为 false，则输出张量的形状为 (n_patches, embed_size)，其中 n_patches 为 img_height x img_width / patch_size²。选项：true, false。

TorchVision 预训练模型编码器¶

有二十个 TorchVision 预训练图像分类模型可用作 Ludwig 图像编码器。可用的模型包括：

AlexNet
ConvNeXt
DenseNet
EfficientNet
EfficientNetV2
GoogLeNet
Inception V3
MaxVit
MNASNet
MobileNet V2
MobileNet V3
RegNet
ResNet
ResNeXt
ShuffleNet V2
SqueezeNet
SwinTransformer
VGG
VisionTransformer
Wide ResNet

有关更多详细信息，请参阅 TorchVision 文档。

TorchVision 预训练模型的 Ludwig 编码器参数

AlexNet¶

encoder:
    type: alexnet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: base

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：base）：要使用的预训练模型变体。选项：base。

ConvNeXt¶

encoder:
    type: convnext
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: base

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：base）：要使用的预训练模型变体。选项：tiny, small, base, large。

DenseNet¶

encoder:
    type: densenet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: 121

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：121）：要使用的预训练模型变体。选项：121, 161, 169, 201。

EfficientNet¶

encoder:
    type: efficientnet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: b0

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：b0）：要使用的预训练模型变体。选项：b0, b1, b2, b3, b4, b5, b6, b7, v2_s, v2_m, v2_l。

GoogLeNet¶

encoder:
    type: googlenet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: base

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：base）：要使用的预训练模型变体。选项：base。

Inception V3¶

encoder:
    type: inceptionv3
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: base

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：base）：要使用的预训练模型变体。选项：base。

MaxVit¶

encoder:
    type: maxvit
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: t

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：t）：要使用的预训练模型变体。选项：t。

MNASNet¶

encoder:
    type: mnasnet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: '0_5'

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：0_5）：要使用的预训练模型变体。选项：0_5, 0_75, 1_0, 1_3。

MobileNet V2¶

encoder:
    type: mobilenetv2
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: base

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：base）：要使用的预训练模型变体。选项：base。

MobileNet V3¶

encoder:
    type: mobilenetv3
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: small

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：small）：要使用的预训练模型变体。选项：small, large。

RegNet¶

encoder:
    type: regnet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: x_1_6gf

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：x_1_6gf）：要使用的预训练模型变体。选项：x_1_6gf, x_16gf, x_32gf, x_3_2gf, x_400mf, x_800mf, x_8gf, y_128gf, y_16gf, y_1_6gf, y_32gf, y_3_2gf, y_400mf, y_800mf, y_8gf。

ResNet¶

encoder:
    type: resnet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: 50

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：50）：要使用的预训练模型变体。选项：18, 34, 50, 101, 152。

ResNeXt¶

encoder:
    type: resnext
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: 50_32x4d

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：50_32x4d）：要使用的预训练模型变体。选项：50_32x4d, 101_32x8d, 101_64x4d。

ShuffleNet V2¶

encoder:
    type: shufflenet_v2
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: x0_5

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：x0_5）：要使用的预训练模型变体。选项：x0_5, x1_0, x1_5, x2_0。

SqueezeNet¶

encoder:
    type: squeezenet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: '1_0'

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：1_0）：要使用的预训练模型变体。选项：1_0, 1_1。

SwinTransformer¶

encoder:
    type: swin_transformer
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: t

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：t）：要使用的预训练模型变体。选项：t, s, b。

VGG¶

encoder:
    type: vgg
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: 11

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：11）：要使用的预训练模型变体。

VisionTransformer¶

encoder:
    type: vit
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: b_16

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：b_16）：要使用的预训练模型变体。选项：b_16, b_32, l_16, l_32, h_14。

Wide ResNet¶

encoder:
    type: wide_resnet
    use_pretrained: true
    trainable: true
    model_cache_dir: null
    model_variant: '50_2'

参数

use_pretrained（默认：true） : 从预训练模型下载模型权重。选项：true, false。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
model_cache_dir（默认：null）：缓存预训练模型权重的目录路径。
model_variant（默认：50_2）：要使用的预训练模型变体。选项：50_2, 101_2。

注意:

目前 Ludwig 仅支持 DEFAULT 预训练权重，这是特定模型最佳可用权重。有关 DEFAULT 权重的更多详细信息，请参阅此博客文章。
一些 TorchVision 预训练模型消耗大量内存。以下 model_variant 需要超过 12GB 内存：
efficientnet_torch: b7
regnet_torch: y_128gf
vit_torch: h_14

U-Net 编码器¶

U-Net 编码器基于 U-Net: Convolutional Networks for Biomedical Image Segmentation。编码器实现了 U-Net 堆栈的收缩下采样路径。

U-Net 编码器接受以下可选参数：

encoder:
    type: unet
    conv_norm: batch

参数

conv_norm（默认：batch）：这是将用于每个双卷积层的默认归一化。它可以是 null 或 batch。选项：batch, null。

已废弃的编码器（计划在 v0.8 中移除）¶

旧版 ResNet 编码器¶

已废弃：此编码器已废弃，将在未来版本中移除。请改用等效的 TorchVision ResNet 编码器。

实现了 Identity Mappings in Deep Residual Networks 中描述的 ResNet V2。

ResNet 编码器接受以下可选参数：

encoder:
    type: _resnet_legacy
    dropout: 0.0
    output_size: 128
    activation: relu
    norm: null
    first_pool_kernel_size: null
    first_pool_stride: null
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    norm_params: null
    num_fc_layers: 1
    fc_layers: null
    resnet_size: 50
    num_channels: null
    out_channels: 32
    kernel_size: 3
    conv_stride: 1
    batch_norm_momentum: 0.9
    batch_norm_epsilon: 0.001

参数

dropout（默认：0.0） : Dropout 率
output_size（默认：128）：如果 fc_layers 中未指定 output_size，则这是将用于每一层的默认 output_size。它表示全连接层的输出大小。
activation（默认：relu）：如果在 fc_layers 中未指定激活函数，则这是将用于每一层的默认激活函数。它表示应用于输出的激活函数。选项：elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
norm（默认：null）：如果在 fc_layers 中未指定归一化，则这是将用于每一层的默认归一化。它表示输出的归一化，可以是 null、batch 或 layer。选项：batch, layer, null。
first_pool_kernel_size（默认：null）：用于第一个池化层的池化大小。如果为 None，则跳过第一个池化层。
first_pool_stride（默认：null）：第一个池化层的步长。如果为 null，则默认为 first_pool_kernel_size。
use_bias（默认：true）：层是否使用偏置向量。选项：true, false。
bias_initializer（默认：zeros）：偏置向量初始化器。选项：constant, dirac, eye, identity, kaiming_normal, kaiming_uniform, normal, ones, orthogonal, sparse, uniform, xavier_normal, xavier_uniform, zeros。
weights_initializer（默认：xavier_uniform）：权重矩阵初始化器。选项：constant, dirac, eye, identity, kaiming_normal, kaiming_uniform, normal, ones, orthogonal, sparse, uniform, xavier_normal, xavier_uniform, zeros。
norm_params（默认：null）：如果 norm 是 batch 或 layer，则使用的参数。有关与 batch 相关的参数信息，请参阅 Torch 关于批量归一化的文档；有关与 layer 相关的参数信息，请参阅 Torch 关于层归一化的文档。
num_fc_layers（默认：1）：堆叠的全连接层数。
fc_layers（默认：null）：包含所有全连接层参数的字典列表。列表的长度决定了堆叠的全连接层数，每个字典的内容决定了特定层的参数。每一层可用的参数有：activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer 和 weights_initializer。如果字典中缺少任何这些值，将使用编码器参数中指定的默认值。
resnet_size（默认：50）：要使用的 ResNet 模型大小。
num_channels（默认：null）：编码器中使用的通道数。
out_channels（默认：32）：表示滤波器数量，进而表示二维卷积的输出通道数。如果 conv_layers 中未指定 out_channels，则这是将用于每一层的默认 out_channels。
kernel_size（默认：3）：指定核大小的整数或整数对。单个整数指定方形核，整数对按顺序 (h, w) 指定核的高度和宽度。如果在 conv_layers 中未指定 kernel_size，则这是将用于每一层的 kernel_size。
conv_stride（默认：1）：指定初始卷积层步长的整数或整数对。
batch_norm_momentum（默认：0.9）：批量归一化运行统计数据的动量。
batch_norm_epsilon（默认：0.001）：批量归一化的 epsilon。

旧版 Vision Transformer 编码器¶

已废弃：此编码器已废弃，将在未来版本中移除。请改用等效的 TorchVision VisionTransformer 编码器。

使用 Vision Transformer 对图像进行编码，如 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 中所述。

Vision Transformer 将图像分成大小相等的块，使用线性变换对每个展平的块进行编码，然后将深度 transformer 架构应用于编码块序列。

Vision Transformer 编码器接受以下可选参数：

encoder:
    type: _vit_legacy
    hidden_dropout_prob: 0.1
    attention_probs_dropout_prob: 0.1
    trainable: true
    use_pretrained: true
    pretrained_model: google/vit-base-patch16-224
    hidden_size: 768
    hidden_act: gelu
    patch_size: 16
    initializer_range: 0.02
    num_hidden_layers: 12
    num_attention_heads: 12
    intermediate_size: 3072
    layer_norm_eps: 1.0e-12
    gradient_checkpointing: false

参数

hidden_dropout_prob（默认：0.1） : 嵌入、编码器和池化中所有全连接层的 dropout 率。
attention_probs_dropout_prob（默认：0.1） : 注意力权重的 dropout 率。
trainable（默认：true） : 编码器是否可训练。选项：true, false。
use_pretrained（默认：true） : 使用 Hugging Face 的预训练模型权重。选项：true, false。
pretrained_model（默认：google/vit-base-patch16-224） : 要使用的预训练模型名称。
hidden_size（默认：768）：编码器层和池化层的维度。
hidden_act（默认：gelu）：隐藏层激活函数，可以是 gelu, relu, selu 或 gelu_new 之一。选项：relu, gelu, selu, gelu_new。
patch_size（默认：16）：图像块大小。每个块包含 patch_size² 像素。必须能够整除图像的宽度和高度。
initializer_range（默认：0.02）：用于初始化所有权重矩阵的 truncated_normal_initializer 的标准差。
num_hidden_layers（默认：12）：Transformer 编码器中的隐藏层数。
num_attention_heads（默认：12）：每个注意力层中的注意力头数。
intermediate_size（默认：3072）：Transformer 编码器中中间层（即前馈层）的维度。
layer_norm_eps（默认：1e-12）：层归一化层使用的 epsilon 值。
gradient_checkpointing（默认：false）：选项：true, false。

图像数据增强¶

图像数据增强是一种通过对图像应用随机变换来增加训练数据集多样性的技术。目标是训练一个对训练数据变异具有鲁棒性的模型。

图像数据增强通过图像特征配置中的 augmentation 部分指定，可以通过以下方式之一指定：

布尔值：False（默认） 不对图像应用数据增强。

augmentation: False

布尔值：True 对图像应用以下数据增强方法：random_horizontal_flip 和 random_rotate。

augmentation: True

数据增强方法列表 按用户指定的顺序对图像应用以下一种或多种数据增强方法：random_horizontal_flip, random_vertical_flip, random_rotate, random_blur, random_brightness, 和 random_contrast。以下是一个说明性示例。

augmentation:
    - type: random_horizontal_flip
    - type: random_vertical_flip
    - type: random_rotate
      degree: 10
    - type: random_blur
      kernel_size: 3
    - type: random_brightness
      min: 0.5
      max: 2.0
    - type: random_contrast
      min: 0.5
      max: 2.0

数据增强仅应用于训练集中的图像批次。验证集和测试集不进行数据增强。

以下示例说明了数据增强如何影响图像

Original Image

水平翻转：图像随机水平翻转。

type: random_horizontal_flip

Horizontal Flip

垂直翻转：图像随机垂直翻转。

type: random_vertical_flip

Vertical Flip

旋转：图像随机旋转，角度范围为 [-degree, +degree]。degree 必须是正整数。

type: random_rotate
degree: 15

参数

degree（默认：15）：随机旋转的角度范围，即 [-degree, +degree]。

以下显示了旋转图像的效果

Rotate Image

模糊：使用用户指定的核大小的高斯滤波器随机模糊图像。kernel_size 必须是正的奇数整数。

type: random_blur
kernel_size: 3

参数

kernel_size（默认：3）：随机模糊的核大小。

以下显示了使用不同核大小模糊图像的效果

Blur Image

调整亮度：通过在范围 [min, max] 中随机选择一个因子来调整图像亮度。min 和 max 都必须是大于 0 的浮点数，且 min 小于 max。

type: random_brightness
min: 0.5
max: 2.0

参数

min（默认：0.5）：随机亮度的最小因子。
max（默认：2.0）：随机亮度的最大因子。

以下显示了使用不同因子调整亮度的效果

Adjust Brightness

调整对比度：通过在范围 [min, max] 中随机选择一个因子来调整图像对比度。min 和 max 都必须是大于 0 的浮点数，且 min 小于 max。

type: random_contrast
min: 0.5
max: 2.0

参数

min（默认：0.5）：随机亮度的最小因子。
max（默认：2.0）：随机亮度的最大因子。

以下显示了使用不同因子调整对比度的效果

Adjust Contrast

带数据增强的图像特征配置示例

name: image_column_name
type: image
encoder: 
    type: resnet
    model_variant: 18
    use_pretrained: true
    pretrained_cache_dir: None
    trainable: true
augmentation: false

name: image_column_name
type: image
encoder: 
    type: stacked_cnn
augmentation: true

name: image_column_name
type: image
encoder: 
    type: alexnet
augmentation:
    - type: random_horizontal_flip
    - type: random_rotate
      degree: 10
    - type: random_blur
      kernel_size: 3
    - type: random_brightness
      min: 0.5
      max: 2.0
    - type: random_contrast
      min: 0.5
      max: 2.0
    - type: random_vertical_flip

输出特征¶

在需要执行语义分割时可以使用图像特征。图像特征只有一个可用的解码器：unet。

使用默认参数的图像输出特征示例

name: image_column_name
type: image
reduce_input: sum
dependencies: []
reduce_dependencies: sum
loss:
    type: softmax_cross_entropy
decoder:
    type: unet

参数

reduce_input（默认 sum）：定义如何对非向量（矩阵或更高阶张量）输入在第一维（如果计算批次维度，则为第二维）上进行归约。可用值包括：sum, mean 或 avg, max, concat（沿第一维连接）, last（返回第一维的最后一个向量）。
dependencies（默认 []）：此输出特征所依赖的输出特征。有关详细说明，请参阅输出特征依赖关系。
reduce_dependencies（默认 sum）：定义如何对非向量（矩阵或更高阶张量）依赖特征的输出在第一维（如果计算批次维度，则为第二维）上进行归约。可用值包括：sum, mean 或 avg, max, concat（沿第一维连接）, last（返回第一维的最后一个向量）。
loss（默认 {type: softmax_cross_entropy}）：是一个包含损失 type 的字典。softmax_cross_entropy 是图像输出特征唯一支持的损失类型。有关详细信息，请参阅损失。
decoder（默认：{"type": "unet"}）：用于所需任务的解码器。选项：unet。有关详细信息，请参阅解码器。

解码器¶

U-Net 解码器¶

U-Net 解码器基于 U-Net: Convolutional Networks for Biomedical Image Segmentation。解码器实现了 U-Net 堆栈的扩展上采样路径。语义分割支持一个输入特征和一个输出特征。解码器和合并器部分的 num_fc_layers 必须设置为 0，因为 U-Net 没有全连接层。

U-Net 解码器接受以下可选参数：

decoder:
    type: unet
    num_fc_layers: 0
    fc_output_size: 256
    fc_norm: null
    fc_dropout: 0.0
    fc_activation: relu
    conv_norm: batch
    fc_layers: null
    fc_use_bias: true
    fc_weights_initializer: xavier_uniform
    fc_bias_initializer: zeros
    fc_norm_params: null

参数

num_fc_layers（默认：0） : 如果未指定 fc_layers，则表示全连接层数。增加层数可增加模型的容量，使其能够学习更复杂的特征交互。
fc_output_size（默认：256） : 全连接堆栈的输出大小。
fc_norm（默认：null） : 在全连接层开始时应用的默认归一化。选项：batch, layer, ghost, null。
fc_dropout（默认：0.0） : 应用于全连接层的默认 dropout 率。增加 dropout 是对抗过拟合的一种常见正则化形式。Dropout 表示将元素置零的概率（0.0 表示没有 dropout）。
fc_activation（默认：relu）：应用于全连接层输出的默认激活函数。选项：elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
conv_norm（默认：batch）：这是将用于每个双卷积层的默认归一化。它可以是 null 或 batch。选项：batch, null。
fc_layers（默认：null）：包含所有全连接层参数的字典列表。列表的长度决定了堆叠的全连接层数，每个字典的内容决定了特定层的参数。每一层可用的参数有：activation, dropout, norm, norm_params, output_size, use_bias, bias_initializer 和 weights_initializer。如果字典中缺少任何这些值，将使用作为独立参数提供的默认值。
fc_use_bias（默认：true）：fc_stack 中的层是否使用偏置向量。选项：true, false。
fc_weights_initializer（默认：xavier_uniform）：用于 fc_stack 中层的权重初始化器
fc_bias_initializer（默认：zeros）：用于 fc_stack 中层的偏置初始化器
fc_norm_params（默认：null）：传递给 norm 模块的默认参数。

解码器类型和解码器参数也可以在类型全局解码器部分中定义一次，并应用于所有图像输出特征。

损失¶

Softmax 交叉熵¶

loss:
    type: softmax_cross_entropy
    class_weights: null
    weight: 1.0
    robust_lambda: 0
    confidence_penalty: 0
    class_similarities: null
    class_similarities_temperature: 0

参数

class_weights（默认：null） : 应用于损失中每个类别的权重。如果未指定，则所有类别权重相等。该值可以是一个权重向量，每个类别对应一个权重，用于乘以该类别作为真实值的样本的损失。这是处理类别分布不平衡时的一种替代过采样的方法。向量的顺序遵循 JSON 元数据文件中类别到整数 ID 的映射（需要包含 <UNK> 类别）。另外，该值也可以是一个字典，以类别字符串为键，权重为值，例如 {class_a: 0.5, class_b: 0.7, ...}。
weight（默认：1.0）：损失的权重。
robust_lambda（默认：0）：用 (1 - robust_lambda) * loss + robust_lambda / c 替换损失，其中 c 是类别数。在存在噪声标签时非常有用。
confidence_penalty（默认：0）：通过向损失添加一个额外项来惩罚过于自信的预测（低熵），该额外项是 a * (max_entropy - entropy) / max_entropy，其中 a 是此参数的值。在存在噪声标签时非常有用。
class_similarities（默认：null）：如果不是 null，则是一个 c x c 矩阵（以列表的列表形式），其中包含类别之间的相互相似度。当 class_similarities_temperature 大于 0 时使用。向量的顺序遵循 JSON 元数据文件中类别到整数 ID 的映射（需要包含 <UNK> 类别）。
class_similarities_temperature（默认：0）：对 class_similarities 的每一行执行 softmax 的温度参数。softmax 的输出用于确定提供给每个数据点的监督向量，而不是通常提供的独热向量。其直觉在于，相似类别之间的错误比完全不同类别之间的错误更容易容忍。

损失和与损失相关的参数也可以在类型全局损失部分中定义一次，并应用于所有图像输出特征。

评估指标¶

每个 epoch 计算并可用于图像特征的度量包括 accuracy 和 loss。如果您将配置的 training 部分中的 validation_field 设置为类别特征的名称，则可以将其中任何一个设置为 validation_metric。