↑ H3 特征

H3 是一种用于表示地理空间数据的索引系统。有关更多详细信息，请参阅 https://eng.uber.com/h3。

预处理¶

Ludwig 会自动解析 H3 64 位编码格式。

preprocessing:
    missing_value_strategy: fill_with_const
    fill_value: 576495936675512319

参数

missing_value_strategy (默认值: fill_with_const) : 当 h3 列中存在缺失值时遵循的策略。选项: fill_with_const, fill_with_mode, bfill, ffill, drop_row。详情请参阅缺失值策略。
fill_value (默认值: 576495936675512319): 当 missing_value_strategy 为 fill_with_const 时，用于替换缺失值的值

预处理参数也可以在类型全局预处理部分中定义一次，并应用于所有 H3 输入特征。

输入特征¶

输入的 H3 特征被转换为大小为 N x 19 的整数张量（其中 N 是数据集大小，19 个维度代表 4 个 H3 分辨率参数 (4) - mode, edge, resolution, base cell - 以及 15 个单元格坐标值）。

在特征级别指定的编码器参数有

tied (默认值 null): 要与该编码器绑定权重的另一个输入特征的名称。它需要是与当前特征类型相同且编码器参数相同的特征的名称。

输入特征列表中的 H3 特征示例条目

name: h3_feature_name
type: h3
tied: null
encoder: 
    type: embed

可用的编码器参数有

type (默认值 embed): 可能的值为 embed, weighted_sum 和 rnn。

编码器类型和编码器参数也可以在类型全局编码器部分中定义一次，并应用于所有 H3 输入特征。

编码器¶

嵌入编码器¶

此编码器使用嵌入对 H3 表示的每个组成部分（mode、edge、resolution、base cell 和 children cells）进行编码。值为 0 的 children cells 将被屏蔽。嵌入后，所有嵌入被求和，并可选择地通过堆叠的全连接层。

encoder:
    type: embed
    dropout: 0.0
    embedding_size: 10
    output_size: 10
    activation: relu
    norm: null
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    embeddings_on_cpu: false
    reduce_output: sum
    norm_params: null
    num_fc_layers: 0
    fc_layers: null

参数

dropout (默认值: 0.0) : 嵌入的 Dropout 概率。
embedding_size (默认值: 10) : 采用的最大嵌入大小。
output_size (默认值: 10) : 如果 fc_layers 中未指定 output_size，这将是用于每层的默认 output_size。它表示全连接层的输出大小。
activation (默认值: relu): 用于每层的默认激活函数。选项: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
norm (默认值: null): 用于每层的默认归一化方式。选项: batch, layer, null。详情请参阅归一化。
use_bias (默认值: true): 该层是否使用偏置向量。选项: true, false。
bias_initializer (默认值: zeros): 用于偏置向量的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
weights_initializer (默认值: xavier_uniform): 用于权重矩阵的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
embeddings_on_cpu (默认值: false): 是否强制将嵌入矩阵放置在常规内存中并由 CPU 处理。选项: true, false。
reduce_output (默认值: sum): 如果张量秩大于 2，如何沿 s 序列长度维度缩减输出张量。选项: last, sum, mean, avg, max, concat, attention, none, None, null。
norm_params (默认值: null): 当 norm 为 batch 或 layer 时使用的参数。
num_fc_layers (默认值: 0): 堆叠全连接层的数量。
fc_layers (默认值: null): 包含每个全连接层参数的字典列表。

加权求和嵌入编码器¶

此编码器使用嵌入对 H3 表示的每个组成部分（mode、edge、resolution、base cell 和 children cells）进行编码。值为 0 的 children cells 将被屏蔽。嵌入后，所有嵌入会进行加权求和（使用习得的权重），并可选择地通过堆叠的全连接层。

encoder:
    type: weighted_sum
    dropout: 0.0
    embedding_size: 10
    output_size: 10
    activation: relu
    norm: null
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    embeddings_on_cpu: false
    should_softmax: false
    norm_params: null
    num_fc_layers: 0
    fc_layers: null

参数

dropout (默认值: 0.0) : 嵌入的 Dropout 概率。
embedding_size (默认值: 10) : 采用的最大嵌入大小。
output_size (默认值: 10) : 如果 fc_layers 中未指定 output_size，这将是用于每层的默认 output_size。它表示全连接层的输出大小。
activation (默认值: relu): 用于每层的默认激活函数。选项: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
norm (默认值: null): 用于每层的默认归一化方式。选项: batch, layer, null。详情请参阅归一化。
use_bias (默认值: true): 该层是否使用偏置向量。选项: true, false。
bias_initializer (默认值: zeros): 用于偏置向量的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
weights_initializer (默认值: xavier_uniform): 用于权重矩阵的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
embeddings_on_cpu (默认值: false): 是否强制将嵌入矩阵放置在常规内存中并由 CPU 处理。选项: true, false。
should_softmax (默认值: false): 确定加权求和的权重在使用前是否应通过 Softmax 层。选项: true, false。
norm_params (默认值: null): 当 norm 为 batch 或 layer 时使用的参数。
num_fc_layers (默认值: 0): 堆叠全连接层的数量。
fc_layers (默认值: null): 包含每个全连接层参数的字典列表。

RNN 编码器¶

此编码器使用嵌入对 H3 表示的每个组成部分（mode、edge、resolution、base cell 和 children cells）进行编码。值为 0 的 children cells 将被屏蔽。嵌入后，所有嵌入都会通过一个 RNN 编码器。

这背后的直觉是，从 base cell 开始，children cells 的序列可以看作是编码所有 H3 六边形树中路径的序列。

encoder:
    type: rnn
    dropout: 0.0
    cell_type: rnn
    num_layers: 1
    embedding_size: 10
    recurrent_dropout: 0.0
    hidden_size: 10
    bias_initializer: zeros
    activation: tanh
    recurrent_activation: sigmoid
    unit_forget_bias: true
    weights_initializer: xavier_uniform
    recurrent_initializer: orthogonal
    reduce_output: last
    embeddings_on_cpu: false
    use_bias: true
    bidirectional: false

参数

dropout (默认值: 0.0) : Dropout 率
cell_type (默认值: rnn) : 使用的循环单元类型。可用值为: rnn, lstm, lstm_block, lstm, ln, lstm_cudnn, gru, gru_block, gru_cudnn。关于单元格之间差异的参考，请参阅 PyTorch 的文档。我们建议在 CPU 上使用 block 变体，在 GPU 上使用 cudnn 变体，因为它们在 GPU 上速度更快。选项: rnn, lstm, lstm_block, ln, lstm_cudnn, gru, gru_block, gru_cudnn。
num_layers (默认值: 1) : 堆叠循环层的数量。
embedding_size (默认值: 10) : 采用的最大嵌入大小。
recurrent_dropout (默认值: 0.0): 循环状态的 Dropout 率
hidden_size (默认值: 10): 转换器块中隐藏表示的大小。它通常与 embedding_size 相同，但如果两个值不同，将在第一个转换器块之前添加一个投影层。
bias_initializer (默认值: zeros): 用于偏置向量的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
activation (默认值: tanh): 使用的激活函数。选项: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
recurrent_activation (默认值: sigmoid): 在循环步骤中使用的激活函数。选项: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null。
unit_forget_bias (默认值: true): 如果为 true，在初始化时向遗忘门的偏置添加 1。选项: true, false。
weights_initializer (默认值: xavier_uniform): 用于权重矩阵的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
recurrent_initializer (默认值: orthogonal): 循环矩阵权重的初始化器。选项: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity。
reduce_output (默认值: last): 如果张量秩大于 2，如何沿 s 序列长度维度缩减输出张量。选项: last, sum, mean, avg, max, concat, attention, none, None, null。
embeddings_on_cpu (默认值: false): 是否强制将嵌入矩阵放置在常规内存中并由 CPU 处理。选项: true, false。
use_bias (默认值: true): 是否使用偏置向量。选项: true, false。
bidirectional (默认值: false): 如果为 true，两个循环网络将分别在前向和后向进行编码，并将其输出拼接起来。选项: true, false。

输出特征¶

目前不支持将 H3 作为输出特征。考虑使用 TEXT 类型。