超参数优化
这是 Ludwig 超参数优化功能的完整示例。
这些交互式 Notebook 遵循本示例的步骤
下载 Adult Census Income 数据集¶
Adult Census Income 是 1994 年人口普查数据的摘录,用于预测一个人的年收入是否超过 5 万美元。数据集包含超过 4.9 万条记录和 14 个带缺失值的属性。
ludwig datasets download adult_census_income
此命令将在当前目录中创建数据集 adult_census_income.csv
。
数据集中的列如下
列 | 描述 |
---|---|
age | 数值变量,人的年龄 |
workclass | 类别变量,就业类型 |
fnlwgt | 数值变量,无定义 |
education | 类别变量,教育水平 |
education-num | 数值变量,无定义 |
marital-status | 类别变量,婚姻状况 |
occupation | 类别变量,职业 |
relationship | 类别变量,与家庭的关系 |
race | 类别变量,种族 |
sex | 类别变量,性别 |
capital-gain | 数值变量,无定义 |
capital-loss | 数值变量,无定义 |
hours-per-week | 数值变量,每周工作小时数 |
native-country | 类别变量,原籍国家 |
income | 二进制变量," <=50K" 或 " >50K" |
split | 数值变量,指示数据分割,训练(0),测试(2) |
设置超参数优化运行¶
超参数优化在 Ludwig 配置规范的 hyperopt
部分中定义。
preprocessing:
...
input_features:
...
combiner:
...
output_features:
...
trainer:
...
defaults:
...
# hyperopt specification
hyperopt:
# specify parameters for the Ray Tune to executor to run the hyperparameter optimization
executor:
...
# specify Ray Tune search algorithm to use
search_alg:
...
# hyperparameter search space for the optimization
parameters:
...
# minimize or maximize the metric score
goal: ...
# metric score to optimize
metric: ...
# name of the output feature
output_feature: ...
# define model configuration
config = {
'combiner': ... ,
'input_features': ... ,
'output_features': ... ,
'preprocessing': ...,
'trainer': ... ,
'defaults': ... ,
# hyperopt specification
'hyperopt': {
# specify parameters for the Ray Tune to executor to run the hyperparameter optimization
'executor': {'type': 'ray', ... },
# specify Ray Tune search algorithm to use
'search_alg': {... },
# hyperparameter search space for the optimization
'parameters': {...},
# minimize or maximize the metric score
'goal': ...,
# metric score to optimize
'metric': ...,
# name of the output feature
'output_feature': ...,
}
}
超参数搜索空间规范¶
对于本示例,我们希望确定 Ludwig 的 Trainer 的 learning_rate
和 income
输出特征的 num_fc_layers
对模型 roc_auc
指标的影响。为此,我们将使用两种不同的超参数优化方法:随机搜索和网格搜索。
随机搜索¶
hyperopt:
executor:
num_samples: 16
goal: maximize
metric: roc_auc
output_feature: income
parameters:
income.decoder.num_fc_layers:
space: randint
lower: 2
upper: 9
trainer.learning_rate:
space: loguniform
lower: 0.001
upper: 0.1
search_alg:
type: variant_generator
random_state: 1919
'hyperopt': {
'executor': {'num_samples': 16, },
'goal': 'maximize',
'metric': 'roc_auc',
'output_feature': 'income',
'parameters': {
'income.decoder.num_fc_layers': {
'space': 'randint',
'lower': 2,
'upper': 9
},
'trainer.learning_rate': {
'space': 'loguniform',
'lower': 0.001,
'upper': 0.1}
},
'search_alg': {'type': 'variant_generator', 'random_state': 1919, }
},
网格搜索¶
hyperopt:
executor:
num_samples: 1
goal: maximize
metric: roc_auc
output_feature: income
parameters:
income.decoder.num_fc_layers:
space: grid_search
values: [2, 4, 6, 8]
trainer.learning_rate:
space: grid_search
values: [0.001, 0.003, 0.007, 0.01]
search_alg:
type: variant_generator
random_state: 1919
'hyperopt': {
'executor': {'num_samples': 1,},
'goal': 'maximize',
'metric': 'roc_auc',
'output_feature': 'income',
'parameters': {
'income.decoder.num_fc_layers': {'space': 'grid_search', 'values': [2, 4, 6, 8]},
'trainer.learning_rate': {'space': 'grid_search', 'values': [0.001, 0.003, 0.007, 0.01]}},
'search_alg': {'type': 'variant_generator', 'random_state': 1919, }
},
运行超参数优化¶
这里是运行 Ludwig 超参数优化功能的示例命令/函数调用。
ludwig hyperopt --dataset adult_census_income.csv \
--config config.yaml \
--output_directory results \
--hyperopt_log_verbosity 1
hyperopt_results = hyperopt(
config,
dataset=adult_census_df,
output_directory="results",
hyperopt_log_verbosity=1
)
可视化超参数优化结果¶
ludwig visualize hyperopt_report
命令
ludwig visualize hyperopt_hiplot
命令
# generate visualizations on hyperparameter effects on the metric
ludwig visualize --visualization hyperopt_report \
--hyperopt_stats_path results/hyperopt_statistics.json \
--output_directory visualizations \
--file_format png
# generate hyperopt hiplot parallel coordinate visualization
ludwig visualize --visualization hyperopt_hiplot \
--hyperopt_stats_path results/hyperopt_statistics.json \
--output_directory visualizations
visualize.hyperopt_report()
函数
visualize.hyperopt_hiplot()
函数
hyperopt_report("./rs_output/hyperopt_statistics.json")
hyperopt_hiplot("./rs_output/hyperopt_statistics.json", output_directory="visualizations")