代码库结构

├── docker                 - Ludwig Docker images
├── examples               - Configs demonstrating Ludwig on various tasks
├── ludwig                 - Ludwig library source code
│   ├── automl             - Configurations, defaults, and utilities for AutoML
│   ├── backend            - Execution backends (local, horovod, ray)
│   ├── benchmarking       - Performance benchmarks for training and hyperopt
│   ├── combiners          - Combiners used in ECD models
│   ├── contribs           - 3rd-party integrations (MLFlow, WandB, Comet)
│   ├── data               - Data loading, pre/postprocessing, sampling
│   ├── datasets           - Ludwig Dataset Zoo: API to download pre-configured datasets.
│   ├── decoders           - Output feature decoders
│   ├── encoders           - Input feature encoders
│   ├── explain            - Utilities for explaining model predictions
│   ├── features           - Implementations of feature types
│   ├── hyperopt
│   ├── models             - Implementations of ECD, trainer, predictor.
│   ├── modules            - Torch modules including layers, metrics, and losses
│   ├── profiling          - Dataset profiles
│   ├── schema             - The complete schema of the ludwig config.yaml
│   ├── trainers
│   ├── utils              - Various internal utilities used by ludwig python modules
│   ├── api.py             - Entry point for python API. Declares LudwigModel.
│   ├── api_annotations.py - Provides @PublicAPI, @DevelopAPI annotation decorators
│   └── cli.py             - ludwig command-line tool
└── tests
    ├── integration_tests  - End-to-end tests of Ludwig workflows
    └── ludwig             - Unit tests. Subdirectories match ludwig/ structure

代码库按照模块化、数据类型/特征中心的方式组织。为新的数据类型添加特征只需对现有代码进行最少的修改。

添加实现新特征的模块
将其导入到相应的注册表文件，例如 ludwig/features/feature_registries.py
将新模块添加到预期的注册表，例如 input_type_registry

所有特定于数据类型的逻辑都位于相应的特征模块中，所有这些模块都在 ludwig/features/ 目录下。

特征¶

特征类在数据类型 Mixin 类（例如 BinaryFeatureMixin、NumberFeatureMixin、CategoryFeatureMixin）中提供了特定于各种数据类型的原始数据预处理逻辑。特征 Mixin 包含用于获取特征元数据（get_feature_meta，用于收集数据集范围内的最小值、最大值、平均值、词汇表等的一次性操作）以及使用先前计算的元数据将原始数据转换为张量（add_feature_data，通常按数据集行进行操作）的数据预处理函数。

输出特征还包含特定于数据类型的逻辑，用于计算数据后处理、将模型预测转换回数据空间以及输出损失或准确度等指标。

模型架构¶

编码器和解码器也进行了模块化（分别位于 ludwig/encoders/ 和 ludwig/decoders/ 目录下），以便它们可以被多个特征使用。例如，序列编码器由文本、序列和时间序列特征共享。

各种可重用的模型架构组件也拆分为专门的模块（例如卷积模块、全连接层、注意力机制等），这些模块在 ludwig/modules/ 中可用。

训练与推理¶

训练逻辑位于 ludwig/trainers/trainer.py 中，它初始化训练会话、馈送数据并执行训练循环。包括批量预测和评估在内的预测逻辑位于 ludwig/models/predictor.py 中。

Ludwig CLI¶

命令行接口由 ludwig/cli.py 脚本管理，该脚本导入 ludwig/ 顶层目录中的其他脚本，这些脚本执行各种子命令（实验、评估、导出、可视化等）。

程序化接口（也被 CLI 命令使用）在 ludwig/api.py 中可用。

测试¶

所有测试代码都位于 tests/ 目录中。tests/integration_tests/ 子目录包含旨在提供 Ludwig 提供的所有工作流的端到端测试覆盖的测试用例。

tests/ludwig/ 目录包含单元测试，按照与 ludwig/ 源代码树平行的子目录树组织。有关测试的更多详细信息，请参阅风格指南与测试。

杂项¶

超参数优化逻辑在 ludwig/hyperopt/ 包中的脚本中实现。

ludwig/utils/ 包包含 Ludwig Python 模块使用的各种内部实用工具。

最后，ludwig/contrib/ 包包含与外部库集成的用户贡献代码。