基于适配器的文本分类编码器微调与deepspeed

大型语言模型微调¶

这些示例展示了如何通过利用DeepSpeed的模型并行性来微调大型语言模型，使Ludwig能够扩展到具有数十亿参数的超大模型。

这里的任务是微调一个数十亿参数量级的大型LLM，以对IMDB电影评论的情感进行分类。因此，我们将使用一个预训练的LLM，为其附加一个分类头，并微调权重以提高LLM在该任务上的性能。Ludwig无需编写机器学习代码，只需通过配置即可为您完成这些。

先决条件¶

已安装带有 ludwig[distributed] 依赖项的Ludwig
已安装支持CUDA的PyTorch版本
能够访问具有多块GPU的机器或机器集群
这些示例中使用的IMDB数据集来自Kaggle，因此请确保您已设置好凭据（例如，$HOME/.kaggle.kaggle.json）

源文件¶

train_imdb_ray.pyimdb_deepspeed_zero3.yamlimdb_deepspeed_zero3_ray.yamlrun_train_dsz3.shrun_train_dsz3_ray.sh

import logging
import os

import yaml

from ludwig.api import LudwigModel

config = yaml.safe_load(
    """
input_features:
- name: review
    type: text

    encoder:
    type: auto_transformer
    pretrained_model_name_or_path: bigscience/bloom-3b
    trainable: true
    adapter:
        type: lora

output_features:
- name: sentiment
    type: category

trainer:
batch_size: 4
epochs: 3

backend:
type: ray
trainer:
    use_gpu: true
    strategy:
    type: deepspeed
    zero_optimization:
        stage: 3
        offload_optimizer:
        device: cpu
        pin_memory: true
"""
)

# Define Ludwig model object that drive model training
model = LudwigModel(config=config, logging_level=logging.INFO)

# initiate model training
(
    train_stats,  # dictionary containing training statistics
    preprocessed_data,  # tuple Ludwig Dataset objects of pre-processed training data
    output_directory,  # location of training results stored on disk
) = model.train(
    dataset="ludwig://imdb",
    experiment_name="imdb_sentiment",
    model_name="bloom3b",
)

# list contents of output directory
print("contents of output directory:", output_directory)
for item in os.listdir(output_directory):
    print("\t", item)

input_features:
- name: review
    type: text
    encoder:
    type: auto_transformer
    pretrained_model_name_or_path: bigscience/bloom-3b
    trainable: true
    adapter: lora

output_features:
- name: sentiment
    type: category

trainer:
batch_size: 4
epochs: 3
gradient_accumulation_steps: 8

backend:
type: deepspeed
zero_optimization:
    stage: 3
    offload_optimizer:
    device: cpu
    pin_memory: true

input_features:
- name: review
    type: text
    encoder:
    type: auto_transformer
    pretrained_model_name_or_path: bigscience/bloom-3b
    trainable: true
    adapter: lora

output_features:
- name: sentiment
    type: category

trainer:
batch_size: 4
epochs: 3
gradient_accumulation_steps: 8

backend:
type: ray
trainer:
    use_gpu: true
    strategy:
    type: deepspeed
    zero_optimization:
        stage: 3
        offload_optimizer:
        device: cpu
        pin_memory: true

#!/usr/bin/env bash

# Fail fast if an error occurs
set -e

# Get the directory of this script, which contains the config file
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Train
deepspeed --no_python --no_local_rank --num_gpus 4 ludwig train --config ${SCRIPT_DIR}/imdb_deepspeed_zero3.yaml --dataset ludwig://imdb

#!/usr/bin/env bash

# Fail fast if an error occurs
set -e

# Get the directory of this script, which contains the config file
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Train
ludwig train --config ${SCRIPT_DIR}/imdb_deepspeed_zero3_ray.yaml --dataset ludwig://imdb

在Ray上运行DeepSpeed¶

这是使用DeepSpeed的推荐方式，它支持自动批量大小调整和分布式数据处理。在小型数据集（<100MB）上使用Ray会有一些开销，但在大多数情况下，性能应与使用原生DeepSpeed相当。

从您的Ray集群的头节点

./run_train_dsz3_ray.sh

Python API¶

如果您想以编程方式运行Ludwig（从notebook或作为更大工作流程的一部分），可以使用Ray集群启动器从本地机器运行以下Python脚本。

ray submit cluster.yaml train_imdb_ray.py

如果直接在Ray头节点上运行，可以省略 ray submit 部分，像运行普通Python脚本一样运行

python train_imdb_ray.py

原生运行DeepSpeed¶

此模式适用于大小足够在单台机器内存中存储的数据集，因为它不使用分布式数据处理（需要使用Ray后端）。

以下示例假设您有4块GPU可用，但可以轻松修改以支持您偏好的设置。

从您机器上的终端

./run_train_dsz3.sh