数据集准备
Ludwig 可以训练任何表格状数据集,这意味着每个特征都有自己的列,每个样本都有自己的行。
在此示例中,我们将使用此 烂番茄 (Rotten Tomatoes) 数据集,这是一个包含多种特征类型和二元目标的 CSV 文件。
在此 处 下载数据到本地。
让我们看看前 5 行,了解数据的排列方式
head -n 5 rotten_tomatoes.csv
import pandas as pd
df = pd.read_csv('rotten_tomatoes.csv')
df.head()
您的结果应该看起来像这样
movie_title | content_rating | genres | runtime | top_critic | review_content | recommended |
---|---|---|---|---|---|---|
Deliver Us from Evil | R | Action & Adventure, Horror | 117.0 | TRUE | Director Scott Derrickson and his co-writer, Paul Harris Boardman, deliver a routine procedural with unremarkable frights. | 0 |
Barbara | PG-13 | Art House & International, Drama | 105.0 | FALSE | Somehow, in this stirring narrative, Barbara manages to keep hold of her principles, and her humanity and courage, and battles to save a dissident teenage girl whose life the Communists are trying to destroy. | 1 |
Horrible Bosses | R | Comedy | 98.0 | FALSE | These bosses cannot justify either murder or lasting comic memories, fatally compromising a farce that could have been great but ends up merely mediocre. | 0 |
Money Monster | R | Drama | 98.0 | FALSE | A satire about television that feels like it was made by the kind of people who claim they don't even watch TV. | 0 |
Battle Royale | NR | Action & Adventure, Art House & International, Drama, Mystery & Suspense | 114.0 | FALSE | Battle Royale is The Hunger Games not diluted for young audiences. | 1 |