🎙️ VyvoTTS Dataset Tokenizer

Process audio datasets for VyvoTTS training by tokenizing both audio and text.

Instructions:

Enter your HuggingFace token (required for downloading and uploading datasets)
Provide the original dataset path from HuggingFace Hub
Specify the output dataset path where processed data will be uploaded
Select the model type (Qwen3 or LFM2)
Specify the text field name in your dataset
Click "Process Dataset" to start

Note: This process requires a GPU and may take several minutes depending on dataset size.

HuggingFace Token

Your HuggingFace token for authentication

Original Dataset

HuggingFace dataset path to process

Output Dataset

Output dataset path on HuggingFace Hub

Model Type

Select the model type for tokenization

qwen3 lfm2

Text Field Name

Name of the text field in your dataset (e.g., 'text', 'text_scribe')

Status

📝 Example Values:

For Qwen3:

Original Dataset: MrDragonFox/Elise
Output Dataset: username/elise-qwen3-processed
Model Type: qwen3
Text Field: text

For LFM2:

Original Dataset: MrDragonFox/Elise
Output Dataset: username/elise-lfm2-processed
Model Type: lfm2
Text Field: text

⚠️ Requirements:

GPU with CUDA support
HuggingFace account with write access
Valid HuggingFace token