🎙️ VyvoTTS Dataset Tokenizer

Process audio datasets for VyvoTTS training by tokenizing both audio and text.

Instructions:

  1. Enter your HuggingFace token (required for downloading and uploading datasets)
  2. Provide the original dataset path from HuggingFace Hub
  3. Specify the output dataset path where processed data will be uploaded
  4. Select the model type (Qwen3 or LFM2)
  5. Specify the text field name in your dataset
  6. Click "Process Dataset" to start

Note: This process requires a GPU and may take several minutes depending on dataset size.

Model Type

Select the model type for tokenization

📝 Example Values:

For Qwen3:

  • Original Dataset: MrDragonFox/Elise
  • Output Dataset: username/elise-qwen3-processed
  • Model Type: qwen3
  • Text Field: text

For LFM2:

  • Original Dataset: MrDragonFox/Elise
  • Output Dataset: username/elise-lfm2-processed
  • Model Type: lfm2
  • Text Field: text

⚠️ Requirements:

  • GPU with CUDA support
  • HuggingFace account with write access
  • Valid HuggingFace token