1. Help Center
  2. Voice Generation Time (VGT)

Introduction to Voice Generation Time (VGT)

Voice generation time (VGT) is calculated as the sum of the generated speech lengths for every sub-block. It can be considered as Studio credits.

VGT is consumed whenever you render a newly created sub-block or modify text in an existing sub-block using any of the three generate (play button) options available in the Studio (block, sub-block, and project level).

However,  modifying the generated speech using a different voice actor, style, pitch, speed, pause, emphasis, pronunciation, punctuation, and volume for the same text will not consume any voice generation time.

The voice generation time consumed to convert text to speech will equal the length of a particular text block. When text is added or modified in a sub-block, speech is generated for the entire sub-block again.

 

Example:
Consider that you have 10 minutes of VGT in your account and enter a 500-word script into the Studio. The generated audio file is 4:34 minutes; then, 4:34 minutes are deducted from your total VGT. Now, if you go back and edit (add or delete content) any of the text in the above script and render again, then only the audio duration for that particular sub-block will be deducted.

 

Estimating Voice Generation Requirement

A 1,000-word English script would consume ~ 6 minutes of VGT without accounting for the text changes. (This is an approximate value and will vary based on the script, voice selected, and speed of the voiceover.) Based on this, you can estimate how much voice generation time your project might need and opt for the right plan.

 

You can track your voice generation time on the tracker on the top panel of your project. 

 

image-png-May-16-2025-11-00-10-7054-AM



This interactive tutorial will show you how your VGT gets consumed upon voice generation: