Enhanced text-to-speech - Internal
Art. no. 221423641
What is Prenly’s enhanced text-to-speech feature?
Prenly’s Enhanced Text-to-Speech (TTS) converts written articles into audio using natural, human-like voices from third-party suppliers.
What is it used for? / What are its functions?
Standard TTS is always available, but the voices come from the browser or device and can’t be controlled by us or the customer. They vary by platform and often sound robotic with limited intonation.
Enhanced TTS replaces these inbuilt voices with those from integrated providers, offering more natural pitch, rhythm, and flow. This is especially valuable for languages other than English, where standard voices are often poorly supported.
This feature improves accessibility, supports multitasking by allowing users to listen instead of read, and offers an alternative way to consume content.
Technical details
Some providers calculate character usage based on both the text and additional code tags (e.g., for language or voice settings), which affects pricing. Language quality varies between providers, so the choice of supplier is important, especially for non-English content.
Only issues uploaded after the service is activated will include the new audio.
How is enhanced text-to-speech implemented? - What actions are needed from us?
For all suppliers
- Customer Success informs Customer Service for which titles the service should be activated
- Customer service sets up the integration → More detailed information will follow
How is enhanced text-to-speech implemented? - What actions are needed from the customer?
Microsoft Azure
- Create an account with Azure
- Go to azure.com and click start free
- Sign up and set up billing
- Create a speech resource
- In the Azure Portal, create a resource; select Speech and click create
- Fill in subscription, resource group, region, name and click create
- Open the resource and choose keys and endpoint and copy key and region
Send credentials:
- Voice: choose from here https://speech.microsoft.com/portal/voicegallery
- Speech key
- Speech region
- Voice gender
- Voice language
Google Cloud
- Go to cloud.google.com and click get started for free
- Log in with your Google account
- Set up billing
- Google Cloud Console, click the project dropdown at the top → New Project
- In the left menu, go to APIs & Services → Library
- Search for Text-to-Speech API and click Enable.
Send credentials:
- Voice: Choose from here https://cloud.google.com/text-to-speech
- Service account JSON key file
- Create a service account and generate a JSON key file
Narakeet
- Create a Narakeet account
- Choose from here https://www.narakeet.com/app/text-to-audio/?projectId=1461fef6-7953-4343-abed-4ed393d70cf8
- Speech/API key
- Go to Account → API Access (or Developer section, depending on the UI version).
- It’s usually labelled simply “API Key” and is a long string of letters/numbers.
Amazon Polly
- Create an AWS account at https://aws.amazon.com → This is a full AWS account, not Polly-only.
- AWS also requires a valid credit card and phone number for verification — even if they stay in the free tier.
- Go to IAM → Security credentials.
- Click create access key
- Select Application running outside AWS and click next
- Click create access key
Send credentials:
- Voice: Choose from here https://docs.aws.amazon.com/polly/latest/dg/listen-to-voices.html
- Secret access key
- Access key ID
- Voice region
Beyond Words
- Create a Beyond Words account
- Create Project → will create a project ID
- In the dashboard, go to Settings → API Keys.
- Create a new API key
Send credentials:
- Voice: Choose from here https://beyondwords.io/voices/
- Speech/ API key
- Project ID
ElevenLabs
- Create an ElevenLabs account
- Go to Profile → API keys and create a new API key
Send credentials:
- Voice ID: Choose from here https://elevenlabs.io/
- Speech/API key
Once the setup is complete, customers upload their publications to Webarch as usual. The audio is then generated automatically, activated, and imported into Prenly.