Enhanced text-to-speech - Internal

Art. no. 221423641

What is Prenly’s enhanced text-to-speech feature?

Prenly’s Enhanced Text-to-Speech (TTS) converts written articles into audio using natural, human-like voices from third-party suppliers.

What is it used for? / What are its functions?

Standard TTS is always available, but the voices come from the browser or device and can’t be controlled by us or the customer. They vary by platform and often sound robotic with limited intonation.

Enhanced TTS replaces these inbuilt voices with those from integrated providers, offering more natural pitch, rhythm, and flow. This is especially valuable for languages other than English, where standard voices are often poorly supported.

This feature improves accessibility, supports multitasking by allowing users to listen instead of read, and offers an alternative way to consume content.

Technical details

Some providers calculate character usage based on both the text and additional code tags (e.g., for language or voice settings), which affects pricing. Language quality varies between providers, so the choice of supplier is important, especially for non-English content.

Only issues uploaded after the service is activated will include the new audio.

How is enhanced text-to-speech implemented? - What actions are needed from us?

For all suppliers

Customer Success informs Customer Service for which titles the service should be activated
Customer service sets up the integration → More detailed information will follow

How is enhanced text-to-speech implemented? - What actions are needed from the customer?

Microsoft Azure

Create an account with Azure

Go to azure.com and click start free
Sign up and set up billing

Create a speech resource

In the Azure Portal, create a resource; select Speech and click create
Fill in subscription, resource group, region, name and click create
Open the resource and choose keys and endpoint and copy key and region

Send credentials:

Voice: choose from here https://speech.microsoft.com/portal/voicegallery
Speech key
Speech region
Voice gender
Voice language

Google Cloud

Go to cloud.google.com and click get started for free
Log in with your Google account
Set up billing
Google Cloud Console, click the project dropdown at the top → New Project
In the left menu, go to APIs & Services → Library
Search for Text-to-Speech API and click Enable.

Send credentials:

Voice: Choose from here https://cloud.google.com/text-to-speech
Service account JSON key file

Create a service account and generate a JSON key file

Narakeet

Create a Narakeet account
Choose from here https://www.narakeet.com/app/text-to-audio/?projectId=1461fef6-7953-4343-abed-4ed393d70cf8
Speech/API key

Go to Account → API Access (or Developer section, depending on the UI version).
It’s usually labelled simply “API Key” and is a long string of letters/numbers.

Amazon Polly

Create an AWS account at https://aws.amazon.com → This is a full AWS account, not Polly-only.
AWS also requires a valid credit card and phone number for verification — even if they stay in the free tier.
Go to IAM → Security credentials.
Click create access key
Select Application running outside AWS and click next
Click create access key

Send credentials:

Voice: Choose from here https://docs.aws.amazon.com/polly/latest/dg/listen-to-voices.html
Secret access key
Access key ID
Voice region

Beyond Words

Create a Beyond Words account
Create Project → will create a project ID
In the dashboard, go to Settings → API Keys.
Create a new API key

Send credentials:

Voice: Choose from here https://beyondwords.io/voices/
Speech/ API key
Project ID

ElevenLabs

Create an ElevenLabs account
Go to Profile → API keys and create a new API key

Send credentials:

Voice ID: Choose from here https://elevenlabs.io/
Speech/API key

Once the setup is complete, customers upload their publications to Webarch as usual. The audio is then generated automatically, activated, and imported into Prenly.

Enhanced text-to-speech - Internal

What is Prenly’s enhanced text-to-speech feature?

What is it used for? / What are its functions?

Technical details

How is enhanced text-to-speech implemented? - What actions are needed from us?

How is enhanced text-to-speech implemented? - What actions are needed from the customer?

Related Links