Amazon Polly - Developer Guide

Developer Guide

Amazon Polly

Amazon Polly Developer Guide

Amazon Polly: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service

that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any

manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are

the property of their respective owners, who may or may not be aﬃliated with, connected to, or

sponsored by Amazon.

Amazon Polly Developer Guide

Table of Contents

What Is Amazon Polly? ................................................................................................................... 1

Beneﬁts ........................................................................................................................................................... 1

Are you a ﬁrst-time user? ........................................................................................................................... 2

How it works .................................................................................................................................... 3

Are you a ﬁrst-time user? ........................................................................................................................... 2

Getting started ................................................................................................................................ 5

Setting up Amazon Polly ............................................................................................................................ 5

Sign up for an AWS account ................................................................................................................ 5

Create a user with administrative access ........................................................................................... 6

Using Amazon Polly on the console ........................................................................................................ 8

Step 1.1: Synthesize speech quick start on the console ................................................................. 8

Step 1.2: Synthesize speech with plaintext input on the console ................................................. 9

Using Amazon Polly on the AWS CLI ....................................................................................................... 9

Step 2.1: Set up the AWS CLI ............................................................................................................ 10

Step 2.2: Getting started exercise using the AWS CLI .................................................................. 13

Python examples ........................................................................................................................................ 15

Set up Python and test an example (SDK) ...................................................................................... 15

Voices in Amazon Polly ................................................................................................................. 18

Listening to voices ..................................................................................................................................... 18

Available voices ........................................................................................................................................... 19

Brand voices ........................................................................................................................................... 26

Voice speed .................................................................................................................................................. 26

Changing your voice speed ................................................................................................................. 27

Bilingual voices ........................................................................................................................................... 28

Accented bilingual voices .................................................................................................................... 28

Fully bilingual voices ............................................................................................................................ 29

Newscaster voices ...................................................................................................................................... 30

Languages in Amazon Polly .......................................................................................................... 33

Phoneme and Viseme Tables for Supported Languages ................................................................... 35

Arabic (arb) ............................................................................................................................................. 36

Arabic (Gulf) (ar-AE) ............................................................................................................................. 41

Catalan (ca-ES) ...................................................................................................................................... 47

Chinese (Cantonese) (yue-CN) ............................................................................................................ 51

Chinese (Mandarin) (cmn-CN) ............................................................................................................. 55

iii

Amazon Polly Developer Guide

Danish (da-DK) ....................................................................................................................................... 60

Dutch (Belgian) (nl-BE) ........................................................................................................................ 64

Dutch (nl-NL) ......................................................................................................................................... 68

English (US) (en-US) ............................................................................................................................. 72

English (Australian) (en-AU) ................................................................................................................ 75

English (British) (en-GB) ...................................................................................................................... 79

English (Indian) (en-IN) ........................................................................................................................ 84

English (Ireland) (en-IE) ....................................................................................................................... 88

English (New Zealand) (en-NZ) .......................................................................................................... 91

English (South African) (en-ZA) ......................................................................................................... 97

English (Welsh) (en-GB-WLS) ........................................................................................................... 102

Finnish (ﬁ-FI) ........................................................................................................................................ 106

French (fr-FR) ....................................................................................................................................... 111

French (Belgian) (fr-BE) ..................................................................................................................... 114

French (Canadian) (fr-CA) ................................................................................................................. 118

German (de-DE) .................................................................................................................................. 121

German (Austrian) (de-AT) ................................................................................................................ 125

Hindi (hi-IN) ......................................................................................................................................... 130

Icelandic (is-IS) .................................................................................................................................... 133

Italian (it-IT) ......................................................................................................................................... 138

Japanese (ja-JP) .................................................................................................................................. 141

Korean (ko-KR) .................................................................................................................................... 144

Norwegian (nb-NO) ............................................................................................................................ 147

Polish (pl-PL) ....................................................................................................................................... 151

Portuguese (pt-PT) ............................................................................................................................. 155

Portuguese (Brazilian) (pt-BR) ......................................................................................................... 158

Romanian (ro-RO) ............................................................................................................................... 161

Russian (ru-RU) .................................................................................................................................... 164

Spanish (es-ES) .................................................................................................................................... 168

Spanish (Mexican) (es-MX) ................................................................................................................ 171

Spanish (US) (es-US) .......................................................................................................................... 174

Swedish (sv-SE) ................................................................................................................................... 176

Turkish (tr-TR) ..................................................................................................................................... 180

Welsh (cy-GB) ...................................................................................................................................... 184

Voice engines ............................................................................................................................... 189

Generative engine .................................................................................................................................... 189

Amazon Polly Developer Guide

Available generative voices .............................................................................................................. 190

Feature and region compatibility .................................................................................................... 190

Using the Generative engine on the console ............................................................................... 191

Long-form engine .................................................................................................................................... 192

Available long-form voices ............................................................................................................... 193

Feature and region compatibility .................................................................................................... 193

Using the Long-form engine on the console ................................................................................ 194

Neural engine ........................................................................................................................................... 194

Available neural voices ...................................................................................................................... 195

Feature and region compatibility .................................................................................................... 199

Using the Neural engine on the console ....................................................................................... 200

Standard engine ....................................................................................................................................... 201

Available Standard voices ................................................................................................................. 201

Feature and region compatibility .................................................................................................... 204

Using the Standard engine on the console .................................................................................. 206

Speech marks ............................................................................................................................... 207

Speech mark types .................................................................................................................................. 207

Visemes and Amazon Polly .............................................................................................................. 208

Using speech marks ................................................................................................................................ 209

Requesting speech marks ................................................................................................................. 209

Speech mark output .......................................................................................................................... 210

Speech mark examples ...................................................................................................................... 211

Requesting speech marks on the console .......................................................................................... 213

Using SSML .................................................................................................................................. 215

Reserved characters ................................................................................................................................. 216

Using SSML on the console ................................................................................................................... 218

Using SSML on the AWS CLI ................................................................................................................. 220

Using SSML with the Synthesize-Speech command ................................................................... 220

Synthesizing an SSML-enhanced document ................................................................................. 221

Using SSML for common Amazon Polly tasks .............................................................................. 222

Supported SSML tags ............................................................................................................................. 226

Identifying SSML-enhanced text ..................................................................................................... 228

Adding a pause ................................................................................................................................... 228

Emphasizing words ............................................................................................................................ 229

Specifying another language for speciﬁc words .......................................................................... 230

Placing a custom tag in your text .................................................................................................. 231

Amazon Polly Developer Guide

Adding a pause between paragraphs ............................................................................................. 232

Using phonetic pronunciation .......................................................................................................... 232

Controlling volume, speaking rate, and pitch .............................................................................. 234

Setting a maximum duration for synthesized speech ................................................................ 237

Adding a pause between sentences ............................................................................................... 240

Controlling how special types of words are spoken ................................................................... 241

Pronouncing acronyms and abbreviations .................................................................................... 244

Improving pronunciation by specifying parts of speech ............................................................ 245

Adding the sound of breathing ....................................................................................................... 246

Newscaster speaking style ................................................................................................................ 250

Adding dynamic range compression .............................................................................................. 251

Speaking softly ................................................................................................................................... 253

Controlling timbre .............................................................................................................................. 254

Whispering ........................................................................................................................................... 255

Managing lexicons ....................................................................................................................... 257

Applying multiple lexicons ..................................................................................................................... 258

Managing lexicons on the console ....................................................................................................... 259

Uploading lexicons on the console ................................................................................................. 259

Applying lexicons on the console (Synthesize Speech) .............................................................. 260

Filtering the lexicon list on the console ........................................................................................ 261

Downloading lexicons on the console ............................................................................................ 262

Deleting a lexicon on the console .................................................................................................. 262

Managing lexicons on the AWS CLI ..................................................................................................... 263

PutLexicon ............................................................................................................................................ 263

GetLexicon ............................................................................................................................................ 270

ListLexicons .......................................................................................................................................... 271

DeleteLexicon ...................................................................................................................................... 272

Creating long audio ﬁles ............................................................................................................ 273

Setting up the IAM policy for asynchronous synthesis .................................................................... 274

Creating long audio ﬁles on the console ............................................................................................ 275

Creating long audio ﬁles on the AWS CLI .......................................................................................... 276

Code and application examples .................................................................................................. 279

Sample code ............................................................................................................................................. 279

Java samples ........................................................................................................................................ 279

Python samples .................................................................................................................................. 289

Example applications .............................................................................................................................. 295

Amazon Polly Developer Guide

Python example .................................................................................................................................. 295

Java example ....................................................................................................................................... 309

iOS example ......................................................................................................................................... 314

Android example ................................................................................................................................. 316

Quotas .......................................................................................................................................... 319

Supported regions ................................................................................................................................... 320

Quotas and throttle rates ...................................................................................................................... 320

Concurrent requests ........................................................................................................................... 321

Best practices to mitigate throttling .............................................................................................. 321

Pronunciation lexicons ............................................................................................................................ 322

SynthesizeSpeech API operations ......................................................................................................... 322

SpeechSynthesisTask API operations ................................................................................................... 323

Speech Synthesis Markup Language (SSML) ...................................................................................... 323

Security ........................................................................................................................................ 324

Data Protection ........................................................................................................................................ 325

Encryption at Rest .............................................................................................................................. 325

Encryption in Transit .......................................................................................................................... 326

Internetwork Traﬃc Privacy ............................................................................................................. 326

Identity and Access Management ........................................................................................................ 326

Audience ............................................................................................................................................... 326

Authenticating with identities ......................................................................................................... 327

Managing access using policies ....................................................................................................... 330

How Amazon Polly works with IAM ............................................................................................... 333

Identity-based policy examples ....................................................................................................... 340

Amazon Polly API Permissions Reference ..................................................................................... 347

Troubleshooting .................................................................................................................................. 348

Logging and Monitoring ......................................................................................................................... 350

Compliance Validation ............................................................................................................................ 351

Resilience ................................................................................................................................................... 351

Infrastructure Security ............................................................................................................................ 352

Security Best Practices ............................................................................................................................ 352

Using Interface VPC Endpoints ............................................................................................................. 352

Availability ............................................................................................................................................ 353

Creating a VPC endpoint for Amazon Polly .................................................................................. 353

Testing the connection between your VPC and Amazon Polly ................................................. 353

Controlling access to your Amazon Polly endpoint .................................................................... 354

vii

Amazon Polly Developer Guide

Support for VPC context keys ......................................................................................................... 355

Logging Amazon Polly API calls with AWS CloudTrail .............................................................. 356

Amazon Polly information in CloudTrail ............................................................................................. 356

Example: Amazon Polly Log File Entries ............................................................................................. 357

CloudWatch integration .............................................................................................................. 359

Getting CloudWatch Metrics (Console) ............................................................................................... 359

Getting CloudWatch metrics on the AWS CLI .................................................................................... 359

Amazon Polly Metrics ............................................................................................................................. 360

Dimensions for Amazon Polly Metrics ................................................................................................. 361

API Reference ............................................................................................................................... 363

Actions ........................................................................................................................................................ 363

DeleteLexicon ...................................................................................................................................... 364

DescribeVoices ..................................................................................................................................... 366

GetLexicon ............................................................................................................................................ 370

GetSpeechSynthesisTask ................................................................................................................... 373

ListLexicons .......................................................................................................................................... 376

ListSpeechSynthesisTasks ................................................................................................................. 379

PutLexicon ............................................................................................................................................ 382

StartSpeechSynthesisTask ................................................................................................................. 385

SynthesizeSpeech ............................................................................................................................... 393

Data Types ................................................................................................................................................. 399

Lexicon .................................................................................................................................................. 400

LexiconAttributes ................................................................................................................................ 401

LexiconDescription .............................................................................................................................. 403

SynthesisTask ....................................................................................................................................... 404

Voice ...................................................................................................................................................... 409

Document History ........................................................................................................................ 412

AWS Glossary ............................................................................................................................... 425

viii

Amazon Polly Developer Guide

What Is Amazon Polly?

Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to

develop applications that increase engagement and accessibility. Amazon Polly supports multiple

languages and includes a variety of lifelike voices. With Amazon Polly, you can build speech-

enabled applications that work in multiple locations and use the ideal voice for your customers.

Also, you only pay for the text you synthesize. You can also cache and replay Amazon Polly’s

generated speech at no additional cost.

Amazon Polly oﬀers many voice options, including generative, long-form, neural, and standard

text-to-speech (TTS) options. These voices deliver ground-breaking improvements in speech

quality using new machine learning technology to oﬀer the most natural and human-like text-to-

speech voices possible. Neural TTS technology also supports a Newscaster speaking style, tailored

to news narration use cases.

Common use cases for Amazon Polly include, but are not limited to: mobile applications such as

newsreaders, games, eLearning platforms, accessibility applications for visually impaired people,

and the rapidly growing segment of Internet of Things (IoT).

Amazon Polly is certiﬁed for use with regulated workloads for HIPAA (the Health Insurance

Portability and Accountability Act of 1996), and Payment Card Industry Data Security Standard (PCI

DSS).

Beneﬁts

Some of the beneﬁts of using Amazon Polly include:

• High quality – Amazon Polly oﬀers highly-performant generative, long-form, neural, and high-

quality text-to-speech (TTS) voices. These technologies synthesize natural speech with high

pronunciation accuracy (including abbreviations, acronym expansions, date/time interpretations,

and homograph disambiguation).

• Low latency – Amazon Polly achieves fast responses, which makes it a viable option for low-

latency use cases such as dialogue systems.

• Support for a large portfolio of languages and voices – Amazon Polly supports dozens of

voices and languages, oﬀering male and female voice options for most languages. This number

will continue to increase as we bring more neural voices online. US English voices Matthew and

Beneﬁts 1

Amazon Polly Developer Guide

Joanna can also use the Neural Newscaster speaking style, similar to what you might hear from a

professional news anchor.

• Cost-eﬀective – Amazon Polly's pay-per-use model means that there are no setup costs. Start

small and scale up as your application grows.

• Cloud-based solution – On-device TTS solutions require signiﬁcant computing resources,

notably CPU power, RAM, and disk space. These can result in higher development costs and

higher power consumption on devices such as tablets, smartphones, and so on. In contrast,

TTS conversion done in the AWS Cloud dramatically reduces local resource requirements. This

enables support of all the available languages and voices with outstanding quality. Moreover,

speech improvements are instantly available to all end users and don't require additional

updates for devices.

Note

To hear example Amazon Polly voices in your browser, see the Amazon Polly product

overview.

Are you a ﬁrst-time user?

If you're a ﬁrst-time user of Amazon Polly, we recommend that you read the following sections in

the listed order:

1. How Amazon Polly works – This section introduces various Amazon Polly inputs and options

that you can work with in order to create a simple experience.

2. Getting started with Amazon Polly – In this section, you set up your account and test Amazon

Polly speech synthesis.

3. Example applications – This section provides additional examples that you can use to explore

Amazon Polly.

Are you a ﬁrst-time user? 2

Amazon Polly Developer Guide

How Amazon Polly works

Amazon Polly converts input text into life-like speech. To use an Amazon Polly voice, choose a

voice engine, call a speech synthesis method, provide the text that you want to synthesize, then

specify an audio output format. Amazon Polly then synthesizes the provided text into a high-

quality speech audio stream.

• Input text – Provide the text that you want to synthesize, and Amazon Polly returns an audio

stream. You can provide the input as plaintext or in Speech Synthesis Markup Language (SSML)

format. With SSML you can control various aspects of speech, such as pronunciation, volume,

pitch, and speech rate. For more information, see Generating speech from SSML documents.

• Available voices – Amazon Polly provides a portfolio of languages and a variety of voices,

including a bilingual voice (for both English and Hindi). For most languages you can choose from

several voices, both male and female. When launching a speech synthesis task, you specify the

voice ID, and then Amazon Polly uses this voice to convert the text to speech. Amazon Polly is

not a translation service—the synthesized speech is in the same language as the text. Numbers

represented as digits (for example, 53, not ﬁfty-three) are synthesized in the language of the

voice and not the text. For more information, see Voices in Amazon Polly.

• Output format – Amazon Polly can deliver the synthesized speech in multiple formats. You can

select the audio format that suits your needs. For example, you might request the speech in

the MP3 or Ogg Vorbis format for consumption by web and mobile applications. Or, you might

request the PCM output format for consumption by AWS IoT devices and telephony solutions.

Note

To hear example Amazon Polly voices in your browser, see the Amazon Polly product

overview.

Are you a ﬁrst-time user?

If you're new to Amazon Polly, we recommend that you read the following topics in order:

• Getting started with Amazon Polly

• Example applications

Are you a ﬁrst-time user? 3

Amazon Polly Developer Guide

• Quotas in Amazon Polly

Are you a ﬁrst-time user? 4

Amazon Polly Developer Guide

Getting started with Amazon Polly

Amazon Polly provides several API operations that you can easily integrate with your existing

applications. For a list of supported operations, see Actions. You can use either of the following

options:

• AWS SDKs – When using the SDKs, your requests to Amazon Polly are automatically signed and

authenticated using the credentials you provide. This is the recommended choice for building

your applications.

• AWS CLI – You can use the AWS CLI to use Amazon Polly without writing any code.

The following sections describe how to get started using Amazon Polly.

Topics

• Setting up Amazon Polly

• Using Amazon Polly on the console

• Using Amazon Polly on the AWS CLI

• Python examples

Setting up Amazon Polly

Before you use Amazon Polly for the ﬁrst time, you must sign up for AWS. When you sign up for

Amazon Web Services (AWS), your AWS account is automatically signed up for all services in AWS,

including Amazon Polly. You're charged only for the services and resources that you use. If you're a

new AWS customer, you can get started with Amazon Polly with no charge. For more information,

see AWS Free Usage Tier.

If you already have an AWS account, you can move on to either of the following activities:

• Using Amazon Polly on the console

• Using Amazon Polly on the AWS CLI

If you do not have an AWS account, complete the following steps to create one.

Setting up Amazon Polly 5

Amazon Polly Developer Guide

To sign up for an AWS account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a veriﬁcation code

on the phone keypad.

When you sign up for an AWS account, an AWS account root user is created. The root user

has access to all AWS services and resources in the account. As a security best practice, assign

administrative access to a user, and use only the root user to perform tasks that require root

user access.

AWS sends you a conﬁrmation email after the sign-up process is complete. At any time, you can

view your current account activity and manage your account by going to https://aws.amazon.com/

and choosing My Account.

Create a user with administrative access

After you sign up for an AWS account, secure your AWS account root user, enable AWS IAM Identity

Center, and create an administrative user so that you don't use the root user for everyday tasks.

Secure your AWS account root user

1. Sign in to the AWS Management Console as the account owner by choosing Root user and

entering your AWS account email address. On the next page, enter your password.

For help signing in by using root user, see Signing in as the root user in the AWS Sign-In User

Guide.

2. Turn on multi-factor authentication (MFA) for your root user.

For instructions, see Enable a virtual MFA device for your AWS account root user (console) in

the IAM User Guide.

Create a user with administrative access

1. Enable IAM Identity Center.

Create a user with administrative access 6

Amazon Polly Developer Guide

For instructions, see Enabling AWS IAM Identity Center in the AWS IAM Identity Center User

Guide.

2. In IAM Identity Center, grant administrative access to a user.

For a tutorial about using the IAM Identity Center directory as your identity source, see

Conﬁgure user access with the default IAM Identity Center directory in the AWS IAM Identity

Center User Guide.

• To sign in with your IAM Identity Center user, use the sign-in URL that was sent to your email

address when you created the IAM Identity Center user.

For help signing in using an IAM Identity Center user, see Signing in to the AWS access portal in

the AWS Sign-In User Guide.

Assign access to additional users

1. In IAM Identity Center, create a permission set that follows the best practice of applying least-

privilege permissions.

For instructions, see Create a permission set in the AWS IAM Identity Center User Guide.

2. Assign users to a group, and then assign single sign-on access to the group.

For instructions, see Add groups in the AWS IAM Identity Center User Guide.

For more information about IAM, see the following:

• AWS Identity and Access Management (IAM)

• Getting started

• IAM User Guide

Note

Note your AWS account ID. You will need it in the next steps.

Create a user with administrative access 7

Amazon Polly Developer Guide

Using Amazon Polly on the console

From the Amazon Polly console, you can quickly start testing and using Amazon Polly's speech

synthesizing features. The Amazon Polly console supports synthesizing speech from either

plaintext or SSML input.

Topics

• Step 1.1: Synthesize speech quick start on the console

• Step 1.2: Synthesize speech with plaintext input on the console

Step 1.1: Synthesize speech quick start on the console

From the console, you can quickly test Amazon Polly speech synthesis for speech quality.

To listen to an Amazon Polly voice on the console

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Text-to-Speech tab. The text ﬁeld will load with example text so you can quickly

try out Amazon Polly.

3. Turn oﬀ SSML.

4. Under Engine, choose Generative, Long Form, Neural, or Standard.

5. Choose a language and AWS Region, then choose a voice. (If you select Neural for Engine, only

the languages and voices that support NTTS are available. All Standard and Long Form voices

are disabled.)

6. Choose Listen.

For more in-depth testing, see the following topics:

• Step 1.2: Synthesize speech with plaintext input on the console

• Using SSML on the console

• Applying lexicons on the console (Synthesize Speech)

Using Amazon Polly on the console 8

Amazon Polly Developer Guide

Step 1.2: Synthesize speech with plaintext input on the console

The following procedure synthesizes speech using plaintext input. (Note how "W3C" and the date

"10/3" (October 3) are synthesized.)

To synthesize speech using plaintext input on the console

1. After logging on to the Amazon Polly console, choose Try Amazon Polly. Then choose the

Text-to-Speech tab.

2. Turn oﬀ SSML.

3. Type or paste this text into the input box.

He was caught up in the game.

In the middle of the 10/3/2014 W3C meeting

he shouted, "Score!" quite loudly.

4. For Engine, choose Generative, Long Form, Neural, or Standard.

5. Choose a language and AWS Region, then choose a voice. (If you choose Neural for Engine,

only the languages and voices that support NTTS are available. All Standard and Long Form

voices are disabled.)

6. To listen to the speech immediately, choose Listen.

7. To save the speech to a ﬁle, do one of the following:

a. Choose Download.

b. To change to a diﬀerent ﬁle format, expand Additional settings, turn on Speech ﬁle

format settings, choose the ﬁle format that you want, and then choose Download.

For more in-depth examples, see the following topics:

• Applying lexicons on the console (Synthesize Speech)

• Using SSML on the console

Using Amazon Polly on the AWS CLI

You can perform almost all of the same operations on the Amazon Polly console and the AWS CLI.

However, you can't listen to synthesized speech on the AWS CLI. To work with audio on the AWS

CLI, save your text to a ﬁle. Then open the ﬁle in an audio application of your choice.

Step 1.2: Synthesize speech with plaintext input on the console 9

Amazon Polly Developer Guide

Topics

• Step 2.1: Set up the AWS CLI

• Step 2.2: Getting started exercise using the AWS CLI

Step 2.1: Set up the AWS CLI

Follow these steps to download and conﬁgure the AWS CLI to work with Amazon Polly.

Important

You don't need the AWS CLI to perform the steps in this exercise. However, some of the

exercises in this guide use the AWS CLI. You can skip this step and go to Step 2.2: Getting

started exercise using the AWS CLI, and then set up the AWS CLI later when you need it.

Set up the AWS CLI

To set up the AWS Command Line Interface

1. Download and conﬁgure the AWS CLI. For instructions, see the following topics in the AWS

Command Line Interface User Guide:

• Getting Set Up with the AWS Command Line Interface

• Conﬁguring the AWS Command Line Interface

2. Add a named proﬁle for the administrator user in the AWS CLI AWS Conﬁg ﬁle. You can use

this proﬁle when running the AWS CLI commands. For more information about named proﬁles,

see Named Proﬁles in the AWS Command Line Interface User Guide.

[profile adminuser]

aws_access_key_id = adminuser access key ID

aws_secret_access_key = adminuser secret access key

region = aws-region

For a list of available AWS Regions and those supported by Amazon Polly, see Regions and

Endpoints in the Amazon Web Services General Reference.

Step 2.1: Set up the AWS CLI 10

Amazon Polly Developer Guide

Note

If you're using a Region supported by Amazon Polly that you speciﬁed when you

conﬁgured the AWS CLI, omit the following line from the AWS CLI code examples.

--region aws-region

3. Verify the setup by typing the following help command at the command prompt.

aws help

A list of valid AWS commands should appear in the AWS CLI window.

Activate Amazon Polly from the AWS CLI

If you've previously downloaded and conﬁgured the AWS CLI, Amazon Polly may be unavailable

unless you reconﬁgure the AWS CLI. The following procedure checks to see if this is necessary.

To activate Amazon Polly from the AWS CLI

1. Verify the availability of Amazon Polly by typing the following help command at the AWS CLI

command prompt.

aws polly help

If you see a description of Amazon Polly and a list of valid commands appears in the AWS CLI

window, you can use Amazon Polly from the AWS CLI immediately. In this case, you can skip

the rest of this procedure. If this is not displayed, continue with Step 2.

2. Activate Amazon Polly using one of the two following options:

a. Uninstall and reinstall the AWS CLI.

For instructions, see Installing the AWS Command Line Interface in the AWS Command

Line Interface User Guide.

Step 2.1: Set up the AWS CLI 11

Amazon Polly Developer Guide

b. Download the ﬁle service-2.json.

At the command prompt, run the following command.

aws configure add-model --service-model file://service-2.json --service-name

polly

3. Reverify the availability of Amazon Polly.

aws polly help

The description of Amazon Polly should be visible.

Set up a voice engine from the AWS CLI

From the AWS CLI, the engine parameter is optional, with four possible values: generative,

long-form, neural, and standard. For example, if you use the following code to run the

start-speech-synthesis-task AWS CLI command in the US West-2 (Oregon) region:

aws polly start-speech-synthesis-task \

--engine neural

--region us-west-2 \

--endpoint-url "https://polly.us-west-1.amazonaws.com/" \

--output-format mp3 \

--output-s3-bucket-name your-bucket-name \

--output-s3-key-prefix optional/prefix/path/file \

--voice-id Joanna \

--text file://text_file.txt

The output will resemble the following:

"SynthesisTask":

{

"CreationTime": [..],

"Engine": "neural",

"OutputFormat": "mp3",

"OutputUri": "https://s3.us-west-1.amazonaws.com/your-bucket-name/optional/prefix/

path/file.<task_id>.mp3",

"TextType": "text",

Step 2.1: Set up the AWS CLI 12

Amazon Polly Developer Guide

"RequestCharacters": [..],

"TaskStatus": "scheduled",

"TaskId": [task_id],

"VoiceId": "Joanna"

}

Step 2.2: Getting started exercise using the AWS CLI

If you've already set up the AWS CLI, you can test the speech synthesis oﬀered by Amazon Polly. In

this exercise, you call the SynthesizeSpeech operation by passing input text. You can save the

resulting audio as a ﬁle and verify its content.

Run the synthesize-speech AWS CLI command to synthesize sample text to an audio ﬁle

(hello.mp3).

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use

full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \

--output-format mp3 \

--voice-id Joanna \

--text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last

year.' \

hello.mp3

In the call to synthesize-speech, you provide sample text to be synthesized by a voice

of your choice. You must provide a voice ID (explained in the following step) and an output

format. The command saves the resulting audio to the hello.mp3 ﬁle. In addition to the MP3

ﬁle, the operation sends the following output to the console.

{

"ContentType": "audio/mpeg",

"RequestCharacters": "71"

}

Play the resulting hello.mp3 ﬁle to verify the synthesized speech.

Get the list of available voices by using the DescribeVoices operation. Run the following

describe-voices AWS CLI command.

Step 2.2: Getting started exercise using the AWS CLI 13

Amazon Polly Developer Guide

aws polly describe-voices

In response, Amazon Polly returns the list of all available voices. For each voice, the response

provides the following metadata: voice ID, language code, language name, and the gender of

the voice. The following is a sample response.

{

"Voices": [

{

"Gender": "Female",

"Name": "Salli",

"LanguageName": "US English",

"Id": "Salli",

"LanguageCode": "en-US",

"SupportedEngines": [

"neural",

"standard",

"generative"

]

{

"Gender": "Female",

"Name": "Danielle",

"LanguageName": "US English",

"Id": "Danielle",

"LanguageCode": "en-US",

"SupportedEngines": [

"long-form"

]

}

]

}

Optionally, you can specify the language code to ﬁnd the available voices for a speciﬁc

language. Amazon Polly supports dozens of voices. The following example lists all the voices

for Brazilian Portuguese.

aws polly describe-voices \

--language-code pt-BR

Step 2.2: Getting started exercise using the AWS CLI 14

Amazon Polly Developer Guide

For a list of language codes, see Languages in Amazon Polly. These language codes are

W3C language identiﬁcation tags (ISO 639 code for the language name-ISO 3166

country code). For example, en-US (US English), en-GB (British English), and es-ES (Spanish),

etc. You can also use the help option in the AWS CLI to get the list of language codes:

aws polly describe-voices help

Python examples

This guide provides a few Python code examples that use AWS SDK for Python (Boto) to make API

calls to Amazon Polly. We recommend that you set up Python and test the example code provided

in the following section. For additional examples, see Example applications.

Set up Python and test an example (SDK)

To test the Python example code, you need the AWS SDK for Python (Boto). For instruction, see

AWS SDK for Python (Boto3).

To test the example Python code

The following Python code example performs the following actions:

•

Invokes the AWS SDK for Python (Boto) to send a SynthesizeSpeech request to Amazon Polly

(by providing some text as input).

•

Accesses the resulting audio stream in the response and saves the audio to a ﬁle (speech.mp3)

on your local disk.

• Plays the audio ﬁle with the default audio player for your local system.

Save the code to a ﬁle (example.py) and run it.

"""Getting Started Example for Python 2.7+/3.3+"""

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError

from contextlib import closing

import os

import sys

import subprocess

Python examples 15

Amazon Polly Developer Guide

from tempfile import gettempdir

# Create a client using the credentials and region defined in the [adminuser]

# section of the AWS credentials file (~/.aws/credentials).

session = Session(profile_name="adminuser")

polly = session.client("polly")

try:

# Request speech synthesis

response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3",

VoiceId="Joanna")

except (BotoCoreError, ClientError) as error:

# The service returned an error, exit gracefully

print(error)

sys.exit(-1)

# Access the audio stream from the response

if "AudioStream" in response:

# Note: Closing the stream is important because the service throttles on the

# number of parallel connections. Here we are using contextlib.closing to

# ensure the close method of the stream object will be called automatically

# at the end of the with statement's scope.

with closing(response["AudioStream"]) as stream:

output = os.path.join(gettempdir(), "speech.mp3")

try:

# Open a file for writing the output as a binary stream

with open(output, "wb") as file:

file.write(stream.read())

except IOError as error:

# Could not write to file, exit gracefully

print(error)

sys.exit(-1)

else:

# The response didn't contain audio data, exit gracefully

print("Could not stream audio")

sys.exit(-1)

# Play the audio using the platform's default player

if sys.platform == "win32":

os.startfile(output)

else:

# The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).

Set up Python and test an example (SDK) 16

Amazon Polly Developer Guide

opener = "open" if sys.platform == "darwin" else "xdg-open"

subprocess.call([opener, output])

For additional examples including an example application, see Example applications.

Set up Python and test an example (SDK) 17

Amazon Polly Developer Guide

Voices in Amazon Polly

Amazon Polly provides dozens of lifelike voices and support for a variety of languages. Each voice

is created using native language speakers, so there are variations from voice to voice, even within

the same language. You can also use the AWS Management Console to test each voice with text of

your choice. For most languages, there will be at least one male and one female voice, and often

more than one of each. A few languages only have a single voice.

Note

To hear example Amazon Polly voices in your browser, see the Amazon Polly product

overview.

Topics

• Listening to voices

• Available voices

• Voice speed

• Bilingual voices

• Newscaster voices

Listening to voices

Once you have set up Amazon Polly, you can test voices using custom text on the console.

To listen to Amazon Polly voices on the console

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Text-to-Speech tab.

3. For Engine, choose Generative, Long Form, Neural, or Standard.

4. Select a language and a Region. Then choose a voice.

5. Enter text for the voice to speak or use the default phrase, and then choose Listen.

Listening to voices 18

Amazon Polly Developer Guide

Note

The inventory of voices and the number of languages included is continually being updated

to include additional choices. To suggest a new language or voice, provide feedback on

this page. Unfortunately, we are not able to comment on plans for speciﬁc new languages

before they are released.

Available voices

Amazon Polly provides a variety of lifelike voices in multiple languages for synthesizing speech

from text. The following table shows all the voices that Amazon Polly oﬀers.

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

1 Arabic arb Zeina Female No No No Yes

2 Arabic

(Gulf)

ar-AE Hala*

Zayd*

Female

Male

Yes

3 Dutch

(Belgian)

nl-BE Lisa Female No No Yes No

4 Catalan ca-ES Arlet Female No No Yes No

5 Czech cs-CZ Jitka Female No No Yes No

6 Chinese

(Cantones

yue-CN Hiujin Female No No Yes No

7 Chinese

(Mandarin

)

cmn-CN Zhiyu Female No No Yes Yes

Available voices 19

Amazon Polly Developer Guide

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

8 Danish da-DK Naja

Mads

Soﬁe

Female

Male

Female

Yes

9 Dutch nl-NL Laura

Lotte

Ruben

Female

Male

Yes

10 English

(Australi

an)

en-AU Nicole

Olivia

Russell

Female

Male

Yes

11 English

(British)

en-GB Amy**

Emma

Brian

Arthur

Female

Male

Yes

12 English

(Indian)

en-IN Aditi*

Raveena

Kajal*

Female

Yes

13 English

(Ireland)

en-IE Niamh Female No No Yes No

Available voices 20

Amazon Polly Developer Guide

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

14 English

(New

Zealand)

en-NZ Aria Female No No Yes No

15 English

(South

African)

en-ZA Ayanda Female No No Yes No

16 English

(US)

en-US Danielle

Gregory

Ivy

Joanna**

Kendra

Kimberly

Salli

Joey

Justin

Kevin

Matthew**

Ruth

Stephen

Female

Male

Female(child)

Female

Male

(child)

Male

(child)

Male

Female

Male

Yes

Available voices 21

Amazon Polly Developer Guide

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

17 English

(Welsh)

en-GB-

WLS

Geraint Male No No No Yes

18 Finnish ﬁ-FI Suvi Female No No Yes No

19 French fr-FR Céline/

Celine

Léa

Mathieu

Rémi

Female

Male

Yes

20 French

(Belgian)

fr-BE Isabelle Female No No Yes No

21 French

(Canadian

)

fr-CA Chantal

Gabrielle

Liam

Female

Male

Yes

22 German de-DE Marlene

Vicki

Hans

Daniel

Female

Male

Yes

23 German

(Austrian

)

de-AT Hannah Female No No Yes No

Available voices 22

Amazon Polly Developer Guide

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

24 German

(Swiss)

de-CH Sabrina Female No No Yes No

25 Hindi hi-IN Aditi*

Kajal*

Female

Yes

26 Icelandic is-IS Dóra/

Dora

Karl

Female

Male

Yes

27 Italian it-IT Carla

Bianca

Giorgio

Adriano

Female

Male

Yes

28 Japanese ja-JP Mizuki

Takumi

Kazuha

Tomoko

Female

Male

Female

Yes

29 Korean ko-KR Seoyeon Female No No Yes Yes

30 Norwegiannb-NO Liv

Ida

Female

Yes

Available voices 23

Amazon Polly Developer Guide

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

31 Polish pl-PL Ewa

Maja

Jacek

Jan

Ola

Female

Male

Female

Yes

32 Portugues

(Brazilia

pt-BR Camila

Vitória/

Vitoria

Ricardo

Thiago

Female

Male

Yes

33 Portugues

(European

)

pt-PT Inês/

Ines

Cristiano

Female

Male

Yes

34 Romanian ro-RO Carmen Female No No No Yes

35 Russian ru-RU Tatyana

Maxim

Female

Male

Yes

Available voices 24

Amazon Polly Developer Guide

 Language

and

language

variants

Language

code

Name/

Gender Generativ

e voice

Long

Form

voice

Neural

voice

Standard

voice

36 Spanish

(European

)

es-ES Conchita

Lucia

Enrique

Sergio

Female

Male

Yes

37 Spanish

(Mexican)

es-MX Mia

Andrés

Female

Male

Yes

38 Spanish

(US)

es-US Lupe**

Penélope/

Penelope

Miguel

Pedro

Female

Male

Yes

39 Swedish sv-SE Astrid

Elin

Female

Yes

40 Turkish tr-TR Filiz

Burcu

Female

Yes

41 Welsh cy-GB Gwyneth Female No No No Yes

* This voice is bilingual. For more information, see Bilingual voices.

** These voices can be used with Newscaster speaking styles when used with the Neural format. For

more information, see Newscaster voices.

Available voices 25

Amazon Polly Developer Guide

Each Amazon Polly voice engine has unique features. Learn more about features and Region

availability for the voice engines oﬀered by Amazon Polly:

• Generative voices

• Long-form voices

• Neural voices

• Standard voices

Brand voices

In addition to the available voices listed in the previous table, you can use Amazon Polly to build a

custom voice for your brand persona. With a brand voice, you can oﬀer unique and exclusive voices

to your customers. To learn more about Amazon Polly brand voices, see Brand Voice.

Voice speed

Because of the natural variation between voices, each available voice speaks at slightly diﬀerent

speeds. For instance, with US English voices, Ivy and Joanna are slightly faster than Matthew, and

considerably faster than Joey. Since there is so much variation between voices, there is no standard

speed (words per minute) available for Amazon Polly voices. However, you can ﬁnd how long it

takes for your voice to say the selected text using Speech Marks.

To time the length of a spoken text passage

1. Open the AWS CLI.

2. Run the following code, ﬁlling in as needed.

aws polly synthesize-speech \

--language-code optional language code if needed

--output-format json \

--voice-id [name of desired voice] \

--text '[desired text]' \

--speech-mark-types='["viseme"]' \

LengthOfText.txt

Open LengthOfText.txt.

If the text were "Mary had a little lamb," the last few lines returned by Amazon Polly would be:

Brand voices 26

Amazon Polly Developer Guide

{"time":882,"type":"viseme","value":"t"}

{"time":964,"type":"viseme","value":"a"}

{"time":1082,"type":"viseme","value":"p"}

The last viseme, essentially the sound for the ﬁnal letters in "lamb" starts 1082 milliseconds after

the beginning of the speech. While this is not exactly the length of the audio, it's close and can

serve as the basis for comparison between voices.

Changing your voice speed

For certain applications, you may ﬁnd that you'd prefer the voice you like be slowed down, or

speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this

using SSML tags. For example, if your organization was making an application that reads books

to immigrant audiences, you may want to vary the voice speed. Your audience may speak English,

but their ﬂuency is limited. Amazon Polly helps you slow down the rate of speech using the SSML

<prosody> tag.

You can use a percentage:

<speak>

In some cases, it might help your audience to <prosody rate="85%">slow

the speaking rate slightly to aid in comprehension.</prosody>

</speak>

Or a preset speed:

<speak>

In some cases, it might help your audience to <prosody rate="slow">slow

the speaking rate slightly to aid in comprehension.</prosody>

</speak>

Two speed options are available to you when using SSML with Amazon Polly:

•

Preset speeds: x-slow, slow, medium, fast, and x-fast. In these cases, the speed of each

option is approximate, depending on your preferred voice. The medium option is the normal

speed of the voice.

• n% of speech rate: any percentage of the speech rate, between 20% and 200% can be used. In

these cases, you can choose exactly the speed you want. However, the actual speed of the voice

Changing your voice speed 27

Amazon Polly Developer Guide

is approximate, depending on the voice you've chosen. 100% is considered to be the normal

speed of the voice.

Note

Test your selected voice at various speeds. The speed of each option is approximate and

depends on the voice you choose.

For more information on using the prosody tag, see Controlling volume, speaking rate, and pitch .

Bilingual voices

Amazon Polly has two ways of producing bilingual voices:

• Accented bilingual voices

• Fully bilingual voices

Accented bilingual voices

Accented bilingual voices can be created using any Amazon Polly voice, but only when using SSML

tags.

Normally, all words in the input text are spoken in the default language of the voice speciﬁed

you're using.

For example, if you're using the voice of Joanna (who speaks US English), Amazon Polly speaks the

following in the Joanna voice without a French accent:

<speak>

Why didn't she just say, 'Je ne parle pas français?'

</speak>

In this case, the words Je ne parle pas français are spoken as they would be if they were English.

However, if you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the

Joanna voice in American-accented French:

Bilingual voices 28

Amazon Polly Developer Guide

<speak>

Why didn't she just say, <lang xml:lang="fr-FR">'Je ne parle pas français?'</

lang>.

</speak>

Because Joanna is not a native French voice, pronunciation is based on her native language, US

English. For instance, although perfect French pronunciation features an uvual trill /R/ in the word

français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.

If you use the voice of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the

sentence in Giorgio's voice with an Italian pronunciation:

<speak>

Mi piace Bruce Springsteen.

</speak>

Fully bilingual voices

A fully bilingual voice like Aditi or Kajal (Indian English and Hindi) can speak two languages

ﬂuently. This gives you the ability to use words and phrases from both languages in a single text

using the same voice.

Currently, Aditi, Kajal, Hala, and Zayd are the only fully bilingual voices available.

Using a Bilingual Voice (example: Aditi)

Aditi speaks both Indian English (en-IN) and Hindi (hi-IN) ﬂuently. You can synthesize speech in

both English and Hindi, and the voice can switch between the two languages even within the same

sentence.

Hindi can be used in two diﬀerent forms:

• Devanagari:

"उसेन

कहँा,

खेल

तोह

अब

शुूर

होगा"

• Romanagari (using the Latin alphabet): "Usne kahan, khel toh ab shuru hoga"

Additionally, it's possible to mix English and Hindi of either or both forms within a single sentence:

• Devanagari + English: "This is the song

कभी

अदिति"

• Romanagari + English: "This is the song from the movie Jaane Tu Ya Jaane Na."

Fully bilingual voices 29

Amazon Polly Developer Guide

• Devanagari + Romanagari + English: "This is the song

कभी

अदिति

from the movie Jaane Tu Ya

Jaane Na."

Because Aditi is a bilingual voice, text in all of these cases will be read correctly, as Amazon Polly

can diﬀerentiate between the languages and scripts.

Amazon Polly also supports numbers, dates, times, and currency expansion in both English (Arabic

numerals) and Hindi (Devanagari numerals). By default, Arabic numerals are read in Indian English.

To make Amazon Polly read them in Hindi, you must use the hi-IN language code parameter.

Newscaster voices

People use diﬀerent speaking styles, depending on context. Casual conversation, for example,

sounds very diﬀerent from a TV or radio newscast. Because of the way standard voices are made,

they can't produce diﬀerent speaking styles. However, neural voices can. They can be trained for a

speciﬁc speaking style, with the variations and emphasis on certain parts of speech inherent in that

style.

In addition to the default neural voices, Amazon Polly provides a newscaster speaking style that

uses the neural system to generate speech in the style of a TV or radio newscaster. The Newscaster

style is available with the Matthew and Joanna voices in US English (en-US), the Lupe voice in US

Spanish (es-US), and the Amy voice in British English (en-GB).

To use the Newscaster style, ﬁrst choose the neural engine and then use the syntax described in

the following steps in your input text.

Note

• To use any neural speaking style, you must use one of the AWS Regions that support

neural voices. This option is not available in all Regions. For more information, see

Feature and region compatibility.

To apply the Newscaster style (console)

1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.

2. Make sure that you are using an AWS Region where neural voices are supported.

3. On the Text-to-Speech page, for Engine, choose Neural.

Newscaster voices 30

Amazon Polly Developer Guide

4. Choose the language and voice you want to use. Only Matthew and Joanna for US English

(en-US), Lupe for US Spanish (es-US), and Amy for British English (en-GB) are available in the

newscaster voice.

5. Turn on SSML.

6. Add input text to your text-to-speech request using the Newscaster style SSML syntax.

<amazon:domain name="news">text</amazon:domain>

For example, you might use the newscaster tag as follows:

<speak>

<amazon:domain name="news">

From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:

The maiden voyage of the White Star liner Titanic, the largest ship ever launched

ended in disaster.

The Titanic started her trip from Southampton for New York on Wednesday. Late on

Sunday night she struck an iceberg off the Grand Banks of Newfoundland. By

wireless telegraphy she sent out signals of distress, and several liners were

near enough to catch and respond to the call.

</amazon:domain>

</speak>

7. Choose Listen.

To apply the Newscaster style (CLI)

In your API request, include the engine parameter with the neural value:

--engine neural

2. Add input text to your API request using the Newscaster style SSML syntax.

<amazon:domain name="news">text</amazon:domain>

For example, you might use the newscaster tag as follows:

<speak>

Newscaster voices 31

Amazon Polly Developer Guide

<amazon:domain name="news">

From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:

The maiden voyage of the White Star liner Titanic, the largest ship ever launched

ended in disaster.

The Titanic started her trip from Southampton for New York on Wednesday. Late on

Sunday night she struck an iceberg off the Grand Banks of Newfoundland. By

wireless telegraphy she sent out signals of distress, and several liners were

near enough to catch and respond to the call.

</amazon:domain>

</speak>

For more information about SSML, see Supported SSML tags.

Newscaster voices 32

Amazon Polly Developer Guide

Languages in Amazon Polly

The following languages are supported by Amazon Polly and can be used to synthesize speech.

Each language has a unique language code. These language codes are W3C language identiﬁcation

tags (ISO 639-3 for the language name and ISO 3166 for the country code).

Select a language from the following table for details on the phonemes and visemes that Amazon

Polly provides.

 Language Language code

1 Arabic arb

2 Arabic (Gulf) ar-AE

3 Catalan ca-ES

4 Chinese (Cantonese) yue-CN

5 Chinese (Mandarin) cmn-CN

6 Danish da-DK

7 Dutch (Belgian) nl-BE

8 Dutch nl-NL

9 English (Australian) en-AU

10 English (British) en-GB

11 English (Indian) en-IN

12 English (New Zealand) en-NZ

13 English (South African) en-ZA

14 English (US) en-US

15 English (Welsh) en-GB-WLS

Amazon Polly Developer Guide

 Language Language code

16 Finnish ﬁ-FI

17 French fr-FR

18 French (Belgian) fr-BE

19 French (Canadian) fr-CA

20 Hindi hi-IN

21 German de-DE

22 German (Austrian) de-AT

23 Icelandic is-IS

24 Italian it-IT

25 Japanese ja-JP

26 Korean ko-KR

27 Norwegian nb-NO

28 Polish pl-PL

29 Portuguese (Brazilian) pt-BR

30 Portuguese (European) pt-PT

31 Romanian ro-RO

32 Russian ru-RU

33 Spanish (European) es-ES

34 Spanish (Mexican) es-MX

35 Spanish (US) es-US

Amazon Polly Developer Guide

 Language Language code

36 Swedish sv-SE

37 Turkish tr-TR

38 Welsh cy-GB

For more information, see Phoneme and Viseme Tables for Supported Languages.

Phoneme and Viseme Tables for Supported Languages

The following tables list the phonemes for the languages supported by Amazon Polly, along with

examples and the corresponding visemes.

Topics

• Arabic (arb)

• Arabic (Gulf) (ar-AE)

• Catalan (ca-ES)

• Chinese (Cantonese) (yue-CN)

• Chinese (Mandarin) (cmn-CN)

• Danish (da-DK)

• Dutch (Belgian) (nl-BE)

• Dutch (nl-NL)

• English (US) (en-US)

• English (Australian) (en-AU)

• English (British) (en-GB)

• English (Indian) (en-IN)

• English (Ireland) (en-IE)

• English (New Zealand) (en-NZ)

• English (South African) (en-ZA)

Phoneme and Viseme Tables for Supported Languages 35

Amazon Polly Developer Guide

• English (Welsh) (en-GB-WLS)

• Finnish (ﬁ-FI)

• French (fr-FR)

• French (Belgian) (fr-BE)

• French (Canadian) (fr-CA)

• German (de-DE)

• German (Austrian) (de-AT)

• Hindi (hi-IN)

• Icelandic (is-IS)

• Italian (it-IT)

• Japanese (ja-JP)

• Korean (ko-KR)

• Norwegian (nb-NO)

• Polish (pl-PL)

• Portuguese (pt-PT)

• Portuguese (Brazilian) (pt-BR)

• Romanian (ro-RO)

• Russian (ru-RU)

• Spanish (es-ES)

• Spanish (Mexican) (es-MX)

• Spanish (US) (es-US)

• Swedish (sv-SE)

• Turkish (tr-TR)

• Welsh (cy-GB)

Arabic (arb)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Arabic voice of Zeina that is supported by Amazon Polly.

Arabic (arb) 36

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

? glottal stop

انَأ

?\ voiced pharyngeal

fricative

رَمُع

b b voiced bilabial

plosive

دَلَب

d d voiced alveolar

plosive

يراد

dˤ

d_?\ emphatic voiced

alveolar plosive

ءوَض

d͡ʒ

dZ voiced postalveo

lar aﬀricate

ليمَج

ð D voiced dental

fricative

َكِلذ

ðˤ

D_?\ emphatic voiced

dental fricative

مالَظ

f f voiceless labiodent

al fricative

لصَف

g voiced velar

plosive

ارتلجنإ

G voiced velar

fricative

برَغ

h h voiceless glottal

fricative

اذه

Arabic (arb) 37

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

j j palatal approxima

يشمَي

k k voiceless velar

plosive

بلَك

l l alveolar lateral

approximant

ىقال

lˠ

l_G emphatic alveolar

lateral approxima

هللادبع

m m bilabial nasal

اذام

n n alveolar nasal

رون

p p voiceless bilabial

plosive

سبَح

q q voiceless uvular

plosive

بيرَق

r r alveolar trill

لمَر

s s voiceless alveolar

fricative

لاؤُس

sˤ

s_?\ emphatic voiceless

alveolar fricative

بِحاص

S voiceless postalveo

lar fricative

ركُش

t t voiceless alveolar

plosive

رمَت

Arabic (arb) 38

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

tˤ

t_?\ emphatic voiceless

alveolar plosive

بِلاط

θ T voiceless dental

fricative

ثالَث

v v voiced labiodental

fricative

نيماتيف

w w labio-velar

approximant

دَلَو

x x voiceless velar

fricative

فْوَخ

ħ X\ voiceless

pharyngeal

fricative

َلْوَح

z z voiced alveolar

fricative

روهُز

Vowels

a a open front

unrounded vowel

درَب

aː

a: long open front

unrounded vowel

راد

ɑˤ

A_?\ emphatic open

back unrounded

vowel

لبَط

ɑˤː

A_?\: emphatic long

open back

unrounded vowel

مِلاظ

Arabic (arb) 39

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

u u close back

rounded vowel

برُش

u: u: long close back

rounded vowel

روس

uˤ

u_?\ emphatic close

back rounded

vowel

ّدُب

uˤː

u_?\: emphatic long

close back

rounded vowel

لوط

i i close front

unrounded vowel

تنِب

iː

i: long close front

unrounded vowel

نيزَح

iˤ

i_?\ emphatic close

front unrounded

vowel

ّدِض

iˤː

i_?\: emphatic long

close front

unrounded vowel

يضام

e e close-mid front

unrounded vowel

تكرام

eː

e: long close-mid

front unrounded

vowel

ليدوم

O open-mid back

rounded vowel

يجولونكت

Arabic (arb) 40

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɔː

O: long open-mid

back rounded

vowel

نويزفيلت

Arabic (Gulf) (ar-AE)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Arabic voice of Hala that is supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Pronunciation Viseme

Consonants

b b voiced bilabial

plosive

دلب

/ " b a . l a d / b

d d voiced alveolar

plosive

در

/ " r a d d / d

dˤ

d_?\ pharyngea

lised voiced

alveolar

plosive

ءوض

/ " d_?\ a w ? / D

f f voiceless

labiodental

fricative

نرف

/ " f I . r I n / f

g g voiced velar

plosive

لاق

/ " g a: l / k

j j voiced palatal

approximant

يشمي

/ " j I m . S i: / i

Arabic (Gulf) (ar-AE) 41

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Pronunciation Viseme

k k voiceless velar

plosive

لماك

/ " k a: . m i l / k

l l voiced alveolar

lateral

approximant

ليل

/ " l e: l / t

lˤ

I_G pharyngea

lised voiced

alveolar lateral

approximant

هللادبع

/ ?\ a b . " d

A_?\ l_G . l_G

A_?\ /

m m bilabial nasal

stop

ةئم

/ " m I j . j a / p

n n alveolar nasal

stop

رون

/ " n u: r / t

p p voiceless

bilabial plosive

اربوأ

/ " ? O . p e . r

a: /

q q voiceless

uvular plosive

رصق

/ " q A_?\ s_?\

r /

r r alveolar trill

لمر

/ " r a . m I l / r

s s voiceless

alveolar

fricative

مسمس

/ " s I m . s I

m /

sˤ

s_?\ pharyngea

lised voiceless

alveolar

fricative

بحاص

/ " s_?\ A_?: . X

\ I b /

Arabic (Gulf) (ar-AE) 42

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Pronunciation Viseme

t t voiceless

alveolar

plosive

رمت

/ "t a . m a r / t

tˤ

t_?\ pharyngea

lised voiceless

alveolar

fricative

بلاط

/ " t_?\ A_?: . l I

b /

v v voiced

labiodental

fricative

نيماتيف

/ v i: . t A . " m

i: n /

w w voiced

labiovelar

approximant

دياو

/ " w a: . j I d / u

x x voiceless velar

fricative

فورخ

/ x a . " r u: f / k

z z voiceless velar

fricative

روهز

/ " z h u: r / s

ð D voiced

interdental

fricative

كلذ

/ " D a: . l I k / D

ðˤ

D_?\ pharyngea

lised voiced

interdental

fricative

مالظ

/ D_?\ A_?\ . " l

a: m /

ħ X\ voiceless

pharyngeal

fricative

نيجلا

/ ? a l . " X\ i:

n /

Arabic (Gulf) (ar-AE) 43

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Pronunciation Viseme

ŋ N velar nasal

stop

غنوك

غنوه

/ h O N . " k O

N g /

G voiced velar

fricative

ةبيرغ

/ G I . " r i: . b

a /

S voiceless

postalveolar

fricative

سمش

/ " S a m s / S

Z voiced

postalveolar

fricative

تيكاج

/ Z a . " k e: t / S

? glottal stop

ةسسؤم

/ m u . " ? a s .

s a . s a /

?\ voiced

pharyngeal

fricative

ماع

/ " ?\ a: m m / k

dZ voiced

postalveolar

aﬀricate

ةعماج

/ " dZ a: m . ?\

a /

θ T voiced

interdental

fricative

ةثالث

/ T a . " l a: . T

a /

h voiced glottal

fricative

لاله

/ " h l a: l / k

Vowels

Arabic (Gulf) (ar-AE) 44

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Pronunciation Viseme

æ a mid-open

front

unrounded

short vowel

رفس

/ " s a . f a r / a

ɑˤ

A_?\ pharyngea

lised open

back

unrounded

short vowel

بلص

/ " s_?\ A_?\ l

b /

æː

a: mid-open

front

unrounded

long vowel

باب

/ " b a: b / a

ɑˤː

A_?\: pharyngea

lised open

back

unrounded

long vowel

جضان

/ " n A_?: . D_?

\ i_?\ dZ /

a A open central

unrounded

short vowel

wiﬁ / " w A j . f A j / a

i i tense

close front

unrounded

short vowel

(MSA)

قاحسإ

/ ? i s . " X\ A_?

\: q /

I lax close front

unrounded

short vowel

تنب

/ " b I n t / i

Arabic (Gulf) (ar-AE) 45

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Pronunciation Viseme

iˤ

i_?\ pharyngea

lised close

front

unrounded

short vowel

لفط

/ " t_?\ i_?\ f I

l /

iː iː

close front

unrounded

long vowel

ليبس

/ s a . " b i: l / i

iˤː

i_?: pharyngea

lised close

front

unrounded

long vowel

بيطر

/ r A_?\ . " t_?\

i_?: b /

u u tense close

back rounded

short vowel

(MSA)

عرتخم

/ " m u x . t a .

r i ?\ /

U lax close back

rounded short

vowel

موسر

/ r U . " s u: m / u

uˤ

u_?\ pharyngea

lised close

back rounded

short vowel

روفصع

/ ?\ u_?\ s_?\ .

" f u: r /

u: u: close back

rounded long

vowel

توت

/ " t u: t / u

Arabic (Gulf) (ar-AE) 46

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Pronunciation Viseme

uˤː

u_?\: pharyngea

lised close

back rounded

long vowel

روص

/ " s_?\ u_?\:

r /

e e mid front

unrounded

short vowel

تِنْرَتْنِإ

/ " s e n t / e

e: e: mid front

unrounded

long vowel

شيإ

/ " ? e: S / e

O open-mid back

rounded short

vowel

رالود

/ d O . " l A r / O

ɔː

O: open-mid back

rounded long

vowel

نول

/ " l O: n / O

Catalan (ca-ES)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Catalan voice of Arlet that is supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

p p voiceless bilabial

plosive

ploure p

Catalan (ca-ES) 47

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t t voiceless alveolar

plosive

Tarragona t

k k voiceless velar

plosive

com k

b b voiced bilabial

plosive

bata p

d d voiced alveolar

plosive

endoll t

g g voiced velar

plosive

gros k

m m voiced bilabial

nasal

manera p

n n voiced alveolar

nasal

donar t

J voiced palatal

nasal

any J

ŋ N voiced velar nasal pingüí k

5 voiced velarized

alveolar lateral

approximant (dark

albercoc l

L voiced palatal

lateral approxima

llop J

r r voiced alveolar trill parra r

4 voiced alveolar tap para t

Catalan (ca-ES) 48

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

f f voiceless labiodent

al fricative

èmfasi f

s s voiceless alveolar

fricative

sac s

z z voiced alveolar

fricative

calzes s

S voiceless postalveo

lar fricative

guix S

Z voiced postalveo

lar fricative

col·legi S

t͡ʃ

tS voiceless postalveo

lar aﬀricate

cotxe S

d͡ʒ

dZ voiced postalveo

lar aﬀricate

platja S

β B voiced bilabial

approximant

obert B

ð D voiced dental

approximant

bedoll T

j j voiced palatal

approximant

noia i

G voiced velar

approximant

pega k

v v voiced labiodental

fricative

afgà f

w w voiced labiovelar

approximant

aigua u

Catalan (ca-ES) 49

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

x x voiceless velar

fricative

Jiménez k

j\ voiced palatal

fricative

yeso J

l l voiced alveolar

lateral approxima

alondra t

θ T voiceless dental

fricative

González T

Vowels

a a open back vowel casa a

e e close-mid front

unrounded vowel

llenya e

E open-mid front

unrounded vowel

xec E

i i closed front

unrounded vowel

visca i

o o close-mid back

rounded vowel

gos o

O open-mid back

rounded vowel

joc O

u u closed back

rounded vowel

un u

@ mid-central vowel casa @

Additional Symbols

Catalan (ca-ES) 50

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Chinese (Cantonese) (yue-CN)

The following table lists the Jyutping and International Phonetic Alphabet (IPA) phonemes for

the Cantonese voice that is supported by Amazon Polly. Jyutping is a romanization system of

Cantonese which is commonly used in academia and among Cantonese speakers. IPA and X-SAMPA

are not commonly used but are available for English support. The IPA and X-SAMPA symbols in the

table are for reference only and should not be used for Chinese transcription. Jyutping examples

and the corresponding visemes are also shown.

To make Amazon Polly use phonetic pronunciation with Jyutping, use the phoneme

alphabet="x-amazon-jyutping"tag.

The following examples show this with each standard.

Jyutping:

<speak>

## <phoneme alphabet="x-amazon-jyutping" ph="sing2">#</phoneme>#

## <phoneme alphabet="x-amazon-jyutping" ph="seng2">#</phoneme>#

</speak>

IPA:

<speak>

## <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>#

## <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>#

</speak>

X-SAMPA:

<speak>

Chinese (Cantonese) (yue-CN) 51

Amazon Polly Developer Guide

## <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>#

## <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>#

</speak>

Note

Amazon Polly accepts Cantonese input encoded in UTF-8 only.

Phoneme/Viseme Table

Jyutping IPA X-

SAMPA

Description Jyutping

Example

Viseme

Consonants

b p p voiceless bilabial plosive

巴,

baa1 p

tsʰ

ts_h aspirated voiceless

alveolar aﬀricate

叉,

caa1 s

d t t voiceless alveolar

plosive

打,

daa2 t

f f f voiceless labiodental

fricative

花,

faa1 f

g k k voiceless velar plosive

家,

gaa1 k

kʷ

k_w labialized voiceless velar

plosive

瓜,

gwaa1 u

h h h voiceless glottal

fricative

哈,

haa1 k

kʰ

k_h aspirated voiceless velar

plosive

卡,

kaa1 k

kʷʰ

k_wh labialized aspirated

voiceless velar plosive

誇,

kwaa1 u

Chinese (Cantonese) (yue-CN) 52

Amazon Polly Developer Guide

Jyutping IPA X-

SAMPA

Description Jyutping

Example

Viseme

l l l alveolar lateral

approximant

啦,

laa1 t

m m m bilabial nasal

媽,

maa1 p

m m m= syllabic bilabial nasal

唔,

m4 p

ng ŋ N velar nasal

牙,

ngaa4 k

ng ŋ N= syllabic velar nasal

吳,

ng4 k

n n n alveolar nasal

拿,

naa4 t

pʰ

p_h aspirated voiceless

bilabial plosive

趴,

paa1 p

s s s voiceless alveolar

fricative

沙,

saa1 s

tʰ

t_h aspirated voiceless

alveolar plosive

他,

taa1 t

w w w labio-velar approximant

娃,

waa1 u

y j j palatal approximant

也,

jaa5 i

z ts ts voiceless alveolar

aﬀricate

渣,

zaa1 s

Vowels

6 near-open central vowel

吉,

gat1 a

A open back unrounded

vowel

家,

gaa1 a

aai

ɑi

Ai dipthong

街,

gaai1 a

Chinese (Cantonese) (yue-CN) 53

Amazon Polly Developer Guide

Jyutping IPA X-

SAMPA

Description Jyutping

Example

Viseme

aau

ɑu

Au dipthong

交,

gaau1 a

ɐi

6i dipthong

雞,

gai1 a

ɐu

6u dipthong

溝,

kau1 a

E open-mid front

unrounded vowel

爹,

de1 E

ei ei ei dipthong

基,

gei1 e

8 close-mid central

rounded vowel

春,

ceon1 o

eoi

ɵy

8y diphthong

居,

geoi1 o

ɛu

Eu diphthong

掉

掉垃圾,

deu6

i i i close front unrounded

vowel

斯,

si1 i

i I l near-close near-front

unrounded vowel

激,

gik1 i

iu iu iu diphthong

驕,

giu1 i

O open-mid back rounded

vowel

哥,

go1 O

oe œ 9 open-mid front rounded

vowel

鋸,

goe3 O

ɔi

Oi dipthong

該,

goi1 O

ou ou ou dipthong

高,

gou1 o

Chinese (Cantonese) (yue-CN) 54

Amazon Polly Developer Guide

Jyutping IPA X-

SAMPA

Description Jyutping

Example

Viseme

u u u close back rounded

vowel

姑,

gu1 u

U near-close near-back

rounded vowel

谷,

guk5 u

ui ui ui dipthong

攰,

gui6 u

yu y y close front rounded

vowel

於,

jyu1 u

Tone marks and Additional Symbols

1  high level

詩,

si1 

2  medium rising

史,

si2 

3  medium level

試,

si3 

4  very low level

時,

si4 

5  low rising

市,

si5 

6  low level

是,

si6 

- . . syllable boundary

語音

jyu5-

jam1

Chinese (Mandarin) (cmn-CN)

The following table lists the Pinyin and International Phonetic Alphabet (IPA) phonemes for the

Mandarin Chinese voice that is supported by Amazon Polly. Pinyin is the international standard for

Standard Chinese romanization. IPA and X-SAMPA are not commonly used but are available for

English support. The IPA and X-SAMPA symbols in the table are for reference only and should not

be used for Chinese transcription. Pinyin examples and the corresponding visemes are also shown.

Chinese (Mandarin) (cmn-CN) 55

Amazon Polly Developer Guide

To make Amazon Polly use phonetic pronunciation with Pinyin, use the phoneme alphabet="x-

amazon-phonetic standard used" tag.

The following examples show this with each standard.

Pinyin:

<speak>

## <phoneme alphabet="x-amazon-pinyin" ph="bo2">#</phoneme>#

## <phoneme alphabet="x-amazon-pinyin" ph="bao2">#</phoneme>#

</speak>

IPA:

<speak>

## <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>#

## <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>#

</speak>

X-SAMPA:

<speak>

## <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>#

## <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>#

</speak>

Note

Amazon Polly accepts Mandarin Chinese input encoded in UTF-8 only. The GB 18030

encoding standard is not currently supported by Amazon Polly.

Phoneme/Viseme Table

Pinyin IPA X-

SAMPA

Description Pinyin

Example

Viseme

Consonants

Chinese (Mandarin) (cmn-CN) 56

Amazon Polly Developer Guide

Pinyin IPA X-

SAMPA

Description Pinyin

Example

Viseme

f f f voiceless labiodental

fricative

发,

fa1 f

h h h voiceless glottal

fricative

和,

he2 k

g k k voiceless velar plosive

古,

gu3 k

kʰ

k_h aspirated voiceless velar

plosive

苦,

ku3 k

l l l alveolar lateral

approximant

拉,

la1 t

m m m bilabial nasal

骂,

ma4 p

n n n alveolar nasal

那,

na4 t

ng ŋ N velar nasal

正,

zheng4 k

b p p voiceless bilabial plosive

爸,

ba4 p

pʰ

p_h aspirated voiceless

bilabial plosive

怕,

pa4 p

s s s voiceless alveolar

fricative

四,

si4 s

s\ voiceless alveolo-palatal

fricative

西,

xi1 J

s` voiceless retroﬂex

fricative

是,

shi4 S

d t t voiceless alveolar

plosive

打,

da3 t

Chinese (Mandarin) (cmn-CN) 57

Amazon Polly Developer Guide

Pinyin IPA X-

SAMPA

Description Pinyin

Example

Viseme

tʰ

t_h aspirated voiceless

alveolar plosive

他,

ta1 t

ʈ͡ʂ

t`s` voiceless retroﬂex

aﬀricate

之,

zhi1 S

ʈ͡ʂʰ

t`s`_h aspirated voiceless

retroﬂex aﬀricate

吃,

chi1 S

t͡s

ts voiceless alveolar

aﬀricate

字,

zi4 s

t͡ɕ

ts\ voiceless alveolo-palatal

aﬀricate

鸡,

ji1 J

t͡ɕʰ

ts\_h aspirated voiceless

alveolo-palatal aﬀricate

七,

qi1 J

t͡sʰ

ts_h aspirated voiceless

alveolar aﬀricate

次,

ci4 s

w w w labio-velar approximant

我,

wo3 u

z` voiced retroﬂex fricative

日,

ri4 S

"er" and "r" colored syllables

@` r-coloured mid central

vowel

二,

er4 @

-r r-colored syllable

馅儿,

xianr4 @

Vowels

7 close-mid back

unrounded vowel

恶,

e4 e

Chinese (Mandarin) (cmn-CN) 58

Amazon Polly Developer Guide

Pinyin IPA X-

SAMPA

Description Pinyin

Example

Viseme

@ mid central vowel

恩,

en1 @

a a a open front unrounded

vowel

安,

an1 a

aɪ

aI diphthong

爱,

ai4 a

aʊ

aU diphthong

奥,

ao4 a

eɪ

e diphthong

诶,

ei4 e

E open-mid front

unrounded vowel

姐,

jie3 E

i i i close front unrounded

vowel

鸡,

ji1 i

oʊ

oU diphthong

欧,

ou1 o

O open-mid back rounded

vowel

哦,

o4 o

u u u close back rounded

vowel

主,

zhu3 u

yu y y close front rounded

vowel

于,

yu2 u

Tone marks and Additional Symbols

1  high level tone

淤,

yu1 

2  rising tone

鱼,

yu2 

3  low (falling-rising) tone

语,

yu3 

4  falling tone

育,

yu4 

Chinese (Mandarin) (cmn-CN) 59

Amazon Polly Developer Guide

Pinyin IPA X-

SAMPA

Description Pinyin

Example

Viseme

0  neutral tone

的,

de0 

- . . syllable boundary

语音

yu3-yin1

Danish (da-DK)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Danish voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bat p

d d voiced alveolar

plosive

da t

ð D voiced dental

fricative

mad, thriller T

f f voiceless labiodent

al fricative

fat f

g g voiced velar

plosive

gat k

h h voiceless glottal

fricative

hat k

j j palatal approxima

jo i

Danish (da-DK) 60

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

k k voiceless velar

plosive

kat k

l l alveolar lateral

approximant

ladt t

m m bilabial nasal mat p

n n alveolar nasal nay t

ŋ N velar nasal lang k

p p voiceless bilabial

plosive

pande p

r r alveolar trill thriller, story r

R voiced uvular

fricative

rat k

s s voiceless alveolar

fricative

sat s

t t voiceless alveolar

plosive

tal t

v v voiced labiodental

fricative

vat f

w w labial-velar

approximant

hav, weekend u

Vowels

ø 2 close-mid front

rounded vowel

øst o

Danish (da-DK) 61

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ø: 2: long close-mid

front rounded

vowel

øse o

6 near-open central

vowel

mor a

œ 9 open-mid front

rounded vowel

skøn, grønt O

œ: 9: long open-mid

front rounded

vowel

høne, gøre O

@ mid central vowel ane @

æː

{: long near-open

front unrounded

vowel

male a

a a open front

unrounded vowel

man a

æ { near-open front

unrounded vowel

adresse a

A open back

unrounded vowel

lak, tak a

ɑ:

A: long open back

unrounded vowel

rase a

e e close-mid front

unrounded vowel

midt e

Danish (da-DK) 62

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

e: e: long close-mid

front unrounded

vowel

mele e

E open-mid front

unrounded vowel

mæt E

ɛ:

E: long open-mid

front unrounded

vowel

mæle E

i i close front

unrounded vowel

mit i

i: i: long close front

unrounded vowel

mile i

o o close-mid back

rounded vowel

foto o

o: o: long close-mid

back rounded

vowel

mole o

O open-mid back

rounded vowel

mund O

ɔ:

O: long open-mid

back rounded

vowel

måle O

ɒː

Q: long open back

rounded vowel

morse O

u u close back

rounded vowel

lusk u

Danish (da-DK) 63

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

u: u: long close back

rounded vowel

mule u

V open-mid back

unrounded

kører E

y y close front

rounded vowel

yt u

y: y: long close front

rounded vowel

hyle u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Dutch (Belgian) (nl-BE)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Belgian Dutch (Flemish) voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bak p

d d voiced alveolar

plosive

dak t

Dutch (Belgian) (nl-BE) 64

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

d͡ʒ

dZ voiced postalveo

lar aﬀricate

manager S

f f voiceless labiodent

al fricative

fel f

g g voiced velar

plosive

goal k

G voiced velar

fricative

hoed k

h\ voiced glottal

fricative

hand k

j j palatal approxima

ja i

k k voiceless velar

plosive

kap k

l l alveolar lateral

approximant

land t

m m bilabial nasal met p

n n alveolar nasal net t

ŋ N velar nasal bang k

p p voiceless bilabial

plosive

pak p

r r alveolar trill rand r

s s voiceless alveolar

fricative

sein s

Dutch (Belgian) (nl-BE) 65

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

S voiceless postalveo

lar fricative

show S

t t voiceless alveolar

plosive

tak t

v v voiced labiodental

fricative

vel f

v\ labiodental

approximant

wit f

x x voiceless velar

fricative

toch k

z z voiced alveolar

fricative

ziin s

Z voiced postalveo

lar fricative

bagage S

Vowels

øː

2: long close-mid

front rounded

vowel

neus o

œy 9y dipthong buit O

@ mid central vowel de @

a: a: long open front

unrounded vowel

baad a

ɑ:

A open back

unrounded vowel

bad a

Dutch (Belgian) (nl-BE) 66

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

e: e: long close-mid

front unrounded

vowel

beet e

ɜː

3: long open-mid

central unrounded

vowel

barrière E

E open-mid front

unrounded vowel

bed E

ɛi

Ei dipthong beet E

i i close front

unrounded vowel

vier i

I near-close near-

front unrounded

vowel

pit i

o: o: long close-mid

back rounded

vowel

boot o

O open-mid back

rounded vowel

pot O

u u close back

rounded vowel

hoed u

ʌu

Vu dipthong fout E

yː

y: long close front

rounded vowel

fuut u

Dutch (Belgian) (nl-BE) 67

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Y near-close near-

front rounded

vowel

hut u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Dutch (nl-NL)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Dutch voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bak p

d d voiced alveolar

plosive

dak t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

manager S

f f voiceless labiodent

al fricative

fel f

Dutch (nl-NL) 68

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

g g voiced velar

plosive

goal k

G voiced velar

fricative

hoed k

h\ voiced glottal

fricative

hand k

j j palatal approxima

ja i

k k voiceless velar

plosive

kap k

l l alveolar lateral

approximant

land t

m m bilabial nasal met p

n n alveolar nasal net t

ŋ N velar nasal bang k

p p voiceless bilabial

plosive

pak p

r r alveolar trill rand r

s s voiceless alveolar

fricative

sein s

S voiceless postalveo

lar fricative

show S

t t voiceless alveolar

plosive

tak t

Dutch (nl-NL) 69

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

v v voiced labiodental

fricative

vel f

v\ labiodental

approximant

wit f

x x voiceless velar

fricative

toch k

z z voiced alveolar

fricative

ziin s

Z voiced postalveo

lar fricative

bagage S

Vowels

øː

2: long close-mid

front rounded

vowel

neus o

œy 9y dipthong buit O

@ mid central vowel de @

a: a: long open front

unrounded vowel

baad a

ɑ:

A open back

unrounded vowel

bad a

e: e: long close-mid

front unrounded

vowel

beet e

ɜː

3: long open-mid

central unrounded

vowel

barrière E

Dutch (nl-NL) 70

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

E open-mid front

unrounded vowel

bed E

ɛi

Ei dipthong beet E

i i close front

unrounded vowel

vier i

I near-close near-

front unrounded

vowel

pit i

o: o: long close-mid

back rounded

vowel

boot o

O open-mid back

rounded vowel

pot O

u u close back

rounded vowel

hoed u

ʌu

Vu dipthong fout E

yː

y: long close front

rounded vowel

fuut u

Y near-close near-

front rounded

vowel

hut u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Dutch (nl-NL) 71

Amazon Polly Developer Guide

English (US) (en-US)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the American English voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

English (US) (en-US) 72

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

l l alveolar lateral

approximant

lay l

m m bilabial nasal mouse p

n n alveolar nasal nap t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

speak p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

trap t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

z z voiced alveolar

fricative

zero s

English (US) (en-US) 73

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid-central vowel arena @

@` mid-central r-

colored vowel

reader @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

A long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

3` open mid-centr

al unrounded r-

colored vowel

nurse E

E open mid-front

unrounded vowel

dress E

i i long close front

unrounded vowel

ﬂeece i

I near-close near-

front unrounded

vowel

kit i

oʊ

oU diphthong goat o

English (US) (en-US) 74

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

O long open mid-

back rounded

vowel

thought O

ɔɪ

OI diphthong choice O

u u long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

V open-mid-back

unrounded vowel

strut E

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

English (Australian) (en-AU)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Australian English voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

English (Australian) (en-AU) 75

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

l l alveolar lateral

approximant

lay t

l̩

l= syllabic alveolar

lateral approxima

battle t

m m bilabial nasal mouse p

m̩

m= syllabic bilabial

nasal

anthem p

English (Australian) (en-AU) 76

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

n n alveolar nasal nap t

n̩

n= syllabic alveolar

nasal

button t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

pin p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

task t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

Θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

z z voiced alveolar

fricative

zero s

English (Australian) (en-AU) 77

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid central vowel arena @

əʊ

@U diphthong goat @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

ɑː

A: long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

ɜː

3: long open mid-

central unrounded

vowel

nurse E

E open mid-front

unrounded vowel

dress E

ɛə

E@ diphthong square E

i: i long close front

unrounded vowel

ﬂeece i

I near-close near-

front unrounded

vowel

kit i

ɪə

I@ diphthong near i

English (Australian) (en-AU) 78

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɔː

OI long open-mid

back rounded

vowel

thought O

ɔɪ

OI Diphthong choice O

Q open back

rounded vowel

lot O

u: u: long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

ʊə

U@ diphthong cure u

V Open-mid-back

unrounded vowel

strut E

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

English (British) (en-GB)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the British English voices that are supported by Amazon Polly.

English (British) (en-GB) 79

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

l l alveolar lateral

approximant

lay t

l̩

l= syllabic alveolar

lateral approxima

battle t

English (British) (en-GB) 80

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

m m bilabial nasal mouse p

m̩

m= syllabic bilabial

nasal

anthem p

n n alveolar nasal nap t

n̩

n= syllabic alveolar

nasal

button t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

pin p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

task t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

Θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

English (British) (en-GB) 81

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

z z voiced alveolar

fricative

zero s

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid central vowel arena @

əʊ

@U diphthong goat @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

ɑː

A: long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

ɜː

3: long open mid-

central unrounded

vowel

nurse E

E open mid-front

unrounded vowel

dress E

ɛə

E@ diphthong square E

i: i long close front

unrounded vowel

ﬂeece i

English (British) (en-GB) 82

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

I near-close near-

front unrounded

vowel

kit i

ɪə

I@ diphthong near i

ɔː

O: long open-mid

back rounded

vowel

thought O

ɔɪ

OI Diphthong choice O

Q open back

rounded vowel

lot O

u: u: long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

ʊə

U@ diphthong cure u

V Open-mid-back

unrounded vowel

strut E

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

English (British) (en-GB) 83

Amazon Polly Developer Guide

English (Indian) (en-IN)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Indian English voice supported by Amazon Polly.

For additional phonemes used in conjunction with Indian English, see Hindi (hi-IN).

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

English (Indian) (en-IN) 84

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

l l alveolar lateral

approximant

lay t

l̩

l= syllabic alveolar

lateral approxima

battle t

m m bilabial nasal mouse p

m̩

m= syllabic bilabial

nasal

anthem p

n n alveolar nasal nap t

n̩

n= syllabic alveolar

nasal

nap t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

pin p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

task t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

English (Indian) (en-IN) 85

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

z z voiced alveolar

fricative

zero s

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid central vowel arena @

əʊ

@U diphthong goat @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

ɑː

A: long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

ɜː

3: long open mid-

central unrounded

vowel

nurse E

English (Indian) (en-IN) 86

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

E open mid-front

unrounded vowel

dress E

ɛə

E@ diphthong square E

i: i long close front

unrounded vowel

ﬂeece i

I near-close near-

front unrounded

vowel

kit i

ɪə

I@ diphthong near i

ɔː

OI long open-mid

back rounded

vowel

thought O

ɔɪ

OI Diphthong choice O

Q open back

rounded vowel

lot O

u: u: long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

ʊə

U@ diphthong cure u

V Open-mid-back

unrounded vowel

strut E

Additional Symbols

" primary stress Alabama

English (Indian) (en-IN) 87

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

English (Ireland) (en-IE)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Irish English voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

English (Ireland) (en-IE) 88

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

l l alveolar lateral

approximant

lay t

m m bilabial nasal mouse p

n n alveolar nasal nap t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

speak p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

trap t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

English (Ireland) (en-IE) 89

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

w w labial-velar

approximant

west u

z z voiced alveolar

fricative

zero s

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid-central vowel arena @

@` mid-central r-

colored vowel

reader @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

A long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

3` open mid-centr

al unrounded r-

colored vowel

nurse E

E open mid-front

unrounded vowel

dress E

i i long close front

unrounded vowel

ﬂeece i

English (Ireland) (en-IE) 90

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

I near-close near-

front unrounded

vowel

kit i

oʊ

oU diphthong goat o

O long open mid-

back rounded

vowel

thought O

ɔɪ

OI diphthong choice O

u u long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

V open-mid-back

unrounded vowel

strut E

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

English (New Zealand) (en-NZ)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the New Zealand English voices that are supported by Amazon Polly.

English (New Zealand) (en-NZ) 91

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

l l alveolar lateral

approximant

lay t

l̩

l= syllabic alveolar

lateral approxima

battle t

English (New Zealand) (en-NZ) 92

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

m m bilabial nasal mouse p

m̩

m= syllabic bilabial

nasal

anthem p

n n alveolar nasal nap t

n̩

n= syllabic alveolar

nasal

button t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

pin p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

task t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

Θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

English (New Zealand) (en-NZ) 93

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

z z voiced alveolar

fricative

zero s

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid central vowel arena @

əʊ

@U diphthong goat @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

ɑː

A: long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

ɜː

3: long open mid-

central unrounded

vowel

nurse E

E open mid-front

unrounded vowel

dress E

ɛə

E@ diphthong square E

i: i long close front

unrounded vowel

ﬂeece i

English (New Zealand) (en-NZ) 94

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

I near-close near-

front unrounded

vowel

kit i

ɪə

I@ diphthong near i

ɔː

O: long open-mid

back rounded

vowel

thought O

ɔɪ

OI Diphthong choice O

Q open back

rounded vowel

lot O

u: u: long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

ʊə

U@ diphthong cure u

V Open-mid-back

unrounded vowel

strut E

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

The Aria voice speaks New Zealand English and oﬀers limited support for Maori. It can pronounce

the following Maori words and phrases. The Maori phrases are case-sensitive.

English (New Zealand) (en-NZ) 95

Amazon Polly Developer Guide

English Maori

Hello/cheers Kia ora

Welcome (to) Nau mai (ki)

Hello (one person)/thank you Tēnā koe

Hello (three or more people)/thank you Tēnā koutou

Good morning Ata mārie

Good morning Mōrena

Thank you Ngā mihi

Take care Ngā manaakitanga

See you Ka kite

See you later Mā te wā

Have a good day Kia pai tō rā

Merry Christmas Meri Kirihimete

Maori Māori

Maori language te reo Māori

Maori language week Te wiki o te reo Māori

New Zealand Aotearoa

Maori New Year Mātariki

Town in New Zealand / Waitangi Day is the

national day of New Zealand

Waitangi

One tahi

Two rua

English (New Zealand) (en-NZ) 96

Amazon Polly Developer Guide

English Maori

Three toru

Four whā

Five rima

Six ono

Seven whitu

Eight waru

Nine iwa

Ten tekau

Twenty rua tekau

Thirty Toru tekau

English (South African) (en-ZA)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the South African English voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

English (South African) (en-ZA) 97

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

l l alveolar lateral

approximant

lay t

l̩

l= syllabic alveolar

lateral approxima

battle t

ɬ̩

K voiceless lateral

fricative

umhlanga t

m m bilabial nasal mouse p

m̩

m= syllabic bilabial

nasal

anthem p

n n alveolar nasal nap t

English (South African) (en-ZA) 98

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

n̩

n= syllabic alveolar

nasal

button t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

pin p

r\ alveolar approxima

red r

r r alveolar trill pareis r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

t t voiceless alveolar

plosive

task t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

Θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

x x voiceless velar

fricative

gauteng k

English (South African) (en-ZA) 99

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

z z voiced alveolar

fricative

zero s

! !\ post-alveolar click gqeberha k

| |\ dental click ncube t

|| ||\ lateral click xhosa t

Vowels

@ mid central vowel arena @

əi

@i diphthong nelspruit i

əʊ

@U diphthong goat @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

ɑː

A: long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

ɜː

3: long open mid-

central unrounded

vowel

nurse E

E open mid-front

unrounded vowel

dress E

ɛə

E@ diphthong square E

English (South African) (en-ZA) 100

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

i: i long close front

unrounded vowel

ﬂeece i

iə

I@ diphthong du preez i

I near-close near-

front unrounded

vowel

kit i

ɪə

I@ diphthong near i

ɔː

O: long open-mid

back rounded

vowel

thought O

ɔɪ

OI Diphthong choice O

Q open back

rounded vowel

lot O

u: u: long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

ʊə

U@ diphthong cure u

V Open-mid-back

unrounded vowel

strut E

y y close front

rounded vowel

van vuuren u

Additional Symbols

" primary stress Alabama

English (South African) (en-ZA) 101

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

English (Welsh) (en-GB-WLS)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Welsh English voice supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bed p

d d voiced alveolar

plosive

dig t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

jump S

ð D voiced dental

fricative

then T

f f voiceless labiodent

al fricative

five f

g g voiced velar

plosive

game k

h h voiceless glottal

fricative

house k

English (Welsh) (en-GB-WLS) 102

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

j j palatal approxima

yes i

k k voiceless velar

plosive

cat k

l l alveolar lateral

approximant

lay t

l̩

l= syllabic alveolar

lateral approxima

battle t

m m bilabial nasal mouse p

m̩

m= syllabic bilabial

nasal

anthem p

n n alveolar nasal nap t

n̩

n= syllabic alveolar

nasal

nap t

ŋ N velar nasal thing k

p p voiceless bilabial

plosive

pin p

r\ alveolar approxima

red r

s s voiceless alveolar

fricative

seem s

S voiceless postalveo

lar fricative

ship S

English (Welsh) (en-GB-WLS) 103

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t t voiceless alveolar

plosive

task t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chart S

Θ T voiceless dental

fricative

thin T

v v voiced labiodental

fricative

vest f

w w labial-velar

approximant

west u

z z voiced alveolar

fricative

zero s

Z voiced postalveo

lar fricative

vision S

Vowels

@ mid central vowel arena @

əʊ

@U diphthong goat @

æ { near open-front

unrounded vowel

trap a

aɪ

aI diphthong price a

aʊ

aU diphthong mouth a

ɑː

A: long open-back

unrounded vowel

father a

eɪ

eI diphthong face e

English (Welsh) (en-GB-WLS) 104

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɜː

3: long open mid-

central unrounded

vowel

nurse E

E open mid-front

unrounded vowel

dress E

ɛə

E@ diphthong square E

i: i long close front

unrounded vowel

ﬂeece i

I near-close near-

front unrounded

vowel

kit i

ɪə

I@ diphthong near i

ɔː

OI long open-mid

back rounded

vowel

thought O

ɔɪ

OI Diphthong choice O

Q open back

rounded vowel

lot O

u: u: long close-back

rounded vowel

goose u

U near-close near-

back rounded

vowel

foot u

ʊə

U@ diphthong cure u

V Open-mid-back

unrounded vowel

strut E

English (Welsh) (en-GB-WLS) 105

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Finnish (ﬁ-FI)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Finnish voice that is supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Finnish consonants

p p voiceless bilabial

plosive

[p]ankki p

t t voiceless alveolar

plosive

[t]alo t

k k voiceless velar

plosive

[k]aali k

d d voiced alveolar

plosive

[d]ata t

s s voiceless alveolar

fricative

[s]ali s

h h voiceless glottal

fricative

[h]attu k

Finnish (ﬁ-FI) 106

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

v\ voiced labiodental

approximant

[v]aivَa

j j palatal approxima

[j]oki i

l l alveolar lateral

approximant

[l]oma t

r r voiced alveolar trill [r]iita r

m m bilabial nasal [m]ato p

n n alveolar nasal [n]enäa t

ŋ N velar nasal he[n]ki k

Consonants found in loanwords

b b voiced bilabial

plosive

[b]ussi p

f f voiceless labiodent

al fricative

[f]irma v

w w labial-velar

approximant

[w]iki u

z z voiced alveolar

fricative

[z]ulu s

g g voiced velar

plosive

[g]aala k

S voiceless postalveo

lar fricative

[sh]akki S

Finnish (ﬁ-FI) 107

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Z voiced postalveo

lar fricative

[g]enre S

θ T voiceless dental

fricative

ear[th] T

ð D voiced dental

fricative

ei[th]er T

Short vowels

i i close front

unrounded vowel

k[i]lo i

E open mid-front

unrounded vowel

k[e]sä E

æ { near open-front

unrounded vowel

k[ä]ly A

y y close front

rounded vowel

k[y]lä u

ø 2 close mid-front

rounded vowel

p[ö]ly O

u u close back

rounded vowel

k[u]lo u

O open mid-back

rounded vowel

k[o]lo O

A open back

unrounded vowel

k[a]la A

Long vowels

Finnish (ﬁ-FI) 108

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

iː iː

long close front

unrounded vowel

s[ii]li i

ɛː

E: long open mid-

front unrounded

vowel

[ee]tu E

æː

{: long near open-

front unrounded

vowel

t[ää]llä A

y: y: long close front

unrounded vowel

t[yy]li u

øː

2: long close mid-

front rounded

vowel

t[öö]lö O

u: u: long close back

rounded vowel

t[uu]li u

ɔː

O: long open mid-

back rounded

vowel

r[oo]li O

ɑː

A: long open back

unrounded vowel

k[aa]su A

Dipthongs

ɛi

Ei dipthong l[ei]pä E

æi {i dipthong [äi]ti A

ui ui dipthong k[ui]n u

ɑi

Ai dipthong k[ai]kki A

Finnish (ﬁ-FI) 109

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɔi

Oi dipthong p[oi]ka O

øi 2i dipthong s[öi]n O

yi yi dipthong l[yi]jy u

ɑu

Au dipthong s[au]na A

ɔu

Ou dipthong k[ou]lu O

ɛu

Eu dipthong r[eu]na E

iu iu dipthong v[iu]lu i

æy {y dipthong t[äy]nnä A

øy 2y dipthong k[öy]hä O

ɛy

Ey dipthong pes[ey]tyä E

iy iy dipthong käär[iy]tyä i

iɛ

iE dipthong t[ie] i

yø y2 dipthong [yö] u

uɔ

uO dipthong t[uo] u

Vowels found in English loanwords

I near-close near-

front unrounded

vowel

b[i]t i

U near-close near-

back rounded

vowel

b[oo]k u

@ mid-central vowel [a]bout @

Finnish (ﬁ-FI) 110

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

V open-mid-back

unrounded vowel

c[u]t E

French (fr-FR)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the French voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

boire p

d d voiced alveolar

plosive

madame t

f f voiceless labiodent

al fricative

femme f

g g voiced velar

plosive

grand k

H labial-palatal

approximant

bruit u

j j palatal approxima

meilleur i

k k voiceless velar

plosive

quatre k

French (fr-FR) 111

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

l l alveolar lateral

approximant

malade t

m m bilabial nasal maison p

n n alveolar nasal astronome t

J palatal nasal baigner J

ŋ N velar nasal parking k

p p voiceless bilabial

plosive

pomme p

R voiced uvular

fricative

amoureux k

s s voiceless alveolar

fricative

santé s

S voiceless postalveo

lar fricative

chat S

t t voiceless alveolar

plosive

téléphone t

v v voiced labiodental

fricative

vrai f

w w labial-velar

approximant

soir u

z z voiced alveolar

fricative

raison s

Z voiced postalveo

lar fricative

aubergine S

French (fr-FR) 112

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Vowels

ø 2 close-mid front

rounded vowel

deux o

œ 9 open-mid front

rounded vowel

neuf O

œ̃ 9~ nasal open-mid

front rounded

vowel

brun O

@ mid central vowel je @

a a open front

unrounded vowel

table a

ɑ̃

A~ nasal open back

unrounded vowel

camembert a

e e close-mid front

unrounded vowel

marché e

E open-mid front

unrounded vowel

neige E

ɛ̃

E~ nasal open-mid

front unrounded

vowel

sapin E

i i close front

unrounded vowel

mille i

o o close-mid back

rounded vowel

hôpital o

O open-mid back

rounded vowel

homme O

French (fr-FR) 113

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɔ̃

O~ nasal open-mid

back rounded

vowel

bon O

u u close back

rounded vowel

sous u

y y close front

rounded vowel

dur u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

French (Belgian) (fr-BE)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Belgian French voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

boire p

d d voiced alveolar

plosive

madame t

French (Belgian) (fr-BE) 114

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

f f voiceless labiodent

al fricative

femme f

g g voiced velar

plosive

grand k

H labial-palatal

approximant

bruit u

j j palatal approxima

meilleur i

k k voiceless velar

plosive

quatre k

l l alveolar lateral

approximant

malade t

m m bilabial nasal maison p

n n alveolar nasal astronome t

J palatal nasal baigner J

ŋ N velar nasal parking k

p p voiceless bilabial

plosive

pomme p

R voiced uvular

fricative

amoureux k

s s voiceless alveolar

fricative

santé s

S voiceless postalveo

lar fricative

chat S

French (Belgian) (fr-BE) 115

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t t voiceless alveolar

plosive

téléphone t

v v voiced labiodental

fricative

vrai f

w w labial-velar

approximant

soir u

z z voiced alveolar

fricative

raison s

Z voiced postalveo

lar fricative

aubergine S

Vowels

ø 2 close-mid front

rounded vowel

deux o

œ 9 open-mid front

rounded vowel

neuf O

œ̃ 9~ nasal open-mid

front rounded

vowel

brun O

@ mid central vowel je @

a a open front

unrounded vowel

table a

ɑ̃

A~ nasal open back

unrounded vowel

camembert a

e e close-mid front

unrounded vowel

marché e

French (Belgian) (fr-BE) 116

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

E open-mid front

unrounded vowel

neige E

ɛ̃

E~ nasal open-mid

front unrounded

vowel

sapin E

i i close front

unrounded vowel

mille i

o o close-mid back

rounded vowel

hôpital o

O open-mid back

rounded vowel

homme O

ɔ̃

O~ nasal open-mid

back rounded

vowel

bon O

u u close back

rounded vowel

sous u

y y close front

rounded vowel

dur u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

French (Belgian) (fr-BE) 117

Amazon Polly Developer Guide

French (Canadian) (fr-CA)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the French Canadian voice supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

boire p

d d voiced alveolar

plosive

madame t

f f voiceless labiodent

al fricative

femme f

g g voiced velar

plosive

grand k

H labial-palatal

approximant

bruit u

j j palatal approxima

meilleur i

k k voiceless velar

plosive

quatre k

l l alveolar lateral

approximant

malade t

m m bilabial nasal maison p

n n alveolar nasal astronome t

J palatal nasal baigner J

French (Canadian) (fr-CA) 118

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ŋ N velar nasal parking k

p p voiceless bilabial

plosive

pomme p

R voiced uvular

fricative

amoureux k

s s voiceless alveolar

fricative

santé s

S voiceless postalveo

lar fricative

chat S

t t voiceless alveolar

plosive

téléphone t

v v voiced labiodental

fricative

vrai f

w w labial-velar

approximant

soir u

z z voiced alveolar

fricative

raison s

Z voiced postalveo

lar fricative

aubergine S

Vowels

ø 2 close-mid front

rounded vowel

deux o

œ 9 open-mid front

rounded vowel

neuf O

French (Canadian) (fr-CA) 119

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

œ̃ 9~ nasal open-mid

front rounded

vowel

brun O

@ mid central vowel je @

a a open front

unrounded vowel

table a

ɑ̃

A~ nasal open back

unrounded vowel

camembert a

e e close-mid front

unrounded vowel

marché e

E open-mid front

unrounded vowel

neige E

ɛ̃

E~ nasal open-mid

front unrounded

vowel

sapin E

i i close front

unrounded vowel

mille i

o o close-mid back

rounded vowel

hôpital o

O open-mid back

rounded vowel

homme O

ɔ̃

O~ nasal open-mid

back rounded

vowel

bon O

u u close back

rounded vowel

sous u

French (Canadian) (fr-CA) 120

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

y y close front

rounded vowel

dur u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

German (de-DE)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the German voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

? glottal stop

b b voiced bilabial

plosive

Bier p

d d voiced alveolar

plosive

Dach t

ç C voiceless palatal

fricative

ich k

d͡ʒ

dZ voiced postalveo

lar aﬀricate

Dschungel S

German (de-DE) 121

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

f f Voiceless labiodent

al fricative

Vogel f

g g Voiced velar

plosive

Gabel k

h h Voiceless glottal

fricative

Haus k

j j Voiceless glottal

fricative

jemand i

k k Voiceless velar

plosive

Kleid k

l l Alveolar lateral

approximant

Loch t

m m Bilabial nasal Milch p

n n Alveolar nasal Natur t

ŋ N Velar nasal klingen k

p p Voiceless bilabial

plosive

Park p

p͡f

pf Voiceless labiodent

al aﬀricate

Apfel

R Uvular trill Regen

s s voiceless alveolar

fricative

Messer s

S Voiceless

postalveolar

fricative

Fischer S

German (de-DE) 122

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t t Voiceless alveolar

plosive

Topf T

t͡s

Ts Voiceless alveolar

aﬀricate

Zahl

t͡ʃ

tS Voiceless

postalveolar

aﬀricate

deutsch S

v v Voiced labiodental

fricative

Wasser f

x x Voiceless velar

fricative

kochen k

z z Voiced alveolar

fricative

See s

Z Voiced postalveo

lar fricative

Orange S

Vowels

øː

2: long close-mid

front rounded

vowel

böse o

6 near-open central

vowel

besser a

ɐ̯

6_^ non-syllabic near-

open central vowel

Klar a

œ 9 open-mid front

rounded vowel

können O

@ mid central vowel Rede @

German (de-DE) 123

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

a a open front

unrounded vowel

Salz a

a: a: long open front

unrounded vowel

Sahne a

aɪ

aI diphthong nein a

aʊ

aU diphthong Augen a

ɑ̃

A~ nasal open back

unrounded vowel

Restaurant a

e: e: long close-mid

front unrounded

vowel

Rede e

E open-mid front

unrounded vowel

Keller E

ɛ̃

E~ nasal open-mid

front unrounded

vowel

Terrain E

i: i: long close front

unrounded vowel

Lied i

I near-close near-

front unrounded

vowel

bitte i

o: o: long close-mid

back rounded

vowel

Kohl o

O open-mid back

rounded vowel

Koﬀer O

German (de-DE) 124

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɔ̃

O~ nasal open-mid

back rounded

vowel

Annonce O

ɔʏ

OY diphthong neu O

u: u: long close back

rounded vowel

Bruder u

U near-close near-

back rounded

vowel

Wunder u

y: y: long close front

rounded vowel

kühl u

Y near-close near-

front rounded

vowel

Küche u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

German (Austrian) (de-AT)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Austrian German voices that are supported by Amazon Polly.

German (Austrian) (de-AT) 125

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

? glottal stop

b b voiced bilabial

plosive

Bier p

d d voiced alveolar

plosive

Dach t

ç C voiceless palatal

fricative

ich k

d͡ʒ

dZ voiced postalveo

lar aﬀricate

Dschungel S

f f Voiceless labiodent

al fricative

Vogel f

g g Voiced velar

plosive

Gabel k

h h Voiceless glottal

fricative

Haus k

j j Voiceless glottal

fricative

jemand i

k k Voiceless velar

plosive

Kleid k

l l Alveolar lateral

approximant

Loch t

m m Bilabial nasal Milch p

n n Alveolar nasal Natur t

German (Austrian) (de-AT) 126

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ŋ N Velar nasal klingen k

p p Voiceless bilabial

plosive

Park p

p͡f

pf Voiceless labiodent

al aﬀricate

Apfel

R Uvular trill Regen

s s voiceless alveolar

fricative

Messer s

S Voiceless

postalveolar

fricative

Fischer S

t t Voiceless alveolar

plosive

Topf T

t͡s

Ts Voiceless alveolar

aﬀricate

Zahl

t͡ʃ

tS Voiceless

postalveolar

aﬀricate

deutsch S

v v Voiced labiodental

fricative

Wasser f

x x Voiceless velar

fricative

kochen k

z z Voiced alveolar

fricative

See s

Z Voiced postalveo

lar fricative

Orange S

German (Austrian) (de-AT) 127

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Vowels

øː

2: long close-mid

front rounded

vowel

böse o

6 near-open central

vowel

besser a

ɐ̯

6_^ non-syllabic near-

open central vowel

Klar a

œ 9 open-mid front

rounded vowel

können O

@ mid central vowel Rede @

a a open front

unrounded vowel

Salz a

a: a: long open front

unrounded vowel

Sahne a

aɪ

aI diphthong nein a

aʊ

aU diphthong Augen a

ɑ̃

A~ nasal open back

unrounded vowel

Restaurant a

e: e: long close-mid

front unrounded

vowel

Rede e

E open-mid front

unrounded vowel

Keller E

German (Austrian) (de-AT) 128

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɛ̃

E~ nasal open-mid

front unrounded

vowel

Terrain E

i: i: long close front

unrounded vowel

Lied i

I near-close near-

front unrounded

vowel

bitte i

o: o: long close-mid

back rounded

vowel

Kohl o

O open-mid back

rounded vowel

Koﬀer O

ɔ̃

O~ nasal open-mid

back rounded

vowel

Annonce O

ɔʏ

OY diphthong neu O

u: u: long close back

rounded vowel

Bruder u

U near-close near-

back rounded

vowel

Wunder u

y: y: long close front

rounded vowel

kühl u

Y near-close near-

front rounded

vowel

Küche u

German (Austrian) (de-AT) 129

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Hindi (hi-IN)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the phoneme's sound type for

the Hindi voices that are supported by Amazon Polly.

For additional phonemes used in conjunction with Hindi, see English (Indian) (en-IN).

Phoneme/Viseme Table

IPA X-SAMPA Description Example

Consonants

pʰ

p_h voiceless aspirated

bilabial plosive

फूल

(phool)

bʱ

b_h voiced aspirated bilabial

plosive

भारी

(bhaari)

t̪

t_d voiceless dental plosive

तापमान

(taapmaan)

t̪ʰ

t_d_h voiceless aspirated

dental plosive

थोड़ा

(thoda)

d̪

d_d voiced dental plosive

दिल्ली

(dilli)

d̪ʱ

d_d_h voiced aspirated dental

plosive

धोबी

(dhobi)

Hindi (hi-IN) 130

Amazon Polly Developer Guide

IPA X-SAMPA Description Example

t` voiceless retroﬂex plosive

कटोरा

(katora)

ʈʰ

t`_h voiceless aspirated

retroﬂex plosive

ठंड

(thand)

d` voiced retroﬂex plosive

डर

(darr)

ɖʱ

d`_h voiced aspirated retroﬂex

plosive

ढाल

(dhal)

tʃʰ

tS_h voiceless aspirated

palatal aﬀricate

छाल

(chaal)

dʒʱ

dZ_h voiced aspirated palatal

aﬀricate

झाल

(jhaal)

kʰ

k_h voiceless aspirated velar

plosive

खान

(khan)

ɡʱ

g_h voiced aspirated velar

plosive

घान

(ghaan)

n` retroﬂex nasal

क्षण

(kshan)

4 alveolar ﬂap

राम

(ram)

r` plain retroﬂex ﬂap

बड़ा

(bada)

ɽʱ

r`_h voiced aspirated retroﬂex

ﬂap

बढ़ी

(barhi)

v\ bilabial approximant

वसूल

(wasool)

Vowels

@_o mid central vowel

अच्छा

(achhaa)

ə̃

@~ nasalised mid central

vowel

हँसना

(hansnaa)

Hindi (hi-IN) 131

Amazon Polly Developer Guide

IPA X-SAMPA Description Example

a A_o open front unrounded

vowel

आग

(aag)

a ̃ A~ nasalised open front

unrounded vowel

घ़डियँा

(ghariyaan)

I_o near-close near-front

unrounded vowel

इक्कीस

(ikkees)

ɪ̃

I~ nasalised near-close near

front unrounded vowel

संिचाई

(sinchai)

i i_o close front unrounded

vowel

बिल्ली

(billee)

ı̃ i~ nasalised close front

unrounded vowel

नहंी

(nahin)

U_o near-close near-back

rounded vowel

उल्ूल

(ullu)

ʊ̃

U~ nasalised near-close

near-back rounded vowel

मँुह

(munh)

u u_o close back rounded

vowel

फूल

(phool)

u ̃ u~ nasalised close back

rounded vowel

ऊँट

(oont)

O_o open-mid back rounded

vowel

कौन

(kaun)

ɔ̃

O~ nasalised open-mid back

rounded vowel

भंौ

(bhaun)

o o close-mid back rounded

vowel

सोना

(sona)

Hindi (hi-IN) 132

Amazon Polly Developer Guide

IPA X-SAMPA Description Example

o ̃ o~ nasalised close-mid back

rounded vowel

क्यंो

(kyon)

E_o open-mid front

unrounded vowel

पैसा

(paisa)

ɛ̃

E~ nasalised open-mid front

unrounded vowel

मंै

(main)

e e close-mid front

unrounded vowel

एक

(ek)

e ̃ e~ nasalised close-mid front

unrounded vowel

किताबंे

(kitabein)

Icelandic (is-IS)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Icelandic voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

grasbakkanum 0

c c voiceless palatal

plosive

pakkin k

cʰ

c_h aspirated voiceless

palatal plosive

anarkistai k

Icelandic (is-IS) 133

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ç C voiceless palatal

fricative

héðan k

d d voiced alveolar

plosive

bóndi t

ð D voiced dental

fricative

borð T

f f voiceless labiodent

al fricative

duft f

g g voiced velar

plosive

holgóma k

G voiced velar

fricative

hugur k

h h voiceless glottal

fricative

heili k

j j palatal approxima

jökull i

kʰ

k_h aspirated voiceless

velar plosive

ósköpunum k

l l alveolar lateral

approximant

gólf t

l̥

l_0 voiceless alveolar

lateral approxima

fólk t

m m bilabial nasal september p

m̥

m_0 voiceless bilabial

nasal

kompa p

Icelandic (is-IS) 134

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

n n alveolar nasal númer t

n̥

n_0 voiceless alveolar

nasal

pöntun t

J palatal nasal pælingar J

ŋ N velar nasal söngvarann k

ŋ̊ N_0 voiceless velar

nasal

frænka k

pʰ

p_h aspirated voiceless

bilabial plosive

afplánun p

r r alveolar trill afskrifta r

r̥

r_0 voiceless alveolar

trill

andvörpum r

s s voiceless alveolar

fricative

baðhús s

tʰ

t_h aspirated voiceless

alveolar plosive

tanki t

θ T voiceless dental

fricative

þeldökki T

v v voiced labiodental

fricative

silfur f

w w labial-velar

approximant

x x voiceless velar

fricative

samfélags k

Icelandic (is-IS) 135

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Vowels

œ 9 open-mid front

rounded vowel

þröskuldinum O

œː

9: long open-mid

front rounded

vowel

tvö O

a a open front

unrounded vowel

nefna a

a: a: long open front

unrounded vowel

fara a

au au diphthong átta a

au: au: diphthong átján a

E open-mid front

unrounded vowel

kennari E

ɛ:

E: long open-mid

front unrounded

vowel

dreka E

i i close front

unrounded vowel

Gúlíver i

i: i: long close front

unrounded vowel

þrír i

I near-close near-

front unrounded

vowel

samspil i

Icelandic (is-IS) 136

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɪ:

I: long near-clos

e near-front

unrounded vowel

stig i

O open-mid back

rounded vowel

regndropar O

ɔ:

O: long open-mid

back rounded

vowel

ullarbolur O

ɔu

Ou diphthong tólf O

ɔu:

Ou: diphthong fjórir O

u u close back

rounded vowel

stúlkan u

u: u: long close back

rounded vowel

frú u

Y near-close near-

front rounded

vowel

tíu u

ʏ:

Y long near-clos

e near-front

rounded vowel

gruninn u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Icelandic (is-IS) 137

Amazon Polly Developer Guide

Italian (it-IT)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Italian voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bacca p

d d voiced alveolar

plosive

dama t

d͡z

dz voiced alveolar

aﬀricate

zero s

d͡ʒ

dZ voiced postalveo

lar aﬀricate

giro S

f f voiceless labiodent

al fricative

famiglia f

g g voiced velar

plosive

gatto k

h h voiceless glottal

fricative

horror k

j j palatal approxima

dieci i

k k voiceless velar

plosive

campo k

Italian (it-IT) 138

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

l l alveolar lateral

approximant

lido t

L palatal lateral

approximant

aglio J

m m bilabial nasal mille p

n n alveolar nasal nove t

J palatal nasal lasagne J

p p voiceless bilabial

plosive

pizza p

r r alveolar trill risata r

s s voiceless alveolar

fricative

sei s

S voiceless postalveo

lar fricative

scienza S

t t voiceless alveolar

plosive

tavola t

t͡s

ts voiceless alveolar

aﬀricate

forza s

t͡ʃ

tS voiceless postalveo

lar aﬀricate

cielo S

v v voiced labiodental

fricative

venti f

w w labial-velar

approximant

quattro u

Italian (it-IT) 139

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

z z voiced alveolar

fricative

bisogno s

Z voiced postalveo

lar fricative

bijou S

Vowels

a a open front

unrounded vowel

arco a

e e close-mid front

unrounded vowel

tre e

E open-mid front

unrounded vowel

ettaro E

i i close front

unrounded vowel

impero i

o o close-mid back

rounded vowel

cento o

O open-mid back

rounded vowel

otto O

u u close back

rounded vowel

uno u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Italian (it-IT) 140

Amazon Polly Developer Guide

Japanese (ja-JP)

Amazon Polly supports the Pronunciation Kana and Yomigana alphabets for Japanese. To make

Amazon Polly use phonetic pronunciation with these alphabets, use the phoneme alphabet="x-

amazon-phonetic standard used" attribute.

•

x-amazon-pron-kana – indicates that Pronunciation Kana is used. Pronunciation Kana are

special Katakana characters used for phonetic transcription and can encode pitch accent.

•

x-amazon-yomigana – indicates that Yomigana is used. Yomigana can be conventional

Katakana, Hiragana, and Latin alphabets interpreted as hepburn romanization.

The following examples show how these are used:

Pronunciation Kana

<speak>

###<phoneme alphabet="x-amazon-pron-kana" ph="###'#">##</phoneme>###

</speak>

Yomigana

<speak>

###<phoneme alphabet="x-amazon-yomigana" ph="####">##</phoneme>###

###<phoneme alphabet="x-amazon-yomigana" ph="Hirokazu">##</phoneme>###

</speak>

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Japanese voice supported by Amazon Polly.

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap

練習,

renshuu t

? glottal stop

あつっ,

atsu'

Japanese (ja-JP) 141

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

b b voiced bilabial

plosive

舞踊,

buyou p

β B voiced bilabial

fricative

ヴィンテージ,

vinteeji

c c voiceless palatal

plosive

ききょう,

kikyou k

ç C voiceless palatal

fricative

人,

hito k

d d voiced alveolar

plosive

濁点,

dakuten t

d͡ʑ

dz\ voiced alveolo-p

alatal aﬀricate

純,

jun J

g voiced velar

plosive

ご飯,

gohan k

h h voiceless glottal

fricative

本,

hon k

j j palatal approxima

屋根,

yane i

J\ voiced palatal

plosive

行儀,

gyougi J

k k voiceless velar

plosive

漢字,

kanji k

l\ alveolar lateral

ﬂap

釣り,

tsuri r

Japanese (ja-JP) 142

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɺj

l\j alveolar lateral

ﬂap, palatal

approximant

流行,

ryuukou r

m m bilabial nasal

飯,

meshi p

n n alveolar nasal

猫,

neko t

J palatal nasal

日本,

nippon J

N\ uvular nasal

缶,

kan k

p p voiceless bilabial

plosive

パン,

pan p

p\ voiceless bilabial

fricative

福,

huku f

s s voiceless alveolar

fricative

層,

sou s

s\ voiceless alveolo-p

alatal fricative

書簡,

shokan J

t t voiceless alveolar

plosive

手紙,

tegami t

t͡s

ts voiceless alveolar

aﬀricate

釣り,

tsuri s

t͡ɕ

ts\ voiceless alveolo-p

alatal aﬀricate

吉,

kichi J

w w labial-velar

approximant

電話,

denwa u

z z voiced alveolar

fricative

座敷,

zashiki s

Japanese (ja-JP) 143

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Vowels

äː

a:_" long open central

unrounded vowel

羽蟻,

haari a

ä a_" open central

unrounded vowel

仮名,

kana a

eː

e:_o long mid front

unrounded vowel

学生,

gakusei @

e e_o mid front

unrounded vowel

歴,

reki @

i i close front

unrounded vowel

気,

ki i

iː

i: long close front

unrounded vowel

詩歌,

shiika i

M close back

unrounded vowel

運,

un i

ɯː

M: long close back

unrounded vowel

宗教,

shuukyou i

oː

o:_o long mid back

rounded vowel

購読,

koodoku o

o o_o mid back rounded

vowel

読者,

dokusha o

Korean (ko-KR)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA)symbols, and the corresponding visemes for the

Korean voice supported by Amazon Polly.

Korean (ko-KR) 144

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Consonants

k k voiceless velar

plosive

강,

[g]ang k

k# k_t strong voiceless

velar plosive

깨,

[kk]e k

n n alveolar nasal

남,

[n]am t

t t voiceless alveolar

plosive

도,

[d]o t

t# t_t strong voiceless

alveolar plosive

때,

[tt]e t

4 alveolar ﬂap

사랑,

sa[r]ang t

l l alveolar lateral

approximant

돌,

do[l] t

m m bilabial nasal

무,

[m]u p

p p voiceless bilabial

plosive

봄,

[b]om p

p# p_t strong voiceless

bilabial plosive

뻘,

[pp]eol p

s s voiceless alveolar

fricative

새,

[s]e s

s# s_t strong voiceless

alveolar fricative

씨,

[ss]i s

ŋ N velar nasal

방,

ba[ng] k

Korean (ko-KR) 145

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t͡ɕ

ts\ voiceless alveolo-p

alatal aﬀricate

조,

[j]o J

t#͡ɕ

ts\_t strong voiceless

alveolo-palatal

aﬀricate

찌,

[jj]i J

t͡ɕʰ

ts\_h aspirated voiceless

alveolo-palatal

aﬀricate

차,

[ch]a J

kʰ

k_h aspirated voiceless

velar plosive

코,

[k]o k

tʰ

t_h aspirated voiceless

alveolar plosive

통,

[t]ong t

pʰ

p_h aspirated voiceless

bilabial plosive

패,

[p]e p

h h voiceless glottal

fricative

힘,

[h]im k

j j palatal approxima

양,

[y]ang i

w w labial-velar

approximant

왕,

[w]ang u

M\ velar approxima

nt>

의,

[wj]i i

Vowels

a a open front

unrounded vowel

밥,

b[a]b a

Korean (ko-KR) 146

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

V open-mid back

unrounded vowel

정,

j[eo]ng E

E open-mid front

unrounded vowel

배,

b[e] E

o o close-mid back

rounded vowel

노,

n[o] o

u u close back

rounded vowel

둘,

d[u]l u

M close back

unrounded vowel

은,

[eu]n i

i i close front

unrounded vowel

김,

k[i]m i

Norwegian (nb-NO)

The following chart lists the full set of International Phonetic Alphabet (IPA) phonemes and the

Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols as well as the

corresponding visemes as supported by Amazon Polly for Norwegian language voices.

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap prøv t

b b voiced bilabial

plosive

labb p

ç C voiceless palatal

fricative

kino k

Norwegian (nb-NO) 147

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

d d voiced alveolar

plosive

ladd t

d` voiced retroﬂex

plosive

verdi t

f f voiceless labiodent

al fricative

fot f

ɡ ɡ

voiced velar

plosive

tagg k

h h voiceless glottal

fricative

ha k

j j palatal approxima

gi i

k k voiceless velar

plosive

takk k

l l alveolar lateral

approximant

fall, ball t

l` retroﬂex lateral

approximant

ærlig t

m m bilabial nasal lam p

n n alveolar nasal vann t

n` retroﬂex nasal garn t

ŋ N velar nasal sang k

p p voiceless bilabial

plosive

hopp p

Norwegian (nb-NO) 148

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

s s voiceless alveolar

fricative

lass s

s` voiceless retroﬂex

fricative

års S

S voiceless postalveo

lar fricative

skyt S

t t voiceless alveolar

plosive

lat t

t` voiceless retroﬂex

plosive

hardt t

v\ labiodental

approximant

vin f

w w labial-velar

approximant

will x

Vowels

øː

2: long close-mid

front rounded

vowel

søt o

œ 9 open-mid front

rounded vowel

søtt O

@ mid central vowel ape @

æː

{: long near-open

front unrounded

vowel

vær a

} close central

rounded vowel

lund u

Norwegian (nb-NO) 149

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ʉː

}: long close central

rounded vowel

lun u

æ { near-open front

unrounded vowel

vært a

A open back

unrounded vowel

hatt a

ɑː

A: long open back

unrounded vowel

hat a

e: e: long close-mid

front unrounded

vowel

sen e

E open-mid front

unrounded vowel

send E

i: i: long close front

unrounded vowel

vin i

I near-close near-

front unrounded

vowel

vind i

oː oː

long close-mid

back rounded

vowel

våt o

O open-mid back

rounded vowel

vått O

u: u: long close back

rounded vowel

bok u

Norwegian (nb-NO) 150

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

U near-close near-

back rounded

vowel

bukk u

y: y: long close front

rounded vowel

lyn u

Y near-close near-

front rounded

vowel

lynne u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Polish (pl-PL)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Polish voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bobas, belka p

d d voiced alveolar

plosive

dar, do t

Polish (pl-PL) 151

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

d͡z

dz voiced alveolar

aﬀricate

dzwon, widzowie s

d͡ʑ

dz\ voiced alveolo-p

alatal aﬀricate

dźwięk J

d͡ʐ

dz` voiced retroﬂex

aﬀricate

dżem, dżungla S

f f voiceless labiodent

al fricative

furtka, film f

g g voiced velar

plosive

gazeta, waga k

h h voiceless glottal

fricative

chleb, handel k

j j palatal approxima

jak, maja i

k k voiceless velar

plosive

kura, marek k

l l alveolar lateral

approximant

lipa, alicja t

m m bilabial nasal matka, molo p

n n alveolar nasal norka t

J palatal nasal koń, toruń J

p p voiceless bilabial

plosive

pora, stop p

r r alveolar trill rok, park r

Polish (pl-PL) 152

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

s s voiceless alveolar

fricative

sum, pas s

s\ voiceless alveolo-p

alatal fricative

śruba, śnieg J

s` voiceless retroﬂex

fricative

szum, masz S

t t voiceless alveolar

plosive

tok, stół t

t͡s

ts voiceless alveolar

aﬀricate

car, co s

t͡ɕ

ts\ voiceless alveolo-p

alatal aﬀricate

ćma, mieć J

t͡ʂ

ts` voiceless retroﬂex

aﬀricate

czas, raczej S

v v voiced labiodental

fricative

worek, mewa f

w w labial-velar

approximant

łaska, mało u

z z voiced alveolar

fricative

zero s

z\ voiced alveolo-p

alatal fricative

źrebię, bieliźnie J

z` voiced retroﬂex

fricative

żar, żona S

Vowels

Polish (pl-PL) 153

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

a a open front

unrounded vowel

ja a

E open-mid front

unrounded vowel

echo E

ɛ̃

E~ nasal open-mid

front unrounded

vowel

węże E

i i close front

unrounded vowel

ile i

O open-mid back

rounded vowel

oczy O

ɔ̃

O~ nasal open-mid

back rounded

vowel

wąż O

u u close back

rounded vowel

uczta u

1 close central

unrounded vowel

byk i

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Polish (pl-PL) 154

Amazon Polly Developer Guide

Portuguese (pt-PT)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Portuguese voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap pira t

b b voiced bilabial

plosive

dato p

d d voiced alveolar

plosive

dato t

f f voiceless labiodent

al fricative

facto f

g g voiced velar

plosive

gato k

j j palatal approxima

paraguay i

k k voiceless velar

plosive

cacto k

l l alveolar lateral

approximant

galo t

L palatal lateral

approximant

galho J

m m bilabial nasal mato p

n n alveolar nasal nato t

Portuguese (pt-PT) 155

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

J palatal nasal pinha J

p p voiceless bilabial

plosive

pato p

R\ uvular trill barroso k

s s voiceless alveolar

fricative

saca s

S voiceless postalveo

lar fricative

chato S

t t voiceless alveolar

plosive

tacto t

v v voiced labiodental

fricative

vaca f

w w labial-velar

approximant

mau u

z z voiced alveolar

fricative

zaca s

Z voiced postalveo

lar fricative

jacto S

Vowels

a a open front

unrounded vowel

parto a

a ̃ a~ nasal open front

unrounded vowel

pega a

e e close-mid front

unrounded vowel

pega e

Portuguese (pt-PT) 156

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

e ̃ e~ nasal close-mid

front unrounded

vowel

movem e

E open-mid front

unrounded vowel

café E

i i close front

unrounded vowel

lingueta i

ı̃ i~ nasal close front

unrounded vowel

cinto i

o o close-mid back

rounded vowel

poder o

o ̃ o~ nasal close-mid

back rounded

vowel

compra o

O open-mid back

rounded vowel

cotó O

u u close back

rounded vowel

fui u

u ̃ u~ nasal close back

rounded vowel

sunto u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Portuguese (pt-PT) 157

Amazon Polly Developer Guide

Portuguese (Brazilian) (pt-BR)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Brazilian Portuguese voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap pira t

b b voiced bilabial

plosive

bato p

d d voiced alveolar

plosive

dato t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

idade S

f f voiceless labiodent

al fricative

facto f

g g voiced velar

plosive

gato k

j j palatal approxima

paraguay i

k k voiceless velar

plosive

cacto k

l l alveolar lateral

approximant

galo t

L palatal lateral

approximant

galho J

Portuguese (Brazilian) (pt-BR) 158

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

m m bilabial nasal mato p

n n alveolar nasal nato t

J palatal nasal pinha J

p p voiceless bilabial

plosive

pato p

s s voiceless alveolar

fricative

saca s

S voiceless postalveo

lar fricative

chato S

t t voiceless alveolar

plosive

tacto t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

noite S

v v voiced labiodental

fricative

vaca f

w w labial-velar

approximant

mau u

χ X voiceless uvular

fricative

carro k

z z voiced alveolar

fricative

zaca s

Z voiced postalveo

lar fricative

jacto S

Vowels

Portuguese (Brazilian) (pt-BR) 159

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

a a open front

unrounded vowel

parto a

a ̃ a~ nasal open front

unrounded vowel

pensamos a

e e close-mid front

unrounded vowel

pega e

e ̃ e~ nasal close-mid

front unrounded

vowel

movem e

E open-mid front

unrounded vowel

café E

i i close front

unrounded vowel

lingueta i

ı̃ i~ nasal close front

unrounded vowel

cinto i

o o close-mid back

rounded vowel

poder o

o ̃ o~ nasal close-mid

back rounded

vowel

compra o

O open-mid back

rounded vowel

cotó O

u u close back

rounded vowel

fui u

u ̃ u~ nasal close back

rounded vowel

sunto u

Portuguese (Brazilian) (pt-BR) 160

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Romanian (ro-RO)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Romanian voice supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bubă p

d d voiced alveolar

plosive

după t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

george S

f f voiceless labiodent

al fricative

afacere f

g g voiced velar

plosive

agriș k

h h voiceless glottal

fricative

harpă k

Romanian (ro-RO) 161

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

j j palatal approxima

baie i

k k voiceless velar

plosive

coș k

l l alveolar lateral

approximant

lampa t

m m bilabial nasal mama p

n n alveolar nasal nor t

p p voiceless bilabial

plosive

pilă p

r r alveolar trill rampă r

s s voiceless alveolar

fricative

soare s

S voiceless postalveo

lar fricative

mașină S

t t voiceless alveolar

plosive

tata t

t͡s

ts voiceless alveolar

aﬀricate

țară s

t͡ʃ

tS voiceless postalveo

lar aﬀricate

ceai S

v v voiced labiodental

fricative

viață f

w w labial-velar

approximant

beau u

Romanian (ro-RO) 162

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

z z voiced alveolar

fricative

mozol s

Z voiced postalveo

lar fricative

joacă S

Vowels

@ mid central vowel babă @

a a open front

unrounded vowel

casa a

e e close-mid front

unrounded vowel

elan e

e̯

e_^ non-syllabic

close-mid front

unrounded vowel

beau e

i i close front

unrounded vowel

mie i

o o close-mid back

rounded vo

oră o

oa o_^a diphthong oare o

u u close back

rounded vowel

unde u

1 close central

unrounded vowel

România i

Additional Symbols

" primary stress Alabama

Romanian (ro-RO) 163

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Russian (ru-RU)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Russian voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

борт p

bʲ

b' palatalized voiced

bilabial plosive

бюро p

d d voiced alveolar

plosive

дом t

dʲ

d' palatalized voiced

alveolar plosive

дядя t

f f voiceless labiodent

al fricative

флаг f

fʲ

f' palatalized

voiceless labiodent

al fricative

февраль f

g g voiced velar

plosive

нога k

Russian (ru-RU) 164

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ɡʲ

g' palatalized voiced

velar plosive

герой k

j j palatal approxima

дизайн, ящик i

k k voiceless velar

plosive

кот k

kʲ

k' palatalized

voiceless velar

plosive

кино k

l l alveolar lateral

approximant

лампа t

lʲ

l' palatalized

alveolar lateral

approximant

лес t

m m bilabial nasal мама p

mʲ

m' palatalized bilabial

nasal

мяч p

n n alveolar nasal нос t

nʲ

n' palatalized

alveolar nasal

няня t

p p voiceless bilabial

plosive

папа p

pʲ

p' palatalized

voiceless bilabial

plosive

перо p

r r alveolar trill роза r

Russian (ru-RU) 165

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

rʲ

r' palatalized

alveolar trill

рюмка r

s s voiceless alveolar

fricative

сыр s

sʲ

s' palatalized

voiceless alveolar

fricative

сердце, русь s

ɕ:

s\: long voiceless

alveolo-palatal

fricative

щека J

s` voiceless retroﬂex

fricative

шум S

t t voiceless alveolar

plosive

точка t

tʲ

t' palatalized

voiceless alveolar

plosive

тётя t

t͡s

ts voiceless alveolar

aﬀricate

царь s

t͡ɕ

ts\ voiceless alveolo-p

alatal aﬀricate

час J

v v voiced labiodental

fricative

вор f

vʲ

v' palatalized voiced

labiodental

fricative

верфь f

Russian (ru-RU) 166

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

x x voiceless velar

fricative

хор k

xʲ

x' palatalized

voiceless velar

fricative

химия k

z z voiced alveolar

fricative

зуб s

zʲ

z' palatalized voiced

alveolar fricative

зима s

ʑ:

z\: long voiced

alveolo-palatal

fricative

уезжать J

z` voiced retroﬂex

fricative

жена S

Vowels

@ mid central vowel канарейка @

a a open front

unrounded vowel

два, яблоко a

e e close-mid front

unrounded vowel

печь e

E open-mid front

unrounded vowel

это E

i i close front

unrounded vowel

один, четыре i

o o close-mid back

rounded vowel

кот o

Russian (ru-RU) 167

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

u u close back

rounded vowel

муж, вьюга u

1 close central

unrounded vowel

мышь i

Spanish (es-ES)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Spanish voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap pero, bravo, amor,

eterno

b b voiced bilabial

plosive

bestia p

β B voiced bilabial

fricative

bebé B

d d voiced alveolar

plosive

cuando t

ð D voiced dental

fricative

arder T

f f voiceless labiodent

al fricative

fase, café f

Spanish (es-ES) 168

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

g g voiced velar

plosive

gato, lengua,

guerra

G voiced velar

fricative

trigo, Argos k

j j palatal approxima

hacia, tierra, radio,

viuda

j\ voiced palatal

fricative

enhielar, sayo,

inyectado,

desyerba

k k voiceless velar

plosive

caña, laca,

quisimos

l l alveolar lateral

approximant

lino, calor,

principal

L palatal lateral

approximant

llave, pollo J

m m bilabial nasal madre, comer,

anﬁbio

n n alveolar nasal nido, anillo, sin t

J palatal nasal cabaña, ñoquis J

ŋ N velar nasal cinco, venga k

p p voiceless bilabial

plosive

pozo, topo p

r r alveolar trill perro, enrachado r

s s voiceless alveolar

fricative

saco, casa, puertas s

Spanish (es-ES) 169

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t t voiceless alveolar

plosive

tamiz, átomo t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chubasco S

θ T voiceless dental

fricative

cereza, zorro,

lacero, paz

w w labial-velar

approximant

fuego, fuimos,

cuota, cuadro

x x voiceless velar

fricative

jamón, general,

suje, reloj

z z voiced alveolar

fricative

rasgo, mismo s

Vowels

a a open front

unrounded vowel

tanque a

e e close-mid front

unrounded vowel

peso e

i i close front

unrounded vowel

cinco i

o o close-mid back

rounded vowel

bosque o

u u close-mid front

unrounded vowel

publicar u

Additional Symbols

" primary stress Alabama

Spanish (es-ES) 170

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Spanish (Mexican) (es-MX)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Mexican Spanish voice that is supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap pero, bravo, amor,

eterno

b b voiced bilabial

plosive

bestia p

β B voiced bilabial

fricative

bebé B

d d voiced alveolar

plosive

cuando t

ð D voiced dental

fricative

arder T

f f voiceless labiodent

al fricative

fase, café f

g g voiced velar

plosive

gato, lengua,

guerra

Spanish (Mexican) (es-MX) 171

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

G voiced velar

fricative

trigo, Argos k

j j palatal approxima

hacia, tierra, radio,

viuda

j\ voiced palatal

fricative

enhielar, sayo,

inyectado,

desyerba

k k voiceless velar

plosive

caña, laca,

quisimos

l l lateral alveolar

approximant

lino, calor,

principal

m m bilabial nasal madre, comer,

anﬁbio

n n alveolar nasal nido, anillo, sin t

J palatal nasal cabaña, ñoquis J

ŋ N velar nasal angosto, increíble k

p p voiceless bilabial

plosive

pozo, topo p

r r alveolar trill perro, enrachado r

s s voiceless alveolar

fricative

saco, casa, puertas s

S voiceless postalveo

lar fricative

show, ﬂash S

t t voiceless alveolar

plosive

tamiz, átomo t

Spanish (Mexican) (es-MX) 172

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chubasco S

w w labial-velar

approximant

fuego, fuimos,

cuota, cuadro

x x voiceless velar

fricative

jamón, general,

peaje, reloj

z z voiced alveolar

fricative

rasgo, mismo s

Vowels

a a central open

unrounded vowel

tanque a

e e close-mid front

unrounded vowel

peso e

i i close front

unrounded vowel

cinco i

o o close-mid back

rounded vowel

bosque o

u u close back

rounded vowel

publicar u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Spanish (Mexican) (es-MX) 173

Amazon Polly Developer Guide

Spanish (US) (es-US)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the US Spanish voices that are supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap pero, bravo, amor,

eterno

b b voiced bilabial

plosive

bestia p

β B voiced bilabial

fricative

bebé B

d d voiced alveolar

plosive

cuando t

ð D voiced dental

fricative

arder T

f f voiceless labiodent

al fricative

fase, café f

g g voiced velar

plosive

gato, lengua,

guerra

G voiced velar

fricative

trigo, Argos k

j j palatal approxima

hacia, tierra, radio,

viuda

Spanish (US) (es-US) 174

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

j\ voiced palatal

fricative

enhielar, sayo,

inyectado,

desyerba

k k voiceless velar

plosive

caña, laca,

quisimos

l l lateral alveolar

approximant

lino, calor,

principal

m m bilabial nasal madre, comer,

anﬁbio

n n alveolar nasal nido, anillo, sin t

J palatal nasal cabaña, ñoquis J

ŋ N velar nasal angosto, increíble k

p p voiceless bilabial

plosive

pozo, topo p

r r alveolar trill perro, enrachado r

s s voiceless alveolar

fricative

saco, casa, puertas s

S voiceless postalveo

lar fricative

show, ﬂash S

t t voiceless alveolar

plosive

tamiz, átomo t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

chubasco S

w w labial-velar

approximant

fuego, fuimos,

cuota, cuadro

Spanish (US) (es-US) 175

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

x x voiceless velar

fricative

jamón, general,

peaje, reloj

z z voiced alveolar

fricative

rasgo, mismo s

Vowels

a a central open

unrounded vowel

tanque a

e e close-mid front

unrounded vowel

peso e

i i close front

unrounded vowel

cinco i

o o close-mid back

rounded vowel

bosque o

u u close back

rounded vowel

publicar u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Swedish (sv-SE)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Swedish voice supported by Amazon Polly.

Swedish (sv-SE) 176

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

bil p

d d voiced alveolar

plosive

dal t

d` voiced retroﬂex

plosive

bord t

f f voiceless labiodent

al fricative

fil f

g g voiced velar

plosive

gås k

h h voiceless glottal

fricative

hal k

j j palatal approxima

jag i

k k voiceless velar

plosive

kal k

l l alveolar lateral

approximant

lös t

l` retroﬂex lateral

approximant

härlig t

m m bilabial nasal mil p

n n alveolar nasal nålar t

n` retroﬂex nasal barn t

Swedish (sv-SE) 177

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ŋ N velar nasal ring k

p p voiceless bilabial

plosive

pil p

r r alveolar trill ris r

s s voiceless alveolar

fricative

sil s

s\ voiceless alveolo-p

alatal fricative

tjock J

s` voiceless retroﬂex

fricative

fors, schlager S

t t voiceless alveolar

plosive

tal t

t` voiceless retroﬂex

plosive

hjort t

v v voiced labiodental

fricative

vår f

w w labial-velar

approximant

aula, airways u

x\ voiceless palatal-v

elar fricative

sjuk k

Vowels

ø 2 close-mid front

rounded vowel

föll, förr o

Swedish (sv-SE) 178

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

ø 2: long close-mid

front rounded

vowel

föl, nöt, för o

8 close-mid central

rounded vowel

buss, full o

@ mid central vowel pojken @

ʉː

}: long close central

rounded vowel

hus, ful u

a a open front

unrounded vowel

hall, matt a

æ { near-open front

unrounded vowel

herr a

ɑː

A: long open back

unrounded vowel

hal, mat a

e: e: long close-mid

front unrounded

vowel

vet, hel e

E open-mid front

unrounded vowel

vett, rätt, hetta,

häll

ɛː

E: long open-mid

front unrounded

vowel

säl, häl, här E:

i: i: long close front

unrounded vowel

vit, sil i:

Swedish (sv-SE) 179

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

I near-close near-

front unrounded

vowel

vitt, sill i

o: o: long close-mid

back rounded

vowel

hål, mål o

O open-mid back

rounded vowel

håll, moll O

u: u: long close back

rounded vowel

sol, bot u

U near-close near-

back rounded

vowel

bott u

y y close front

rounded vowel

bytt u

y: y: long close front

rounded vowel

syl, syl u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Turkish (tr-TR)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Turkish voice supported by Amazon Polly.

Turkish (tr-TR) 180

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

4 alveolar ﬂap durum t

ɾ̝̊

4_0_r voiceless fricated

alveolar ﬂap

bir t

ɾ̝

4_r fricated alveolar

ﬂap

raf t

b b voiced bilabial

plosive

raf p

c c voiceless palatal

plosive

kedi k

d d voiced alveolar

plosive

dede t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

cam S

f f voiceless labiodent

al fricative

fare f

g g voiced velar plosiv galibi k

h h voiceless glottal

fricative

hasta k

j j palatal approxima

yat i

J\ voiced palatal

plosive

genç J

Turkish (tr-TR) 181

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

k k voiceless velar

plosive

akıl k

l l alveolar lateral

approximant

lale t

5 velarized alveolar

lateral approxima

labirent t

m m bilabial nasal maaş p

n n alveolar nasal anı t

p p voiceless bilabial

plosive

ip p

s s voiceless alveolar

fricative

ses s

S voiceless postalveo

lar fricative

aşı S

t t voiceless alveolar

plosive

ütü t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

çaba S

v v voiced labiodental

fricative

ekvator, kahveci,

akvaryum, isveçli,

teşviki, cetvel

z z voiced alveolar

fricative

ver s

Z voiced postalveo

lar fricative

azık S

Turkish (tr-TR) 182

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

Vowels

ø 2 close-mid front

rounded vowel

göl 0

œ 9 open-mid front

rounded vowel

banliyö O

a a open front

unrounded vowel

kal a

a: a: long open front

unrounded vowel

davacı a

æ { near-open front

unrounded vowel

özlem, güvenlik,

gürel, somersault

e e close-mid front

unrounded vowel

keçi e

E open-mid front

unrounded vowel

dede E

i i close front

unrounded vowel

bir i

i: i: long close front

unrounded vowel

izah i

I near-close near-

front unrounded

vowel

keçi i

M close back

unrounded vowel

kıl i

o o close-mid back

rounded vowel

kol o

Turkish (tr-TR) 183

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

o: o: long close-mid

back rounded

vowel

dolar o

u u close back

rounded vowel

durum u

u: u: long close back

rounded vowel

ruhum u

U near-close near-

back rounded

vowel

dolu u

y y close front

rounded vowel

güvenlik u

Y near-close near-

front rounded

vowel

aşı u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Welsh (cy-GB)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech

Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for

the Welsh voice supported by Amazon Polly.

Welsh (cy-GB) 184

Amazon Polly Developer Guide

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

b b voiced bilabial

plosive

baban p

d d voiced alveolar

plosive

deg t

d͡ʒ

dZ voiced postalveo

lar aﬀricate

garej S

ð D voiced dental

fricative

deuddeg T

f f voiceless labiodent

al fricative

ﬀacs f

g g voiced velar

plosive

gadael k

h h voiceless glottal

fricative

haearn k

j j palatal approxima

astudio i

k k voiceless velar

plosive

cant k

l l alveolar lateral

approximant

lan t

K voiceless alveolar

lateral fricative

llan t

m m bilabial nasal mae p

Welsh (cy-GB) 185

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

m̥

m_0 voiceless bilabial

nasal

ymhen p

n n alveolar nasal naw t

n̥

n_0 voiceless alveolar

nasal

anhawster t

ŋ N velar nasal argyfwng k

ŋ̊ N_0 voiceless velar

nasal

anghenion k

p p voiceless bilabial

plosive

pump p

r r alveolar trill rhoi r

r̥

r_0 voiceless alveolar

trill

garw r

s s voiceless alveolar

fricative

saith s

S voiceless postalveo

lar fricative

siawns S

t t voiceless alveolar

plosive

tegan t

t͡ʃ

tS voiceless postalveo

lar aﬀricate

cytsain S

θ T voiceless dental

fricative

aberth T

v v voiced labiodental

fricative

prawf f

Welsh (cy-GB) 186

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

w w labial-velar

approximant

rhagweld u

χ X voiceless uvular

fricative

chwech k

z z voiced alveolar

fricative

aids s

Z voiced postalveo

lar fricative

rouge S

Vowels

@ mid central vowel ychwanega @

a a open front

unrounded vowel

acen a

ai ai diphthong dau a

au au diphthong awdur a

ɑː

A: long open back

unrounded vowel

mab a

ɑːɨ

A:1 diphthong aelod a

e: e: long close-mid

front unrounded

vowel

peth e

E open-mid front

unrounded vowel

pedwar E

ɛi

Ei diphthong beic E

Welsh (cy-GB) 187

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

i: i: long close front

unrounded vowel

tri i

I near-close near-

front unrounded

vowel

miliwn i

ɨu

1u diphthong unigryw i

o: o: long close-mid

back rounded

vowel

oddi o

O open-mid back

rounded vowel

oddieithr O

ɔi

Oi diphthong troi O

ɔu

Ou diphthong rownd O

u: u: long close back

rounded vowel

cwch u

U near-close near-

back rounded

vowel

acwstig u

ʊi

Ui diphthong wyth u

Additional Symbols

" primary stress Alabama

% secondary stress Alabama

. . syllable boundary A.la.ba.ma

Welsh (cy-GB) 188

Amazon Polly Developer Guide

Amazon Polly voice engines

Amazon Polly has four voice engines that convert input text into life-like speech. These include:

Generative, Long-form, Neural, and Standard. To use an Amazon Polly voice, select an engine

and a speech synthesis API operation. Then provide input text for the engine to synthesize, and

select an audio output format. Given these inputs, Amazon Polly synthesizes the provided text into

a high-quality speech audio stream.

The following sections include details about the voice engines oﬀered by Amazon Polly.

Topics

• Generative voices

• Long-form voices

• Neural voices

• Standard voices

Generative voices

Amazon Polly's generative text-to-speech (TTS) engine oﬀers the most human-like, emotionally

engaged, and adaptive conversational voices available for the use via the Amazon Polly console.

The Generative engine is the largest Amazon Polly TTS model to-date. It deploys a billion-

parameter transformer that converts raw text into speech codes, followed by a convolution-based

decoder that converts these speech codes into waveforms in an incremental, streamable manner.

This method shows the widely-reported emergent abilities of Large Language Models (LLMs) when

trained on increasing volumes of publicly available and proprietary data comprising a variety of

voices, languages, and styles.

The Generative engine creates synthetic speech which is emotionally engaged, assertive, and

highly colloquial in a way that is remarkably similar to a human voice. You can use these voices as

a knowledgeable customer assistant, a virtual trainer, or an advertiser with a near-human synthetic

speech.

Note

The state-of-the-art technology underlying these voices falls within the paradigm of

generative AI for language and voice modelling. A side eﬀect of the technology is that any

Generative engine 189

Amazon Polly Developer Guide

updates to the training data and the model could result in slight variations to the way the

voices sound, even in case when their overall quality improves with model updates. This

could have an impact on use cases with diﬀerent content parts synthesized over a long

time period – for example, a season of podcasts.

Available generative voices

Amazon Polly currently oﬀers two female and one male English voice in a generative variant. These

generative voices are also available in a conversational NTTS variant.

Language Language code Name/ID Gender

1 English (UK) en-GB Amy Female

2 English (US) en-US Matthew

Ruth

Male

Female

Note

Generative voices cost is speciﬁed on the Amazon Polly pricing information page.

Feature and region compatibility

Amazon Polly generative voices are available in the following regions:

• US East (N. Virginia): us-east-1

• Europe (Frankfurt): eu-central-1

• Other Regions are not available

The following features are supported for generative voices:

• Real-time and asynchronous speech synthesis operations.

• Newscaster speaking style is not supported in the Generative engine.

Available generative voices 190

Amazon Polly Developer Guide

• Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-

supported SSML tags, see Supported SSML tags

• As with standard voices, you can choose from various sampling rates to optimize the bandwidth

and audio quality for your application. Valid sampling rates for standard and neural voices are

8 kHz, 16 kHz, 22 kHz, or 24 kHz. The default for standard voices is 22 kHz. The default for

generative voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis), and raw PCM audio

stream formats.

Support for generating speech marks is currently not available.

Note

In the unlikely event of model hallucination, (and with the Generative engine's model

behavior of rendering the speech token by token) an imposed emergency stop mechanism

is in place. The built-in mechanism stops the model from rendering speech any further. This

safety feature is based on data analysis where the model has the potential to hallucinate,

usually at the end of the sentence.

There could be cases where the model thinks it is going to hallucinate and then might

end up cutting a word during a generation step, thus rendering half the word. This could

potentially generate inappropriate results.

Using the Generative engine on the console

You can access Amazon Polly generative voices through the Amazon Polly console or AWS CLI.

From the console, select the Generative engine, then select a corresponding generative voice from

the list to hear the synthesized speech in that voice. You can also explore generative voices with the

SynthesizeSpeech and StartSpeechSynthesisTask API operations. For the API operations,

you can specify the engine and the name of the voices in the API request. For quick-start getting

started code examples using Python, see Python examples.

To use the generative engine on the console

1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.

2. From the Amazon Polly console, choose the Generative engine.

3. Choose the desired voice from the voice dropdown menu.

4. Generate TTS audio with text of your choice.

Using the Generative engine on the console 191

Amazon Polly Developer Guide

Note

Generative voices can also be used with the SynthesizeSpeech and

StartSpeechSynthesisTask API operations. For the API operations, customers can

specify the engine and the name of the voices in the API request. You can ﬁnd more quick-

start code samples here.

Long-form voices

Amazon Polly has a Long-form engine that produces human-like, highly expressive, and

emotionally adept voices. Long-form voices are designed to captivate listeners’ attention for longer

content, such as news articles, training materials, or marketing videos.

Amazon Polly Long-form voices are developed with a cutting-edge deep learning TTS technology.

The model learns to replicate phonemes, prosody, intonation, and other phonetic and acoustic

aspects of human language, resulting in a highly natural speech output.

The Long-form engine uses text embeddings to interpret the meaning of a text. Using text

embeddings, the Long-form engine can generate the correct emphasis, pauses, and tone of a

natural voice. The result is a voice that combines the complete range of emotional elements

present in human communication. This includes mimicking surprisal or diﬀerentiating dialogue

from narration. Together, this creates a premium speech product that sounds like a live human

being.

Note

The state-of-the-art technology underlying these voices falls within the paradigm of

generative AI for language and voice modelling. A side eﬀect of the technology is that any

updates to the training data and the model could result in a slight variations to the way

the voices sound, even in case when their overall quality improves with model updates.

This could have an impact on use cases with diﬀerent content parts synthesized over a long

time period – for example, a season of podcasts.

Long-form engine 192

Amazon Polly Developer Guide

Available long-form voices

Amazon Polly currently oﬀers two female and one male en-US long-form voice. These long-form

voices are also available in a conversational NTTS variant.

Language Language code Name/ID Gender

1 English (US) en-US Danielle

Gregory

Ruth

Female

Male

Female

Feature and region compatibility

Amazon Polly long-form voices are available in the following regions:

• US East (N. Virginia): us-east-1

• Other regions not available

The Amazon Polly Long-form engine supports the following features:

• Real-time and asynchronous speech synthesis operations.

• All speech marks.

• Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-

supported SSML tags, see Supported SSML tags

• As with standard voices, you can choose from various sampling rates to optimize the bandwidth

and audio quality for your application. Valid sampling rates for standard, long-form, and neural

voices are: 8 kHz, 16 kHz, 22kHz, or 24 kHz. The default for standard voices is 22 kHz. The

default for long-form and neural voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis),

and raw PCM audio stream formats.

Note

Long-form voices cost is speciﬁed on the Amazon Polly pricing information page.

Available long-form voices 193

Amazon Polly Developer Guide

Using the Long-form engine on the console

You can access Amazon Polly long-form voices through the Amazon Polly console or AWS CLI.

To use the Long-form engine on the console

1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.

2. From the Amazon Polly console, choose the Long Form engine.

3. Choose the desired voice from the voice dropdown menu.

4. Generate TTS audio with text of your choice.

Note

Long-form voices can also be used with the SynthesizeSpeech and

StartSpeechSynthesisTask API operations. For the API operations, customers can

specify the engine and the name of the voices in the API request. You can ﬁnd more quick-

start code samples here.

Neural voices

Amazon Polly has a Neural text-to-speech (NTTS) engine that can produce even higher quality

voices than its standard voices. Standard TTS voices use concatenative synthesis. The standard

engine concatenates phonemes of recorded speech, producing very natural-sounding synthesized

speech. However, the inevitable variations in speech and the techniques used to segment the

waveforms limits the quality of speech. The Amazon Polly NTTS engine doesn't use standard

concatenative synthesis to produce speech. It has two parts:

• A neural network — that converts a sequence of phonemes (the most basic units of language)

into a sequence of spectrograms. (Spectograms are snapshots of the energy levels in diﬀerent

frequency bands.)

• A vocoder — that converts spectrograms into a nearly continuous audio signal.

The ﬁrst component of the neural TTS system is a sequence-to-sequence model. This model

doesn’t create its results solely from the corresponding input but also considers how the sequence

of the elements of the input work together. The model chooses the spectrograms that it outputs so

Using the Long-form engine on the console 194

Amazon Polly Developer Guide

that their frequency bands emphasize acoustic features that the human brain uses when processing

speech.

The output of this model then passes to a neural vocoder. This converts the spectrograms

into speech waveforms. When trained on the large datasets used to build general-purpose

concatenative-synthesis systems, this sequence-to-sequence approach will yield higher-quality,

more natural-sounding voices.

Available neural voices

Neural voices are available in 35 languages and language variants. The following table lists the

voices.

Language

and language

variants

Language code Name/ID Gender

1 Arabic (Gulf) ar-AE Hala

Zayd

Female

Male

2 Belgian Dutch

(Flemish)

nl-BE Lisa Female

3 Catalan ca-ES Arlet Female

4 Czech cs-CZ Jitka Female

5 Chinese

(Cantonese)

yue-CN Hiujin Female

6 Chinese

(Mandarin)

cmn-CN Zhiyu Female

7 Danish da-DK Soﬁe Female

8 Dutch nl-NL Laura Female

9 English

(Australian)

en-AU Olivia Female

Available neural voices 195

Amazon Polly Developer Guide

Language

and language

variants

Language code Name/ID Gender

10 English (British) en-GB Amy*

Emma

Brian

Arthur

Female

Male

11 English (Indian) en-IN Kajal Female

12 English (Irish) en-IE Niamh Female

13 English (New

Zealand)

en-NZ Aria Female

14 English (South

African)

en-ZA Ayanda Female

Available neural voices 196

Amazon Polly Developer Guide

Language

and language

variants

Language code Name/ID Gender

15 English (US) en-US Danielle

Gregory

Ivy

Joanna*

Kendra

Kimberly

Salli

Joey

Justin

Kevin

Matthew*

Ruth

Stephen

Female

Male

Female(child)

Female

Male

Male (child)

Male

Female

Male

16 Finnish ﬁ-FI Suvi Female

17 French (Belgian) fr-BE Isabelle Female

18 French

(Canadian)

fr-CA Gabrielle

Liam

Female

Male

19 French fr-FR Léa

Rémi

Female

Male

Available neural voices 197

Amazon Polly Developer Guide

Language

and language

variants

Language code Name/ID Gender

20 German de-DE Vicki

Daniel

Female

Male

21 German

(Austrian)

de-AT Hannah Female

22 German (Swiss) de-CH Sabrina Female

23 Hindi hi-IN Kajal Female

24 Italian it-IT Bianca

Adriano

Female

Male

25 Japanese ja-JP Takumi

Kazuha

Tomoko

Male

Female

26 Korean ko-KR Seoyeon Female

27 Norwegian nb-NO Ida Female

28 Polish pl-PL Ola Female

29 Portuguese

(Brazilian)

pt-BR Camila

Vitória/Vitoria

Thiago

Female

Male

30 Portuguese

(European)

pt-PT Inês/Ines Female

Available neural voices 198

Amazon Polly Developer Guide

Language

and language

variants

Language code Name/ID Gender

31 Spanish

(European)

es-ES Lucia

Sergio

Female

Male

32 Spanish

(Mexican)

es-MX Mia

Andrés

Female

Male

33 Spanish (US) es-US Lupe*

Pedro

Female

Male

34 Swedish sv-SE Elin Female

35 Turkish tr-TR Burcu Female

*The Amy, Joanna, Lupe, and Matthew voices can be used with the Newscaster speaking style. For

more information, see Newscaster voices.

Topics

• Feature and region compatibility

• Using the Neural engine on the console

Feature and region compatibility

Neural voices aren't available in all AWS Regions, nor do they support all Amazon Polly features.

Neural voices are supported in the following regions:

• US East (N. Virginia): us-east-1

• US West (Oregon): us-west-2

• Africa (Cape Town): af-south-1

• Asia Paciﬁc (Tokyo): ap-northeast-1

• Asia Paciﬁc (Seoul): ap-northeast-2

Feature and region compatibility 199

Amazon Polly Developer Guide

• Asia Paciﬁc (Osaka): ap-northeast-3

• Asia Paciﬁc (Mumbai): ap-south-1

• Asia Paciﬁc (Singapore): ap-southeast-1

• Asia Paciﬁc (Sydney): ap-southeast-2

• Canada (Central): ca-central-1

• Europe (Frankfurt): eu-central-1

• Europe (Ireland): eu-west-1

• Europe (London): eu-west-2

• Europe (Paris): eu-west-3

• AWS GovCloud (US-West): us-gov-west-1

Endpoints and protocols for these Regions are identical to those used for standard voices. For more

information, see Amazon Polly endpoints and quotas.

The following features are supported for neural voices:

• Real-time and asynchronous speech synthesis operations.

• Newscaster speaking style. For more information about the speaking styles, see Newscaster

voices.

• All speech marks.

• Many (but not all) of the SSML tags that are supported by Amazon Polly. For more information

about NTTS-supported SSML tags, see Supported Tags.

As with standard voices, you can choose from various sampling rates to optimize the bandwidth

and audio quality for your application. Valid sampling rates for standard and neural voices are 8

kHz, 16 kHz, 22 kHz, or 24 kHz. The default for standard voices is 22 kHz. The default for neural

voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis), and raw PCM audio stream formats.

Using the Neural engine on the console

You can access Amazon Polly Neural voices through the Amazon Polly console or AWS CLI.

To use the neural engine on the console

1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.

Using the Neural engine on the console 200

Amazon Polly Developer Guide

2. From the console, choose the Neural engine.

3. Choose the desired voice from the voice dropdown menu.

4. Generate TTS audio with text of your choice.

Standard voices

Amazon Polly has a standard engine that use concatenative synthesis. The standard engine

concatenates phonemes of recorded speech, producing very natural-sounding synthesized speech.

Available Standard voices

Amazon Polly currently oﬀers 40 female and 20 male standard voices in 29 language and language

variants.

 Language Language code Name/ID Gender

1 Arabic arb Zeina Female

2 Chinese

(Mandarin)

cmn-CN Zhiyu Female

3 Danish da-DK Naja

Mads

Female

Male

4 Dutch nl-NL Lotte

Ruben

Female

Male

5 English

(Australian)

en-AU Nicole

Russell

Female

Male

6 English (British) en-GB Amy

Emma

Brian

Female

Male

7 English (Indian) en-IN Aditi Female

Standard engine 201

Amazon Polly Developer Guide

 Language Language code Name/ID Gender

Raveena Female

8 English (US) en-US Ivy

Joanna

Kendra

Kimberly

Salli

Joey

Kevin

Female

Male

9 English (Welsh) en-GB-WLS Geraint Male

10 French fr-FR Céline/Celine

Léa

Mathieu

Female

Male

11 French

(Canadian)

fr-CA Chantal Female

12 German de-DE Marlene

Vicki

Hans

Female

Male

13 Hindi hi-IN Aditi Female

14 Icelandic is-IS Dóra/Dora

Karl

Female

Male

Available Standard voices 202

Amazon Polly Developer Guide

 Language Language code Name/ID Gender

15 Italian it-IT Carla

Bianca

Giorgio

Female

Male

16 Japanese ja-JP Mizuki

Takumi

Female

Male

17 Korean ko-KR Seoyeon Female

18 Norwegian nb-NO Liv Female

19 Polish pl-PL Ewa

Maja

Jacek

Jan

Female

Male

20 Portuguese

(Brazilian)

pt-BR Camila

Vitória/Vitoria

Ricardo

Female

Male

21 Portuguese

(European)

pt-PT Inês/Ines

Cristiano

Female

Male

22 Romanian ro-RO Carmen Female

23 Russian ru-RU Tatyana

Maxim

Female

Male

Available Standard voices 203

Amazon Polly Developer Guide

 Language Language code Name/ID Gender

24 Spanish

(European)

es-ES Conchita

Lucia

Enrique

Female

Male

25 Spanish

(Mexican)

es-MX Mia Female

26 Spanish (US) es-US Lupe

Penélope/

Penelope

Miguel

Female

Male

27 Swedish sv-SE Astrid Female

28 Turkish tr-TR Filiz Male

29 Welsh cy-GB Gwyneth Female

Feature and region compatibility

Amazon Polly standard voices are available in the following Amazon Polly regions:

• US East (N. Virginia): us-east-1

• US East (Ohio): us-east-2

• US West (N. California): us-west-1

• US West (Oregon): us-west-2

• Africa (Cape Town): af-south-1

• Asia Paciﬁc (Hong Kong): ap-east-1

• Asia Paciﬁc (Tokyo): ap-northeast-1

• Asia Paciﬁc (Seoul): ap-northeast-2

• Asia Paciﬁc (Osaka): ap-northeast-3

Feature and region compatibility 204

Amazon Polly Developer Guide

• Asia Paciﬁc (Mumbai): ap-south-1

• Asia Paciﬁc (Singapore): ap-southeast-1

• Asia Paciﬁc (Sydney): ap-southeast-2

• China (Ningxia): cn-northwest-1;

• Canada (Central): ca-central-1

• Europe (Frankfurt): eu-central-1

• Europe (Ireland): eu-west-1

• Europe (London): eu-west-2

• Europe (Paris): eu-west-3

• Europe (Stockholm): eu-north-1

• Middle East (Bahrain): me-south-1

• South America (São Paulo): sa-east-1

• AWS GovCloud (US-West): us-gov-west-1

Endpoints and protocols for these Regions are identical to those used for Neural voices. For more

information, see Amazon Polly endpoints and quotas.

The Amazon Polly standard engine supports the following features (TBD):

• Real-time and asynchronous speech synthesis operations.

• All speech marks.

• Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-

supported SSML tags, see Supported SSML tags.

• You can choose from various sampling rates to optimize the bandwidth and audio quality for

your application. The default sampling rates for standard voices are 22 kHz. Amazon Polly

supports MP3, OGG (Vorbis), and raw PCM audio stream formats.

Note

Standard voices cost is speciﬁed on the Amazon Polly pricing information page.

Feature and region compatibility 205

Amazon Polly Developer Guide

Using the Standard engine on the console

You can access Amazon Polly standard voices through the Amazon Polly console or AWS CLI.

To use a standard voice on the console

1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.

2. From the Amazon Polly console, choose the Standard engine.

3. Choose the desired voice from the voice dropdown menu.

4. Generate TTS audio with text of your choice.

Note

Standard voices can also be used with the SynthesizeSpeech and

StartSpeechSynthesisTask API operations. For the API operations, customers can

specify the engine and the name of the voices in the API request. You can ﬁnd more quick-

start code samples.

Using the Standard engine on the console 206

Amazon Polly Developer Guide

Speech marks

Speech marks are metadata that describe the speech that you synthesize, such as where a sentence

or word starts and ends in the audio stream. When you request speech marks for your text,

Amazon Polly returns this metadata instead of synthesized speech. By using speech marks in

conjunction with the synthesized speech audio stream, you can provide your applications with an

enhanced visual experience.

For example, combining the metadata with the audio stream from your text can enable you to

synchronize speech with facial animation (lip-syncing) or to highlight written words as they're

spoken.

Speechmarks are available when using either neural or standard text-to-speech formats.

Topics

• Speech mark types

• Using speech marks

• Requesting speech marks on the console

Speech mark types

You request speech marks using the SpeechMarkTypes option for either the SynthesizeSpeech or

StartSpeechSynthesisTask commands. You specify the metadata elements that you want to return

from your input text. You can request as many as four types of metadata but you must specify at

least one per request. No audio output is generated with the request.

In the AWS CLI, for example:

--speech-mark-types='["sentence", "word", "viseme", "ssml"]'

Amazon Polly generates speech marks using the following elements:

• sentence – Indicates a sentence element in the input text.

• word – Indicates a word element in the text.

• viseme – Describes the face and mouth movements corresponding to each phoneme being

spoken. For more information, see Visemes and Amazon Polly.

Speech mark types 207

Amazon Polly Developer Guide

• ssml – Describes a <mark> element from the SSML input text. For more information, see

Generating speech from SSML documents.

Visemes and Amazon Polly

A viseme represents the position of the face and mouth when saying a word. It is the visual

equivalent of a phoneme, which is the basic acoustic unit from which a word is formed. Visemes are

the basic visual building blocks of speech.

Each language has a set of viseme that correspond to their speciﬁc phonemes. In a language,

each phoneme has a corresponding viseme that represents the shape that the mouth makes when

forming the sound. However, not all visemes can be mapped to a particular phoneme because

numerous phonemes appear the same when spoken, even though they sound diﬀerent. For

example, in English, the words "pet" and "bet" are acoustically diﬀerent. However, when observed

visually (without sound), they look exactly the same.

The following chart shows a partial list of International Phonetic Alphabet (IPA) phonemes and

Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols as well as their

corresponding visemes for US English voices.

For the complete table and tables for all available languages, see Phoneme and Viseme Tables for

Supported Languages.

IPA X-SAMPA Description Example Viseme

Consonants

b b Voiced bilabial

plosive

bed p

d d Voiced alveolar

plosive

dig t

d͡ʒ

dZ Voiced postalveo

lar aﬀricate

jump S

ð D Voiced dental

fricative

then T

Visemes and Amazon Polly 208

Amazon Polly Developer Guide

IPA X-SAMPA Description Example Viseme

f f Voiceless labiodent

al fricative

five f

g g Voiced velar

plosive

game k

h h Voiceless glottal

fricative

house k

... ... ... ... ...

Using speech marks

Requesting speech marks

To request speech marks for input text, use the synthesize-speech command. Besides the input

text, the following elements are required to return this metadata:

•

output-format

Amazon Polly supports only the JSON format when returning speech marks.

--output-format json

If you use an unsupported output format, Amazon Polly throws an exception.

•

voice-id

To ensure that the metadata matches the associated audio stream, specify the same voice that is

used to generate the synthesized speech audio stream. The available voices don't have identical

speech rates. If you use a voice other than the one used to generate the speech, the metadata

will not match the audio stream.

--voice-id Joanna

•

speech-mark-types

Using speech marks 209

Amazon Polly Developer Guide

Specify the type or types of speech marks you want. You can request any or all of the speech

mark types, but must specify at least one type.

--speech-mark-types='["sentence", "word", "viseme", "ssml"]'

•

text-type

Plain text is the default input text for Amazon Polly, so you must use text-type ssml if you

want to return SSML speech marks.

•

outfile

Specify the output ﬁle to which the metadata is written.

MaryLamb.txt



The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \

--output-format json \

--voice-id Voice ID \

--text 'Input text' \

--speech-mark-types='["sentence", "word", "viseme"]' \

outfile

Speech mark output

Amazon Polly returns speech mark objects in a line-delimited JSON stream. A speech mark object

contains the following ﬁelds:

• time – the timestamp in milliseconds from the beginning of the corresponding audio stream

• type – the type of speech mark (sentence, word, viseme, or ssml)

• start – the oﬀset in bytes (not characters) of the start of the object in the input text (not

including viseme marks)

Speech mark output 210

Amazon Polly Developer Guide

• end – the oﬀset in bytes (not characters) of the object's end in the input text (not including

viseme marks)

• value – this varies depending on the type of speech mark

• SSML: <mark> SSML tag

• viseme: the viseme name

• word or sentence: a substring of the input text, as delimited by the start and end ﬁelds

For example, Amazon Polly generates the following word speech mark object from the text "Mary

had a little lamb":

{"time":373,"type":"word","start":5,"end":8,"value":"had"}

The described word ("had") begins 373 milliseconds after the audio stream begins, and starts at

byte 5 and ends at byte 8 of the input text.

Note

This metadata is for the Joanna voice-id. If you use another voice with the same input text,

the metadata might diﬀer.



Speech mark examples

The following examples of speech mark requests show how to make common requests and the

output that they generate.

Example 1: Speech Marks Without SSML

The following example shows you what requested metadata looks like on your screen for the

simple sentence: "Mary had a little lamb." For simplicity, we don't include SSML speech marks in

this example.

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

Speech mark examples 211

Amazon Polly Developer Guide

aws polly synthesize-speech \

--output-format json \

--voice-id Joanna \

--text 'Mary had a little lamb.' \

--speech-mark-types='["viseme", "word", "sentence"]' \

MaryLamb.txt

When you make this request, Amazon Polly returns the following in the .txt ﬁle:

{"time":0,"type":"sentence","start":0,"end":23,"value":"Mary had a little lamb."}

{"time":6,"type":"word","start":0,"end":4,"value":"Mary"}

{"time":6,"type":"viseme","value":"p"}

{"time":73,"type":"viseme","value":"E"}

{"time":180,"type":"viseme","value":"r"}

{"time":292,"type":"viseme","value":"i"}

{"time":373,"type":"word","start":5,"end":8,"value":"had"}

{"time":373,"type":"viseme","value":"k"}

{"time":460,"type":"viseme","value":"a"}

{"time":521,"type":"viseme","value":"t"}

{"time":604,"type":"word","start":9,"end":10,"value":"a"}

{"time":604,"type":"viseme","value":"@"}

{"time":643,"type":"word","start":11,"end":17,"value":"little"}

{"time":643,"type":"viseme","value":"t"}

{"time":739,"type":"viseme","value":"i"}

{"time":769,"type":"viseme","value":"t"}

{"time":799,"type":"viseme","value":"t"}

{"time":882,"type":"word","start":18,"end":22,"value":"lamb"}

{"time":882,"type":"viseme","value":"t"}

{"time":964,"type":"viseme","value":"a"}

{"time":1082,"type":"viseme","value":"p"}

In this output, each part of the text is broken out in terms of speech marks:

• The sentence "Mary had a little lamb."

• Each word in the text: "Mary", "had", "a", "little", and "lamb."

• The viseme for each sound in the corresponding audio stream: "p", "E", "r", "i", and so on. For

more information on visemes see Visemes and Amazon Polly.

Speech mark examples 212

Amazon Polly Developer Guide

Example 2: Speech marks with SSML

The process of generating speech marks from SSML-enhanced text is similar to the process when

SSML is not present. Use the synthesize-speech command, and specify the SSML-enhanced

text and the type of speech marks that you want, as shown in the following example. To make the

example easier to read, we don't include viseme speech marks, but these could be included as well.

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \

--output-format json \

--voice-id Joanna \

--text-type ssml \

--text '<speak><prosody volume="+20dB">Mary had <break time="300ms"/>a little <mark

name="animal"/>lamb</prosody></speak>' \

--speech-mark-types='["sentence", "word", "ssml"]' \

output.txt

When you make this request, Amazon Polly returns the following in the .txt ﬁle:

{"time":0,"type":"sentence","start":31,"end":95,"value":"Mary had <break time=\"300ms

\"\/>a little <mark name=\"animal\"\/>lamb"}

{"time":6,"type":"word","start":31,"end":35,"value":"Mary"}

{"time":325,"type":"word","start":36,"end":39,"value":"had"}

{"time":897,"type":"word","start":40,"end":61,"value":"<break time=\"300ms\"\/>"}

{"time":1291,"type":"word","start":61,"end":62,"value":"a"}

{"time":1373,"type":"word","start":63,"end":69,"value":"little"}

{"time":1635,"type":"ssml","start":70,"end":91,"value":"animal"}

{"time":1635,"type":"word","start":91,"end":95,"value":"lamb"}

Requesting speech marks on the console

You can use the console to request speech marks from Amazon Polly. You can then view the

metadata or save it to a ﬁle.

To generate speech marks (console)

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

Requesting speech marks on the console 213

Amazon Polly Developer Guide

2. Choose the Text-to-Speech tab.

3. Turn on SSML to use SSML.

4. Type or paste your text into the input box.

5. For Language, choose the language for your text.

6. For Voice, choose the voice you want to use for the text.

7. To change text pronunciation, expand Additional settings, turn on Customize pronunciation,

and for Apply lexicon, choose the desired lexicon.

8. To verify that the speech is in its ﬁnal form, choose Listen.

9. Turn on Speech ﬁle format settings.

Note

Downloading MP3, OGG, or PCM formats will not generate speech marks.

10. For File Format, choose Speech marks.

11. For Speech mark types, choose the types of speech marks to generate. The option to choose

SSML metadata is only available when SSML is on. For more information on using SSML with

Amazon Polly see Generating speech from SSML documents.

12. Choose Download.



Requesting speech marks on the console 214

Amazon Polly Developer Guide

Generating speech from SSML documents

You can use Amazon Polly to generate speech from either plain text or from documents marked up

with Speech Synthesis Markup Language (SSML). Using SSML-enhanced text gives you additional

control over how Amazon Polly generates speech from the text you provide.

For example, you can include a long pause within your text, or change the speech rate or pitch.

Other options include:

• emphasizing speciﬁc words or phrases

• using phonetic pronunciation

• including breathing sounds

• whispering

• using the Newscaster speaking style.

For complete details on the SSML tags supported by Amazon Polly and how to use them, see

Supported SSML tags

When using SSML, there are several reserved characters that require special treatment. This is

because SSML uses these characters as part of its code. In order to use them, you use a speciﬁc

entity to escape them. For more information, see Reserved characters in SSML

Amazon Polly provides these types of control with a subset of the SSML markup tags that are

deﬁned by Speech Synthesis Markup Language (SSML) Version 1.1, W3C Recommendation.

You can use SSML within the Amazon Polly console or by using the AWS CLI. The following topics

show you how you can use SSML to generate speech and control the output so that it precisely ﬁts

your needs.

Topics

• Reserved characters in SSML

• Using SSML on the console

• Using SSML on the AWS CLI

• Supported SSML tags

215

Amazon Polly Developer Guide

Reserved characters in SSML

There are ﬁve predeﬁned characters that can't normally be used within an SSML statement. These

entities are reserved by the language speciﬁcation. These characters are

NameCharacter Escape

code

quotation

mark

(double

quotation

mark)

ampersand&&

apostroph

single

quotation

mark

less

than

sign

greater

than

sign

Because SSML uses these characters as part of its code, to use these symbols in SSML, you must

escape the character when you use it. You use the escape code instead of the actual character so it

displays properly while still creating a valid SSML document. For example, the following sentence

We're using the lawyer at Peabody & Chambers, attorneys-at-law.

Reserved characters 216

Amazon Polly Developer Guide

would be rendered in SSML as

<speak>

We're using the lawyer at Peabody & Chambers, attorneys-at-law.

</speak>

In this case, the special characters for the apostrophe and ampersand are escaped so the SSML

document remains valid.

For the &, <, and > symbols, escape codes are always necessary when you use SSML. Additionallty,

when you use the apostrophe/single quotation mark (') as an apostrophe, you must also use the

escape code.

However, when you use the double quotation mark ("), or the apostrophe/single quotation mark (')

as a quotation mark, then whether or not you use the escape code is dependent on context.

Double quotation marks

• Must be escaped when in a attribute value delimited by double quotes. For example, in the

following AWS CLI code

--text "Pete "Maverick" Mitchell"

• Do not need to be escaped when in textual context. For example, in the following

He said, "Turn right at the corner."

• Do not need to be escaped when in a attribute value delimited by single quotes. For example, in

the following AWS CLI code

--text 'Pete "Maverick" Mitchell'

Single quotation marks

• Must be escaped when used as an apostrophe. For example, in the following

We've got to leave quickly.

• Do not need to be escaped when in textual context. For example, in the following

Reserved characters 217

Amazon Polly Developer Guide

"And then I said, 'Don't quote me.'"

• Do not need to be escaped when in a code attribute delimited by double quotes. For example, in

the following AWS CLI code

--text "Pete 'Maverick' Mitchell"

Using SSML on the console

With SSML tags, you can customize and control aspects of speech such as pronunciation, volume,

and speech rate. In the AWS Management Console, the SSML-enhanced text that you want to

convert to audio is entered on the SSML tab of the Text-to-Speech page. Although text entered

in plain text relies on default settings for the language and voice you've chosen, text enhanced

with SSML tells Amazon Polly not only what you want to say, but how you want to say it. Except

for the added SSML tags, Amazon Polly synthesizes SSML-enhanced text in the same way as it

synthesizes plain text. See Step 1.2: Synthesize speech with plaintext input on the console for more

information.

When using SSML, you enclose the entire text in a <speak> tag to let Amazon Polly know that

you're using SSML. For example:

<speak>Hi! My name is Joanna. I will read any text you type here.</speak>

You then use speciﬁc SSML tags on the text inside the <speak> tags to customize the way you

want the text to sound. You can add a pause, change the pace of the speech, lower or raise the

volume of the voice, or add many other customizations so that the text sounds right for you. For a

full list of the SSML tags that you can use, see Supported SSML tags.

In the following example, you use an SSML tag to tell Amazon Polly to substitute "World Wide Web

Consortium" for "W3C" when it speaks a short paragraph. You also use tags to introduce a pause

and whisper a word. Compare the results of this exercise with that of Applying lexicons on the

console (Synthesize Speech) .

For more information on SSML, with examples, see Supported SSML tags.

To synthesize speech from SSML-enhanced text (console)

Using SSML on the console 218

Amazon Polly Developer Guide

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. If it isn't already displayed, choose the Text-to-Speech tab.

3. Turn on SSML.

4. Type or paste the following text in the text box:

<speak>

He was caught up in the game.<break time="1s"/> In the middle of the

10/3/2014 <sub alias="World Wide Web Consortium">W3C</sub> meeting,

he shouted, "Nice job!" quite loudly. When his boss stared at him, he

repeated

<amazon:effect name="whispered">"Nice job,"</amazon:effect> in a

whisper.

</speak>

The SSML tags tell Amazon Polly how to render the text:

•

<break time="1s"/> tells Amazon Polly to pause 1 second between the ﬁrst two

sentences.

•

<sub alias="World Wide Web Consortium">W3C</sub> tells Amazon Polly to

substitute World Wide Web Consortium for the acronym W3C.

•

<amazon:effect name="whispered">Nice job</amazon:effect> tells Amazon

Polly to whisper the second instance of "Nice job." .

Note

When you use the AWS CLI, you enclose the input text in quotation marks to

diﬀerentiate it from the surrounding code. The Amazon Polly console doesn't show

you code, so you don't enclose input text in quotation marks when you use it.

5. For Language, choose English, US, then choose a voice.

6. To listen to the speech, choose Listen.

7. To save the speech ﬁle, choose Download. If you want to save it in a diﬀerent format, expand

Additional settings, turn on Speech ﬁle format settings and choose the format that you

want, then choose Download.

Using SSML on the console 219

Amazon Polly Developer Guide

Using SSML on the AWS CLI

You can use the AWS CLI to synthesize SSML input text. The following examples show how to

perform common tasks using the AWS CLI.

Topics

• Using SSML with the Synthesize-Speech command

• Synthesizing an SSML-enhanced document

• Using SSML for common Amazon Polly tasks

Using SSML with the Synthesize-Speech command

This example shows how to use the synthesize-speech command with an SSML string. When

you use the synthesize-speech command, you typically provide the following:

• The input text (required)

• Opening and closing tags (required)

• The output format

• A voice

In this example, you specify a simple text string in quotation marks along with the required

opening and closing <speak></speak> tags.

Important

Although you don't use quotation marks around input text in the Amazon Polly console,

you must use them in use the AWS CLI It's also important that you diﬀerentiate between

the quotation marks around input text and quotations required for individual tags.

For example, you can use standard quotation marks (") to enclose the input text, and single

quotation marks (') for interior tags, or vice versa. Either option works for Unix, Linux, and

macOS. However, with Windows you must enclose the input text in standard quotations

marks and use single quotation marks for the tags.

For all operating systems, you can use standard quotation marks (") to enclose the input

text, and single quotation marks (') for interior tags). For example:

--text "<speak>Hello <break time='300ms'/> World</speak>"

Using SSML on the AWS CLI 220

Amazon Polly Developer Guide

For Unix, Linux, and macOS, you can also use the reverse, with single quotation marks (')

enclosing the input text and standard quotation marks (") for interior tags:

--text '<speak>Hello <break time="300ms"/> World</speak>'

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \

--text-type ssml \

--text '<speak>Hello world</speak>' \

--output-format mp3 \

--voice-id Joanna \

speech.mp3

To hear the synthesized speech, play the resulting speech.mp3 ﬁle using any audio player.

Synthesizing an SSML-enhanced document

For longer input text, you may ﬁnd it easier to save your SSML content to a ﬁle and simply specify

the ﬁle name in the synthesize-speech command. For example you could save the following to

a ﬁle called example.xml:

<?xml version="1.0"?>

<speak version="1.1"

xmlns="http://www.w3.org/2001/10/synthesis"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/

speech-synthesis11/synthesis.xsd"

xml:lang="en-US">Hello World</speak>

The xml:lang attribute speciﬁes en-US (US English) as the language of the input text. For

information about how the language of the input text and the language of the chosen voice aﬀect

the SynthesizeSpeech operation, see Improving the pronunciation of foreign words.

Synthesizing an SSML-enhanced document 221

Amazon Polly Developer Guide

To run an SSML-enhanced ﬁle

Save the SSML to a ﬁle (for example, example.xml).

Run the following synthesize-speech command from the path where the XML ﬁle is stored

and specify the SSML ﬁle as input by substituting file:\\example.xml for the input text.

Because this command points to a ﬁle instead of containing the actual input text, you don't

use quotation marks.

Note

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows,

replace the backslash (\) Unix continuation character at the end of each line with a

caret (^).

aws polly synthesize-speech \

--text-type ssml \

--text file://example.xml \

--output-format mp3 \

--voice-id Joanna \

speech.mp3

To hear the synthesized speech, play the resulting speech.mp3 ﬁle using any audio player.

Using SSML for common Amazon Polly tasks

The following examples show how to use SSML tags to complete common Amazon Polly tasks. For

more SSML tags, see Supported SSML tags.

To test the following examples, use the following synthesize-speech command with the

appropriate SSML-enhanced text:

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \

--text-type ssml \

--text '<speak>Hello <break time="300ms"/> World</speak>' \

Using SSML for common Amazon Polly tasks 222

Amazon Polly Developer Guide

--output-format mp3 \

--voice-id Joanna \

speech.mp3

Adding a pause

To add a pause between words, use the <break> element. The following SSML synthesize-

speechcommand uses the <break> element to add a 300-millisecond delay between the words

"Hello" and "World."

<speak>

Hello <break time="300ms"/> World.

</speak>

Controlling volume, pitch, and speed

To control pitch, speaking rate, and speech volume, use the <prosody> element.

•

The following synthesize-speech command uses the <prosody> element to control volume:

<speak>

<prosody volume="+20dB">Hello world</prosody>

</speak>

•

The following synthesize-speech command uses the <prosody> element to control pitch:

<speak>

<prosody pitch="x-high">Hello world.</prosody>

</speak>

•

The following synthesize-speech command uses the <prosody> element to specify the

speech rate (speaking speed):

<speak>

<prosody rate="x-fast">Hello world.</prosody>

</speak>

•

You can specify multiple attributes in a <prosody> element, as shown in the following

examples:

Using SSML for common Amazon Polly tasks 223

Amazon Polly Developer Guide

<speak>

<prosody volume="x-loud" pitch="x-high" rate="x-fast">Hello world.</prosody>

</speak>

Whispering

To whisper words, use the <amazon:effect name="whispered"> element. In the following

example, the <amazon:effect name="whispered"> element tells Amazon Polly to whisper

"little lamb":

<speak>

Mary has a <amazon:effect name="whispered">little lamb.</amazon:effect>

</speak>

To enhance this eﬀect, use the <prosody> element to slightly slow down the whispered speech.

Emphasizing words

To stress a word or phrase, use the <emphasis> element.

<speak>

<emphasis level="strong">Hello</emphasis> world how are you?

</speak>

Specifying how to say certain words

To provide information about the type of text to be spoken, use the <say-as> element.

For instance, in the following SSML, <say-as> indicates that the text 4/6 should be interpreted as

a date. The attribute interpret-as="date" format="dm" indicates that it should be spoken as

a date with the format month/day.

You can also use the <say-as> element to tell Amazon Polly to say numbers as fractions, telephone

numbers, measurement units, and more.

<speak>

Today is <say-as interpret-as="date" format="md" >4/6</say-as>

Using SSML for common Amazon Polly tasks 224

Amazon Polly Developer Guide

</speak>

The resulting speech is "Today is June 4th." The <say-as> tag describes how the text should be

interpreted by providing additional context with the interpret-as attribute.

To verify the accuracy of the synthesized speech, play the resulting speech.mp3 ﬁle.

For more information on this element, see Controlling how special types of words are spoken .

Improving the pronunciation of foreign words

Amazon Polly assumes that the input text is in the same language as the language spoken by

the voice you choose. To improve the pronunciation of foreign words within input text, in the

synthesize-speech call. Specify the target language with the xml:lang attribute. This tells

Amazon Polly to apply diﬀerent pronunciation rules for the foreign words that you tag.

The following examples show how to use diﬀerent combinations of languages in the input text,

and how to specify voices and the pronunciation of foreign words. For a complete list of available

languages, see Languages in Amazon Polly.

In the following example, the voice (Joanna) is a US English voice. By default, Amazon Polly

assumes that the input text is in the same language as the voice (in this case, US English). When

you use the xml:lang tag, Amazon Polly interprets the text as Spanish and the text is spoken as

the selected voice would pronounce Spanish words, according to the pronunciation rules of the

foreign language. Without this tag, the text is spoken using the pronunciation rules of the selected

voice.

<speak>

That restaurant is terrific. <lang xml:lang="es-ES">Mucho gusto.</lang>

</speak>

Because the language of the input text is English, Amazon Polly maps the Spanish phonemes

to the closest English phonemes. As a result, Joanna speaks the text as a native US speaker who

pronounces the works correctly in Spanish, but with a US English accent.

Note

Some languages are more similar than others, and so some language combinations work

better than others.

Using SSML for common Amazon Polly tasks 225

Amazon Polly Developer Guide

Supported SSML tags

Amazon Polly supports the following SSML tags:

Action SSML tag Availabil

ity with

neural

voices

Availabil

ity with

long-form

voices

Availabil

ity with

generative

voices

Adding a pause <break> Full

availability

Full

availability

Full

availability

Emphasizing words <emphasis> Not

available

Not

available

Not

available

Specifying another

language for speciﬁc

words

<lang> Full

availability

Full

availability

Full

availability

Placing a custom tag in

your text

<mark> Full

availability

Full

availability

Full

availability

Adding a pause between

paragraphs

<p> Full

availability

Full

availability

Full

availability

Using phonetic

pronunciation

<phoneme> Full

availability

Full

availability

Not

available

Controlling volume,

speaking rate, and pitch

<prosody> Partial

availability

Partial

availability

Not

available

Setting a maximum

duration for synthesized

speech

<prosody amazon:max-

duration>

Not

available

Not

available

Not

available

Adding a pause between

sentences

<s> Full

availability

Full

availability

Full

availability

Supported SSML tags 226

Amazon Polly Developer Guide

Action SSML tag Availabil

ity with

neural

voices

Availabil

ity with

long-form

voices

Availabil

ity with

generative

voices

Controlling how special

types of words are

spoken

<say-as> Partial

availability

Partial

availability

Partial

availability

Identifying SSML-enha

nced text

<speak> Full

availability

Full

availability

Full

availability

Pronouncing acronyms

and abbreviations

<sub> Full

availability

Full

availability

Full

availability

Improving pronunciation

by specifying parts of

speech

<w> Full

availability

Full

availability

Full

availability

Adding the sound of

breathing

<amazon:auto-breaths> Not

available

Not

available

Not

available

Newscaster speaking

style

<amazon:domain

name="news">

Select

neural

voices only

Not

available

Not

available

Adding dynamic range

compression

<amazon:eﬀect

name="drc">

Full

availability

Full

availability

Not

available

Speaking softly <amazon:eﬀect

phonation="soft">

Not

available

Not

available

Not

available

Controlling timbre <amazon:eﬀect vocal-tra

ct-length>

Not

available

Not

available

Not

available

Whispering <amazon:eﬀect

name="whispered">

Not

available

Not

available

Not

available

Supported SSML tags 227

Amazon Polly Developer Guide

Note

If you use unsupported SSML tags in standard, neural, or long-form format, you will get an

error.

Identifying SSML-enhanced text

<speak>

This tag is supported by generative, long-form, neural, and standard TTS formats.

The <speak> tag is the root element of all Amazon Polly SSML text. All SSML-enhanced text must

be enclosed within a pair of <speak> tags.

<speak>Mary had a little lamb.</speak>

Adding a pause

<break>

This tag is supported by generative, long-form, neural, and standard TTS formats.

To add a pause to your text, use the <break> tag. You can set a pause based on strength

(equivalent to the pause after a comma, a sentence, or a paragraph), or you can set it to a speciﬁc

length of time in seconds or milliseconds. If you don't specify an attribute to determine the pause

length, Amazon Polly uses the default, which is <break strength="medium"/>, which adds a

pause the length of a pause after a comma.

strength attribute values:

•

none: No pause. Use none to remove a normally occurring pause, such as after a period.

•

x-weak: Has the same strength as none, no pause.

•

weak: Sets a pause of the same duration as the pause after a comma.

•

medium: Has the same strength as weak.

•

strong: Sets a pause of the same duration as the pause after a sentence.

Identifying SSML-enhanced text 228

Amazon Polly Developer Guide

•

x-strong: Sets a pause of the same duration as the pause after a paragraph.

time attribute values:

•

[number]s: The duration of the pause, in seconds. The maximum duration is 10s.

•

[number]ms: The duration of the pause, in milliseconds. The maximum duration is 10000ms.

For example:

<speak>

Mary had a little lamb <break time="3s"/>Whose fleece was white as snow.

</speak>

If you don't use an attribute with the break tag, the result varies depending on text:

•

If there is no other punctuation next to the break tag, it creates a <break

strength="medium"/> (comma-length pause).

•

If the tag is next to a comma, it upgrades the tag to a <break strength="strong"/>

(sentence-length pause).

•

If the tag is next to a period, it upgrades the tag to <break strength="x-strong"/>

(paragraph-length pause).

Emphasizing words

This tag is supported only by the standard TTS format.

To emphasize words, use the <emphasis> tag. Emphasizing words changes the speaking rate and

volume. More emphasis makes Amazon Polly speak the text louder and slower. Less emphasis

makes it speak quieter and faster. To specify the degree of emphasis, use the level attribute.

level attribute values:

•

Strong: Increases the volume and slows the speaking rate so that the speech is louder and

slower.

Emphasizing words 229

Amazon Polly Developer Guide

•

Moderate: Increases the volume and slows the speaking rate, but less than strong. Moderate

is the default.

•

Reduced: Decreases the volume and speeds up the speaking rate. Speech is softer and faster.

Note

The normal speaking rate and volume for a voice falls between the moderate and

reduced levels.

For example:

<speak>

I already told you I <emphasis level="strong">really like</emphasis> that person.

</speak>

Specifying another language for speciﬁc words

<lang>

This tag is supported by generative, long-form, neural, and standard TTS formats.

Specify another language for a speciﬁc word, phrase, or sentence with the <lang> tag. Foreign

language words and phrases are generally spoken better when they are enclosed within a pair of

<lang> tags. To specify the language, use the xml:lang attribute. For a complete list of available

languages, see Languages in Amazon Polly.

Unless you apply the <lang> tag, all of the words in the input text are spoken in the language of

the voice speciﬁed in the voice-id. If you apply the <lang> tag, the words are spoken in that

language.

For example, if the voice-id is Joanna (who speaks US English), Amazon Polly speaks the

following in the Joanna voice without a French accent:

<speak>

Je ne parle pas français.

</speak>

Specifying another language for speciﬁc words 230

Amazon Polly Developer Guide

If you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the Joanna

voice in American-accented French:

<speak>

<lang xml:lang="fr-FR">Je ne parle pas français.</lang>.

</speak>

Because Joanna is not a native French voice, pronunciation is based on her native language, US

English. For example, although perfect French pronunciation features an uvual trill /R/ in the word

français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.

If you use the voice-id of Giorgio, who speaks Italian, with the following text, Amazon Polly

speaks the sentence in Giorgio's voice with an Italian pronunciation:

<speak>

Mi piace Bruce Springsteen.

</speak>

If you use the same voice with the following <lang> tag, Amazon Polly pronounces Bruce

Springsteen in Italian-accented English:

<speak>

Mi piace <lang xml:lang="en-US">Bruce Springsteen.</lang>

</speak>

This tag can also be used as a substitute for the optional DefaultLangCode option when

synthesizing speech. However, doing so requires that you format your text using SSML.

Placing a custom tag in your text

<mark>

This tag is supported by generative, long-form, neural, and standard TTS formats.

To put a custom tag within the text, use the <mark> tag. Amazon Polly takes no action on the tag,

but returns the location of the tag in the SSML metadata. This tag can be anything you want to call

out, as long as it maintains the following format:

Placing a custom tag in your text 231

Amazon Polly Developer Guide

For example, suppose that the tag name is "animal" and the input text is:

<speak>

Mary had a little <mark name="animal"/>lamb.

</speak>

Amazon Polly might return the following SSML metadata:

{"time":767,"type":"ssml","start":25,"end":46,"value":"animal"}

Adding a pause between paragraphs

<p>

This tag is supported by generative, long-form, neural, and standard TTS formats.

To add a pause between paragraphs in your text, use the <p> tag. Using this tag provides a longer

pause than native speakers usually place at commas or the end of a sentence. Use the <p> tag to

enclose the paragraph:

<speak>

<p>This is the first paragraph. There should be a pause after this text is

spoken.</p>

<p>This is the second paragraph.</p>

</speak>

This is equivalent to specifying a pause using <break strength="x-strong"/>.

Using phonetic pronunciation

This tag is supported by long-form, neural, and standard TTS formats.

To make Amazon Polly use phonetic pronunciation for speciﬁc text, use the <phoneme> tag.

Adding a pause between paragraphs 232

Amazon Polly Developer Guide

Two attributes are required with the <phoneme> tag. They indicate the phonetic alphabet Amazon

Polly uses and the phonetic symbols of the corrected pronunciation:

•

alphabet

•

ipa— Indicates that the International Phonetic Alphabet (IPA) will be used.

•

x-sampa— Indicates that the Extended Speech Assessment Methods Phonetic Alphabet (X-

SAMPA) will be used.

•

• Speciﬁes the phonetic symbols for pronunciation. For more information, see Phoneme and

Viseme Tables for Supported Languages

With the <phoneme> tag, Amazon Polly uses the pronunciation speciﬁed by the ph attribute

instead of the standard pronunciation associated by default with the language used by the selected

voice.

For instance, the word "pecan" can be pronounced two ways. In the following example, “pecan” is

assigned a diﬀerent pronunciation in each line. Amazon Polly pronounces pecan as speciﬁed in the

ph attributes, instead of using the default pronunciation.

International Phonetic Alphabet (IPA)

<speak>

You say, <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>.

I say, <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>.

</speak>

Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA)

<speak>

You say, <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>.

I say, <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>.

</speak>

Mandarin Chinese uses Pinyin for phonetic pronunciation..

Pinyin

<speak>

## <phoneme alphabet="x-amazon-pinyin" ph="bo2">#</phoneme>#

Using phonetic pronunciation 233

Amazon Polly Developer Guide

## <phoneme alphabet="x-amazon-pinyin" ph="bao2">#</phoneme>#

</speak>

Japanese uses Yomigana and Pronunciation Kana.

Yomigana

<speak>

###<phoneme alphabet="x-amazon-yomigana" ph="####">##</phoneme>###

###<phoneme alphabet="x-amazon-yomigana" ph="Hirokazu">##</phoneme>###

</speak>

Pronunciation Kana

<speak>

###<phoneme alphabet="x-amazon-pron-kana" ph="##'##">##</phoneme>###

</speak>

Controlling volume, speaking rate, and pitch

Prosody tag attributes are fully supported by the standard TTS voices. Neural and long-form voices

support the volume and rate attributes, but don't support the pitch attribute.

To control the volume, rate, or pitch of your selected voice, use the prosody tag.

Volume, speech rate, and pitch are dependent on the speciﬁc voice selected. In addition to

diﬀerences between voices for diﬀerent languages, there are diﬀerences between individual voices

speaking the same language. Because of this, while attributes are similar across all languages,

there are clear variations from language to language and no absolute value is available.

The prosody tag has three attributes, each of which has several available values to set the

attribute. Each attribute uses the same syntax:

•

volume

•

default: Resets volume to the default level for the current voice.

Controlling volume, speaking rate, and pitch 234

Amazon Polly Developer Guide

•

silent, x-soft, soft, medium, loud, x-loud: Sets the volume to a predeﬁned value for the

current voice.

•

+ndB, -ndB: Changes volume relative to the current level. A value of +0dB means no change,

+6dB means approximately twice the current volume, and -6dB means approximately half the

current volume.

For example, you could set the volume for a passage as follows:

<speak>

Sometimes it can be useful to <prosody volume="loud">increase the volume

for a specific speech.</prosody>

</speak>

Or you could set it this way:

<speak>

And sometimes a lower volume <prosody volume="-6dB">is a more effective way of

interacting with your audience.</prosody>

</speak>

•

rate

•

x-slow, slow, medium, fast,x-fast. Sets the pitch to a predeﬁned value for the selected

voice.

•

n%: A non-negative percentage change in the speaking rate. For example, a value of 100%

means no change in speaking rate, a value of 200% means a speaking rate twice the default

rate, and a value of 50% means a speaking rate of half the default rate. This value has a range

of 20-200%.

For example, you could set the speech rate for a passage as follows:

<speak>

For dramatic purposes, you might wish to <prosody rate="slow">slow up the

speaking

rate of your text.</prosody>

</speak>

Or you could set it this way:

<speak>

Controlling volume, speaking rate, and pitch 235

Amazon Polly Developer Guide

Although in some cases, it might help your audience to <prosody rate="85%">slow

the speaking rate slightly to aid in comprehension.</prosody>

</speak>

•

pitch

•

default: Resets pitch to the default level for the current voice.

•

x-low, low, medium, high, x-high: Sets the pitch to a predeﬁned value for the current voice.

•

+n% or -n%: Adjusts pitch by a relative percentage. For example, a value of +0% means no

baseline pitch change, +5% gives a little higher baseline pitch, and -5% results in a little lower

baseline pitch.

For example, you could set the pitch for a passage as follows:

<speak>

Do you like sythesized speech <prosody pitch="high">with a pitch that is higher

than normal?</prosody>

</speak>

Or you could set it this way:

<speak>

Or do you prefer your speech <prosody pitch="-10%">with a somewhat lower pitch?

</prosody>

</speak>

The <prosody> tag must contain at least one attribute, but can include more within the same tag.

<speak>

Each morning when I wake up, <prosody volume="loud" rate="x-slow">I speak

quite slowly and deliberately until I have my coffee.</prosody>

</speak>

It can also be combined with nested tags, as follows:

<speak>

<prosody rate="85%">Sometimes combining attributes <prosody pitch="-10%">can

change the impression your audience has of a voice</prosody> as well.</prosody>

</speak>

Controlling volume, speaking rate, and pitch 236

Amazon Polly Developer Guide

Setting a maximum duration for synthesized speech

This tag is currently supported only by the standard TTS format.

To control how long you want a speech to take when it is synthesized, use the <prosody> tag with

the amazon:max-duration attribute.

The duration of synthesized speech varies slightly, depending on the voice you select. This can

make it diﬃcult to match synthesized speech with visuals or other activities that require precise

timing. This issue is magniﬁed for translation applications because the time it takes to say

particular phrases can vary widely with diﬀerent languages.

The <prosody amazon:max-duration> tag matches synthesized speech to the amount of time

you want it to take (the duration).

This tag uses the following syntax:

With the <prosody amazon:max-duration> tag, you can specify duration in either seconds or

milliseconds:

•

ns: the maximum duration in seconds

•

nms: the maximum duration in milliseconds

For example, the following spoken text has a maximum duration of 2 seconds:

<speak>

Human speech is a powerful way to communicate.

</prosody>

</speak>

Text placed within the tag, it doesn't exceed the speciﬁed duration. If the chosen voice or language

would normally take longer than that duration, Amazon Polly speeds up the speech so that it ﬁts

into the speciﬁed duration.

Setting a maximum duration for synthesized speech 237

Amazon Polly Developer Guide

If the speciﬁed duration is longer than it takes to read the text at a normal rate, Amazon Polly

reads the speech normally. It doesn't slow down the speech or add silence, so the resulting audio is

shorter than requested.

Note

Amazon Polly increases the speed no more than 5 times the normal rate. If text is spoken

faster than this, it usually doesn't make sense. If a speech cannot ﬁt within your speciﬁed

duration even when speeded up to the maximum, the audio will be speeded up but will last

longer than the speciﬁed duration.

You can include a single sentence or multiple sentences within a <prosody amazon:max-

duration> tag, and you can use multiple <prosody amazon:max-duration> tags within your

text.

For example:

<speak>

Human speech is a powerful way to communicate.

</prosody>

Even a simple ‘Hello’ can convey a lot of information depending on the pitch,

intonation, and tempo.

</prosody>

We naturally understand this information, which is why speech is ideal for

creating applications where

a screen isn’t practical or possible, or simply isn’t convenient.

</prosody>

</speak>

Using the <prosody amazon:max-duration> tag can increase latency when Amazon Polly

is returns synthesized speech. The degree of latency depends on the passage and its length. We

recommend using text comprised of relatively short text passages.

Setting a maximum duration for synthesized speech 238

Amazon Polly Developer Guide

Limitations

There are limitations both in how you use <prosody amazon:max-duration> tag and in how it

works with other SSML tags:

•

The text inside a <prosody amazon:max-duration> tag can't be longer than 1500 characters.

•

You can't nest <prosody amazon:max-duration> tags. If you put one <prosody

amazon:max-duration> tag inside another, Amazon Polly ignores the inner tag.

For example, in the following, the <prosody amazon:max-duration="5s"> tag is ignored:

<speak>

Human speech is a powerful way to communicate.

Even a simple ‘Hello’ can convey a lot of information depending on the

pitch, intonation, and tempo.

</prosody>

We naturally understand this information, which is why speech is ideal for

creating applications where a screen isn’t practical or possible, or simply isn’t

convenient.

</prosody>

</speak>

•

You can't use the <prosody> tags with the rate attribute within a <prosody amazon:max-

duration> tag. This is because both aﬀect the speed at which text is spoken.

In the following example, Amazon Polly ignores the <prosody rate="2"> tag:

<speak>

Human speech is a powerful way to communicate.

Even a simple ‘Hello’ can convey a lot of information depending on the

pitch, intonation, and tempo.

</prosody>

</speak>

Setting a maximum duration for synthesized speech 239

Amazon Polly Developer Guide

Pauses and max-duration

When using max-duration tag, you can still insert pauses within your text. However, Amazon

Polly includes the length of the pause when calculating the maximum duration for speech.

Additionally, Amazon Polly preserves the short pauses that occur where commas and periods are

placed within a passage and includes in the maximum duration.

For example, in the following block, the 600 millisecond break and the breaks caused by the

commas and periods occur within the 8-second speech:

<speak>

Human speech is a powerful way to communicate.

Even a simple ‘Hello’ can convey a lot of information depending on the pitch,

intonation, and tempo.

</prosody>

</speak>

Adding a pause between sentences

<s>

This tag is supported by generative, long-form, neural, and standard TTS formats.

To add a pause between lines or sentences in your text, use the <s> tag. Using this tag has the

same eﬀect as:

• Ending a sentence with a period (.)

•

Specifying a pause with <break strength="strong"/>

Unlike the <break> tag, the <s> tag encloses the sentence. This is useful for synthesizing speech

that is organized in lines, rather than sentence, such as poetry.

In the following example, the <s> tag creates a short pause after both the ﬁrst and second

sentences. The ﬁnal sentence has no <s> tag, but it is also followed by a short pause because it

ends with a period.

<speak>

Adding a pause between sentences 240

Amazon Polly Developer Guide

<s>Mary had a little lamb</s>

<s>Whose fleece was white as snow</s>

And everywhere that Mary went, the lamb was sure to go.

</speak>

Controlling how special types of words are spoken

<say-as>

Except for the characters option, the <say-as> tag is supported by generative, long-form,

neural, and standard TTS formats. Note that if Amazon Polly is using a neural voice and encounters

the <say-as> tag with the characters option at runtime, the aﬀected sentence will be

synthesized using the related standard voice. However, the aﬀected sentence will still be billed as if

it uses a neural voice.

Use the <say-as> tag with the interpret-as attribute to tell Amazon Polly how to say certain

characters, words, and numbers. This enables you to provide additional context to eliminate any

ambiguity on how Amazon Polly should render the text.

The <say-as> tag uses one attribute, interpret-as, which uses a number of possible available

values. Each uses the same syntax:

<say-as interpret-as="value">[text to be interpreted]</say-as>

The following values are available with interpret-as:

•

characters or spell-out: Spells out each letter of the text, as in a-b-c.

Note

This option is not currently supported for neural voices. If you're using a neural voice and

this SSML code is encountered by Amazon Polly at run-time, the aﬀected sentence will

be synthesized using the related standard voice. Please note, however, that this sentence

will still be billed as if it uses a neural voice.

•

cardinal or number: Interprets the numerical text as a cardinal number, as in 1,234.

•

ordinal: Interprets the numerical text as an ordinal number, as in 1,234th.

•

digits: Spells out each digit individually, as in 1-2-3-4.

Controlling how special types of words are spoken 241

Amazon Polly Developer Guide

•

fraction: Interprets the numerical text as a fraction. This works for both common fractions

such as 3/20, and mixed fractions, such as 2 ½. See below for more information.

•

unit: Interprets a numerical text as a measurement. The value should be either a number or

a fraction followed by a unit with no space in between as in 1/2inch, or by just a unit, as in

1meter.

•

date: Interprets the text as a date. The format of the date must be speciﬁed with the format

attribute. See below for more information.

•

time: Interprets the numerical text as duration, in minutes and seconds, as in 1'21".

•

address: Interprets the text as part of a street address.

•

expletive: "Beeps out" the content included within the tag.

•

telephone: Interprets the numerical text as a 7-digit or 10-digit telephone number,

as in 2025551212. You can also use this value for handle telephone extensions, as in

2025551212x345. See below for more information.

Note

Currently the telephone option is not available for all languages. However, it is

available for voices speaking English language variants (en-AU, en-GB, en-IN, en-US,

and en-GB-WLS), Spanish language variants (es-ES, es-MX, and es-US), French language

variants (fr-FR and fr-CA), and Portuguese variants (pt-BR and pt-PT), as well as German

(de-DE), Italian (it-IT), Japanese (ja-JP), and Russian (ru-RU). It should also be noted that

in some cases, languages such as Arabic (arb) automatically handle the number set as a

telephone number and so don't actually implement the telephone SSML tag.

Fractions

Amazon Polly interprets values within the say-as tag that have the interpret-as="fraction"

attribute as common fractions. The following is the syntax for fractions:

• Fraction

Syntax: cardinal number/cardinal number, such as 2/9.

For example: <say-as interpret-as="fraction">2/9</say-as> is pronounced "two

ninths."

• Non-negative Mixed Number

Controlling how special types of words are spoken 242

Amazon Polly Developer Guide

Syntax: cardinal number+cardinal number/cardinal number, such as 3+1/2.

For example, <say-as interpret-as="fraction">3+1/2</say-as> is pronounced "three

and a half."

Note

There must be a + between the "3" and the "1/2". Amazon Polly doesn't support a mixed

number without the +, such as "3 1/2".

Dates

When interpret-as is set to date, you also need to indicate the format of the date.

This uses the following syntax:

<say-as interpret-as="date" format="format">[date]</say-as>

For example:

<speak>

I was born on <say-as interpret-as="date" format="mdy">12-31-1900</say-as>.

</speak>

The following formats can be used with the date attribute.

•

mdy: Month-day-year.

•

dmy: Day-month-year.

•

ymd: Year-month-day.

•

md: Month-day.

•

dm: Day-month.

•

ym: Year-month.

•

my: Month-year.

•

d: Day.

•

m: Month.

•

y: Year.

Controlling how special types of words are spoken 243

Amazon Polly Developer Guide

•

yyyymmdd: Year-month-day. If you use this format, you can make Amazon Polly skip parts of the

date using question marks.

For example, Amazon Polly renders the following as "September 22nd":

<say-as interpret-as="date">????0922</say-as>

Format is not needed.

Telephone

Amazon Polly attempts to interpret the text you provide correctly based on the text’s formatting

even without the <say-as> tag. For example, if your text includes "202-555-1212," Amazon Polly

interprets it as a 10-digit telephone number and says each digit individually, with a brief pause

for each dash. In this case, you don't need to use <say-as interpret-as="telephone">.

However, if you provide the text “2025551212” and want Amazon Polly to say it as a phone

number, you would specify <say-as interpret-as="telephone">.

The logic for interpreting each element is language-speciﬁc. For example, US and UK English diﬀer

in how phone numbers are pronounced (in UK English, sequences of the same digit are grouped

together, as in "double ﬁve" or "triple four"). To see the diﬀerence, test the following example with

a US voice and with a UK voice:

<speak>

Richard's number is <say-as interpret-as="telephone">2122241555</say-as>

</speak>

Pronouncing acronyms and abbreviations

<sub>

This tag is supported by generative, long-form, neural, and standard TTS formats.

Use the <sub> tag with the alias attribute to substitute a diﬀerent word (or pronunciation) for

selected text such as an acronym or abbreviation.

This uses the syntax:

<sub alias="new word">abbreviation</sub>

Pronouncing acronyms and abbreviations 244

Amazon Polly Developer Guide

In the following example, the name "Mercury" is substituted for the element's chemical symbol to

make the audio content clearer.

<speak>

My favorite chemical element is <sub alias="Mercury">Hg</sub>, because it looks so

shiny.

</speak>

Improving pronunciation by specifying parts of speech

<w>

This tag is supported by generative, long-form, neural, and standard TTS formats.

You can use the <w> tag to customize the pronunciation of words by specifying the word’s part of

speech or alternate meaning. This is done using the role attribute.

This tag uses the following syntax:

The following values can be used for the role attribute:

To specify the part of speech:

•

amazon:VB: interprets the word as a verb (present simple).

•

amazon:VBD: interprets the word as past tense verb.

•

amazon:DT: interprets the word as a determiner.

•

amazon:IN: interprets the word as a preposition.

•

amazon:JJ: interprets the word as an adjective.

•

amazon:NN: interprets the word as a noun.

For example, depending on its part of speech, the US English pronunciation of the word "read"

varies based on the tag:

<speak>

The word <say-as interpret-as="characters">read</say-as> may be interpreted

as either the present simple form <w role="amazon:VB">read</w>, or the past

participle form <w role="amazon:VBD">read</w>.

Improving pronunciation by specifying parts of speech 245

Amazon Polly Developer Guide

</speak>

To specify a speciﬁc meaning:

•

amazon:DEFAULT: uses the default sense of the word.

•

amazon:SENSE_1: uses the non-default sense of the word when present. For example, the noun

"bass" is pronounced diﬀerently depending on its meaning. The default meaning is the lowest

part of the musical range. The alternate meaning is a species of freshwater ﬁsh, also called "bass"

but pronounced diﬀerently. Using <w role="amazon:SENSE_1">bass</w> renders the non-

default pronunciation (freshwater ﬁsh) for the audio text.

This diﬀerence in pronunciation and meaning can be heard if you synthesize the following:

<speak>

Depending on your meaning, the word <say-as interpret-as="characters">bass</say-

as>

may be interpreted as either a musical element: bass, or as its alternative

meaning,

a freshwater fish <w role="amazon:SENSE_1">bass</w>.

</speak>

Note

Some languages may have a diﬀerent selection of supported parts of speech.

Adding the sound of breathing

<amazon:breath> and <amazon:auto-breaths>

This tag is supported only by the standard TTS format.

Natural-sounding speech includes both correctly spoken words and breathing sounds. By

adding breathing sounds to synthesized speech, you can make it sound more natural. The

<amazon:breath> and <amazon:auto-breaths> tags provide breaths. You have the following

options:

• Manual mode: you set the location, length, and volume of a breath sound within the text

• Automated mode: Amazon Polly automatically inserts breathing sounds into the speech output

Adding the sound of breathing 246

Amazon Polly Developer Guide

• Mixed mode: both you and Amazon Polly add breathing sounds

Manual Mode

In manual mode, you place the <amazon:breath/> tag in the input text where you want to locate

a breath. You can customize the length and volume of breaths with the duration and volume

attributes, respectively:

•

duration: Controls the length of the breath. Valid values are: default, x-short, short,

medium, long, x-long. The default value is medium.

•

volume: Controls how loud breathing sounds. Valid values are: default, x-soft, soft,

medium, loud, x-loud. The default value is medium.

Note

The exact length and volume of each attribute value is dependent on the speciﬁc Amazon

Polly voice used.

To set a breath sound using the defaults, use <amazon:breath/> without attributes.

For example, to use attributes to set the duration and volume for a breath to medium, you would

set the attributes as follows:

<speak>

Sometimes you want to insert only <amazon:breath duration="medium" volume="x-

loud"/>a single breath.

</speak>

To use the defaults, you would just use the tag:

<speak>

Sometimes you need <amazon:breath/>to insert one or more average breaths

<amazon:breath/> so that the

text sounds correct.

</speak>

You can add individual breathing sounds within a passage, as follows:

Adding the sound of breathing 247

Amazon Polly Developer Guide

<speak>

<amazon:breath duration="long" volume="x-loud"/> <prosody rate="120%"> <prosody

volume="loud">

Wow! <amazon:breath duration="long" volume="loud"/> </prosody> That was quite

fast. <amazon:breath

duration="medium" volume="x-loud"/> I almost beat my personal best time on this

track. </prosody>

</speak>

Automated Mode

In automated mode, you use the <amazon:auto-breaths> tag to tell Amazon Polly to

automatically create breathing noises at appropriate intervals. You can set the frequency of the

intervals, their volume, and their duration. Place the </amazon:auto-breaths> tag at the

beginning of the text that you want to apply automated breathing to and then close the tag at the

end.

Note

Unlike the manual mode tag, <amazon:breath/>, the <amazon:auto-breaths> tag

requires a closing tag (</amazon:auto-breaths>).

You can use the following optional attributes with the <amazon:auto-breaths> tag:

•

volume: Controls how loud the breathing sounds. Valid values are: default, x-soft, soft,

medium, loud, x-loud. The default value is medium.

•

frequency: Controls how often breathing sounds occur in the text. Valid values are: default,

x-low, low, medium, high, x-high. The default value is medium.

•

duration: Controls the length of the breath. Valid values are: default, x-short, short,

medium, long, x-long. The default value is medium.

By default, the frequency of breathing sounds depends on the input text. However, breathing

sounds often occur after commas and periods.

The following examples show how to use the <amazon:auto-breaths> tag. To decide which

options to use for your content, copy the applicable examples to the Amazon Polly console and

listen to the diﬀerences.

Adding the sound of breathing 248

Amazon Polly Developer Guide

• Using automated mode without optional parameters.

<speak>

<amazon:auto-breaths>Amazon Polly is a service that turns text into lifelike

speech,

allowing you to create applications that talk and build entirely new categories

of speech-

enabled products. Amazon Polly is a text-to-speech service that uses advanced

deep learning

technologies to synthesize speech that sounds like a human voice. With dozens of

lifelike

voices across a variety of languages, you can select the ideal voice and build

speech-

enabled applications that work in many different countries.</amazon:auto-

breaths>

</speak>

•

Using automated mode with volume control. The unspeciﬁed parameters (duration and

frequency) are set to the default values (medium).

<speak>

<amazon:auto-breaths volume="x-soft">Amazon Polly is a service that turns text

into lifelike

speech, allowing you to create applications that talk and build entirely new

categories of

speech-enabled products. Amazon Polly is a text-to-speech service, that uses

advanced deep

learning technologies to synthesize speech that sounds like a human voice. With

dozens of

lifelike voices across a variety of languages, you can select the ideal voice

and build speech-

enabled applications that work in many different countries.</amazon:auto-

breaths>

</speak>

•

Using automated mode with frequency control. The unspeciﬁed parameters (duration and

volume) are set to the default values (medium).

<speak>

<amazon:auto-breaths frequency="x-low">Amazon Polly is a service that turns text

into lifelike

speech, allowing you to create applications that talk and build entirely new

categories of

Adding the sound of breathing 249

Amazon Polly Developer Guide

speech-enabled products. Amazon Polly is a text-to-speech service, that uses

advanced deep

learning technologies to synthesize speech that sounds like a human voice. With

dozens of

lifelike voices across a variety of languages, you can select the ideal voice

and build speech-

enabled applications that work in many different countries.</amazon:auto-

breaths>

</speak>

•

Using automated mode with multiple parameters. For the unspeciﬁed Duration parameter,

Amazon Polly uses the default value (medium).

<speak>

<amazon:auto-breaths volume="x-loud" frequency="x-low">Amazon Polly is a service

that turns

text into lifelike speech, allowing you to create applications that talk and

build entirely new

categories of speech-enabled products. Amazon Polly is a text-to-speech service,

that uses

advanced deep learning technologies to synthesize speech that sounds like a

human voice. With

dozens of lifelike voices across a variety of languages, you can select the

ideal voice and build

speech-enabled applications that work in many different countries.</amazon:auto-

breaths>

</speak>

Newscaster speaking style

<amazon:domain name="news">

The newscaster style is available only for the Matthew or Joanna voices, which are available only in

American English (en-US), Lupe, in US Spanish (es-US) and Amy, in British English (en-GB). It is only

supported when using Neural format.

To use the newscaster style, you use SSML tags and the following syntax::

<amazon:domain name="news">text</amazon:domain>

For example, you might use the newscaster style with the Amy voice as follows:

Newscaster speaking style 250

Amazon Polly Developer Guide

<speak>

<amazon:domain name="news">

From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:

The maiden voyage of the White Star liner Titanic, the largest ship ever launched, has

ended in disaster.

The Titanic started her trip from Southampton for New York on Wednesday. Late on Sunday

night she struck

an iceberg off the Grand Banks of Newfoundland. By wireless telegraphy she sent out

signals of distress,

and several liners were near enough to catch and respond to the call.

</amazon:domain>

</speak>

Adding dynamic range compression

<amazon:eﬀect name="drc">

This tag is supported by long-form, neural, and standard TTS formats.

Depending on the text, language, and voice used in an audio ﬁle, the sounds range from soft to

loud. Environmental sounds, such as the sound of a moving vehicle, can often mask the softer

sounds, which makes the audio track diﬃcult to hear clearly. To enhance the volume of certain

sounds in your audio ﬁle, use the dynamic range compression (drc) tag.

The drc tag sets a midrange "loudness" threshold for your audio, and increases the volume (the

gain) of the sounds around that threshold. It applies the greatest gain increase closest to the

threshold, and the gain increase is lessened farther away from the threshold.

This makes the middle-range sounds easier to hear in a noisy environment, which makes the entire

audio ﬁle clearer.

Adding dynamic range compression 251

Amazon Polly Developer Guide

The drc tag is a Boolean parameter (it's either present or it isn't). It uses the syntax:

<amazon:effect name="drc"> and is closed with </amazon:effect>.

You can use the drc tag with any voice or language supported by Amazon Polly. You can apply it to

an entire section of the recording, or for only a few words. For example:

<speak>

Some audio is difficult to hear in a moving vehicle, but <amazon:effect

name="drc"> this audio

is less difficult to hear in a moving vehicle.</amazon:effect>

</speak>

Note

When you use "drc" in the amazon:effect syntax, it is case-sensitive.

Using drc with the prosody volume Tag

As the following graphic shows, the prosody volume tag evenly increases the volume of an entire

audio ﬁle from the original level (dotted line) to an adjusted level (solid line). To further increase

the volume of certain parts of the ﬁle, use the drc tag with the prosody volume tag. Combining

tags doesn't aﬀect the settings of the prosody volume tag.

When you use the drc and prosody volume tags together, Amazon Polly applies the drc tag

ﬁrst, increasing the middle-range sounds (those near the threshold). It then applies the prosody

volume tag and further increases the volume of the entire audio track evenly.

Adding dynamic range compression 252

Amazon Polly Developer Guide

To use the tags together, nest one inside the other. For example:

<speak>

<prosody volume="loud">This text needs to be understandable and loud.

<amazon:effect name="drc">

This text also needs to be more understandable in a moving car.</amazon:effect></

prosody>

</speak>

In this text, the prosody volume tag increases the volume of the entire passage to "loud." The

drc tag enhances the volume of the middle-range values in the second sentence.

Note

When using the drc and prosody volume tags together, use standard XML practices for

nesting tags.

Speaking softly

<amazon:eﬀect phonation="soft">

This tag is currently supported only by the standard TTS format.

To specify that input text should be spoken in a softer-than-normal voice, use the <amazon:eﬀect

phonation="soft"> tag.

This uses the syntax:

<amazon:effect phonation="soft">text</amazon:effect>

Speaking softly 253

Amazon Polly Developer Guide

For example, you might use this tag with the Matthew voice as follows:

<speak>

This is Matthew speaking in my normal voice. <amazon:effect phonation="soft">This

is Matthew speaking in my softer voice.</amazon:effect>

</speak>

Controlling timbre

<amazon:eﬀect vocal-tract-length>

This tag is currently supported only by the standard TTS format.

Timbre is the tonal quality of a voice that helps you tell the diﬀerence between voices, even when

they have the same pitch and loudness. One of the most important physiological features that

contributes to speech timbre is the length of the vocal tract. The vocal tract is a cavity of air that

spans from the top of the vocal folds up to the edge of the lips.

To control the timbre of output speech in Amazon Polly, use the vocal-tract-length tag. This

tag has the eﬀect of changing the length of the speaker’s vocal tract, which sounds like a change

in the speaker’s size. When you increase the vocal-tract-length, the speaker sounds physically

bigger. When you decrease it, the speaker sounds smaller. You can use this tag with any of the

voices in the Amazon Polly Text-to-Speech portfolio.

To change timbre, use the following values:

•

+n% or -n%: Adjusts the vocal tract length by a relative percentage change in the current voice.

For example, +4% or -2%. Valid values range from +100% to -50%. Values outside this range are

clipped. For example, +111% sounds like +100% and -60% sounds like -50%.

•

n%: Changes the vocal tract length to an absolute percentage of the tract length of the current

voice. For example, 110% or 75%. An absolute value of 110% is equivalent to a relative value of

+10%. An absolute value of 100% is the same as the default value for the current voice.

The following example shows how to change the vocal tract length to change timbre:

<speak>

This is my original voice, without any modifications. <amazon:effect vocal-tract-

length="+15%">

Controlling timbre 254

Amazon Polly Developer Guide

Now, imagine that I am much bigger. </amazon:effect> <amazon:effect vocal-tract-

length="-15%">

Or, perhaps you prefer my voice when I'm very small. </amazon:effect> You can also

control the

timbre of my voice by making minor adjustments. <amazon:effect vocal-tract-

length="+10%">

For example, by making me sound just a little bigger. </

amazon:effect><amazon:effect

vocal-tract-length="-10%"> Or, making me sound only somewhat smaller. </

amazon:effect>

</speak>

Combining Multiple Tags

You can combine the vocal-tract-length tag with any other SSML tag that is supported by

Amazon Polly. Because timbre (vocal tract length) and pitch are closely connected, you might get

the best results by using both the vocal-tract-length and the <prosody pitch> tags. To

produce the most realistic voice, we recommend that you use diﬀerent percentages of change for

the two tags. Experiment with various combinations to get the results you want.

The following example shows how to combine tags.

<speak>

The pitch and timbre of a person's voice are connected in human speech.

<amazon:effect vocal-tract-length="-15%"> If you are going to reduce the vocal

tract length,

</amazon:effect><amazon:effect vocal-tract-length="-15%"> <prosody pitch="+20%">

you

might consider increasing the pitch, too. </prosody></amazon:effect>

<amazon:effect vocal-tract-length="+15%"> If you choose to lengthen the vocal

tract,

</amazon:effect> <amazon:effect vocal-tract-length="+15%"> <prosody pitch="-10%">

you might also want to lower the pitch. </prosody></amazon:effect>

</speak>

Whispering

<amazon:eﬀect name="whispered">

This tag is currently supported only by the standard TTS format.

Whispering 255

Amazon Polly Developer Guide

This tag indicates that the input text should be spoken in a whispered voice rather than as normal

speech. This can be used with any of the voices in the Amazon Polly Text-to-Speech portfolio.

This uses the following syntax:

<amazon:effect name="whispered">text</amazon:effect>

For example:

<speak>

<amazon:effect name="whispered">If you make any noise, </amazon:effect>

she said, <amazon:effect name="whispered">they will hear us.</amazon:effect>

</speak>

In this case, the synthesized speech spoken by the character is whispered, but the phrase "she said"

is spoken in the normal synthesized speech of the selected Amazon Polly voice.

You can enhance the "whispered" eﬀect by slowing down the prosody rate by up to 10%,

depending on the eﬀect you want.

For example:

<speak>

When any voice is made to whisper, <amazon:effect name="whispered">

<prosody rate="-10%">the sound is slower and quieter than normal speech

</prosody></amazon:effect>

</speak>

When generating speech marks for a whispered voice, the audio stream must also include the

whispered voice to ensure that the speech marks match the audio stream.

Whispering 256

Amazon Polly Developer Guide

Managing lexicons

Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides

API operations that you can use to store lexicons in an AWS region. Those lexicons are then

speciﬁc to that particular region. You can use one or more of the lexicons from that region when

synthesizing the text by using the SynthesizeSpeech operation. This applies the speciﬁed

lexicon to the input text before the synthesis begins. For more information, see SynthesizeSpeech.

Note

These lexicons must conform with the Pronunciation Lexicon Speciﬁcation (PLS) W3C

recommendation. For more information, see Pronunciation Lexicon Speciﬁcation (PLS)

Version 1.0 on the W3C website.

The following are examples of ways to use lexicons with speech synthesis engines:

• Common words are sometimes stylized with numbers taking the place of letters, as with "g3t

sm4rt" (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS)

engine reads the text literally, pronouncing the name exactly as it is spelled. This is where

you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this

example, you can specify an alias (get smart) for the word "g3t sm4rt" in the lexicon.

• Your text might include an acronym, such as W3C. You can use a lexicon to deﬁne an alias for the

word W3C so that it is read in the full, expanded form (World Wide Web Consortium).

Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the

selected language. For example, you can specify the pronunciation using a phonetic alphabet. For

more information, see Pronunciation Lexicon Speciﬁcation (PLS) Version 1.0 on the W3C website.

Topics

• Applying multiple lexicons

• Managing lexicons on the Amazon Polly console

• Managing lexicons on the AWS CLI

257

Amazon Polly Developer Guide

Applying multiple lexicons

You can apply up to ﬁve lexicons to your text. If the same grapheme appears in more than one

lexicon that you apply to your text, the order in which they are applied can make a diﬀerence in the

resulting speech. For example, given the following text, "Hello, my name is Bob." and two lexemes

in diﬀerent lexicons that both use the grapheme Bob.

LexA

<alias>Robert</alias>

</lexeme>

LexB

<alias>Bobby</alias>

</lexeme>

If the lexicons are listed in the order LexA and then LexB, the synthesized speech will be "Hello, my

name is Robert." If they are listed in the order LexB and then LexA, the synthesized speech is "Hello,

my name is Bobby."

Example – Applying LexA Before LexB

aws polly synthesize-speech \

--lexicon-names LexA LexB \

--output-format mp3 \

--text 'Hello, my name is Bob' \

--voice-id Justin \

bobAB.mp3

Speech output: "Hello, my name is Robert."

Example – Applying LexB before LexA

aws polly synthesize-speech \

--lexicon-names LexB LexA \

Applying multiple lexicons 258

Amazon Polly Developer Guide

--output-format mp3 \

--text 'Hello, my name is Bob' \

--voice-id Justin \

bobBA.mp3

Speech output: "Hello, my name is Bobby."

For information about applying lexicons using the Amazon Polly console, see Applying lexicons on

the console (Synthesize Speech).

Managing lexicons on the Amazon Polly console

You can use the Amazon Polly console to upload, download, apply, ﬁlter, and delete lexicons. The

following procedures demonstrate each of these processes.

Uploading lexicons on the console

To use a pronunciation lexicon, you must ﬁrst upload it. There are two locations on the console

from which you can upload a lexicon, the Text-to-Speech tab and the Lexicons tab.

The following processes describe how to add lexicons that you can use to customize how words and

phrases uncommon to the chosen language are pronounced.

To add a lexicon from the Lexicons tab

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Lexicons tab.

3. Choose Upload lexicon.

4. Provide a name for the lexicon and then use Choose a lexicon ﬁle to ﬁnd the lexicon to

upload. You can only upload PLS ﬁles with .pls or .xml extensions.

5. Choose Upload lexicon. If a lexicon by the same name (whether a .pls or .xml ﬁle) already

exists, uploading the lexicon overwrites the existing lexicon.

To add a lexicon from the text-to-Speech tab

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

Managing lexicons on the console 259

Amazon Polly Developer Guide

2. Choose the Text-to-Speech tab.

3. Expand Additional settings, turn on Customize pronunciation, and then choose Upload

lexicon.

4. Provide a name for the lexicon and then use Choose a lexicon ﬁle to ﬁnd the lexicon to

upload. You can only use PLS ﬁles with .pls or .xml extensions.

5. Choose Upload lexicon. If a lexicon with the same name (whether a .pls or .xml ﬁle) already

exists, uploading the lexicon overwrites the existing lexicon.

Applying lexicons on the console (Synthesize Speech)

The following procedure demonstrates how to apply a lexicon to your input text by applying the

W3c.pls lexicon to substitute "World Wide Web Consortium" for "W3C". If you apply multiple

lexicons to your text they are applied in a top-down order with the ﬁrst match taking precedence

over later matches. A lexicon is applied to the text only if the language speciﬁed in the lexicon is

the same as the language chosen.

You can apply a lexicon to plain text or SSML input.

Example – Applying the W3C.pls Lexicon

To create the lexicon you'll need for this exercise, see Using the PutLexicon Operation. Use a plain

text editor to create the W3C.pls lexicon shown at the top of the topic. Remember where you save

this ﬁle.

To apply the W3C.pls lexicon to your input

In this example we introduce a lexicon to substitute "World Wide Web Consortium" for "W3C".

Compare the results of this exercise with that of Using SSML on the console for both US English

and another language.

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Do one of the following:

• Turn oﬀ SSML and then type or paste this text into the text input box.

He was caught up in the game.

In the middle of the 10/3/2014 W3C meeting

Applying lexicons on the console (Synthesize Speech) 260

Amazon Polly Developer Guide

he shouted, "Score!" quite loudly.

• Turn on SSML and then type or paste this text into the text input box.

<speak>He wasn't paying attention.<break time="1s"/>

In the middle of the 10/3/2014 W3C meeting

he shouted, "Score!" quite loudly.</speak>

3. From the Language list, choose English, US, then choose the voice you want to use for this

text.

4. Expand Additional settings and turn on Customize pronunciation.

From the list of lexicons, choose W3C (English, US).

If the W3C (English, US) lexicon is not listed, choose Upload lexicon and upload it, then

choose it from the list. To create this lexicon, see Using the PutLexicon Operation.

6. To listen to the speech immediately, choose Listen.

7. To save the speech to a ﬁle,

a. Choose Download.

b. To change to a diﬀerent ﬁle format, turn on Speech ﬁle format settings, choose the ﬁle

format you want, and then choose Download.

Repeat the previous steps, but choose a diﬀerent language and notice the diﬀerence in the output.

Filtering the lexicon list on the console

The following procedure describes how to ﬁlter the lexicons list so that only lexicons of a chosen

language are displayed.

To ﬁlter the lexicons listed by language

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Lexicons tab.

3. Choose Any language.

4. From the list of languages, choose the language you want to ﬁlter on.

The list displays only the lexicons for the chosen language.

Filtering the lexicon list on the console 261

Amazon Polly Developer Guide

Downloading lexicons on the console

The following process describes how to download one or more lexicons. You can add, remove, or

modify lexicon entries in the ﬁle and then upload it again to keep your lexicon up-to-date.

To download one or more lexicons

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Lexicons tab.

3. Choose the lexicon or lexicons you want to download.

a. To download a single lexicon, choose its name from the list.

b. To download multiple lexicons as a single compressed archive ﬁle, select the check box

next to each entry in the list that you want to download.

4. Choose Download.

5. Open the folder where you want to download the lexicon.

6. Choose Save.

Deleting a lexicon on the console

To delete a lexicon

The following process describes how to delete a lexicon. After deleting the lexicon, you must add it

back before you can use it again. You can delete one or more lexicons at the same time by selecting

the check boxes next to individual lexicons.

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Lexicons tab.

3. Choose one or more lexicons that you want to delete from the list.

4. Choose Delete.

5. Enter conﬁrmation text and then choose Delete to remove the lexicon from the Region or

Cancel to keep it.

Downloading lexicons on the console 262

Amazon Polly Developer Guide

Managing lexicons on the AWS CLI

The following topics cover the AWS CLI commands needed to manage your pronunciation lexicons.

Topics

• Using the PutLexicon Operation

• Using the GetLexicon operation

• Using the ListLexicons operation

• Using the DeleteLexicon operation

Using the PutLexicon Operation

With Amazon Polly, you can use PutLexicon to store pronunciation lexicons in a speciﬁc AWS

Region for your account. Then, you can specify one or more of these stored lexicons in your

SynthesizeSpeech request that you want to apply before the service starts synthesizing the text.

For more information, see Managing lexicons.

This section provides example lexicons and step-by-step instructions for storing and testing them.

Note

These lexicons must conform to the Pronunciation Lexicon Speciﬁcation (PLS) W3C

recommendation. For more information, see Pronunciation Lexicon Speciﬁcation (PLS)

Version 1.0 on the W3C website.

Example 1: Lexicon with one lexeme

Consider the following W3C PLS-compliant lexicon.

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

alphabet="ipa"

Managing lexicons on the AWS CLI 263

Amazon Polly Developer Guide

xml:lang="en-US">

<alias>World Wide Web Consortium</alias>

</lexeme>

</lexicon>

Note the following:

•

The two attributes speciﬁed in the <lexicon> element:

•

The xml:lang attribute speciﬁes the language code, en-US, to which the lexicon applies.

Amazon Polly can use this example lexicon if the voice you specify in the SynthesizeSpeech

call has the same language code (en-US).

Note

You can use the DescribeVoices operation to ﬁnd the language code associated

with a voice.



•

The alphabet attribute speciﬁes IPA, which means that the International Phonetic

Alphabet (IPA) alphabet is used for pronunciations. IPA is one of the alphabets for writing

pronunciations. Amazon Polly also supports the Extended Speech Assessment Methods

Phonetic Alphabet (X-SAMPA).



•

The <lexeme> element describes the mapping between <grapheme> (that is, a textual

representation of the word) and <alias>.

To test this lexicon, do the following:

Save the lexicon as example.pls.

Run the put-lexicon AWS CLI command to store the lexicon (with the name w3c), in the us-

east-2 region.

aws polly put-lexicon \

--name w3c \

PutLexicon 264

Amazon Polly Developer Guide

--content file://example.pls

Run the synthesize-speech command to synthesize sample text to an audio stream

(speech.mp3), and specify the optional lexicon-name parameter.

aws polly synthesize-speech \

--text 'W3C is a Consortium' \

--voice-id Joanna \

--output-format mp3 \

--lexicon-names="w3c" \

speech.mp3

Play the resulting speech.mp3, and notice that the word W3C in the text is replaced by World

Wide Web Consortium.

The preceding example lexicon uses an alias. The IPA alphabet mentioned in the lexicon is not used.

The following lexicon speciﬁes a phonetic pronunciation using the <phoneme> element with the

IPA alphabet.

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

alphabet="ipa"

xml:lang="en-US">

<grapheme>pecan</grapheme>

</lexeme>

</lexicon>

Follow the same steps to test this lexicon. Make sure you specify input text that has word

"pecan" (for example, "Pecan pie is delicious").

Example 2: Lexicon with multiple lexemes

In this example, the lexeme that you specify in the lexicon applies exclusively to the input text for

the synthesis. Consider the following lexicon:

PutLexicon 265

Amazon Polly Developer Guide

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

alphabet="ipa" xml:lang="en-US">

<alias>World Wide Web Consortium</alias>

</lexeme>

<alias>WWW Consortium</alias>

</lexeme>

<grapheme>Consortium</grapheme>

<alias>Community</alias>

</lexeme>

</lexicon>

The lexicon speciﬁes three lexemes, two of which deﬁne an alias for the grapheme W3C as follows:

•

The ﬁrst <lexeme> element deﬁnes an alias (World Wide Web Consortium).

•

The second <lexeme> deﬁnes an alternative alias (WWW Consortium).

Amazon Polly uses the ﬁrst replacement for any given grapheme in a lexicon.

The third <lexeme> deﬁnes a replacement (Community) for the word Consortium.

First, let's test this lexicon. Suppose you want to synthesize the following sample text to an audio

ﬁle (speech.mp3), and you specify the lexicon in a call to SynthesizeSpeech.

The W3C is a Consortium

SynthesizeSpeech ﬁrst applies the lexicon as follows:

• As per the ﬁrst lexeme, the word W3C is revised as World Wide Web Consortium. The revised text

appears as follows:

PutLexicon 266

Amazon Polly Developer Guide

The World Wide Web Consortium is a Consortium

• The alias deﬁned in the third lexeme applies only to the word Consortium that was part of the

original text, resulting in the following text:

The World Wide Web Consortium is a Community.

You can test this using the AWS CLI as follows:

Save the lexicon as example.pls.

Run the put-lexicon command to store the lexicon with name w3c in the us-east-2 region.

aws polly put-lexicon \

--name w3c \

--content file://example.pls

Run the list-lexicons command to verify that the w3c lexicon is in the list of lexicons

returned.

aws polly list-lexicons

Run the synthesize-speech command to synthesize sample text to an audio ﬁle

(speech.mp3), and specify the optional lexicon-name parameter.

aws polly synthesize-speech \

--text 'W3C is a Consortium' \

--voice-id Joanna \

--output-format mp3 \

--lexicon-names="w3c" \

speech.mp3

Play the resulting speech.mp3 ﬁle to verify that the synthesized speech reﬂects the text

changes.

PutLexicon 267

Amazon Polly Developer Guide

Example 3: Specifying multiple lexicons

In a call to SynthesizeSpeech, you can specify multiple lexicons. In this case, the ﬁrst lexicon

speciﬁed (in order from left to right) overrides any preceding lexicons.

Consider the following two lexicons. Note that each lexicon describes diﬀerent aliases for the same

grapheme W3C.

•

Lexicon 1: w3c.pls

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

alphabet="ipa" xml:lang="en-US">

<alias>World Wide Web Consortium</alias>

</lexeme>

</lexicon>

•

Lexicon 2: w3cAlternate.pls

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0"

xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"

alphabet="ipa" xml:lang="en-US">

<alias>WWW Consortium</alias>

</lexeme>

</lexicon>

PutLexicon 268

Amazon Polly Developer Guide

Suppose you store these lexicons as w3c and w3cAlternate respectively. If you specify lexicons in

order (w3c followed by w3cAlternate) in a SynthesizeSpeech call, the alias for W3C deﬁned in

the ﬁrst lexicon has precedence over the second. To test the lexicons, do the following:

Save the lexicons locally in ﬁles called w3c.pls and w3cAlternate.pls.

Upload these lexicons using the put-lexicon AWS CLI command.

•

Upload the w3c.pls lexicon and store it as w3c.

aws polly put-lexicon \

--name w3c \

--content file://w3c.pls

•

Upload the w3cAlternate.pls lexicon on the service as w3cAlternate.

aws polly put-lexicon \

--name w3cAlternate \

--content file://w3cAlternate.pls

Run the synthesize-speech command to synthesize sample text to an audio stream

(speech.mp3), and specify both lexicons using the lexicon-name parameter.

aws polly synthesize-speech \

--text 'PLS is a W3C recommendation' \

--voice-id Joanna \

--output-format mp3 \

--lexicon-names '["w3c","w3cAlternative"]' \

speech.mp3

Test the resulting speech.mp3. It should read as follows:

PLS is a World Wide Web Consortium recommendation

Additional code samples for the PutLexicon API

• Java Sample: PutLexicon

• Python (Boto3) Sample: PutLexicon

PutLexicon 269

Amazon Polly Developer Guide

Using the GetLexicon operation

Amazon Polly provides the GetLexicon API operation to retrieve the content of a pronunciation

lexicon you stored in your account in a speciﬁc region.

The following get-lexicon AWS CLI command retrieves the content of the example lexicon.

aws polly get-lexicon \

--name example

If you don't already have a lexicon stored in your account, you can use the PutLexicon operation

to store one. For more information, see Using the PutLexicon Operation.

The following is a sample response. In addition to the lexicon content, the response returns the

metadata, such as the language code to which the lexicon applies, number of lexemes deﬁned in

the lexicon, the Amazon Resource Name (ARN) of the resource, and the size of the lexicon in bytes.

The LastModified value is a Unix timestamp.

{

"Lexicon": {

"Content": "lexicon content in plain text PLS format",

"Name": "example"

"LexiconAttributes": {

"LanguageCode": "en-US",

"LastModified": 1474222543.989,

"Alphabet": "ipa",

"LexemesCount": 1,

"LexiconArn": "arn:aws:polly:us-east-2:account-id:lexicon/example",

"Size": 495

}

Additional code samples for the GetLexicon API

• Java Sample: GetLexicon

• Python (Boto3) Sample: GetLexicon

GetLexicon 270

Amazon Polly Developer Guide

Using the ListLexicons operation

Amazon Polly provides the ListLexicons API operation that you can use to get the list of

pronunciation lexicons in your account in a speciﬁc AWS Region. The following AWS CLI call lists

the lexicons in your account in the us-east-2 region.

aws polly list-lexicons

The following is an example response, showing two lexicons named w3c and tomato. For each

lexicon, the response returns metadata such as the language code to which the lexicon applies, the

number of lexemes deﬁned in the lexicon, the size in bytes, and so on. The language code describes

a language and locale to which the lexemes deﬁned in the lexicon apply.

{

"Lexicons": [

{

"Attributes": {

"LanguageCode": "en-US",

"LastModified": 1474222543.989,

"Alphabet": "ipa",

"LexemesCount": 1,

"LexiconArn": "arn:aws:polly:aws-region:account-id:lexicon/w3c",

"Size": 495

"Name": "w3c"

{

"Attributes": {

"LanguageCode": "en-US",

"LastModified": 1473099290.858,

"Alphabet": "ipa",

"LexemesCount": 1,

"LexiconArn": "arn:aws:polly:aws-region:account-id:lexicon/tomato",

"Size": 645

"Name": "tomato"

}

]

}

ListLexicons 271

Amazon Polly Developer Guide

Additional code samples for the ListLexicon API

• Java Sample: ListLexicons

• Python (Boto3) Sample: ListLexicon

Using the DeleteLexicon operation

Amazon Polly provides the DeleteLexicon API operation to delete a pronunciation lexicon from a

speciﬁc AWS Region in your account. The following AWS CLI deletes the speciﬁed lexicon.

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

aws polly delete-lexicon \

--name example

Additional code samples for the DeleteLexicon API

• Java Sample: DeleteLexicon

• Python (Boto3) Sample: DeleteLexicon

DeleteLexicon 272

Amazon Polly Developer Guide

Creating long audio ﬁles

To create TTS ﬁles for large passages of text, use Amazon Polly's asynchronous synthesis

functionality. This uses the three SpeechSynthesisTask APIs:

•

StartSpeechSynthesisTask: starts a new synthesis task.

•

GetSpeechSynthesisTask: returns details about a previously submitted synthesis task.

•

ListSpeechSynthesisTasks: lists all submitted synthesis tasks.

The SynthesizeSpeech operation produces audio in near-real time, with relatively little latency

in most cases. To do this, the operation can only synthesize 3000 characters.

Amazon Polly's Asynchronous Synthesis feature overcomes the challenge of processing a larger

text document by changing the way the document is both synthesized and returned. When a

synthesis request is made by submitting input text using the StartSpeechSynthesisTask,

Amazon Polly queues the requests, and then asynchronously processes them in the background

as soon as the system resources are available. Amazon Polly then uploads the resulting speech

or speech marks stream directly to your (required) Amazon Simple Storage Service (Amazon S3)

bucket, and notiﬁes you about the completed ﬁle's availability through your (optional) SNS topic.

In this way, all of the functionality except near-real time processing is available for texts of up to

100,000 billable characters (or 200,000 total characters) in length.

To synthesize a document using this method, you must have an Amazon S3 bucket that is writable

to which the audio ﬁle can be saved. You can be notiﬁed when the synthesized audio is ready by

providing an optional SNS Topic identiﬁer. When the synthesis task is complete, Amazon Polly will

publish a message on that topic. This message may also contain useful error information in cases

where the synthesis task didn't succeed. To do this, make sure that the user creating the synthesis

task can also publish to the SNS Topic. See the Amazon SNS documentation for more information

on how to create and subscribe to an SNS Topic.

Encryption

You can store the output ﬁle in an encrypted form in your S3 bucket if desired. To do this, you

enable Amazon S3 bucket encryption, which use one of the strongest block ciphers available, 256-

bit Advanced Encryption Standard (AES-256).

Topics

273

Amazon Polly Developer Guide

• Setting up the IAM policy for asynchronous synthesis

• Creating long audio ﬁles on the console

• Creating long audio ﬁles on the AWS CLI

Setting up the IAM policy for asynchronous synthesis

In order to use the asynchronous synthesis functionality, you will need an IAM policy that allows

the following:

• use of new Amazon Polly operations

• writing to the output S3 bucket

• publishing to the status SNS topic [optional]

The following policy grants only the necessary permissions required for asynchronous synthesis

and can be attached to the IAM user.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"polly:StartSpeechSynthesisTask",

"polly:GetSpeechSynthesisTask",

"polly:ListSpeechSynthesisTasks"

"Resource": "*"

{

"Effect": "Allow",

"Action": "s3:PutObject",

"Resource": "arn:aws:s3:::bucket-name/*"

{

"Effect": "Allow",

"Action": "sns:Publish",

"Resource": "arn:aws:sns:region:account:topic"

}

]

Setting up the IAM policy for asynchronous synthesis 274

Amazon Polly Developer Guide

}

Creating long audio ﬁles on the console

You can use the Amazon Polly console to create long speeches using asynchronous synthesis with

the same functionality as you can use with the AWS CLI. This is done using the Text-to-Speech tab

much like any other synthesis.

The other asynchronous synthesis functionality is also available via the console. The S3 synthesis

tasks tab reﬂects the ListSpeechSynthesisTasks functionality, displaying all tasks saved to

the S3 bucket and enabling you to ﬁlter them if you want. Clicking on a speciﬁc single task shows

its details, reﬂecting GetSpeechSynthesisTask functionality.

To synthesize a large text using the Amazon Polly console

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Text-to-Speech tab. Select Long Form as the engine if appropriate.

3. With SSML on or oﬀ, type or paste your text into the input box.

4. Choose the language, region, and voice for your text.

5. Choose Save to S3.

Note

Both the Download and Listen options are greyed out if the text length is above the

3,000 character limit for the real-time SynthesizeSpeech operation.

6. The console opens a form so that you can choose where to store the output ﬁle.

a. Fill in the name of the destination Amazon S3 bucket.

b. Optionally, ﬁll in the preﬁx key of the output.

Note

The output S3 bucket must be writable.

c. If you want to be notiﬁed when the synthesis task is complete, provide an optional SNS

topic identiﬁer.

Creating long audio ﬁles on the console 275

Amazon Polly Developer Guide

Note

The SNS must be open for publication by the current console user to use this

option. For more information, see Amazon Simple Notiﬁcation Service (SNS)

d. Choose Save to S3.

To retrieve information on your speech synthesis tasks

1. In the console, choose the S3 Synthesis Tasks tab.

2. The tasks are displayed in date order. To ﬁlter the tasks, by status, choose All statuses and

then choose the status to use.

3. To view the details of a speciﬁc task, choose the linked Task ID.

Creating long audio ﬁles on the AWS CLI

Amazon Polly asynchronous synthesis functionality uses three SpeechSynthesisTask APIs to

work with large amounts of text:

•

StartSpeechSynthesisTask: starts a new synthesis task.

•

GetSpeechSynthesisTask: returns details about a previously submitted synthesis task.

•

ListSpeechSynthesisTasks: lists all submitted synthesis tasks.

Synthesizing large amounts of text (StartSpeechSynthesisTask)

When you want to create an audio ﬁle larger than one that you can create with the real-time

SynthesizeSpeech, use the StartSpeechSynthesisTask operation. In addition to the

arguments needed for the SynthesizeSpeech operation, StartSpeechSynthesisTask also

requires the name of an Amazon S3 bucket. Two other optional arguments are also available: a key

preﬁx for the output ﬁle and the ARN for an SNS Topic if you want to receive status notiﬁcation

about the task.

•

OutputS3BucketName: The name of the Amazon S3 bucket where the synthesis should be

uploaded. This bucket should be in the same region as the Amazon Polly service. Additionally,

the IAM user being used to make the call should have access to the bucket. [Required]

Creating long audio ﬁles on the AWS CLI 276

Amazon Polly Developer Guide

•

OutputS3KeyPrefix: Key preﬁx for the output ﬁle. Use this parameter if you want to save the

output speech ﬁle in a custom directory-like key in your bucket. [Optional]

•

SnsTopicArn: The SNS topic ARN to use if you want to receive notiﬁcation about status of the

task. This SNS topic should be in the same region as the Amazon Polly service. Additionally, the

IAM user being used to make the call should have access to the topic. [Optional]

For example, the following example can be used to run the start-speech-synthesis-task

AWS CLI command in the US East (Ohio) region:

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace

the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full

quotation marks (") around the input text with single quotes (') for interior tags.

aws polly start-speech-synthesis-task \

--region us-east-2 \

--endpoint-url "https://polly.us-east-2.amazonaws.com/" \

--output-format mp3 \

--output-s3-bucket-name your-bucket-name \

--output-s3-key-prefix optional/prefix/path/file \

--voice-id Joanna \

--text file://text_file.txt

This will result in a response that looks similar to this:

"SynthesisTask":

{

"OutputFormat": "mp3",

"OutputUri": "https://s3.us-east-2.amazonaws.com/your-bucket-name/optional/prefix/

path/file.<task_id>.mp3",

"TextType": "text",

"CreationTime": [..],

"RequestCharacters": [..],

"TaskStatus": "scheduled",

"TaskId": [task_id],

"VoiceId": "Joanna"

}

The start-speech-synthesis-task operation returns several new ﬁelds:

Creating long audio ﬁles on the AWS CLI 277

Amazon Polly Developer Guide

•

OutputUri: the location of your output speech ﬁle.

•

TaskId: a unique identiﬁer for the speech synthesis task generated by Amazon Polly.

•

CreationTime: a timestamp for when the task was initially submitted.

•

RequestCharacters: the number of billable characters in the task.

•

TaskStatus: provides information on the status of the submitted task.

When your task is submitted, the initial status will show scheduled. When Amazon Polly

starts processing the task, the status will change to inProgress and later, to completed

or failed. If the task fails, an error message will be returned when calling either the

GetSpeechSynthesisTask or ListSpeechSynthesisTasks operation.

When the task is completed, the speech ﬁle is available at the location speciﬁed in OutputUri.

Retrieving information on your speech synthesis task

You can get information on a task, such as errors, status, and so on, using the

GetSpeechSynthesisTask operation. To do this, you will need the task-id returned by the

StartSpeechSynthesisTask.

For example, the following example can be used to run the get-speech-synthesis-task AWS

CLI command:

aws polly get-speech-synthesis-task \

--region us-east-2 \

--endpoint-url "https:// polly.us-east-2.amazonaws.com/" \

--task-id task identifier

You can also list all speech synthesis tasks that you've run in the current region using the

ListSpeechSynthesisTasks operation.

For example, the following example can be used to run the list-speech-synthesis-tasks

AWS CLI command:

aws polly list-speech-synthesis-tasks \

--region us-east-2 \

--endpoint-url "https:// polly.us-east-2.amazonaws.com/"

Creating long audio ﬁles on the AWS CLI 278

Amazon Polly Developer Guide

Code and application examples

This section provides code samples and example applications that you can use to explore Amazon

Polly.

Topics

• Sample code

• Example applications

The Sample Code topic contains snippets of code organized by programming language and

separated into examples for diﬀerent Amazon Polly functionality. The Example Application topic

contains applications organized by programming language that can be used independently to

explore Amazon Polly.

Before you start using these examples, we recommend that you ﬁrst read How Amazon Polly works

and follow the steps described in Getting started with Amazon Polly.

Sample code

This topic contains code samples for various functionality which can be used to explore Amazon

Polly.

Sample Code by Programming Language

• Java samples

• Python samples

Java samples

The following code samples show how to use Java-based applications to accomplish various

tasks with Amazon Polly. These samples are not full examples, but can be included in larger Java

applications that use the AWS SDK for Java.

Code Snippets

• DeleteLexicon

• DescribeVoices

Sample code 279

Amazon Polly Developer Guide

• GetLexicon

• ListLexicons

• PutLexicon

• StartSpeechSynthesisTask

• Speech Marks

• SynthesizeSpeech

DeleteLexicon

The following Java code sample show how to use Java-based applications to delete a speciﬁc

lexicon stored in an AWS Region. A lexicon which has been deleted is not available for speech

synthesis, nor can it be retrieved using either the GetLexicon or ListLexicon APIs.

For more information on this operation, see the reference for the DeleteLexicon API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

import com.amazonaws.services.polly.model.DeleteLexiconRequest;

public class DeleteLexiconSample {

private String LEXICON_NAME = "SampleLexicon";

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

public void deleteLexicon() {

DeleteLexiconRequest deleteLexiconRequest = new

DeleteLexiconRequest().withName(LEXICON_NAME);

try {

client.deleteLexicon(deleteLexiconRequest);

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

Java samples 280

Amazon Polly Developer Guide

DescribeVoices

The following Java code sample show how to use Java-based applications to produce a list of the

voices that are available for use when requesting speech synthesis. You can optionally specify

a language code to ﬁlter the available voices. For example, if you specify en-US, the operation

returns a list of all available US English voices.

For more information on this operation, see the reference for the DescribeVoices API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

import com.amazonaws.services.polly.model.DescribeVoicesRequest;

import com.amazonaws.services.polly.model.DescribeVoicesResult;

public class DescribeVoicesSample {

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

public void describeVoices() {

DescribeVoicesRequest allVoicesRequest = new DescribeVoicesRequest();

DescribeVoicesRequest enUsVoicesRequest = new

DescribeVoicesRequest().withLanguageCode("en-US");

try {

String nextToken;

do {

DescribeVoicesResult allVoicesResult =

client.describeVoices(allVoicesRequest);

nextToken = allVoicesResult.getNextToken();

allVoicesRequest.setNextToken(nextToken);

System.out.println("All voices: " + allVoicesResult.getVoices());

} while (nextToken != null);

do {

DescribeVoicesResult enUsVoicesResult =

client.describeVoices(enUsVoicesRequest);

nextToken = enUsVoicesResult.getNextToken();

enUsVoicesRequest.setNextToken(nextToken);

System.out.println("en-US voices: " + enUsVoicesResult.getVoices());

} while (nextToken != null);

Java samples 281

Amazon Polly Developer Guide

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

GetLexicon

The following Java code sample show how to use Java-based applications to produce the content

of a speciﬁc pronunciation lexicon stored in a AWS Region.

For more information on this operation, see the reference for the GetLexicon API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

import com.amazonaws.services.polly.model.GetLexiconRequest;

import com.amazonaws.services.polly.model.GetLexiconResult;

public class GetLexiconSample {

private String LEXICON_NAME = "SampleLexicon";

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

public void getLexicon() {

GetLexiconRequest getLexiconRequest = new

GetLexiconRequest().withName(LEXICON_NAME);

try {

GetLexiconResult getLexiconResult = client.getLexicon(getLexiconRequest);

System.out.println("Lexicon: " + getLexiconResult.getLexicon());

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

ListLexicons

The following Java code sample shows how to use Java-based applications to produce a list of

pronunciation lexicons stored in an AWS Region.

Java samples 282

Amazon Polly Developer Guide

For more information on this operation, see the reference for the ListLexicons API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

import com.amazonaws.services.polly.model.LexiconAttributes;

import com.amazonaws.services.polly.model.LexiconDescription;

import com.amazonaws.services.polly.model.ListLexiconsRequest;

import com.amazonaws.services.polly.model.ListLexiconsResult;

public class ListLexiconsSample {

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

public void listLexicons() {

ListLexiconsRequest listLexiconsRequest = new ListLexiconsRequest();

try {

String nextToken;

do {

ListLexiconsResult listLexiconsResult =

client.listLexicons(listLexiconsRequest);

nextToken = listLexiconsResult.getNextToken();

listLexiconsRequest.setNextToken(nextToken);

for (LexiconDescription lexiconDescription :

listLexiconsResult.getLexicons()) {

LexiconAttributes attributes = lexiconDescription.getAttributes();

System.out.println("Name: " + lexiconDescription.getName()

+ ", Alphabet: " + attributes.getAlphabet()

+ ", LanguageCode: " + attributes.getLanguageCode()

+ ", LastModified: " + attributes.getLastModified()

+ ", LexemesCount: " + attributes.getLexemesCount()

+ ", LexiconArn: " + attributes.getLexiconArn()

+ ", Size: " + attributes.getSize());

}

} while (nextToken != null);

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

Java samples 283

Amazon Polly Developer Guide

PutLexicon

The following Java code sample show how to use Java-based applications to store a pronunciation

lexicon in an AWS Region.

For more information on this operation, see the reference for the PutLexicon API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

import com.amazonaws.services.polly.model.PutLexiconRequest;

public class PutLexiconSample {

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

private String LEXICON_CONTENT = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +

"<lexicon version=\"1.0\" xmlns=\"http://www.w3.org/2005/01/pronunciation-

lexicon\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +

"xsi:schemaLocation=\"http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd\" " +

"alphabet=\"ipa\" xml:lang=\"en-US\">" +

"<lexeme><grapheme>test1</grapheme><alias>test2</alias></lexeme>" +

"</lexicon>";

private String LEXICON_NAME = "SampleLexicon";

public void putLexicon() {

PutLexiconRequest putLexiconRequest = new PutLexiconRequest()

.withContent(LEXICON_CONTENT)

.withName(LEXICON_NAME);

try {

client.putLexicon(putLexiconRequest);

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

StartSpeechSynthesisTask

The following Java code sample show how to use Java-based applications to synthesize a long

speech (up to 100,000 billed characters) and store it directly in an Amazon S3 bucket.

Java samples 284

Amazon Polly Developer Guide

For more information, see the reference for StartSpeechSynthesisTask API.

package com.amazonaws.parrot.service.tests.speech.task;

import com.amazonaws.parrot.service.tests.AbstractParrotServiceTest;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.model.*;

import org.awaitility.Duration;

import java.util.concurrent.TimeUnit;

import static org.awaitility.Awaitility.await;

public class StartSpeechSynthesisTaskSample {

private static final int SYNTHESIS_TASK_TIMEOUT_SECONDS = 300;

private static final AmazonPolly AMAZON_POLLY_CLIENT =

AmazonPollyClientBuilder.defaultClient();

private static final String PLAIN_TEXT = "This is a sample text to be

synthesized.";

private static final String OUTPUT_FORMAT_MP3 = OutputFormat.Mp3.toString();

private static final String OUTPUT_BUCKET = "synth-books-buckets";

private static final String SNS_TOPIC_ARN = "arn:aws:sns:eu-

west-2:123456789012:synthesize-finish-topic";

private static final Duration SYNTHESIS_TASK_POLL_INTERVAL = Duration.FIVE_SECONDS;

private static final Duration SYNTHESIS_TASK_POLL_DELAY = Duration.TEN_SECONDS;

public static void main(String... args) {

StartSpeechSynthesisTaskRequest request = new StartSpeechSynthesisTaskRequest()

.withOutputFormat(OUTPUT_FORMAT_MP3)

.withText(PLAIN_TEXT)

.withTextType(TextType.Text)

.withVoiceId(VoiceId.Amy)

.withOutputS3BucketName(OUTPUT_BUCKET)

.withSnsTopicArn(SNS_TOPIC_ARN)

.withEngine("neural");

StartSpeechSynthesisTaskResult result =

AMAZON_POLLY_CLIENT.startSpeechSynthesisTask(request);

String taskId = result.getSynthesisTask().getTaskId();

await().with()

.pollInterval(SYNTHESIS_TASK_POLL_INTERVAL)

.pollDelay(SYNTHESIS_TASK_POLL_DELAY)

Java samples 285

Amazon Polly Developer Guide

.atMost(SYNTHESIS_TASK_TIMEOUT_SECONDS, TimeUnit.SECONDS)

.until(

() ->

getSynthesisTaskStatus(taskId).equals(TaskStatus.Completed.toString())

);

}

private static SynthesisTask getSynthesisTask(String taskId) {

GetSpeechSynthesisTaskRequest getSpeechSynthesisTaskRequest = new

GetSpeechSynthesisTaskRequest()

.withTaskId(taskId);

GetSpeechSynthesisTaskResult result

=AMAZON_POLLY_CLIENT.getSpeechSynthesisTask(getSpeechSynthesisTaskRequest);

return result.getSynthesisTask();

}

private static String getSynthesisTaskStatus(String taskId) {

GetSpeechSynthesisTaskRequest getSpeechSynthesisTaskRequest = new

GetSpeechSynthesisTaskRequest()

.withTaskId(taskId);

GetSpeechSynthesisTaskResult result

=AMAZON_POLLY_CLIENT.getSpeechSynthesisTask(getSpeechSynthesisTaskRequest);

return result.getSynthesisTask().getTaskStatus();

}

Speech Marks

The following code sample shows how to use Java-based applications to synthesize speech marks

for inputed text. This functionality uses the SynthesizeSpeech API.

For more information on this functionality, see Speech marks.

For more information on the API, see the reference for SynthesizeSpeech API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

Java samples 286

Amazon Polly Developer Guide

import com.amazonaws.services.polly.model.OutputFormat;

import com.amazonaws.services.polly.model.SpeechMarkType;

import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;

import com.amazonaws.services.polly.model.SynthesizeSpeechResult;

import com.amazonaws.services.polly.model.VoiceId;

import java.io.File;

import java.io.FileOutputStream;

import java.io.InputStream;

public class SynthesizeSpeechMarksSample {

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

public void synthesizeSpeechMarks() {

String outputFileName = "/tmp/speechMarks.json";

SynthesizeSpeechRequest synthesizeSpeechRequest = new SynthesizeSpeechRequest()

.withOutputFormat(OutputFormat.Json)

.withSpeechMarkTypes(SpeechMarkType.Viseme, SpeechMarkType.Word)

.withVoiceId(VoiceId.Joanna)

.withText("This is a sample text to be synthesized.");

try (FileOutputStream outputStream = new FileOutputStream(new

File(outputFileName))) {

SynthesizeSpeechResult synthesizeSpeechResult =

client.synthesizeSpeech(synthesizeSpeechRequest);

byte[] buffer = new byte[2 * 1024];

int readBytes;

try (InputStream in = synthesizeSpeechResult.getAudioStream()){

while ((readBytes = in.read(buffer)) > 0) {

outputStream.write(buffer, 0, readBytes);

}

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

Java samples 287

Amazon Polly Developer Guide

SynthesizeSpeech

The following Java code sample show how to use Java-based applications to synthesize speech

with shorter texts for near-real time processing.

For more information, see the reference for SynthesizeSpeech API.

package com.amazonaws.polly.samples;

import com.amazonaws.services.polly.AmazonPolly;

import com.amazonaws.services.polly.AmazonPollyClientBuilder;

import com.amazonaws.services.polly.model.OutputFormat;

import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;

import com.amazonaws.services.polly.model.SynthesizeSpeechResult;

import com.amazonaws.services.polly.model.VoiceId;

import java.io.File;

import java.io.FileOutputStream;

import java.io.InputStream;

public class SynthesizeSpeechSample {

AmazonPolly client = AmazonPollyClientBuilder.defaultClient();

public void synthesizeSpeech() {

String outputFileName = "/tmp/speech.mp3";

SynthesizeSpeechRequest synthesizeSpeechRequest = new SynthesizeSpeechRequest()

.withOutputFormat(OutputFormat.Mp3)

.withVoiceId(VoiceId.Joanna)

.withText("This is a sample text to be synthesized.")

.withEngine("neural");

try (FileOutputStream outputStream = new FileOutputStream(new

File(outputFileName))) {

SynthesizeSpeechResult synthesizeSpeechResult =

client.synthesizeSpeech(synthesizeSpeechRequest);

byte[] buffer = new byte[2 * 1024];

int readBytes;

try (InputStream in = synthesizeSpeechResult.getAudioStream()){

while ((readBytes = in.read(buffer)) > 0) {

outputStream.write(buffer, 0, readBytes);

}

Java samples 288

Amazon Polly Developer Guide

}

} catch (Exception e) {

System.err.println("Exception caught: " + e);

}

Python samples

The following code samples show how to use Python (boto3)-based applications to accomplish

various tasks with Amazon Polly. These samples are not intended to be full examples, but can be

included in larger Python applications that use the AWS SDK for Python (Boto).

Code Snipppets

• DeleteLexicon

• GetLexicon

• ListLexicon

• PutLexicon

• StartSpeechSynthesisTask

• SynthesizeSpeech

DeleteLexicon

The following Python code example uses the AWS SDK for Python (Boto) to delete a lexicon in the

region speciﬁed in your local AWS conﬁguration. The example deletes only the speciﬁed lexicon. It

asks you to conﬁrm that you want to proceed before actually deleting the lexicon.

The following code example uses default credentials stored in the AWS SDK conﬁguration ﬁle. For

information about creating the conﬁguration ﬁle, see Step 2.1: Set up the AWS CLI.

For more information on this operation, see the reference for the DeleteLexicon API.

from argparse import ArgumentParser

from sys import version_info

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError

Python samples 289

Amazon Polly Developer Guide

# Define and parse the command line arguments

cli = ArgumentParser(description="DeleteLexicon example")

cli.add_argument("name", type=str, metavar="LEXICON_NAME")

arguments = cli.parse_args()

# Create a client using the credentials and region defined in the adminuser

# section of the AWS credentials and configuration files

session = Session(profile_name="adminuser")

polly = session.client("polly")

# Request confirmation

prompt = input if version_info >= (3, 0) else raw_input

proceed = prompt((u"This will delete the \"{0}\" lexicon,"

" do you want to proceed? [y,n]: ").format(arguments.name))

if proceed in ("y", "Y"):

print(u"Deleting {0}...".format(arguments.name))

try:

# Request deletion of a lexicon by name

response = polly.delete_lexicon(Name=arguments.name)

except (BotoCoreError, ClientError) as error:

# The service returned an error, exit gracefully

cli.error(error)

print("Done.")

else:

print("Cancelled.")

GetLexicon

The following Python code uses the AWS SDK for Python (Boto) to retrieve all lexicons stored in an

AWS Region. The example accepts a lexicon name as a command line parameter and fetches that

lexicon only, printing out the tmp path where it has been saved locally.

The following code example uses default credentials stored in the AWS SDK conﬁguration ﬁle. For

information about creating the conﬁguration ﬁle, see Step 2.1: Set up the AWS CLI.

For more information on this operation, see the reference for the GetLexicon API.

from argparse import ArgumentParser

from os import path

from tempfile import gettempdir

Python samples 290

Amazon Polly Developer Guide

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError

# Define and parse the command line arguments

cli = ArgumentParser(description="GetLexicon example")

cli.add_argument("name", type=str, metavar="LEXICON_NAME")

arguments = cli.parse_args()

# Create a client using the credentials and region defined in the adminuser

# section of the AWS credentials and configuration files

session = Session(profile_name="adminuser")

polly = session.client("polly")

print(u"Fetching {0}...".format(arguments.name))

try:

# Fetch lexicon by name

response = polly.get_lexicon(Name=arguments.name)

except (BotoCoreError, ClientError) as error:

# The service returned an error, exit gracefully

cli.error(error)

# Get the lexicon data from the response

lexicon = response.get("Lexicon", {})

# Access the lexicon's content

if "Content" in lexicon:

output = path.join(gettempdir(), u"%s.pls" % arguments.name)

print(u"Saving to %s..." % output)

try:

# Save the lexicon contents to a local file

with open(output, "w") as pls_file:

pls_file.write(lexicon["Content"])

except IOError as error:

# Could not write to file, exit gracefully

cli.error(error)

else:

# The response didn't contain lexicon data, exit gracefully

cli.error("Could not fetch lexicons contents")

print("Done.")

Python samples 291

Amazon Polly Developer Guide

ListLexicon

The following Python code example uses the AWS SDK for Python (Boto) to list the lexicons in your

account in the region speciﬁed in your local AWS conﬁguration. For information about creating the

conﬁguration ﬁle, see Step 2.1: Set up the AWS CLI.

For more information on this operation, see the reference for the ListLexicons API.

import sys

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError

# Create a client using the credentials and region defined in the adminuser

# section of the AWS credentials and configuration files

session = Session(profile_name="adminuser")

polly = session.client("polly")

try:

# Request the list of available lexicons

response = polly.list_lexicons()

except (BotoCoreError, ClientError) as error:

# The service returned an error, exit gracefully

print(error)

sys.exit(-1)

# Get the list of lexicons in the response

lexicons = response.get("Lexicons", [])

print("{0} lexicon(s) found".format(len(lexicons)))

# Output a formatted list of lexicons with some of the attributes

for lexicon in lexicons:

print((u" - {Name} ({Attributes[LanguageCode]}), "

"{Attributes[LexemesCount]} lexeme(s)").format(**lexicon))

PutLexicon

The following code sample show how to use Python (boto3)-based applications to store a

pronunciation lexicon in an AWS Region.

For more information on this operation, see the reference for the PutLexicon API.

Note the following:

Python samples 292

Amazon Polly Developer Guide

• You need to update the code by providing a local lexicon ﬁle name and a stored lexicon name.

•

The example assumes you have lexicon ﬁles created in a subdirectory called pls. You need to

update the path as appropriate.

The following code example uses default credentials stored in the AWS SDK conﬁguration ﬁle. For

information about creating the conﬁguration ﬁle, see Step 2.1: Set up the AWS CLI.

For more information on this operation, see the reference for the PutLexicon API.

from argparse import ArgumentParser

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError

# Define and parse the command line arguments

cli = ArgumentParser(description="PutLexicon example")

cli.add_argument("path", type=str, metavar="FILE_PATH")

cli.add_argument("-n", "--name", type=str, required=True,

metavar="LEXICON_NAME", dest="name")

arguments = cli.parse_args()

# Create a client using the credentials and region defined in the adminuser

# section of the AWS credentials and configuration files

session = Session(profile_name="adminuser")

polly = session.client("polly")

# Open the PLS lexicon file for reading

try:

with open(arguments.path, "r") as lexicon_file:

# Read the pls file contents

lexicon_data = lexicon_file.read()

# Store the PLS lexicon on the service.

# If a lexicon with that name already exists,

# its contents will be updated

response = polly.put_lexicon(Name=arguments.name,

Content=lexicon_data)

except (IOError, BotoCoreError, ClientError) as error:

# Could not open/read the file or the service returned an error,

# exit gracefully

cli.error(error)

Python samples 293

Amazon Polly Developer Guide

print(u"The \"{0}\" lexicon is now available for use.".format(arguments.name))

StartSpeechSynthesisTask

The following Python code example uses the AWS SDK for Python (Boto) to list the lexicons in your

account in the region speciﬁed in your local AWS conﬁguration. For information about creating the

conﬁguration ﬁle, see Step 2.1: Set up the AWS CLI.

For more information, see the reference for StartSpeechSynthesisTask API.

import boto3

import time

polly_client = boto3.Session(

aws_access_key_id='',

aws_secret_access_key='',

region_name='eu-west-2').client('polly')

response = polly_client.start_speech_synthesis_task(VoiceId='Joanna',

OutputS3BucketName='synth-books-buckets',

OutputS3KeyPrefix='key',

OutputFormat='mp3',

Text='This is a sample text to be synthesized.',

Engine='neural')

taskId = response['SynthesisTask']['TaskId']

print( "Task id is {} ".format(taskId))

task_status = polly_client.get_speech_synthesis_task(TaskId = taskId)

print(task_status)

SynthesizeSpeech

The following Python code example uses the AWS SDK for Python (Boto) synthesize speech

with shorter texts for near real-time processing. For more information, see the reference for the

SynthesizeSpeech operation.

This example uses a short string of plain text. You can use SSML text for more control over the

output. For more information, see Generating speech from SSML documents.

Python samples 294

Amazon Polly Developer Guide

import boto3

polly_client = boto3.Session(

aws_access_key_id=,

aws_secret_access_key=,

region_name='us-west-2').client('polly')

response = polly_client.synthesize_speech(VoiceId='Joanna',

OutputFormat='mp3',

Text = 'This is a sample text to be synthesized.',

Engine = 'neural')

file = open('speech.mp3', 'wb')

file.write(response['AudioStream'].read())

file.close()

Example applications

This section contains additional examples, in the form of example applications which can be used

to explore Amazon Polly.

Example Applications by Programming Language

• Python example (HTML5 Client and Python Server)

• Java example

• iOS example

• Android example

Python example (HTML5 Client and Python Server)

This example application consists of the following:

• An HTTP 1.1 server using the HTTP chunked transfer coding (see Chunked Transfer Coding)

• A simple HTML5 user interface that interacts with the HTTP 1.1 server (shown below):



Example applications 295

Amazon Polly Developer Guide

The goal of this example is to show how to use Amazon Polly to stream speech from a browser-

based HTML5 application. Consuming the audio stream produced by Amazon Polly as the text gets

synthesized is the recommended approach for use cases where responsiveness is an important

factor (for example, dialog systems, screen readers, etc.).

To run this example application you need the following:

• Web browser compliant with the HTML5 and EcmaScript5 standards (for example, Chrome 23.0

or higher, Firefox 21.0 or higher, Internet Explorer 9.0, or higher)

• Python version greater than 3.0

To test the application

Save the server code as server.py. For the code, see Python example: Python Server Code

(server.py).

Save the HTML5 client code as index.html. For the code, see Python example: HTML5 User

Interface (index.html).

Python example 296

Amazon Polly Developer Guide

3. Run the following command from the path where you saved server.py to start the application

(on some systems you might need to use python3 instead of python when running the

command).

$ python server.py

After the application starts, a URL appears on the terminal.

4. Open the URL shown in the terminal in a web browser.

You can pass the address and port for the application server to use as a parameter to

server.py. For more information, run python server.py -h.

5. To listen to speech, choose a voice from the list, type some text, and then choose Read. The

speech starts playing as soon as Amazon Polly transfers the ﬁrst usable chunk of audio data.

6. To stop the Python server when you're ﬁnished testing the application, press Ctrl+C in the

terminal where the server is running.

Note

The server creates a Boto3 client using the AWS SDK for Python (Boto). The client uses the

credentials stored in the AWS conﬁg ﬁle on your computer to sign and authenticate the

requests to Amazon Polly. For more information on how to create the AWS conﬁg ﬁle and

store credentials, see Conﬁguring the AWS Command Line Interface in the AWS Command

Line Interface User Guide.

Python example: HTML5 User Interface (index.html)

This section provides the code for the HTML5 client described in Python example (HTML5 Client

and Python Server).

<html>

<head>

<title>Text-to-Speech Example Application</title>

* This sample code requires a web browser with support for both the

* HTML5 and ECMAScript 5 standards; the following is a non-comprehensive

Python example 297

Amazon Polly Developer Guide

* list of compliant browsers and their minimum version:

* - Chrome 23.0+

* - Firefox 21.0+

* - Internet Explorer 9.0+

* - Edge 12.0+

* - Opera 15.0+

* - Safari 6.1+

* - Android (stock web browser) 4.4+

* - Chrome for Android 51.0+

* - Firefox for Android 48.0+

* - Opera Mobile 37.0+

* - iOS (Safari Mobile and Chrome) 3.2+

* - Internet Explorer Mobile 10.0+

* - Blackberry Browser 10.0+

// Mapping of the OutputFormat parameter of the SynthesizeSpeech API

// and the audio format strings understood by the browser

var AUDIO_FORMATS = {

'ogg_vorbis': 'audio/ogg',

'mp3': 'audio/mpeg',

'pcm': 'audio/wave; codecs=1'

};

/**

* Handles fetching JSON over HTTP

function fetchJSON(method, url, onSuccess, onError) {

var request = new XMLHttpRequest();

request.open(method, url, true);

request.onload = function () {

// If loading is complete

if (request.readyState === 4) {

// if the request was successful

if (request.status === 200) {

var data;

// Parse the JSON in the response

try {

data = JSON.parse(request.responseText);

} catch (error) {

onError(request.status, error.toString());

}

Python example 298

Amazon Polly Developer Guide

onSuccess(data);

} else {

onError(request.status, request.responseText)

}

};

request.send();

}

/**

* Returns a list of audio formats supported by the browser

function getSupportedAudioFormats(player) {

return Object.keys(AUDIO_FORMATS)

.filter(function (format) {

var supported = player.canPlayType(AUDIO_FORMATS[format]);

return supported === 'probably' || supported === 'maybe';

});

}

// Initialize the application when the DOM is loaded and ready to be

// manipulated

document.addEventListener("DOMContentLoaded", function () {

var input = document.getElementById('input'),

voiceMenu = document.getElementById('voice'),

text = document.getElementById('text'),

player = document.getElementById('player'),

submit = document.getElementById('submit'),

supportedFormats = getSupportedAudioFormats(player);

// Display a message and don't allow submitting the form if the

// browser doesn't support any of the available audio formats

if (supportedFormats.length === 0) {

submit.disabled = true;

alert('The web browser in use does not support any of the' +

' available audio formats. Please try with a different' +

' one.');

}

// Play the audio stream when the form is submitted successfully

input.addEventListener('submit', function (event) {

// Validate the fields in the form, display a message if

Python example 299

Amazon Polly Developer Guide

// unexpected values are encountered

if (voiceMenu.selectedIndex <= 0 || text.value.length === 0) {

alert('Please fill in all the fields.');

} else {

var selectedVoice = voiceMenu

.options[voiceMenu.selectedIndex]

.value;

// Point the player to the streaming server

player.src = '/read?voiceId=' +

encodeURIComponent(selectedVoice) +

'&text=' + encodeURIComponent(text.value) +

'&outputFormat=' + supportedFormats[0];

player.play();

}

// Stop the form from submitting,

// Submitting the form is allowed only if the browser doesn't

// support Javascript to ensure functionality in such a case

event.preventDefault();

});

// Load the list of available voices and display them in a menu

fetchJSON('GET', '/voices',

// If the request succeeds

function (voices) {

var container = document.createDocumentFragment();

// Build the list of options for the menu

voices.forEach(function (voice) {

var option = document.createElement('option');

option.value = voice['Id'];

option.innerHTML = voice['Name'] + ' (' +

voice['Gender'] + ', ' +

voice['LanguageName'] + ')';

container.appendChild(option);

});

// Add the options to the menu and enable the form field

voiceMenu.appendChild(container);

voiceMenu.disabled = false;

// If the request fails

function (status, response) {

Python example 300

Amazon Polly Developer Guide

// Display a message in case loading data from the server

// fails

alert(status + ' - ' + response);

});

</script>

<style>

#input {

min-width: 100px;

max-width: 600px;

margin: 0 auto;

padding: 50px;

}

#input div {

margin-bottom: 20px;

}

#text {

width: 100%;

height: 200px;

display: block;

}

#submit {

width: 100%;

}

</style>

</head>

<body>

<div>

<label for="voice">Select a voice:</label>

<option value="">Choose a voice...</option>

</select>

</div>

<div>

<textarea id="text" maxlength="1000" minlength="1" name="text"

placeholder="Type some text here..."></textarea>

</div>

Python example 301

Amazon Polly Developer Guide

</form>

</body>

</html>

Python example: Python Server Code (server.py)

This section provides the code for the Python server described in Python example (HTML5 Client

and Python Server).

"""

Example Python 2.7+/3.3+ Application

This application consists of a HTTP 1.1 server using the HTTP chunked transfer

coding (https://tools.ietf.org/html/rfc2616#section-3.6.1) and a minimal HTML5

user interface that interacts with it.

The goal of this example is to start streaming the speech to the client (the

HTML5 web UI) as soon as the first consumable chunk of speech is returned in

order to start playing the audio as soon as possible.

For use cases where low latency and responsiveness are strong requirements,

this is the recommended approach.

The service documentation contains examples for non-streaming use cases where

waiting for the speech synthesis to complete and fetching the whole audio stream

at once are an option.

To test the application, run 'python server.py' and then open the URL

displayed in the terminal in a web browser (see index.html for a list of

supported browsers). The address and port for the server can be passed as

parameters to server.py. For more information, run: 'python server.py -h'

"""

from argparse import ArgumentParser

from collections import namedtuple

from contextlib import closing

from io import BytesIO

from json import dumps as json_encode

import os

import sys

if sys.version_info >= (3, 0):

Python example 302

Amazon Polly Developer Guide

from http.server import BaseHTTPRequestHandler, HTTPServer

from socketserver import ThreadingMixIn

from urllib.parse import parse_qs

else:

from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer

from SocketServer import ThreadingMixIn

from urlparse import parse_qs

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError

ResponseStatus = namedtuple("HTTPStatus",

["code", "message"])

ResponseData = namedtuple("ResponseData",

["status", "content_type", "data_stream"])

# Mapping the output format used in the client to the content type for the

# response

AUDIO_FORMATS = {"ogg_vorbis": "audio/ogg",

"mp3": "audio/mpeg",

"pcm": "audio/wave; codecs=1"}

CHUNK_SIZE = 1024

HTTP_STATUS = {"OK": ResponseStatus(code=200, message="OK"),

"BAD_REQUEST": ResponseStatus(code=400, message="Bad request"),

"NOT_FOUND": ResponseStatus(code=404, message="Not found"),

"INTERNAL_SERVER_ERROR": ResponseStatus(code=500, message="Internal

server error")}

PROTOCOL = "http"

ROUTE_INDEX = "/index.html"

ROUTE_VOICES = "/voices"

ROUTE_READ = "/read"

# Create a client using the credentials and region defined in the adminuser

# section of the AWS credentials and configuration files

session = Session(profile_name="adminuser")

polly = session.client("polly")

class HTTPStatusError(Exception):

"""Exception wrapping a value from http.server.HTTPStatus"""

def __init__(self, status, description=None):

Python example 303

Amazon Polly Developer Guide

"""

Constructs an error instance from a tuple of

(code, message, description), see http.server.HTTPStatus

"""

super(HTTPStatusError, self).__init__()

self.code = status.code

self.message = status.message

self.explain = description

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):

"""An HTTP Server that handle each request in a new thread"""

daemon_threads = True

class ChunkedHTTPRequestHandler(BaseHTTPRequestHandler):

""""HTTP 1.1 Chunked encoding request handler"""

# Use HTTP 1.1 as 1.0 doesn't support chunked encoding

protocol_version = "HTTP/1.1"

def query_get(self, queryData, key, default=""):

"""Helper for getting values from a pre-parsed query string"""

return queryData.get(key, [default])[0]

def do_GET(self):

"""Handles GET requests"""

# Extract values from the query string

path, _, query_string = self.path.partition('?')

query = parse_qs(query_string)

response = None

print(u"[START]: Received GET for %s with query: %s" % (path, query))

try:

# Handle the possible request paths

if path == ROUTE_INDEX:

response = self.route_index(path, query)

elif path == ROUTE_VOICES:

response = self.route_voices(path, query)

elif path == ROUTE_READ:

response = self.route_read(path, query)

else:

Python example 304

Amazon Polly Developer Guide

response = self.route_not_found(path, query)

self.send_headers(response.status, response.content_type)

self.stream_data(response.data_stream)

except HTTPStatusError as err:

# Respond with an error and log debug

# information

if sys.version_info >= (3, 0):

self.send_error(err.code, err.message, err.explain)

else:

self.send_error(err.code, err.message)

self.log_error(u"%s %s %s - [%d] %s", self.client_address[0],

self.command, self.path, err.code, err.explain)

print("[END]")

def route_not_found(self, path, query):

"""Handles routing for unexpected paths"""

raise HTTPStatusError(HTTP_STATUS["NOT_FOUND"], "Page not found")

def route_index(self, path, query):

"""Handles routing for the application's entry point'"""

try:

return ResponseData(status=HTTP_STATUS["OK"], content_type="text_html",

# Open a binary stream for reading the index

# HTML file

data_stream=open(os.path.join(sys.path[0],

path[1:]), "rb"))

except IOError as err:

# Couldn't open the stream

raise HTTPStatusError(HTTP_STATUS["INTERNAL_SERVER_ERROR"],

str(err))

def route_voices(self, path, query):

"""Handles routing for listing available voices"""

params = {}

voices = []

while True:

try:

# Request list of available voices, if a continuation token

# was returned by the previous call then use it to continue

Python example 305

Amazon Polly Developer Guide

# listing

response = polly.describe_voices(**params)

except (BotoCoreError, ClientError) as err:

# The service returned an error

raise HTTPStatusError(HTTP_STATUS["INTERNAL_SERVER_ERROR"],

str(err))

# Collect all the voices

voices.extend(response.get("Voices", []))

# If a continuation token was returned continue, stop iterating

# otherwise

if "NextToken" in response:

params = {"NextToken": response["NextToken"]}

else:

break

json_data = json_encode(voices)

bytes_data = bytes(json_data, "utf-8") if sys.version_info >= (3, 0) \

else bytes(json_data)

return ResponseData(status=HTTP_STATUS["OK"],

content_type="application/json",

# Create a binary stream for the JSON data

data_stream=BytesIO(bytes_data))

def route_read(self, path, query):

"""Handles routing for reading text (speech synthesis)"""

# Get the parameters from the query string

text = self.query_get(query, "text")

voiceId = self.query_get(query, "voiceId")

outputFormat = self.query_get(query, "outputFormat")

# Validate the parameters, set error flag in case of unexpected

# values

if len(text) == 0 or len(voiceId) == 0 or \

outputFormat not in AUDIO_FORMATS:

raise HTTPStatusError(HTTP_STATUS["BAD_REQUEST"],

"Wrong parameters")

else:

try:

# Request speech synthesis

response = polly.synthesize_speech(Text=text,

VoiceId=voiceId,

Python example 306

Amazon Polly Developer Guide

OutputFormat=outputFormat,

Engine="neural")

except (BotoCoreError, ClientError) as err:

# The service returned an error

raise HTTPStatusError(HTTP_STATUS["INTERNAL_SERVER_ERROR"],

str(err))

return ResponseData(status=HTTP_STATUS["OK"],

content_type=AUDIO_FORMATS[outputFormat],

# Access the audio stream in the response

data_stream=response.get("AudioStream"))

def send_headers(self, status, content_type):

"""Send out the group of headers for a successful request"""

# Send HTTP headers

self.send_response(status.code, status.message)

self.send_header('Content-type', content_type)

self.send_header('Transfer-Encoding', 'chunked')

self.send_header('Connection', 'close')

self.end_headers()

def stream_data(self, stream):

"""Consumes a stream in chunks to produce the response's output'"""

print("Streaming started...")

if stream:

# Note: Closing the stream is important as the service throttles on

# the number of parallel connections. Here we are using

# contextlib.closing to ensure the close method of the stream object

# will be called automatically at the end of the with statement's

# scope.

with closing(stream) as managed_stream:

# Push out the stream's content in chunks

while True:

data = managed_stream.read(CHUNK_SIZE)

self.wfile.write(b"%X\r\n%s\r\n" % (len(data), data))

# If there's no more data to read, stop streaming

if not data:

break

# Ensure any buffered output has been transmitted and close the

# stream

self.wfile.flush()

Python example 307

Amazon Polly Developer Guide

print("Streaming completed.")

else:

# The stream passed in is empty

self.wfile.write(b"0\r\n\r\n")

print("Nothing to stream.")

# Define and parse the command line arguments

cli = ArgumentParser(description='Example Python Application')

cli.add_argument(

"-p", "--port", type=int, metavar="PORT", dest="port", default=8000)

cli.add_argument(

"--host", type=str, metavar="HOST", dest="host", default="localhost")

arguments = cli.parse_args()

# If the module is invoked directly, initialize the application

if __name__ == '__main__':

# Create and configure the HTTP server instance

server = ThreadedHTTPServer((arguments.host, arguments.port),

ChunkedHTTPRequestHandler)

print("Starting server, use <Ctrl-C> to stop...")

print(u"Open {0}://{1}:{2}{3} in a web browser.".format(PROTOCOL,

arguments.host,

arguments.port,

ROUTE_INDEX))

try:

# Listen for requests indefinitely

server.serve_forever()

except KeyboardInterrupt:

# A request to terminate has been received, stop the server

print("\nShutting down...")

server.socket.close()

Python example 308

Amazon Polly Developer Guide

Java example

This example shows how to use Amazon Polly to stream speech from a Java-based application. The

example uses the AWS SDK for Java to read the speciﬁed text using a voice selected from a list.

The code shown covers major tasks, but does only minimal error checking. If Amazon Polly

encounters an error, the application terminates.

To run this example application, you need the following:

• Java 8 Java Development Kit (JDK)

• AWS SDK for Java

• Apache Maven

To test the application

1. Ensure that the JAVA_HOME environment variable is set for the JDK.

For example, if you installed JDK 1.8.0_121 on Windows at C:\Program Files\Java

\jdk1.8.0_121, you would type the following at the command prompt:

set JAVA_HOME=""C:\Program Files\Java\jdk1.8.0_121""

If you installed JDK 1.8.0_121 in Linux at /usr/lib/jvm/java8-openjdk-amd64 , you

would type the following at the command prompt:

export JAVA_HOME=/usr/lib/jvm/java8-openjdk-amd64

2. Set the Maven environment variables to run Maven from the command line.

For example, if you installed Maven 3.3.9 on Windows at C:\Program Files\apache-

maven-3.3.9, you would type the following:

set M2_HOME=""C:\Program Files\apache-maven-3.3.9""

set M2=%M2_HOME%\bin

set PATH=%M2%;%PATH%

If you installed Maven 3.3.9 on Linux at /home/ec2-user/opt/apache-maven-3.3.9, you

would type the following:

Java example 309

Amazon Polly Developer Guide

export M2_HOME=/home/ec2-user/opt/apache-maven-3.3.9

export M2=$M2_HOME/bin

export PATH=$M2:$PATH

Create a new directory called polly-java-demo.

In the polly-java-demo directory, create a new ﬁle called pom.xml, and paste the following

code into it:

<project xmlns="http://maven.apache.org/POM/4.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/

maven-4.0.0.xsd">

<groupId>com.amazonaws.polly</groupId>

<version>0.0.1-SNAPSHOT</version>

<groupId>com.amazonaws</groupId>

<artifactId>aws-java-sdk-polly</artifactId>

</dependency>

<groupId>com.googlecode.soundlibs</groupId>

<artifactId>jlayer</artifactId>

</dependency>

</dependencies>

<build>

<groupId>org.codehaus.mojo</groupId>

<artifactId>exec-maven-plugin</artifactId>

<goals>

Java example 310

Amazon Polly Developer Guide

</goals>

</execution>

</executions>

<mainClass>com.amazonaws.demos.polly.PollyDemo</mainClass>

</configuration>

</plugin>

</plugins>

</build>

</project>

Create a new directory called polly at src/main/java/com/amazonaws/demos.

In the polly directory, create a new Java source ﬁle called PollyDemo.java, and paste in the

following code:

package com.amazonaws.demos.polly;

import java.io.IOException;

import java.io.InputStream;

import com.amazonaws.ClientConfiguration;

import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;

import com.amazonaws.regions.Region;

import com.amazonaws.regions.Regions;

import com.amazonaws.services.polly.AmazonPollyClient;

import com.amazonaws.services.polly.model.DescribeVoicesRequest;

import com.amazonaws.services.polly.model.DescribeVoicesResult;

import com.amazonaws.services.polly.model.OutputFormat;

import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;

import com.amazonaws.services.polly.model.SynthesizeSpeechResult;

import com.amazonaws.services.polly.model.Voice;

import javazoom.jl.player.advanced.AdvancedPlayer;

import javazoom.jl.player.advanced.PlaybackEvent;

import javazoom.jl.player.advanced.PlaybackListener;

public class PollyDemo {

private final AmazonPollyClient polly;

private final Voice voice;

private static final String SAMPLE = "Congratulations. You have successfully built

this working demo

Java example 311

Amazon Polly Developer Guide

of Amazon Polly in Java. Have fun building voice enabled apps with Amazon Polly

(that's me!), and always

look at the AWS website for tips and tricks on using Amazon Polly and other great

services from AWS";

public PollyDemo(Region region) {

// create an Amazon Polly client in a specific region

polly = new AmazonPollyClient(new DefaultAWSCredentialsProviderChain(),

new ClientConfiguration());

polly.setRegion(region);

// Create describe voices request.

DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();

// Synchronously ask Amazon Polly to describe available TTS voices.

DescribeVoicesResult describeVoicesResult =

polly.describeVoices(describeVoicesRequest);

voice = describeVoicesResult.getVoices().get(0);

}

public InputStream synthesize(String text, OutputFormat format) throws IOException

{

SynthesizeSpeechRequest synthReq =

new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())

.withOutputFormat(format).withEngine("neural");

SynthesizeSpeechResult synthRes = polly.synthesizeSpeech(synthReq);

return synthRes.getAudioStream();

}

public static void main(String args[]) throws Exception {

//create the test class

PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_EAST_1));

//get the audio stream

InputStream speechStream = helloWorld.synthesize(SAMPLE, OutputFormat.Mp3);

//create an MP3 player

AdvancedPlayer player = new AdvancedPlayer(speechStream,

javazoom.jl.player.FactoryRegistry.systemRegistry().createAudioDevice());

player.setPlayBackListener(new PlaybackListener() {

@Override

public void playbackStarted(PlaybackEvent evt) {

System.out.println("Playback started");

System.out.println(SAMPLE);

Java example 312

Amazon Polly Developer Guide

}

@Override

public void playbackFinished(PlaybackEvent evt) {

System.out.println("Playback finished");

}

});

// play it!

player.play();

}

Return to the polly-java-demo directory to clean, compile, and execute the demo:

mvn clean compile exec:java

Java example 313

Amazon Polly Developer Guide

iOS example

The following example uses the iOS SDK for Amazon Polly to read the speciﬁed text using a voice

selected from a list of voices.

The code shown here covers the major tasks but does not handle errors. For the complete code, see

AWS Mobile SDK for iOS Amazon Polly demo.

Initialize

// Region of Amazon Polly.

let AwsRegion = AWSRegionType.usEast1

// Cognito pool ID. Pool needs to be unauthenticated pool with

// Amazon Polly permissions.

let CognitoIdentityPoolId = "YourCognitoIdentityPoolId"

// Initialize the Amazon Cognito credentials provider.

let credentialProvider = AWSCognitoCredentialsProvider(regionType: AwsRegion,

identityPoolId: CognitoIdentityPoolId)

// Create an audio player

var audioPlayer = AVPlayer()

Get List of Available Voices

// Use the configuration as default

AWSServiceManager.default().defaultServiceConfiguration = configuration

// Get all the voices (no parameters specified in input) from Amazon Polly

// This creates an async task.

let task = AWSPolly.default().describeVoices(AWSPollyDescribeVoicesInput())

// When the request is done, asynchronously do the following block

// (we ignore all the errors, but in a real-world scenario they need

// to be handled)

task.continue(successBlock: { (awsTask: AWSTask) -> Any? in

// awsTask.result is an instance of AWSPollyDescribeVoicesOutput in

// case of the "describeVoices" method

let voices = (awsTask.result! as AWSPollyDescribeVoicesOutput).voices

iOS example 314

Amazon Polly Developer Guide

return nil

})

Synthesize Speech

// First, Amazon Polly requires an input, which we need to prepare.

// Again, we ignore the errors, however this should be handled in

// real applications. Here we are using the URL Builder Request,

// since in order to make the synthesis quicker we will pass the

// presigned URL to the system audio player.

let input = AWSPollySynthesizeSpeechURLBuilderRequest()

// Text to synthesize

input.text = "Sample text"

// We expect the output in MP3 format

input.outputFormat = AWSPollyOutputFormat.mp3

// Choose the voice ID

input.voiceId = AWSPollyVoiceId.joanna

// Create an task to synthesize speech using the given synthesis input

let builder = AWSPollySynthesizeSpeechURLBuilder.default().getPreSignedURL(input)

// Request the URL for synthesis result

builder.continueOnSuccessWith(block: { (awsTask: AWSTask<NSURL>) -> Any? in

// The result of getPresignedURL task is NSURL.

// Again, we ignore the errors in the example.

let url = awsTask.result!

// Try playing the data using the system AVAudioPlayer

self.audioPlayer.replaceCurrentItem(with: AVPlayerItem(url: url as URL))

self.audioPlayer.play()

return nil

})

iOS example 315

Amazon Polly Developer Guide

Android example

The following example uses the Android SDK for Amazon Polly to read the speciﬁed text using a

voice selected from a list of voices.

The code shown here covers the major tasks but does not handle errors. For the complete code, see

the AWS Mobile SDK for Android Amazon Polly demo.

Initialize

// Cognito pool ID. Pool needs to be unauthenticated pool with

// Amazon Polly permissions.

String COGNITO_POOL_ID = "YourCognitoIdentityPoolId";

// Region of Amazon Polly.

Regions MY_REGION = Regions.US_EAST_1;



// Initialize the Amazon Cognito credentials provider.

CognitoCachingCredentialsProvider credentialsProvider = new

CognitoCachingCredentialsProvider(

 getApplicationContext(),

 COGNITO_POOL_ID,

 MY_REGION

);

// Create a client that supports generation of presigned URLs.

AmazonPollyPresigningClient client = new

AmazonPollyPresigningClient(credentialsProvider);

Get List of Available Voices

// Create describe voices request.

DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();

// Synchronously ask Amazon Polly to describe available TTS voices.

DescribeVoicesResult describeVoicesResult =

client.describeVoices(describeVoicesRequest);

List<Voice> voices = describeVoicesResult.getVoices();

Get URL for Audio Stream

Android example 316

Amazon Polly Developer Guide

// Create speech synthesis request.

SynthesizeSpeechPresignRequest synthesizeSpeechPresignRequest =

 new SynthesizeSpeechPresignRequest()

 // Set the text to synthesize.

 .withText("Hello world!")

 // Select voice for synthesis.

 .withVoiceId(voices.get(0).getId()) // "Joanna"

 // Set format to MP3.

 .withOutputFormat(OutputFormat.Mp3);

// Get the presigned URL for synthesized speech audio stream.

URL presignedSynthesizeSpeechUrl =

 client.getPresignedSynthesizeSpeechUrl(synthesizeSpeechPresignRequest);

Play Synthesized Speech

// Use MediaPlayer: https://developer.android.com/guide/topics/media/mediaplayer.html

// Create a media player to play the synthesized audio stream.

MediaPlayer mediaPlayer = new MediaPlayer();

mediaPlayer.setAudioStreamType(AudioManager.STREAM_MUSIC);

try {

 // Set media player's data source to previously obtained URL.

 mediaPlayer.setDataSource(presignedSynthesizeSpeechUrl.toString());

} catch (IOException e) {

 Log.e(TAG, "Unable to set data source for the media player! " + e.getMessage());

}

// Prepare the MediaPlayer asynchronously (since the data source is a network stream).

mediaPlayer.prepareAsync();

// Set the callback to start the MediaPlayer when it's prepared.

mediaPlayer.setOnPreparedListener(new MediaPlayer.OnPreparedListener() {

 @Override

 public void onPrepared(MediaPlayer mp) {

 mp.start();

 }

});

// Set the callback to release the MediaPlayer after playback is completed.

mediaPlayer.setOnCompletionListener(new MediaPlayer.OnCompletionListener() {

Android example 317

Amazon Polly Developer Guide

@Override

public void onCompletion(MediaPlayer mp) {

mp.release();

}

});

Android example 318

Amazon Polly Developer Guide

Quotas in Amazon Polly

Amazon Polly applies quotas to customer traﬃc by rejecting excessive requests. The default

quota for the SynthesizeSpeech request with standard voices is 80 transactions per second

(tps), in a single region, for a single AWS account. If limits did not increase, and if you generated

100 SynthesizeSpeech requests per second using a standard voice, 80 requests per second

would succeed, and 20 requests per second would be throttled by Amazon Polly. These

requests would return a response with HTTP status 400, and a response header indicating

ThrottlingException. Amazon Polly also throttles traﬃc to all operations based on the request

rate.

Speech synthesis limit examples

• Synthesize the ﬁrst 24 letters of the English alphabet one letter at a time. If the synthesis

of each letter took less than 50 milliseconds, with an operation limit of eight tps, synthesizing

24 letters would take at least three seconds. During that time, you could synthesize up to eight

letters per second. Any further requests would be throttled. As the requests last a short time,

they would be synthesized serially without overlap.

• Synthesize 16 paragraphs of text. If each paragraph was synthesized and fully received on the

client side in two seconds or less, with an operation limit of eight concurrent requests, it would

take at least four seconds to synthesize all 16 articles. In the ﬁrst second, you could start up

to eight requests. During concurrent requests, any attempt to start a new synthesis would be

throttled due to the concurrency limit. You could synthesize the remaining eight paragraphs

after the ﬁrst two seconds, after the ﬁrst batch of requests ﬁnishes.

Keep the following limits in mind when using Amazon Polly.

Topics

• Supported regions

• Quotas and throttle rates

• Pronunciation lexicons

• SynthesizeSpeech API operations

• SpeechSynthesisTask API operations

• Speech Synthesis Markup Language (SSML)

319

Amazon Polly Developer Guide

Supported regions

For a list of AWS Regions where Amazon Polly is available, see Amazon Polly Endpoints and Quotas

in the Amazon Web Services General Reference.

• For Regions that support generative voices, see Generative voices.

• For Regions that support long-form voices, see Long-form voices.

• For Regions that support neural voices, see the section called “Feature and region compatibility”

for neural TTS.

Quotas and throttle rates

The following table deﬁnes throttle rates per Amazon Polly operation. You can use the AWS

Management Console to request quota increases for the adjustable quotas when needed.

Operation Limit

Lexicon 

DeleteLexicon

PutLexicon

GetLexicon

ListLexicons

Any 2 transactions per second (tps) from these operations

combined.

Maximum allowed burst of 4 tps.

Speech

DescribeVoices

80 tps with a burst limit of 100 tps

SynthesizeSpeech

Generative voice: 8 tps

Long-form voice: 8 tps with a burst limit of 10 tps

Neural voice: 8 tps with a burst limit of 10 tps

Standard voice: 80 tps with a burst limit of 100 tps

Supported regions 320

Amazon Polly Developer Guide

Operation Limit

StartSpeechSynthes

isTask

Generative voice: 1 tps

Long-form voice: 1 tps

Neural voice: 1 tps

Standard voice: 10 tps with a burst limit of 12 tps

GetSynthesizeSpeec

hTask and ListSynth

esizeSpeechTask

Maximum allowed 10 tps combined

Concurrent requests

For generative voice, Amazon Polly supports up to 26 concurrent requests. For long-form voice,

Amazon Polly supports up to 26 concurrent requests. For neural voice, Amazon Polly supports 8

tps with a burst limit of 10 tps, for up to 18 concurrent requests. Amazon Polly also supports limits

for concurrent requests. For standard voice, Amazon Polly supports 80 tps for up to 80 concurrent

requests.

Best practices to mitigate throttling

• Retry throttles with backoﬀ and jitter so you can spread the load over a short period of time,

and handle unexpected peaks in usage without compromising availability. AWS Code Sample

Catalog is already conﬁgured to do this by default in many programming languages. Visit feature

retry behavior to see the details.

• Use Amazon Polly metrics. Amazon Polly automatically publishes to CloudWatch to analyze

your current usage and forecast usage growth.

Note

Before requesting a quota increase (where applicable), calculate your tps needs following

the guidelines on this page. Amazon Polly secures only the required computational

resources according to customer demand in order to keep your costs low.

Concurrent requests 321

Amazon Polly Developer Guide

Pronunciation lexicons

• You can store up to 100 lexicons per account.

• Lexicon names can be an alphanumeric string up to 20 characters long.

• Each lexicon can be up to 40,000 characters in size. (Note that the size of the lexicon aﬀects the

latency of the SynthesizeSpeech operation.)

• You can specify up to 100 characters for each <phoneme> or <alias> replacement in a lexicon.

For information about using lexicons, see Managing lexicons.

SynthesizeSpeech API operations

When estimating the usage of SynthesizeSpeech, keep in mind that the audio produced by

Amazon Polly, especially for interactive applications, usually takes at least several seconds to be

played. This reduces the rate of requests to SynthesizeSpeech, even for a large number of

concurrent consumers. Additionally, Amazon Polly throttles SynthesizeSpeech requests by the

number of concurrent requests that it synthesizes. There is no separate setting for concurrent

requests. The concurrent requests limit has always the same value as the number of tps allowed

and scales with it.

Short story example application. You can use Amazon Polly to build an application that plays a

series of short stories. With this kind of app, the ﬁrst story would start playing, and then the next,

and so on, until a user quit the application. Each story would take around 0.5 seconds to synthesize

and 10 seconds to play. In this scenario, you could expect one call to SynthesizeSpeech for

every 10 seconds that the customer spent using the application. This would translate to one

call per second for every 10 customers who were concurrently using the application. If you had

1000 customers concurrently using the application, you could expect an average call rate to

SynthesizeSpeech of only 100 transactions per second.

Note the following limits related to using the SynthesizeSpeech API operation:

• The size of the input text can be up to 3000 billed characters (6000 total characters). SSML tags

are not counted as billed characters.

• You can specify up to ﬁve lexicons to apply to the input text.

• The output audio stream (synthesis) is limited to 10 minutes. After this is reached, any remaining

speech is cut oﬀ.

Pronunciation lexicons 322

Amazon Polly Developer Guide

For more information, see SynthesizeSpeech.

Note

Some limitations of the SynthesizeSpeech API operation can be bypassed using the

StartSythensizeSpeechTask API operation. For more information, see Creating long

audio ﬁles.

SpeechSynthesisTask API operations

Note the following limit relating to using the StartSpeechSynthesisTask,

GetSpeechSynthesisTask, and ListSpeechSynthesisTasks API operations:

• The size of the input text can be up to 100,000 billed characters (200,000 total characters). SSML

tags are not counted as billed characters.

• You can specify up to ﬁve lexicons to apply to the input text.

Speech Synthesis Markup Language (SSML)

Note the following limits related to using SSML:

•

The <audio>, <lexicon>, <lookup>, and <voice> tags are not supported.

•

<break> elements can specify a maximum duration of 10 seconds each.

•

The <prosody> tag doesn't support values for the rate attribute lower than -80%.

For more information, see Generating speech from SSML documents.

SpeechSynthesisTask API operations 323

Amazon Polly Developer Guide

Security in Amazon Polly

Cloud security at AWS is the highest priority. As an AWS customer, you beneﬁt from a data center

and network architecture that is built to meet the requirements of the most security-sensitive

organizations.

Security is a shared responsibility between AWS and you. The shared responsibility model describes

this as security of the cloud and security in the cloud:

• Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS

services in the AWS Cloud. AWS also provides you with services that you can use securely. Third-

party auditors regularly test and verify the eﬀectiveness of our security as part of the AWS

Compliance Programs. To learn about the compliance programs that apply to Amazon Polly, see

AWS Services in Scope by Compliance Program.

• Security in the cloud – Your responsibility is determined by the AWS service that you use.

You're also responsible for other factors including the sensitivity of your data, your company’s

requirements, and applicable laws and regulations.

This documentation helps you understand how to apply the shared responsibility model when

using Amazon Polly. The following topics show you how to conﬁgure Amazon Polly to meet your

security and compliance objectives. You also learn how to use other AWS services that help you to

monitor and secure your Amazon Polly resources.

Topics

• Data Protection in Amazon Polly

• Identity and Access Management in Amazon Polly

• Logging and Monitoring in Amazon Polly

• Compliance Validation for Amazon Polly

• Resilience in Amazon Polly

• Infrastructure Security in Amazon Polly

• Security Best Practices for Amazon Polly

• Using Amazon Polly with interface VPC endpoints

324

Amazon Polly Developer Guide

Data Protection in Amazon Polly

Amazon Polly conforms to the AWS shared responsibility model, which includes regulations and

guidelines for data protection. AWS is responsible for protecting the global infrastructure that runs

all the AWS services. AWS maintains control over data hosted on this infrastructure, including the

security conﬁguration controls for handling customer content and personal data. AWS customers

and APN partners, acting either as data controllers or data processors, are responsible for any

personal data that they put in the AWS Cloud.

For data protection purposes, we recommend that you protect AWS account credentials and set up

individual users with AWS Identity and Access Management (IAM), so that each user is given only

the permissions necessary to fulﬁll their job duties. We also recommend that you secure your data

in the following ways:

• Use multi-factor authentication (MFA) with each account.

• Use SSL/TLS to communicate with AWS resources.

• Set up API and user activity logging with AWS CloudTrail.

• Use AWS encryption solutions, along with all default security controls within AWS services.

We strongly recommend that you never put sensitive identifying information, such as your

customers' account numbers, into free-form ﬁelds such as a Name ﬁeld. This includes when you

work with Amazon Polly or other AWS services using the console, API, AWS CLI, or AWS SDKs.

Any data that you enter into Amazon Polly or other services might get picked up for inclusion

in diagnostic logs. When you provide a URL to an external server, don't include credentials

information in the URL to validate your request to that server.

For more information about data protection, see the AWS Shared Responsibility Model and GDPR

blog post on the AWS Security Blog.

Encryption at Rest

Output of your Amazon Polly voice synthesis can be saved on your own system. You can also call

Amazon Polly, and then encrypt the ﬁle with any encryption key of your choice and store it in

Amazon Simple Storage Service (Amazon S3) or another secure storage. The Amazon Polly the

section called “SynthesizeSpeech” operation is stateless and is not associated with a customer

identity. You can't retrieve it from Amazon Polly later.

Data Protection 325

Amazon Polly Developer Guide

Encryption in Transit

All text submissions are protected by Secure Sockets Layer (SSL) while in transit. Amazon Polly

does not retain the content of text submissions.

Internetwork Traﬃc Privacy

Access to Amazon Polly is via the AWS console, CLI, or SDKs. Communications utilize Transport

Layer Security (TLS) session encryption for conﬁdentiality and digital signatures for authentication

and integrity.

Identity and Access Management in Amazon Polly

AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely

control access to AWS resources. IAM administrators control who can be authenticated (signed in)

and authorized (have permissions) to use Amazon Polly resources. IAM is an AWS service that you

can use with no additional charge.

Topics

• Audience

• Authenticating with identities

• Managing access using policies

• How Amazon Polly works with IAM

• Identity-based policy examples for Amazon Polly

• Amazon Polly API Permissions: Actions, Permissions, and Resources Reference

• Troubleshooting Amazon Polly identity and access

Audience

How you use AWS Identity and Access Management (IAM) diﬀers, depending on the work that you

do in Amazon Polly.

Service user – If you use the Amazon Polly service to do your job, then your administrator provides

you with the credentials and permissions that you need. As you use more Amazon Polly features to

do your work, you might need additional permissions. Understanding how access is managed can

Encryption in Transit 326

Amazon Polly Developer Guide

help you request the right permissions from your administrator. If you cannot access a feature in

Amazon Polly, see Troubleshooting Amazon Polly identity and access.

Service administrator – If you're in charge of Amazon Polly resources at your company, you

probably have full access to Amazon Polly. It's your job to determine which Amazon Polly features

and resources your service users should access. You must then submit requests to your IAM

administrator to change the permissions of your service users. Review the information on this page

to understand the basic concepts of IAM. To learn more about how your company can use IAM with

Amazon Polly, see How Amazon Polly works with IAM.

IAM administrator – If you're an IAM administrator, you might want to learn details about how you

can write policies to manage access to Amazon Polly. To view example Amazon Polly identity-based

policies that you can use in IAM, see Identity-based policy examples for Amazon Polly.

Authenticating with identities

Authentication is how you sign in to AWS using your identity credentials. You must be

authenticated (signed in to AWS) as the AWS account root user, as an IAM user, or by assuming an

IAM role.

You can sign in to AWS as a federated identity by using credentials provided through an identity

source. AWS IAM Identity Center (IAM Identity Center) users, your company's single sign-on

authentication, and your Google or Facebook credentials are examples of federated identities.

When you sign in as a federated identity, your administrator previously set up identity federation

using IAM roles. When you access AWS by using federation, you are indirectly assuming a role.

Depending on the type of user you are, you can sign in to the AWS Management Console or the

AWS access portal. For more information about signing in to AWS, see How to sign in to your AWS

account in the AWS Sign-In User Guide.

If you access AWS programmatically, AWS provides a software development kit (SDK) and a

command line interface (CLI) to cryptographically sign your requests by using your credentials. If

you don't use AWS tools, you must sign requests yourself. For more information about using the

recommended method to sign requests yourself, see Signing AWS API requests in the IAM User

Guide.

Regardless of the authentication method that you use, you might be required to provide additional

security information. For example, AWS recommends that you use multi-factor authentication

(MFA) to increase the security of your account. To learn more, see Multi-factor authentication in the

Authenticating with identities 327

Amazon Polly Developer Guide

AWS IAM Identity Center User Guide and Using multi-factor authentication (MFA) in AWS in the IAM

User Guide.

AWS account root user

When you create an AWS account, you begin with one sign-in identity that has complete access to

all AWS services and resources in the account. This identity is called the AWS account root user and

is accessed by signing in with the email address and password that you used to create the account.

We strongly recommend that you don't use the root user for your everyday tasks. Safeguard your

root user credentials and use them to perform the tasks that only the root user can perform. For

the complete list of tasks that require you to sign in as the root user, see Tasks that require root

user credentials in the IAM User Guide.

Federated identity

As a best practice, require human users, including users that require administrator access, to use

federation with an identity provider to access AWS services by using temporary credentials.

A federated identity is a user from your enterprise user directory, a web identity provider, the AWS

Directory Service, the Identity Center directory, or any user that accesses AWS services by using

credentials provided through an identity source. When federated identities access AWS accounts,

they assume roles, and the roles provide temporary credentials.

For centralized access management, we recommend that you use AWS IAM Identity Center. You can

create users and groups in IAM Identity Center, or you can connect and synchronize to a set of users

and groups in your own identity source for use across all your AWS accounts and applications. For

information about IAM Identity Center, see What is IAM Identity Center? in the AWS IAM Identity

Center User Guide.

IAM users and groups

An IAM user is an identity within your AWS account that has speciﬁc permissions for a single person

or application. Where possible, we recommend relying on temporary credentials instead of creating

IAM users who have long-term credentials such as passwords and access keys. However, if you have

speciﬁc use cases that require long-term credentials with IAM users, we recommend that you rotate

access keys. For more information, see Rotate access keys regularly for use cases that require long-

term credentials in the IAM User Guide.

An IAM group is an identity that speciﬁes a collection of IAM users. You can't sign in as a group. You

can use groups to specify permissions for multiple users at a time. Groups make permissions easier

Authenticating with identities 328

Amazon Polly Developer Guide

to manage for large sets of users. For example, you could have a group named IAMAdmins and give

that group permissions to administer IAM resources.

Users are diﬀerent from roles. A user is uniquely associated with one person or application, but

a role is intended to be assumable by anyone who needs it. Users have permanent long-term

credentials, but roles provide temporary credentials. To learn more, see When to create an IAM user

(instead of a role) in the IAM User Guide.

IAM roles

An IAM role is an identity within your AWS account that has speciﬁc permissions. It is similar to an

IAM user, but is not associated with a speciﬁc person. You can temporarily assume an IAM role in

the AWS Management Console by switching roles. You can assume a role by calling an AWS CLI or

AWS API operation or by using a custom URL. For more information about methods for using roles,

see Using IAM roles in the IAM User Guide.

IAM roles with temporary credentials are useful in the following situations:

• Federated user access – To assign permissions to a federated identity, you create a role

and deﬁne permissions for the role. When a federated identity authenticates, the identity

is associated with the role and is granted the permissions that are deﬁned by the role. For

information about roles for federation, see Creating a role for a third-party Identity Provider

in the IAM User Guide. If you use IAM Identity Center, you conﬁgure a permission set. To control

what your identities can access after they authenticate, IAM Identity Center correlates the

permission set to a role in IAM. For information about permissions sets, see Permission sets in

the AWS IAM Identity Center User Guide.

• Temporary IAM user permissions – An IAM user or role can assume an IAM role to temporarily

take on diﬀerent permissions for a speciﬁc task.

• Cross-account access – You can use an IAM role to allow someone (a trusted principal) in a

diﬀerent account to access resources in your account. Roles are the primary way to grant cross-

account access. However, with some AWS services, you can attach a policy directly to a resource

(instead of using a role as a proxy). To learn the diﬀerence between roles and resource-based

policies for cross-account access, see Cross account resource access in IAM in the IAM User Guide.

• Cross-service access – Some AWS services use features in other AWS services. For example, when

you make a call in a service, it's common for that service to run applications in Amazon EC2 or

store objects in Amazon S3. A service might do this using the calling principal's permissions,

using a service role, or using a service-linked role.

Authenticating with identities 329

Amazon Polly Developer Guide

• Forward access sessions (FAS) – When you use an IAM user or role to perform actions in

AWS, you are considered a principal. When you use some services, you might perform an

action that then initiates another action in a diﬀerent service. FAS uses the permissions of the

principal calling an AWS service, combined with the requesting AWS service to make requests

to downstream services. FAS requests are only made when a service receives a request that

requires interactions with other AWS services or resources to complete. In this case, you must

have permissions to perform both actions. For policy details when making FAS requests, see

Forward access sessions.

• Service role – A service role is an IAM role that a service assumes to perform actions on your

behalf. An IAM administrator can create, modify, and delete a service role from within IAM. For

more information, see Creating a role to delegate permissions to an AWS service in the IAM

User Guide.

• Service-linked role – A service-linked role is a type of service role that is linked to an AWS

service. The service can assume the role to perform an action on your behalf. Service-linked

roles appear in your AWS account and are owned by the service. An IAM administrator can

view, but not edit the permissions for service-linked roles.

• Applications running on Amazon EC2 – You can use an IAM role to manage temporary

credentials for applications that are running on an EC2 instance and making AWS CLI or AWS API

requests. This is preferable to storing access keys within the EC2 instance. To assign an AWS role

to an EC2 instance and make it available to all of its applications, you create an instance proﬁle

that is attached to the instance. An instance proﬁle contains the role and enables programs that

are running on the EC2 instance to get temporary credentials. For more information, see Using

an IAM role to grant permissions to applications running on Amazon EC2 instances in the IAM

User Guide.

To learn whether to use IAM roles or IAM users, see When to create an IAM role (instead of a user)

in the IAM User Guide.

Managing access using policies

You control access in AWS by creating policies and attaching them to AWS identities or resources.

A policy is an object in AWS that, when associated with an identity or resource, deﬁnes their

permissions. AWS evaluates these policies when a principal (user, root user, or role session) makes

a request. Permissions in the policies determine whether the request is allowed or denied. Most

policies are stored in AWS as JSON documents. For more information about the structure and

contents of JSON policy documents, see Overview of JSON policies in the IAM User Guide.

Managing access using policies 330

Amazon Polly Developer Guide

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

By default, users and roles have no permissions. To grant users permission to perform actions on

the resources that they need, an IAM administrator can create IAM policies. The administrator can

then add the IAM policies to roles, and users can assume the roles.

IAM policies deﬁne permissions for an action regardless of the method that you use to perform the

operation. For example, suppose that you have a policy that allows the iam:GetRole action. A

user with that policy can get role information from the AWS Management Console, the AWS CLI, or

the AWS API.

Identity-based policies

Identity-based policies are JSON permissions policy documents that you can attach to an identity,

such as an IAM user, group of users, or role. These policies control what actions users and roles can

perform, on which resources, and under what conditions. To learn how to create an identity-based

policy, see Creating IAM policies in the IAM User Guide.

Identity-based policies can be further categorized as inline policies or managed policies. Inline

policies are embedded directly into a single user, group, or role. Managed policies are standalone

policies that you can attach to multiple users, groups, and roles in your AWS account. Managed

policies include AWS managed policies and customer managed policies. To learn how to choose

between a managed policy or an inline policy, see Choosing between managed policies and inline

policies in the IAM User Guide.

Resource-based policies

Resource-based policies are JSON policy documents that you attach to a resource. Examples of

resource-based policies are IAM role trust policies and Amazon S3 bucket policies. In services that

support resource-based policies, service administrators can use them to control access to a speciﬁc

resource. For the resource where the policy is attached, the policy deﬁnes what actions a speciﬁed

principal can perform on that resource and under what conditions. You must specify a principal

in a resource-based policy. Principals can include accounts, users, roles, federated users, or AWS

services.

Resource-based policies are inline policies that are located in that service. You can't use AWS

managed policies from IAM in a resource-based policy.

Managing access using policies 331

Amazon Polly Developer Guide

Access control lists (ACLs)

Access control lists (ACLs) control which principals (account members, users, or roles) have

permissions to access a resource. ACLs are similar to resource-based policies, although they do not

use the JSON policy document format.

Amazon S3, AWS WAF, and Amazon VPC are examples of services that support ACLs. To learn more

about ACLs, see Access control list (ACL) overview in the Amazon Simple Storage Service Developer

Guide.

Other policy types

AWS supports additional, less-common policy types. These policy types can set the maximum

permissions granted to you by the more common policy types.

• Permissions boundaries – A permissions boundary is an advanced feature in which you set

the maximum permissions that an identity-based policy can grant to an IAM entity (IAM user

or role). You can set a permissions boundary for an entity. The resulting permissions are the

intersection of an entity's identity-based policies and its permissions boundaries. Resource-based

policies that specify the user or role in the Principal ﬁeld are not limited by the permissions

boundary. An explicit deny in any of these policies overrides the allow. For more information

about permissions boundaries, see Permissions boundaries for IAM entities in the IAM User Guide.

• Service control policies (SCPs) – SCPs are JSON policies that specify the maximum permissions

for an organization or organizational unit (OU) in AWS Organizations. AWS Organizations is a

service for grouping and centrally managing multiple AWS accounts that your business owns. If

you enable all features in an organization, then you can apply service control policies (SCPs) to

any or all of your accounts. The SCP limits permissions for entities in member accounts, including

each AWS account root user. For more information about Organizations and SCPs, see Service

control policies in the AWS Organizations User Guide.

• Session policies – Session policies are advanced policies that you pass as a parameter when you

programmatically create a temporary session for a role or federated user. The resulting session's

permissions are the intersection of the user or role's identity-based policies and the session

policies. Permissions can also come from a resource-based policy. An explicit deny in any of these

policies overrides the allow. For more information, see Session policies in the IAM User Guide.

Managing access using policies 332

Amazon Polly Developer Guide

Multiple policy types

When multiple types of policies apply to a request, the resulting permissions are more complicated

to understand. To learn how AWS determines whether to allow a request when multiple policy

types are involved, see Policy evaluation logic in the IAM User Guide.

How Amazon Polly works with IAM

Before you use IAM to manage access to Amazon Polly, learn what IAM features are available to use

with Amazon Polly.

IAM features you can use with Amazon Polly

IAM feature Amazon Polly support

Identity-based policies Yes

Resource-based policies No

Policy actions Yes

Policy resources Yes

Policy condition keys (service-speciﬁc) No

ACLs No

ABAC (tags in policies) No

Temporary credentials Yes

Forward access sessions (FAS) for Amazon

Polly

Yes

Service roles No

Service-linked roles No

To get a high-level view of how Amazon Polly and other AWS services work with most IAM

features, see AWS services that work with IAM in the IAM User Guide.

How Amazon Polly works with IAM 333

Amazon Polly Developer Guide

Identity-based policies for Amazon Polly

Supports identity-based policies: Yes

Identity-based policies are JSON permissions policy documents that you can attach to an identity,

such as an IAM user, group of users, or role. These policies control what actions users and roles can

perform, on which resources, and under what conditions. To learn how to create an identity-based

policy, see Creating IAM policies in the IAM User Guide.

With IAM identity-based policies, you can specify allowed or denied actions and resources as well

as the conditions under which actions are allowed or denied. You can't specify the principal in an

identity-based policy because it applies to the user or role to which it is attached. To learn about all

of the elements that you can use in a JSON policy, see IAM JSON policy elements reference in the

IAM User Guide.

Identity-based policy examples for Amazon Polly

To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for

Amazon Polly.

Resource-based policies within Amazon Polly

Supports resource-based policies: No

Resource-based policies are JSON policy documents that you attach to a resource. Examples of

resource-based policies are IAM role trust policies and Amazon S3 bucket policies. In services that

support resource-based policies, service administrators can use them to control access to a speciﬁc

resource. For the resource where the policy is attached, the policy deﬁnes what actions a speciﬁed

principal can perform on that resource and under what conditions. You must specify a principal

in a resource-based policy. Principals can include accounts, users, roles, federated users, or AWS

services.

To enable cross-account access, you can specify an entire account or IAM entities in another

account as the principal in a resource-based policy. Adding a cross-account principal to a resource-

based policy is only half of establishing the trust relationship. When the principal and the resource

are in diﬀerent AWS accounts, an IAM administrator in the trusted account must also grant

the principal entity (user or role) permission to access the resource. They grant permission by

attaching an identity-based policy to the entity. However, if a resource-based policy grants access

to a principal in the same account, no additional identity-based policy is required. For more

information, see Cross account resource access in IAM in the IAM User Guide.

How Amazon Polly works with IAM 334

Amazon Polly Developer Guide

Policy actions for Amazon Polly

Supports policy actions: Yes

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

The Action element of a JSON policy describes the actions that you can use to allow or deny

access in a policy. Policy actions usually have the same name as the associated AWS API operation.

There are some exceptions, such as permission-only actions that don't have a matching API

operation. There are also some operations that require multiple actions in a policy. These

additional actions are called dependent actions.

Include actions in a policy to grant permissions to perform the associated operation.

To see a list of Amazon Polly actions, see Actions deﬁned by Amazon Polly in the Service

Authorization Reference.

Policy actions in Amazon Polly use the following preﬁx before the action:

polly

To specify multiple actions in a single statement, separate them with commas.

"Action": [

"polly:action1",

"polly:action2"

]

To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for

Amazon Polly.

Policy resources for Amazon Polly

Supports policy resources: Yes

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

How Amazon Polly works with IAM 335

Amazon Polly Developer Guide

The Resource JSON policy element speciﬁes the object or objects to which the action applies.

Statements must include either a Resource or a NotResource element. As a best practice,

specify a resource using its Amazon Resource Name (ARN). You can do this for actions that support

a speciﬁc resource type, known as resource-level permissions.

For actions that don't support resource-level permissions, such as listing operations, use a wildcard

(*) to indicate that the statement applies to all resources.

"Resource": "*"

To see a list of Amazon Polly resource types and their ARNs, see Resources deﬁned by Amazon

Polly in the Service Authorization Reference. To learn with which actions you can specify the ARN of

each resource, see Actions deﬁned by Amazon Polly.

To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for

Amazon Polly.

Policy condition keys for Amazon Polly

Supports service-speciﬁc policy condition keys: No

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

The Condition element (or Condition block) lets you specify conditions in which a statement

is in eﬀect. The Condition element is optional. You can create conditional expressions that use

condition operators, such as equals or less than, to match the condition in the policy with values in

the request.

If you specify multiple Condition elements in a statement, or multiple keys in a single

Condition element, AWS evaluates them using a logical AND operation. If you specify multiple

values for a single condition key, AWS evaluates the condition using a logical OR operation. All of

the conditions must be met before the statement's permissions are granted.

You can also use placeholder variables when you specify conditions. For example, you can grant

an IAM user permission to access a resource only if it is tagged with their IAM user name. For more

information, see IAM policy elements: variables and tags in the IAM User Guide.

AWS supports global condition keys and service-speciﬁc condition keys. To see all AWS global

condition keys, see AWS global condition context keys in the IAM User Guide.

How Amazon Polly works with IAM 336

Amazon Polly Developer Guide

To see a list of Amazon Polly condition keys, see Condition keys for Amazon Polly in the Service

Authorization Reference. To learn with which actions and resources you can use a condition key, see

Actions deﬁned by Amazon Polly.

To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for

Amazon Polly.

ACLs in Amazon Polly

Supports ACLs: No

Access control lists (ACLs) control which principals (account members, users, or roles) have

permissions to access a resource. ACLs are similar to resource-based policies, although they do not

use the JSON policy document format.

ABAC with Amazon Polly

Supports ABAC (tags in policies): No

Attribute-based access control (ABAC) is an authorization strategy that deﬁnes permissions based

on attributes. In AWS, these attributes are called tags. You can attach tags to IAM entities (users or

roles) and to many AWS resources. Tagging entities and resources is the ﬁrst step of ABAC. Then

you design ABAC policies to allow operations when the principal's tag matches the tag on the

resource that they are trying to access.

ABAC is helpful in environments that are growing rapidly and helps with situations where policy

management becomes cumbersome.

To control access based on tags, you provide tag information in the condition element of a policy

using the aws:ResourceTag/key-name, aws:RequestTag/key-name, or aws:TagKeys

condition keys.

If a service supports all three condition keys for every resource type, then the value is Yes for the

service. If a service supports all three condition keys for only some resource types, then the value is

Partial.

For more information about ABAC, see What is ABAC? in the IAM User Guide. To view a tutorial with

steps for setting up ABAC, see Use attribute-based access control (ABAC) in the IAM User Guide.

Using temporary credentials with Amazon Polly

Supports temporary credentials: Yes

How Amazon Polly works with IAM 337

Amazon Polly Developer Guide

Some AWS services don't work when you sign in using temporary credentials. For additional

information, including which AWS services work with temporary credentials, see AWS services that

work with IAM in the IAM User Guide.

You are using temporary credentials if you sign in to the AWS Management Console using

any method except a user name and password. For example, when you access AWS using your

company's single sign-on (SSO) link, that process automatically creates temporary credentials. You

also automatically create temporary credentials when you sign in to the console as a user and then

switch roles. For more information about switching roles, see Switching to a role (console) in the

IAM User Guide.

You can manually create temporary credentials using the AWS CLI or AWS API. You can then use

those temporary credentials to access AWS. AWS recommends that you dynamically generate

temporary credentials instead of using long-term access keys. For more information, see

Temporary security credentials in IAM.

Cross-service forward access sessions (FAS) for Amazon Polly

Supports forward access sessions (FAS): Yes

When you use an IAM user or role to perform actions in AWS, you are considered a principal.

When you use some services, you might perform an action that then initiates another action in a

diﬀerent service. FAS uses the permissions of the principal calling an AWS service, combined with

the requesting AWS service to make requests to downstream services. FAS requests are only made

when a service receives a request that requires interactions with other AWS services or resources to

complete. In this case, you must have permissions to perform both actions. For policy details when

making FAS requests, see Forward access sessions.

Service roles for Amazon Polly

Supports service roles: No

A service role is an IAM role that a service assumes to perform actions on your behalf. An IAM

administrator can create, modify, and delete a service role from within IAM. For more information,

see Creating a role to delegate permissions to an AWS service in the IAM User Guide.

Warning

Changing the permissions for a service role might break Amazon Polly functionality. Edit

service roles only when Amazon Polly provides guidance to do so.

How Amazon Polly works with IAM 338

Amazon Polly Developer Guide

Service-linked roles for Amazon Polly

Supports service-linked roles: No

A service-linked role is a type of service role that is linked to an AWS service. The service can

assume the role to perform an action on your behalf. Service-linked roles appear in your AWS

account and are owned by the service. An IAM administrator can view, but not edit the permissions

for service-linked roles.

For details about creating or managing service-linked roles, see AWS services that work with IAM.

Find a service in the table that includes a Yes in the Service-linked role column. Choose the Yes

link to view the service-linked role documentation for that service.

Amazon Polly IAM roles

You can attach an identity-based permissions policy to an IAM role to grant cross-account

permissions. For example, the administrator in account A can create a role to grant cross-account

permissions to another AWS account (for example, account B) or an AWS service as follows:

1. Account A administrator creates an IAM role and attaches a permissions policy to the role that

grants permissions on resources in account A.

2. Account A administrator attaches a trust policy to the role identifying account B as the principal

who can assume the role.

3. Account B administrator can then delegate permissions to assume the role to any users in

account B. Doing this allows users in account B to create or access resources in account A. The

principal in the trust policy can also be an AWS service principal if you want to grant an AWS

service permissions to assume the role.

For more information about using IAM to delegate permissions, see Access Management in the IAM

User Guide.

The following is an example policy that grants permissions to put and get lexicons as well as to list

those lexicons currently available.

Amazon Polly supports Identity-based policies for actions at the resource-level. In some

cases, the resource can be limited by an ARN. This is true for the SynthesizeSpeech,

StartSpeechSynthesisTask, PutLexicon, GetLexicon, and DeleteLexicon operations.

In these cases, the Resource value is indicated by the ARN. For example: arn:aws:polly:us-

How Amazon Polly works with IAM 339

Amazon Polly Developer Guide

east-2:account-id:lexicon/* as the Resource value speciﬁes permissions on all owned

lexicons within the us-east-2 Region.

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "AllowPut-Get-ListActions",

"Effect": "Allow",

"Action": [

"polly:PutLexicon",

"polly:GetLexicon",

"polly:ListLexicons"],

"Resource": "arn:aws:polly:us-east-2:account-id:lexicon/*"

}

]

}

However, not all operations use ARNs. This is the case with the DescribeVoices, ListLexicons,

GetSpeechSynthesisTasks, and ListSpeechSynthesisTasks operations.

For more information about users, groups, roles, and permissions, see Identities (Users, Groups, and

Roles) in the IAM User Guide.

Identity-based policy examples for Amazon Polly

By default, users and roles don't have permission to create or modify Amazon Polly resources. They

also can't perform tasks by using the AWS Management Console, AWS Command Line Interface

(AWS CLI), or AWS API. To grant users permission to perform actions on the resources that they

need, an IAM administrator can create IAM policies. The administrator can then add the IAM

policies to roles, and users can assume the roles.

To learn how to create an IAM identity-based policy by using these example JSON policy

documents, see Creating IAM policies in the IAM User Guide.

For details about actions and resource types deﬁned by Amazon Polly, including the format of the

ARNs for each of the resource types, see Actions, resources, and condition keys for Amazon Polly in

the Service Authorization Reference.

Topics

• Policy best practices

Identity-based policy examples 340

Amazon Polly Developer Guide

• Using the Amazon Polly console

• Allow users to view their own permissions

• AWS managed (predeﬁned) policies for Amazon Polly

• Customer-managed policy examples

Policy best practices

Identity-based policies determine whether someone can create, access, or delete Amazon Polly

resources in your account. These actions can incur costs for your AWS account. When you create or

edit identity-based policies, follow these guidelines and recommendations:

• Get started with AWS managed policies and move toward least-privilege permissions – To

get started granting permissions to your users and workloads, use the AWS managed policies

that grant permissions for many common use cases. They are available in your AWS account. We

recommend that you reduce permissions further by deﬁning AWS customer managed policies

that are speciﬁc to your use cases. For more information, see AWS managed policies or AWS

managed policies for job functions in the IAM User Guide.

• Apply least-privilege permissions – When you set permissions with IAM policies, grant only the

permissions required to perform a task. You do this by deﬁning the actions that can be taken on

speciﬁc resources under speciﬁc conditions, also known as least-privilege permissions. For more

information about using IAM to apply permissions, see Policies and permissions in IAM in the

IAM User Guide.

• Use conditions in IAM policies to further restrict access – You can add a condition to your

policies to limit access to actions and resources. For example, you can write a policy condition to

specify that all requests must be sent using SSL. You can also use conditions to grant access to

service actions if they are used through a speciﬁc AWS service, such as AWS CloudFormation. For

more information, see IAM JSON policy elements: Condition in the IAM User Guide.

• Use IAM Access Analyzer to validate your IAM policies to ensure secure and functional

permissions – IAM Access Analyzer validates new and existing policies so that the policies

adhere to the IAM policy language (JSON) and IAM best practices. IAM Access Analyzer provides

more than 100 policy checks and actionable recommendations to help you author secure and

functional policies. For more information, see IAM Access Analyzer policy validation in the IAM

User Guide.

• Require multi-factor authentication (MFA) – If you have a scenario that requires IAM users

or a root user in your AWS account, turn on MFA for additional security. To require MFA when

Identity-based policy examples 341

Amazon Polly Developer Guide

API operations are called, add MFA conditions to your policies. For more information, see

Conﬁguring MFA-protected API access in the IAM User Guide.

For more information about best practices in IAM, see Security best practices in IAM in the IAM User

Guide.

Using the Amazon Polly console

To access the Amazon Polly console, you must have a minimum set of permissions. These

permissions must allow you to list and view details about the Amazon Polly resources in your AWS

account. If you create an identity-based policy that is more restrictive than the minimum required

permissions, the console won't function as intended for entities (users or roles) with that policy.

You don't need to allow minimum console permissions for users that are making calls only to the

AWS CLI or the AWS API. Instead, allow access to only the actions that match the API operation

that they're trying to perform.

To ensure that users and roles can still use the Amazon Polly console, also attach the Amazon Polly

ConsoleAccess or ReadOnly AWS managed policy to the entities. For more information, see

Adding permissions to a user in the IAM User Guide.

To use the Amazon Polly console, grant permissions to all the Amazon Polly APIs. There are no

additional permissions needed. To get full console functionality you can use following policy:.

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "Console-AllowAllPollyActions",

"Effect": "Allow",

"Action": [

"polly:*"],

"Resource": "*"

}

]

}

Allow users to view their own permissions

This example shows how you might create a policy that allows IAM users to view the inline and

managed policies that are attached to their user identity. This policy includes permissions to

complete this action on the console or programmatically using the AWS CLI or AWS API.

Identity-based policy examples 342

Amazon Polly Developer Guide

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "ViewOwnUserInfo",

"Effect": "Allow",

"Action": [

"iam:GetUserPolicy",

"iam:ListGroupsForUser",

"iam:ListAttachedUserPolicies",

"iam:ListUserPolicies",

"iam:GetUser"

"Resource": ["arn:aws:iam::*:user/${aws:username}"]

{

"Sid": "NavigateInConsole",

"Effect": "Allow",

"Action": [

"iam:GetGroupPolicy",

"iam:GetPolicyVersion",

"iam:GetPolicy",

"iam:ListAttachedGroupPolicies",

"iam:ListGroupPolicies",

"iam:ListPolicyVersions",

"iam:ListPolicies",

"iam:ListUsers"

"Resource": "*"

}

]

}

AWS managed (predeﬁned) policies for Amazon Polly

AWS addresses many common use cases by providing standalone IAM policies that are created

and administered by AWS. These AWS managed policies grant necessary permissions for common

use cases so that you can avoid having to investigate what permissions are needed. For more

information, see AWS Managed Policies in the IAM User Guide.

The following AWS managed policies, which you can attach to users in your account, are speciﬁc to

Amazon Polly:

Identity-based policy examples 343

Amazon Polly Developer Guide

• AmazonPollyReadOnlyAccess – Grants read-only access to resources, allows listing lexicons,

fetching lexicons, listing available voices and synthesizing speech (including, applying lexicons to

the synthesized speech).

• AmazonPollyFullAccess – Grants full access to resources and all the supported operations.

Note

You can review these permissions policies by signing in to the IAM console and searching

for speciﬁc policies there.

You can also create your own custom IAM policies to allow permissions for Amazon Polly actions

and resources. You can attach these custom policies to the IAM users or groups that require those

permissions.

Customer-managed policy examples

In this section, you can ﬁnd example user policies that grant permissions for various Amazon Polly

actions. These policies work when you're using AWS SDKs or the AWS CLI. When you're using the

console, grant permissions to all the Amazon Polly APIs.

Note

All examples use the us-east-2 Region and contain ﬁctitious account IDs.

Examples

• Example 1: Allow All Amazon Polly Actions

• Example 2: Allow all Amazon Polly actions except DeleteLexicon

• Example 3: Allow DeleteLexicon

• Example 4: Allow Delete Lexicon in a speciﬁed Region

• Example 5: Allow DeleteLexicon for speciﬁed Lexicon

Example 1: Allow All Amazon Polly Actions

After you sign up (see Setting up Amazon Polly) create an administrator user to manage your

account, including creating users and managing their permissions.

Identity-based policy examples 344

Amazon Polly Developer Guide

You might create a user who has permissions for all Amazon Polly actions. Think of this user as

a service-speciﬁc administrator for working with Amazon Polly. You can attach the following

permissions policy to this user.

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "AllowAllPollyActions",

"Effect": "Allow",

"Action": [

"polly:*"],

"Resource": "*"

}

]

}

Example 2: Allow all Amazon Polly actions except DeleteLexicon

The following permissions policy grants the user permissions to perform all actions except

DeleteLexicon, with the permissions for delete explicitly denied in all Regions.

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "AllowAllActions-DenyDelete",

"Effect": "Allow",

"Action": [

"polly:DescribeVoices",

"polly:GetLexicon",

"polly:PutLexicon",

"polly:SynthesizeSpeech",

"polly:ListLexicons"],

"Resource": "*"

}

{

"Sid": "DenyDeleteLexicon",

"Effect": "Deny",

"Action": [

"polly:DeleteLexicon"],

"Resource": "*"

}

]

Identity-based policy examples 345

Amazon Polly Developer Guide

}

Example 3: Allow DeleteLexicon

The following permissions policy grants the user permissions to delete any lexicon that you own

regardless of the project or Region in which it is located.

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "AllowDeleteLexicon",

"Effect": "Allow",

"Action": [

"polly:DeleteLexicon"],

"Resource": "*"

}

]

}

Example 4: Allow Delete Lexicon in a speciﬁed Region

The following permissions policy grants the user permissions to delete any lexicon in any project

that you own that is located in a single Region (in this case, us-east-2).

{

"Version": "2012-10-17",

"Statement": [{

"Sid": "AllowDeleteSpecifiedRegion",

"Effect": "Allow",

"Action": [

"polly:DeleteLexicon"],

"Resource": "arn:aws:polly:us-east-2:123456789012:lexicon/*"

}

]

}

Example 5: Allow DeleteLexicon for speciﬁed Lexicon

The following permissions policy grants the user permissions to delete a speciﬁc lexicon that you

own (in this case, myLexicon) in a speciﬁc Region (in this case, us-east-2).

{

Identity-based policy examples 346

Amazon Polly Developer Guide

"Version": "2012-10-17",

"Statement": [{

"Sid": "AllowDeleteForSpecifiedLexicon",

"Effect": "Allow",

"Action": [

"polly:DeleteLexicon"],

"Resource": "arn:aws:polly:us-east-2:123456789012:lexicon/myLexicon"

}

]

}

Amazon Polly API Permissions: Actions, Permissions, and Resources

Reference

When you're setting up a permissions policy that you can attach to an IAM identity (identity-based

policies), you can use the following list as a reference. The list includes each Amazon Polly API

operation, the corresponding actions for which you can grant permissions to perform the action,

and the AWS resource for which you can grant the permissions. You specify the actions in the

policy's Action ﬁeld, and you specify the resource value in the policy's Resource ﬁeld.

You can use AWS-wide condition keys in your Amazon Polly policies to express conditions. For a

complete list of AWS-wide keys, see available keys in the IAM User Guide.

Note

To specify an action, use the polly preﬁx followed by the API operation name (for

example, polly:GetLexicon).

Amazon Polly supports Identity-based policies for actions at the resource-level. Therefore, the

Resource value is indicated by the ARN. For example: arn:aws:polly:us-east-2:account-

id:lexicon/* as the Resource value speciﬁes permissions on all owned lexicons within the us-

east-2 Region.

Because Amazon Polly doesn't support permissions for actions at the resource-level, most

policies specify a wildcard character (*) as the Resource value. However, if it is necessary to limit

permissions to a speciﬁc Region this wildcard character is replaced with the appropriate ARN:

arn:aws:polly:region:account-id:lexicon/*.

Amazon Polly API Permissions Reference 347

Amazon Polly Developer Guide

Amazon Polly API and Required Permissions for Actions

API Operation: DeleteLexicon

Required Permissions (API Action): polly:DeleteLexicon

Resources: arn:aws:polly:region:account-id:lexicon/LexiconName

API Operation: DescribeVoices

Required Permissions (API Action): polly:DescribeVoices

Resources: arn:aws:polly:region:account-id:lexicon/voice-name

API Operation: GetLexicon

Required Permissions (API Action): polly:GetLexicon

Resources: arn:aws:polly:region:account-id:lexicon/voice-name

API Operation: ListLexicons

Required Permissions (API Action): polly:ListLexicons

Resources: arn:aws:polly:region:account-id:lexicon/*

API Operation: PutLexicon

Required Permissions (API Action): polly:ListLexicons

Resources: *

API Operation: SynthesizeSpeech

Required Permissions (API Action): polly:SynthesizeSpeech

Resources: *

Troubleshooting Amazon Polly identity and access

Use the following information to help you diagnose and ﬁx common issues that you might

encounter when working with Amazon Polly and IAM.

Topics

Troubleshooting 348

Amazon Polly Developer Guide

• I am not authorized to perform an action in Amazon Polly

• I am not authorized to perform iam:PassRole

• I want to allow people outside of my AWS account to access my Amazon Polly resources

I am not authorized to perform an action in Amazon Polly

If you receive an error that you're not authorized to perform an action, your policies must be

updated to allow you to perform the action.

The following example error occurs when the mateojackson IAM user tries to use the console

to view details about a ﬁctional my-example-widget resource but doesn't have the ﬁctional

polly:GetWidget permissions.

User: arn:aws:iam::123456789012:user/mateojackson is not authorized to perform:

polly:GetWidget on resource: my-example-widget

In this case, the policy for the mateojackson user must be updated to allow access to the my-

example-widget resource by using the polly:GetWidget action.

If you need help, contact your AWS administrator. Your administrator is the person who provided

you with your sign-in credentials.

I am not authorized to perform iam:PassRole

If you receive an error that you're not authorized to perform the iam:PassRole action, your

policies must be updated to allow you to pass a role to Amazon Polly.

Some AWS services allow you to pass an existing role to that service instead of creating a new

service role or service-linked role. To do this, you must have permissions to pass the role to the

service.

The following example error occurs when an IAM user named marymajor tries to use the console

to perform an action in Amazon Polly. However, the action requires the service to have permissions

that are granted by a service role. Mary does not have permissions to pass the role to the service.

User: arn:aws:iam::123456789012:user/marymajor is not authorized to perform:

iam:PassRole

In this case, Mary's policies must be updated to allow her to perform the iam:PassRole action.

Troubleshooting 349

Amazon Polly Developer Guide

If you need help, contact your AWS administrator. Your administrator is the person who provided

you with your sign-in credentials.

I want to allow people outside of my AWS account to access my Amazon Polly

resources

You can create a role that users in other accounts or people outside of your organization can use to

access your resources. You can specify who is trusted to assume the role. For services that support

resource-based policies or access control lists (ACLs), you can use those policies to grant people

access to your resources.

To learn more, consult the following:

• To learn whether Amazon Polly supports these features, see How Amazon Polly works with IAM.

• To learn how to provide access to your resources across AWS accounts that you own, see

Providing access to an IAM user in another AWS account that you own in the IAM User Guide.

• To learn how to provide access to your resources to third-party AWS accounts, see Providing

access to AWS accounts owned by third parties in the IAM User Guide.

• To learn how to provide access through identity federation, see Providing access to externally

authenticated users (identity federation) in the IAM User Guide.

• To learn the diﬀerence between using roles and resource-based policies for cross-account access,

see Cross account resource access in IAM in the IAM User Guide.

Logging and Monitoring in Amazon Polly

Monitoring is an important part of maintaining the reliability, availability, and performance of your

Amazon Polly applications. To monitor Amazon Polly API calls, you can use AWS CloudTrail. To

monitor the status of your jobs, use Amazon CloudWatch Logs.

• Amazon CloudWatch Alarms – Using CloudWatch alarms, you watch a single metric over a

time period that you specify. If the metric exceeds a given threshold, a notiﬁcation is sent to

an Amazon Simple Notiﬁcation Service topic or AWS Auto Scaling policy. CloudWatch alarms

don't invoke actions when a metric is in a particular state. Rather the state must have changed

and been maintained for a speciﬁed number of periods. For more information, see Integrating

CloudWatch with Amazon Polly.

• CloudTrail logs – CloudTrail provides a record of actions taken by a user, role, or an AWS

service in Amazon Polly. Using the information collected by CloudTrail, you can determine the

Logging and Monitoring 350

Amazon Polly Developer Guide

request that was made to Amazon Polly. You can also determine the IP address from which the

request was made, who made the request, when it was made, and additional details. For more

information, see Logging Amazon Polly API calls with AWS CloudTrail.

Compliance Validation for Amazon Polly

Third-party auditors assess the security and compliance of Amazon Polly as part of multiple AWS

compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others.

For a list of AWS services in scope of speciﬁc compliance programs, see AWS Services in Scope by

Compliance Program. For general information, see AWS Compliance Programs.

You can download third-party audit reports using AWS Artifact. For more information, see

Downloading Reports in AWS Artifact.

Your compliance responsibility when using Amazon Polly is determined by the sensitivity of your

data, your company's compliance objectives, and applicable laws and regulations. AWS provides the

following resources to help with compliance:

• Security and Compliance Quick Start Guides – These deployment guides discuss architectural

considerations and provide steps for deploying security- and compliance-focused baseline

environments on AWS.

• Architecting for HIPAA Security and Compliance Whitepaper – This whitepaper describes how

companies can use AWS to create HIPAA-compliant applications.

• AWS Compliance Resources – This collection of workbooks and guides might apply to your

industry and location.

• Evaluating Resources with Rules in the AWS Conﬁg Developer Guide – The AWS Conﬁg service

assesses how well your resource conﬁgurations comply with internal practices, industry

guidelines, and regulations.

• AWS Security Hub – This AWS service provides a comprehensive view of your security state within

AWS that helps you check your compliance with security industry standards and best practices.

Resilience in Amazon Polly

The AWS global infrastructure is built around AWS Regions and Availability Zones. AWS Regions

provide multiple physically separated and isolated Availability Zones, which are connected with

Compliance Validation 351

Amazon Polly Developer Guide

low-latency, high-throughput, and highly redundant networking. With Availability Zones, you

can design and operate applications and databases that automatically fail over between zones

without interruption. Availability Zones are more highly available, fault tolerant, and scalable than

traditional single or multiple data center infrastructures.

For more information about AWS Regions and Availability Zones, see AWS Global Infrastructure.

Infrastructure Security in Amazon Polly

As a managed service, Amazon Polly is protected by the AWS global network security procedures

that are described in the Amazon Web Services: Overview of Security Processes whitepaper.

You use AWS published API calls to access Amazon Polly through the network. Clients must

support Transport Layer Security (TLS) 1.0 or later. We recommend TLS 1.2 or later. Clients must

also support cipher suites with perfect forward secrecy (PFS) such as Ephemeral Diﬃe-Hellman

(DHE) or Elliptic Curve Ephemeral Diﬃe-Hellman (ECDHE). Most modern systems such as Java 7

and later support these modes.

Additionally, requests must be signed by using an access key ID and a secret access key that is

associated with an IAM principal. Or you can use the AWS Security Token Service (AWS STS) to

generate temporary security credentials to sign requests.

Security Best Practices for Amazon Polly

Your trust, privacy, and the security of your content are our highest priorities. We implement

responsible and sophisticated technical and physical controls designed to prevent unauthorized

access to, or disclosure of, your content and ensure that our use complies with our commitments to

you. For more information, see AWS Data Privacy FAQ.

Amazon Polly does not retain the the content of text submissions.

For a broad view of AWS security, including compliance, penetration testing, bulletins, and

resources, visit the AWS Cloud Security website.

Using Amazon Polly with interface VPC endpoints

If you use Amazon Virtual Private Cloud (Amazon VPC) to host your AWS resources, you can

establish a private connection between your VPC and Amazon Polly. You can use this connection to

synthesize speech with Amazon Polly without traversing the public internet.

Infrastructure Security 352

Amazon Polly Developer Guide

Amazon VPC is an AWS service that you can use to launch AWS resources in a virtual network that

you deﬁne. With a VPC, you have control over your network settings, such the IP address range,

subnets, route tables, and network gateways. To connect your VPC to Amazon Polly, you deﬁne an

interface VPC endpoint for Amazon Polly. This type of endpoint enables you to connect your VPC

to AWS services. The endpoint provides reliable, scalable connectivity to Amazon Polly without

requiring an internet gateway, network address translation (NAT) instance, or VPN connection. For

more information, see the What is Amazon VPC in the Amazon VPC User Guide.

Interface VPC endpoints are powered by AWS PrivateLink, an AWS technology that enables private

communication between AWS services using an elastic network interface with private IP addresses.

For more information, see New - AWS PrivateLink for AWS services.

The following steps are for users of Amazon VPC. For more information, see Getting Started in the

Amazon VPC User Guide.

Availability

VPC endpoints are supported in all the Regions where Amazon Polly is supported. For more

information about AWS Regions and Availability Zones, see AWS Global Infrastructure.

Creating a VPC endpoint for Amazon Polly

To start using Amazon Polly with your VPC, create an interface VPC endpoint for Amazon Polly.

The service to choose is com.amazonaws.Region.polly. You don't need to change any settings for

Amazon Polly. For more information, see Creating an Interface Endpoint in the Amazon VPC User

Guide.

Testing the connection between your VPC and Amazon Polly

After you create the endpoint, you can test the connection.

To test the connection between your VPC and your Amazon Polly endpoint

1. Connect to an Amazon EC2 instance that resides in your VPC. For information about connecting,

see Connect to your Linux instance or Connecting to your Windows instance in the Amazon EC2

documentation.

From the instance, use aws polly describe-voices from the AWS CLI to list available

Amazon Polly voices.

Availability 353

Amazon Polly Developer Guide

If the response to the command includes the list of available Amazon Polly voices, the command

has succeeded, and your VPC endpoint is working.

Controlling access to your Amazon Polly endpoint

A VPC endpoint policy is an IAM resource policy that you attach to an endpoint when you create or

modify the endpoint. If you don't attach a policy when you create an endpoint, we attach a default

policy for you that allows full access to the service. An endpoint policy doesn't override or replace

IAM user policies or service-speciﬁc policies. It's a separate policy for controlling access from the

endpoint to the speciﬁed service.

Endpoint policies must be written in JSON format.

For more information, see Controlling Access to Services with VPC Endpoints in the Amazon VPC

User Guide.

The following is an example of an endpoint policy for Amazon Polly. This policy enables users

connecting to Amazon Polly through the VPC to describe voices and synthesize speech with

Amazon Polly, and prevents them from performing other Amazon Polly actions.

{

"Statement": [

{

"Sid": "SynthesisAndDescribeVoicesOnly",

"Principal": "*",

"Action": [

"polly:DescribeVoices",

"polly:SynthesizeSpeech"

"Effect": "Allow",

"Resource": "*"

}

]

}

To modify the VPC endpoint policy for Amazon Polly

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc.

2. In the navigation pane, choose Endpoints.

3. If you have not already created the endpoint for Amazon Polly, choose Create endpoint. Then

select com.amazonaws.Region.polly and choose Create endpoint.

Controlling access to your Amazon Polly endpoint 354

Amazon Polly Developer Guide

Select the com.amazonaws.Region.polly endpoint, and choose the Policy tab in the lower half

of the screen.

5. Choose Edit Policy and make the changes to the policy.

Support for VPC context keys

Amazon Polly supports the aws:SourceVpc and aws:SourceVpce context keys that can limit

access to speciﬁc VPCs or speciﬁc VPC endpoints. These keys work only when the user is using VPC

endpoints. For more information, see Keys Available for Some Services in the IAM user Guide.

Support for VPC context keys 355

Amazon Polly Developer Guide

Logging Amazon Polly API calls with AWS CloudTrail

Amazon Polly is integrated with AWS CloudTrail, a service that provides a record of actions taken

by a user, role, or an AWS service in Amazon Polly. CloudTrail captures all API calls for Amazon

Polly as events. The calls captured include calls from the Amazon Polly console and code calls

to the Amazon Polly API operations. If you create a trail, you can enable continuous delivery

of CloudTrail events to an Amazon S3 bucket, including events for Amazon Polly. If you don't

conﬁgure a trail, you can still view the most recent events in the CloudTrail console in Event

history. Using the information collected by CloudTrail, you can determine the request that was

made to Amazon Polly, the IP address from which the request was made, who made the request,

when it was made, and additional details.

To learn more about CloudTrail, including how to conﬁgure and enable it, see the AWS CloudTrail

User Guide.

Amazon Polly information in CloudTrail

CloudTrail is enabled on your AWS account when you create the account. When supported event

activity occurs in Amazon Polly, that activity is recorded in a CloudTrail event along with other AWS

service events in Event history. You can view, search, and download recent events in your AWS

account. For more information, see Viewing Events with CloudTrail Event History.

For an ongoing record of events in your AWS account, including events for Amazon Polly, create

a trail. A trail enables CloudTrail to deliver log ﬁles to an Amazon S3 bucket. By default, when

you create a trail in the console, the trail applies to all AWS Regions. The trail logs events from all

Regions in the AWS partition and delivers the log ﬁles to the Amazon S3 bucket that you specify.

Additionally, you can conﬁgure other AWS services to further analyze and act upon the event data

collected in CloudTrail logs. For more information, see the following:

• Overview for Creating a Trail

• CloudTrail Supported Services and Integrations

• Conﬁguring Amazon SNS Notiﬁcations for CloudTrail

• Receiving CloudTrail Log Files from Multiple Regions and Receiving CloudTrail Log Files from

Multiple Accounts

Amazon Polly supports logging the following actions as events in CloudTrail log ﬁles:

Amazon Polly information in CloudTrail 356

Amazon Polly Developer Guide

• DeleteLexicon

• DescribeVoices

• GetLexicon

• GetSpeechSynthesisTask

• ListLexicons

• ListSpeechSynthesisTasks

• PutLexicon

• StartSpeechSynthesisTask

• SynthesizeSpeech

Every event or log entry contains information about who generated the request. The identity

information helps you determine the following:

• Whether the request was made with root user or AWS Identity and Access Management (IAM)

user credentials.

• Whether the request was made with temporary security credentials for a role or federated user.

• Whether the request was made by another AWS service.

For more information, see the CloudTrail userIdentity Element.

Example: Amazon Polly Log File Entries

A trail is a conﬁguration that enables delivery of events as log ﬁles to an Amazon S3 bucket that

you specify. CloudTrail log ﬁles contain one or more log entries. An event represents a single

request from any source and includes information about the requested action, the date and time of

the action, request parameters, and so on. CloudTrail log ﬁles aren't an ordered stack trace of the

public API calls, so they don't appear in any speciﬁc order.

The following example shows a CloudTrail log entry that demonstrates the SynthesizeSpeech.

{

"Records": [

{

"awsRegion": "us-east-2",

"eventID": "19bd70f7-5e60-4cdc-9825-936c552278ae",

Example: Amazon Polly Log File Entries 357

Amazon Polly Developer Guide

"eventName": "SynthesizeSpeech",

"eventSource": "polly.amazonaws.com",

"eventTime": "2016-11-02T03:49:39Z",

"eventType": "AwsApiCall",

"eventVersion": "1.05",

"recipientAccountId": "123456789012",

"requestID": "414288c2-a1af-11e6-b17f-d7cfc06cb461",

"requestParameters": {

"lexiconNames": [

"SampleLexicon"

"engine": "neural",

"outputFormat": "mp3",

"sampleRate": "22050",

"text": "**********",

"textType": "text",

"voiceId": "Kendra"

"responseElements": null,

"sourceIPAddress": "1.2.3.4",

"userAgent": "Amazon CLI/Polly 1.10 API 2016-06-10",

"userIdentity": {

"accessKeyId": "EXAMPLE_KEY_ID",

"accountId": "123456789012",

"arn": "arn:aws:iam::123456789012:user/Alice",

"principalId": "EX_PRINCIPAL_ID",

"type": "IAMUser",

"userName": "Alice"

}

]

}

Example: Amazon Polly Log File Entries 358

Amazon Polly Developer Guide

Integrating CloudWatch with Amazon Polly

When you interact with Amazon Polly, it sends the following metrics and dimensions to

CloudWatch every minute. You can use the following procedures to view the metrics for Amazon

Polly.

You can monitor Amazon Polly using CloudWatch, which collects and processes raw data from

Amazon Polly into readable, near real-time metrics. These statistics are recorded for a period of

two weeks, so that you can access historical information and gain a better perspective on

how your web application or service is performing. By default, Amazon Polly metric data is sent to

CloudWatch in 1 minute intervals. For more information, see What Is Amazon CloudWatch in the

Amazon CloudWatch User Guide.

Getting CloudWatch Metrics (Console)

1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

2. In the navigation pane, choose Metrics.

3. In the CloudWatch Metrics by Category pane, under the metrics category for Amazon Polly,

select a metrics category, and then in the upper pane, scroll down to view the full list of

metrics.

Getting CloudWatch metrics on the AWS CLI

The following code display available metrics for Amazon Polly.

aws cloudwatch list-metrics --namespace "AWS/Polly"

The preceding command returns a list of Amazon Polly metrics similar to the following. The

MetricName element identiﬁes what the metric is.

{

"Metrics": [

{

"Namespace": "AWS/Polly",

"Dimensions": [

{

"Name": "Operation",

Getting CloudWatch Metrics (Console) 359

Amazon Polly Developer Guide

"Value": "SynthesizeSpeech"

}

"MetricName": "ResponseLatency"

{

"Namespace": "AWS/Polly",

"Dimensions": [

{

"Name": "Operation",

"Value": "SynthesizeSpeech"

}

"MetricName": "RequestCharacters"

}

For more information, see GetMetricStatistics in the Amazon CloudWatch API Reference.

Amazon Polly Metrics

Amazon Polly produces the following metrics for each request. These metrics are aggregated and

in one minute intervals sent to CloudWatch where they are available.

Metric Description

RequestCharacters

The number of characters in the request. This is

billable characters only and does not include SSML

tags.

Valid Dimension: Operation

Valid Statistics: Minimum, Maximum, Average,

SampleCount, Sum

Unit: Count

ResponseLatency

The latency between when the request was made

and the start of the streaming response.

Valid Dimensions: Operation

Amazon Polly Metrics 360

Amazon Polly Developer Guide

Metric Description

Valid Statistics: Minimum, Maximum, Average,

SampleCount

Unit: milliseconds

2XXCount

HTTP 200 level code returned upon a successful

response.

Valid Dimensions: Operation

Valid Statistics: Average, SampleCount, Sum

Unit: Count

4XXCount

HTTP 400 level error code returned upon an error.

For each successful response, a zero (0) is emitted.

Valid Dimensions: Operation

Valid Statistics: Average, SampleCount, Sum

Unit: Count

5XXCount

HTTP 500 level error code returned upon an error.

For each successful response, a zero (0) is emitted.

Valid Dimensions: Operation

Valid Statistics: Average, SampleCount, Sum

Unit: Count

Dimensions for Amazon Polly Metrics

Amazon Polly metrics use the AWS/Polly namespace and provide metrics for the following

dimension:

Dimensions for Amazon Polly Metrics 361

Amazon Polly Developer Guide

Dimension Description

Operation

Metrics are grouped by the API method they refer

to. Possible values are SynthesizeSpeech ,

PutLexicon , DescribeVoices , etc.

Dimensions for Amazon Polly Metrics 362

Amazon Polly Developer Guide

Amazon Polly API Reference

This section contains the Amazon Polly API reference.

Note

Authenticated API calls must be signed using the Signature Version 4 Signing Process.

For more information, see Signing AWS API Requests in the Amazon Web Services General

Reference.

Topics

• Actions

• Data Types

Actions

The following actions are supported:

• DeleteLexicon

• DescribeVoices

• GetLexicon

• GetSpeechSynthesisTask

• ListLexicons

• ListSpeechSynthesisTasks

• PutLexicon

• StartSpeechSynthesisTask

• SynthesizeSpeech

Actions 363

Amazon Polly Developer Guide

DeleteLexicon

Deletes the speciﬁed pronunciation lexicon stored in an AWS Region. A lexicon which has been

deleted is not available for speech synthesis, nor is it possible to retrieve it using either the

GetLexicon or ListLexicon APIs.

For more information, see Managing Lexicons.

Request Syntax

DELETE /v1/lexicons/LexiconName HTTP/1.1

URI Request Parameters

The request uses the following URI parameters.

LexiconName

The name of the lexicon to delete. Must be an existing lexicon in the region.

Pattern: [0-9A-Za-z]{1,20}

Required: Yes

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200

Response Elements

If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

Errors

LexiconNotFoundException

Amazon Polly can't ﬁnd the speciﬁed lexicon. This could be caused by a lexicon that is missing,

its name is misspelled or specifying a lexicon that is in a diﬀerent region.

DeleteLexicon 364

Amazon Polly Developer Guide

Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is

spelled correctly. Then try again.

HTTP Status Code: 404

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

DeleteLexicon 365

Amazon Polly Developer Guide

DescribeVoices

Returns the list of voices that are available for use when requesting speech synthesis. Each voice

speaks a speciﬁed language, is either male or female, and is identiﬁed by an ID, which is the ASCII

version of the voice name.

When synthesizing speech ( SynthesizeSpeech ), you provide the voice ID for the voice you want

from the list of voices returned by DescribeVoices.

For example, you want your news reader application to read news in a speciﬁc language, but giving

a user the option to choose the voice. Using the DescribeVoices operation you can provide the

user with a list of available voices to select from.

You can optionally specify a language code to ﬁlter the available voices. For example, if you specify

en-US, the operation returns a list of all available US English voices.

This operation requires permissions to perform the polly:DescribeVoices action.

Request Syntax

GET /v1/voices?

Engine=Engine&IncludeAdditionalLanguageCodes=IncludeAdditionalLanguageCodes&LanguageCode=LanguageCode&NextToken=NextToken

HTTP/1.1

URI Request Parameters

The request uses the following URI parameters.

Engine

Speciﬁes the engine (standard, neural, long-form or generative) used by Amazon Polly

when processing input text for speech synthesis.

Valid Values: standard | neural | long-form | generative

IncludeAdditionalLanguageCodes

Boolean value indicating whether to return any bilingual voices that use the speciﬁed language

as an additional language. For instance, if you request all languages that use US English (es-

US), and there is an Italian voice that speaks both Italian (it-IT) and US English, that voice will be

included if you specify yes but not if you specify no.

DescribeVoices 366

Amazon Polly Developer Guide

LanguageCode

The language identiﬁcation tag (ISO 639 code for the language name-ISO 3166 country code)

for ﬁltering the list of voices returned. If you don't specify this optional parameter, all available

voices are returned.

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

NextToken

An opaque pagination token returned from the previous DescribeVoices operation. If

present, this indicates where to continue the listing.

Length Constraints: Minimum length of 0. Maximum length of 4096.

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200

Content-type: application/json

{

"NextToken": "string",

"Voices": [

{

"AdditionalLanguageCodes": [ "string" ],

"Gender": "string",

"Id": "string",

"LanguageCode": "string",

"LanguageName": "string",

"Name": "string",

"SupportedEngines": [ "string" ]

}

]

DescribeVoices 367

Amazon Polly Developer Guide

}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

NextToken

The pagination token to use in the next request to continue the listing of voices. NextToken is

returned only if the response is truncated.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 4096.

Voices

A list of voices with their properties.

Type: Array of Voice objects

Errors

InvalidNextTokenException

The NextToken is invalid. Verify that it's spelled correctly, and then try again.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

DescribeVoices 368

Amazon Polly Developer Guide

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

DescribeVoices 369

Amazon Polly Developer Guide

GetLexicon

Returns the content of the speciﬁed pronunciation lexicon stored in an AWS Region. For more

information, see Managing Lexicons.

Request Syntax

GET /v1/lexicons/LexiconName HTTP/1.1

URI Request Parameters

The request uses the following URI parameters.

LexiconName

Name of the lexicon.

Pattern: [0-9A-Za-z]{1,20}

Required: Yes

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200

Content-type: application/json

{

"Lexicon": {

"Content": "string",

"Name": "string"

"LexiconAttributes": {

"Alphabet": "string",

"LanguageCode": "string",

"LastModified": number,

"LexemesCount": number,

GetLexicon 370

Amazon Polly Developer Guide

"LexiconArn": "string",

"Size": number

}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

Lexicon

Lexicon object that provides name and the string content of the lexicon.

Type: Lexicon object

LexiconAttributes

Metadata of the lexicon, including phonetic alphabetic used, language code, lexicon ARN,

number of lexemes deﬁned in the lexicon, and size of lexicon in bytes.

Type: LexiconAttributes object

Errors

LexiconNotFoundException

Amazon Polly can't ﬁnd the speciﬁed lexicon. This could be caused by a lexicon that is missing,

its name is misspelled or specifying a lexicon that is in a diﬀerent region.

Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is

spelled correctly. Then try again.

HTTP Status Code: 404

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

GetLexicon 371

Amazon Polly Developer Guide

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

GetLexicon 372

Amazon Polly Developer Guide

GetSpeechSynthesisTask

Retrieves a speciﬁc SpeechSynthesisTask object based on its TaskID. This object contains

information about the given speech synthesis task, including the status of the task, and a link to

the S3 bucket containing the output of the task.

Request Syntax

GET /v1/synthesisTasks/TaskId HTTP/1.1

URI Request Parameters

The request uses the following URI parameters.

TaskId

The Amazon Polly generated identiﬁer for a speech synthesis task.

Pattern: ^[a-zA-Z0-9_-]{1,100}$

Required: Yes

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200

Content-type: application/json

{

"SynthesisTask": {

"CreationTime": number,

"Engine": "string",

"LanguageCode": "string",

"LexiconNames": [ "string" ],

"OutputFormat": "string",

"OutputUri": "string",

"RequestCharacters": number,

GetSpeechSynthesisTask 373

Amazon Polly Developer Guide

"SampleRate": "string",

"SnsTopicArn": "string",

"SpeechMarkTypes": [ "string" ],

"TaskId": "string",

"TaskStatus": "string",

"TaskStatusReason": "string",

"TextType": "string",

"VoiceId": "string"

}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

SynthesisTask

SynthesisTask object that provides information from the requested task, including output

format, creation time, task status, and so on.

Type: SynthesisTask object

Errors

InvalidTaskIdException

The provided Task ID is not valid. Please provide a valid Task ID and try again.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

SynthesisTaskNotFoundException

The Speech Synthesis task with requested Task ID cannot be found.

HTTP Status Code: 400

GetSpeechSynthesisTask 374

Amazon Polly Developer Guide

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

GetSpeechSynthesisTask 375

Amazon Polly Developer Guide

ListLexicons

Returns a list of pronunciation lexicons stored in an AWS Region. For more information, see

Managing Lexicons.

Request Syntax

GET /v1/lexicons?NextToken=NextToken HTTP/1.1

URI Request Parameters

The request uses the following URI parameters.

NextToken

An opaque pagination token returned from previous ListLexicons operation. If present,

indicates where to continue the list of lexicons.

Length Constraints: Minimum length of 0. Maximum length of 4096.

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200

Content-type: application/json

{

"Lexicons": [

{

"Attributes": {

"Alphabet": "string",

"LanguageCode": "string",

"LastModified": number,

"LexemesCount": number,

"LexiconArn": "string",

"Size": number

"Name": "string"

}

ListLexicons 376

Amazon Polly Developer Guide

"NextToken": "string"

}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

Lexicons

A list of lexicon names and attributes.

Type: Array of LexiconDescription objects

NextToken

The pagination token to use in the next request to continue the listing of lexicons. NextToken

is returned only if the response is truncated.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 4096.

Errors

InvalidNextTokenException

The NextToken is invalid. Verify that it's spelled correctly, and then try again.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

ListLexicons 377

Amazon Polly Developer Guide

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

ListLexicons 378

Amazon Polly Developer Guide

ListSpeechSynthesisTasks

Returns a list of SpeechSynthesisTask objects ordered by their creation date. This operation can

ﬁlter the tasks by their status, for example, allowing users to list only tasks that are completed.

Request Syntax

GET /v1/synthesisTasks?MaxResults=MaxResults&NextToken=NextToken&Status=Status HTTP/1.1

URI Request Parameters

The request uses the following URI parameters.

MaxResults

Maximum number of speech synthesis tasks returned in a List operation.

Valid Range: Minimum value of 1. Maximum value of 100.

NextToken

The pagination token to use in the next request to continue the listing of speech synthesis

tasks.

Length Constraints: Minimum length of 0. Maximum length of 4096.

Status

Status of the speech synthesis tasks returned in a List operation

Valid Values: scheduled | inProgress | completed | failed

Request Body

The request does not have a request body.

Response Syntax

HTTP/1.1 200

Content-type: application/json

ListSpeechSynthesisTasks 379

Amazon Polly Developer Guide

{

"NextToken": "string",

"SynthesisTasks": [

{

"CreationTime": number,

"Engine": "string",

"LanguageCode": "string",

"LexiconNames": [ "string" ],

"OutputFormat": "string",

"OutputUri": "string",

"RequestCharacters": number,

"SampleRate": "string",

"SnsTopicArn": "string",

"SpeechMarkTypes": [ "string" ],

"TaskId": "string",

"TaskStatus": "string",

"TaskStatusReason": "string",

"TextType": "string",

"VoiceId": "string"

}

]

}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

NextToken

An opaque pagination token returned from the previous List operation in this request. If

present, this indicates where to continue the listing.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 4096.

SynthesisTasks

List of SynthesisTask objects that provides information from the speciﬁed task in the list

request, including output format, creation time, task status, and so on.

Type: Array of SynthesisTask objects

ListSpeechSynthesisTasks 380

Amazon Polly Developer Guide

Errors

InvalidNextTokenException

The NextToken is invalid. Verify that it's spelled correctly, and then try again.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

ListSpeechSynthesisTasks 381

Amazon Polly Developer Guide

PutLexicon

Stores a pronunciation lexicon in an AWS Region. If a lexicon with the same name already exists

in the region, it is overwritten by the new lexicon. Lexicon operations have eventual consistency,

therefore, it might take some time before the lexicon is available to the SynthesizeSpeech

operation.

For more information, see Managing Lexicons.

Request Syntax

PUT /v1/lexicons/LexiconName HTTP/1.1

Content-type: application/json

{

"Content": "string"

}

URI Request Parameters

The request uses the following URI parameters.

LexiconName

Name of the lexicon. The name must follow the regular express format [0-9A-Za-z]{1,20}. That

is, the name is a case-sensitive alphanumeric string up to 20 characters long.

Pattern: [0-9A-Za-z]{1,20}

Required: Yes

Request Body

The request accepts the following data in JSON format.

Content

Content of the PLS lexicon as string data.

Type: String

Required: Yes

PutLexicon 382

Amazon Polly Developer Guide

Response Syntax

HTTP/1.1 200

Response Elements

If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

Errors

InvalidLexiconException

Amazon Polly can't ﬁnd the speciﬁed lexicon. Verify that the lexicon's name is spelled correctly,

and then try again.

HTTP Status Code: 400

LexiconSizeExceededException

The maximum size of the speciﬁed lexicon would be exceeded by this operation.

HTTP Status Code: 400

MaxLexemeLengthExceededException

The maximum size of the lexeme would be exceeded by this operation.

HTTP Status Code: 400

MaxLexiconsNumberExceededException

The maximum number of lexicons would be exceeded by this operation.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

UnsupportedPlsAlphabetException

The alphabet speciﬁed by the lexicon is not a supported alphabet. Valid values are x-sampa

and ipa.

PutLexicon 383

Amazon Polly Developer Guide

HTTP Status Code: 400

UnsupportedPlsLanguageException

The language speciﬁed in the lexicon is unsupported. For a list of supported languages, see

Lexicon Attributes.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

PutLexicon 384

Amazon Polly Developer Guide

StartSpeechSynthesisTask

Allows the creation of an asynchronous synthesis task, by starting a new SpeechSynthesisTask.

This operation requires all the standard information needed for speech synthesis, plus the name

of an Amazon S3 bucket for the service to store the output of the synthesis task and two optional

parameters (OutputS3KeyPrefix and SnsTopicArn). Once the synthesis task is created, this

operation will return a SpeechSynthesisTask object, which will include an identiﬁer of this task

as well as the current status. The SpeechSynthesisTask object is available for 72 hours after

starting the asynchronous synthesis task.

Request Syntax

POST /v1/synthesisTasks HTTP/1.1

Content-type: application/json

{

"Engine": "string",

"LanguageCode": "string",

"LexiconNames": [ "string" ],

"OutputFormat": "string",

"OutputS3BucketName": "string",

"OutputS3KeyPrefix": "string",

"SampleRate": "string",

"SnsTopicArn": "string",

"SpeechMarkTypes": [ "string" ],

"Text": "string",

"TextType": "string",

"VoiceId": "string"

}

URI Request Parameters

The request does not use any URI parameters.

Request Body

The request accepts the following data in JSON format.

StartSpeechSynthesisTask 385

Amazon Polly Developer Guide

Engine

Speciﬁes the engine (standard, neural, long-form or generative) for Amazon Polly to

use when processing input text for speech synthesis. Using a voice that is not supported for the

engine selected will result in an error.

Type: String

Valid Values: standard | neural | long-form | generative

Required: No

LanguageCode

Optional language code for the Speech Synthesis request. This is only necessary if using a

bilingual voice, such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN).

If a bilingual voice is used and no language code is speciﬁed, Amazon Polly uses the default

language of the bilingual voice. The default language for any voice is the one returned by the

DescribeVoices operation for the LanguageCode parameter. For example, if no language code

is speciﬁed, Aditi will use Indian English rather than Hindi.

Type: String

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

Required: No

LexiconNames

List of one or more pronunciation lexicon names you want the service to apply during synthesis.

Lexicons are applied only if the language of the lexicon is the same as the language of the voice.

Type: Array of strings

Array Members: Maximum number of 5 items.

Pattern: [0-9A-Za-z]{1,20}

StartSpeechSynthesisTask 386

Amazon Polly Developer Guide

Required: No

OutputFormat

The format in which the returned output will be encoded. For audio stream, this will be mp3,

ogg_vorbis, or pcm. For speech marks, this will be json.

Type: String

Valid Values: json | mp3 | ogg_vorbis | pcm

Required: Yes

OutputS3BucketName

Amazon S3 bucket name to which the output ﬁle will be saved.

Type: String

Pattern: ^[a-z0-9][\.\-a-z0-9]{1,61}[a-z0-9]$

Required: Yes

OutputS3KeyPreﬁx

The Amazon S3 key preﬁx for the output speech ﬁle.

Type: String

Pattern: ^[0-9a-zA-Z\/\!\-_\.\*\':;\$@=+\,\?&]{0,800}$

Required: No

SampleRate

The audio frequency speciﬁed in Hz.

The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The

default value for standard voices is "22050". The default value for neural voices is "24000". The

default value for long-form voices is "24000". The default value for generative voices is "24000".

Valid values for pcm are "8000" and "16000" The default value is "16000".

Type: String

StartSpeechSynthesisTask 387

Amazon Polly Developer Guide

Required: No

SnsTopicArn

ARN for the SNS topic optionally used for providing status notiﬁcation for a speech synthesis

task.

Type: String

Pattern: ^arn:aws(-(cn|iso(-b)?|us-gov))?:sns:[a-z0-9_-]{1,50}:\d{12}:[a-

zA-Z0-9_-]{1,251}([a-zA-Z0-9_-]{0,5}|\.fifo)$

Required: No

SpeechMarkTypes

The type of speech marks returned for the input text.

Type: Array of strings

Array Members: Maximum number of 4 items.

Valid Values: sentence | ssml | viseme | word

Required: No

Text

The input text to synthesize. If you specify ssml as the TextType, follow the SSML format for the

input text.

Type: String

Required: Yes

TextType

Speciﬁes whether the input text is plain text or SSML. The default value is plain text.

Type: String

Valid Values: ssml | text

Required: No

StartSpeechSynthesisTask 388

Amazon Polly Developer Guide

VoiceId

Voice ID to use for the synthesis.

Type: String

Required: Yes

Response Syntax

HTTP/1.1 200

Content-type: application/json

{

"SynthesisTask": {

"CreationTime": number,

"Engine": "string",

"LanguageCode": "string",

"LexiconNames": [ "string" ],

"OutputFormat": "string",

"OutputUri": "string",

"RequestCharacters": number,

"SampleRate": "string",

"SnsTopicArn": "string",

"SpeechMarkTypes": [ "string" ],

"TaskId": "string",

"TaskStatus": "string",

"TaskStatusReason": "string",

StartSpeechSynthesisTask 389

Amazon Polly Developer Guide

"TextType": "string",

"VoiceId": "string"

}

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

SynthesisTask

SynthesisTask object that provides information and attributes about a newly submitted speech

synthesis task.

Type: SynthesisTask object

Errors

EngineNotSupportedException

This engine is not compatible with the voice that you have designated. Choose a new voice that

is compatible with the engine or change the engine and restart the operation.

HTTP Status Code: 400

InvalidS3BucketException

The provided Amazon S3 bucket name is invalid. Please check your input with S3 bucket naming

requirements and try again.

HTTP Status Code: 400

InvalidS3KeyException

The provided Amazon S3 key preﬁx is invalid. Please provide a valid S3 object key name.

HTTP Status Code: 400

InvalidSampleRateException

The speciﬁed sample rate is not valid.

StartSpeechSynthesisTask 390

Amazon Polly Developer Guide

HTTP Status Code: 400

InvalidSnsTopicArnException

The provided SNS topic ARN is invalid. Please provide a valid SNS topic ARN and try again.

HTTP Status Code: 400

InvalidSsmlException

The SSML you provided is invalid. Verify the SSML syntax, spelling of tags and values, and then

try again.

HTTP Status Code: 400

LanguageNotSupportedException

The language speciﬁed is not currently supported by Amazon Polly in this capacity.

HTTP Status Code: 400

LexiconNotFoundException

Amazon Polly can't ﬁnd the speciﬁed lexicon. This could be caused by a lexicon that is missing,

its name is misspelled or specifying a lexicon that is in a diﬀerent region.

Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is

spelled correctly. Then try again.

HTTP Status Code: 404

MarksNotSupportedForFormatException

Speech marks are not supported for the OutputFormat selected. Speech marks are only

available for content in json format.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

SsmlMarksNotSupportedForTextTypeException

SSML speech marks are not supported for plain text-type input.

StartSpeechSynthesisTask 391

Amazon Polly Developer Guide

HTTP Status Code: 400

TextLengthExceededException

The value of the "Text" parameter is longer than the accepted limits. For the

SynthesizeSpeech API, the limit for input text is a maximum of 6000 characters total, of

which no more than 3000 can be billed characters. For the StartSpeechSynthesisTask API,

the maximum is 200,000 characters, of which no more than 100,000 can be billed characters.

SSML tags are not counted as billed characters.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

StartSpeechSynthesisTask 392

Amazon Polly Developer Guide

SynthesizeSpeech

Synthesizes UTF-8 input, plain text or SSML, to a stream of bytes. SSML input must be valid, well-

formed SSML. Some alphabets might not be available with all the voices (for example, Cyrillic

might not be read at all by English voices) unless phoneme mapping is used. For more information,

see How it Works.

Request Syntax

POST /v1/speech HTTP/1.1

Content-type: application/json

{

"Engine": "string",

"LanguageCode": "string",

"LexiconNames": [ "string" ],

"OutputFormat": "string",

"SampleRate": "string",

"SpeechMarkTypes": [ "string" ],

"Text": "string",

"TextType": "string",

"VoiceId": "string"

}

URI Request Parameters

The request does not use any URI parameters.

Request Body

The request accepts the following data in JSON format.

Engine

Speciﬁes the engine (standard, neural, long-form, or generative) for Amazon Polly to

use when processing input text for speech synthesis. Provide an engine that is supported by the

voice you select. If you don't provide an engine, the standard engine is selected by default. If a

chosen voice isn't supported by the standard engine, this will result in an error. For information

on Amazon Polly voices and which voices are available for each engine, see Available Voices.

Type: String

SynthesizeSpeech 393

Amazon Polly Developer Guide

Valid Values: standard | neural | long-form | generative

Required: Yes

Type: String

Valid Values: standard | neural | long-form | generative

Required: No

LanguageCode

Optional language code for the Synthesize Speech request. This is only necessary if using a

bilingual voice, such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN).

If a bilingual voice is used and no language code is speciﬁed, Amazon Polly uses the default

language of the bilingual voice. The default language for any voice is the one returned by the

DescribeVoices operation for the LanguageCode parameter. For example, if no language code

is speciﬁed, Aditi will use Indian English rather than Hindi.

Type: String

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

Required: No

LexiconNames

List of one or more pronunciation lexicon names you want the service to apply during synthesis.

Lexicons are applied only if the language of the lexicon is the same as the language of the voice.

For information about storing lexicons, see PutLexicon.

Type: Array of strings

Array Members: Maximum number of 5 items.

Pattern: [0-9A-Za-z]{1,20}

Required: No

SynthesizeSpeech 394

Amazon Polly Developer Guide

OutputFormat

The format in which the returned output will be encoded. For audio stream, this will be mp3,

ogg_vorbis, or pcm. For speech marks, this will be json.

When pcm is used, the content returned is audio/pcm in a signed 16-bit, 1 channel (mono),

little-endian format.

Type: String

Valid Values: json | mp3 | ogg_vorbis | pcm

Required: Yes

SampleRate

The audio frequency speciﬁed in Hz.

The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The

default value for standard voices is "22050". The default value for neural voices is "24000". The

default value for long-form voices is "24000". The default value for generative voices is "24000".

Valid values for pcm are "8000" and "16000" The default value is "16000".

Type: String

Required: No

SpeechMarkTypes

The type of speech marks returned for the input text.

Type: Array of strings

Array Members: Maximum number of 4 items.

Valid Values: sentence | ssml | viseme | word

Required: No

Text

Input text to synthesize. If you specify ssml as the TextType, follow the SSML format for the

input text.

SynthesizeSpeech 395

Amazon Polly Developer Guide

Type: String

Required: Yes

TextType

Speciﬁes whether the input text is plain text or SSML. The default value is plain text. For more

information, see Using SSML.

Type: String

Valid Values: ssml | text

Required: No

VoiceId

Voice ID to use for the synthesis. You can get a list of available voice IDs by calling the

DescribeVoices operation.

Type: String

Required: Yes

Response Syntax

HTTP/1.1 200

Content-Type: ContentType

SynthesizeSpeech 396

Amazon Polly Developer Guide

x-amzn-RequestCharacters: RequestCharacters

AudioStream

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The response returns the following HTTP headers.

ContentType

Speciﬁes the type audio stream. This should reﬂect the OutputFormat parameter in your

request.

•

If you request mp3 as the OutputFormat, the ContentType returned is audio/mpeg.

•

If you request ogg_vorbis as the OutputFormat, the ContentType returned is audio/ogg.

•

If you request pcm as the OutputFormat, the ContentType returned is audio/pcm in a

signed 16-bit, 1 channel (mono), little-endian format.

•

If you request json as the OutputFormat, the ContentType returned is application/x-json-

stream.

RequestCharacters

Number of characters synthesized.

The response returns the following as the HTTP body.

AudioStream

Stream containing the synthesized speech.

Errors

EngineNotSupportedException

This engine is not compatible with the voice that you have designated. Choose a new voice that

is compatible with the engine or change the engine and restart the operation.

HTTP Status Code: 400

SynthesizeSpeech 397

Amazon Polly Developer Guide

InvalidSampleRateException

The speciﬁed sample rate is not valid.

HTTP Status Code: 400

InvalidSsmlException

The SSML you provided is invalid. Verify the SSML syntax, spelling of tags and values, and then

try again.

HTTP Status Code: 400

LanguageNotSupportedException

The language speciﬁed is not currently supported by Amazon Polly in this capacity.

HTTP Status Code: 400

LexiconNotFoundException

Amazon Polly can't ﬁnd the speciﬁed lexicon. This could be caused by a lexicon that is missing,

its name is misspelled or specifying a lexicon that is in a diﬀerent region.

Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is

spelled correctly. Then try again.

HTTP Status Code: 404

MarksNotSupportedForFormatException

Speech marks are not supported for the OutputFormat selected. Speech marks are only

available for content in json format.

HTTP Status Code: 400

ServiceFailureException

An unknown condition has caused a service failure.

HTTP Status Code: 500

SsmlMarksNotSupportedForTextTypeException

SSML speech marks are not supported for plain text-type input.

HTTP Status Code: 400

SynthesizeSpeech 398

Amazon Polly Developer Guide

TextLengthExceededException

The value of the "Text" parameter is longer than the accepted limits. For the

SynthesizeSpeech API, the limit for input text is a maximum of 6000 characters total, of

which no more than 3000 can be billed characters. For the StartSpeechSynthesisTask API,

the maximum is 200,000 characters, of which no more than 100,000 can be billed characters.

SSML tags are not counted as billed characters.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS Command Line Interface

• AWS SDK for .NET

• AWS SDK for C++

• AWS SDK for Go v2

• AWS SDK for Java V2

• AWS SDK for JavaScript V3

• AWS SDK for PHP V3

• AWS SDK for Python

• AWS SDK for Ruby V3

Data Types

The following data types are supported:

• Lexicon

• LexiconAttributes

• LexiconDescription

• SynthesisTask

• Voice

Data Types 399

Amazon Polly Developer Guide

Lexicon

Provides lexicon name and lexicon content in string format. For more information, see

Pronunciation Lexicon Speciﬁcation (PLS) Version 1.0.

Contents

Content

Lexicon content in string format. The content of a lexicon must be in PLS format.

Type: String

Required: No

Name

Name of the lexicon.

Type: String

Pattern: [0-9A-Za-z]{1,20}

Required: No

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS SDK for C++

• AWS SDK for Java V2

• AWS SDK for Ruby V3

Lexicon 400

Amazon Polly Developer Guide

LexiconAttributes

Contains metadata describing the lexicon such as the number of lexemes, language code, and so

on. For more information, see Managing Lexicons.

Contents

Alphabet

Phonetic alphabet used in the lexicon. Valid values are ipa and x-sampa.

Type: String

Required: No

LanguageCode

Language code that the lexicon applies to. A lexicon with a language code such as "en" would be

applied to all English languages (en-GB, en-US, en-AUS, en-WLS, and so on.

Type: String

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

Required: No

LastModiﬁed

Date lexicon was last modiﬁed (a timestamp value).

Type: Timestamp

Required: No

LexemesCount

Number of lexemes in the lexicon.

Type: Integer

LexiconAttributes 401

Amazon Polly Developer Guide

Required: No

LexiconArn

Amazon Resource Name (ARN) of the lexicon.

Type: String

Required: No

Size

Total size of the lexicon, in characters.

Type: Integer

Required: No

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS SDK for C++

• AWS SDK for Java V2

• AWS SDK for Ruby V3

LexiconAttributes 402

Amazon Polly Developer Guide

LexiconDescription

Describes the content of the lexicon.

Contents

Attributes

Provides lexicon metadata.

Type: LexiconAttributes object

Required: No

Name

Name of the lexicon.

Type: String

Pattern: [0-9A-Za-z]{1,20}

Required: No

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS SDK for C++

• AWS SDK for Java V2

• AWS SDK for Ruby V3

LexiconDescription 403

Amazon Polly Developer Guide

SynthesisTask

SynthesisTask object that provides information about a speech synthesis task.

Contents

CreationTime

Timestamp for the time the synthesis task was started.

Type: Timestamp

Required: No

Engine

Speciﬁes the engine (standard, neural, long-form or generative) for Amazon Polly to

use when processing input text for speech synthesis. Using a voice that is not supported for the

engine selected will result in an error.

Type: String

Valid Values: standard | neural | long-form | generative

Required: No

LanguageCode

Optional language code for a synthesis task. This is only necessary if using a bilingual voice,

such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN).

If a bilingual voice is used and no language code is speciﬁed, Amazon Polly uses the default

language of the bilingual voice. The default language for any voice is the one returned by the

DescribeVoices operation for the LanguageCode parameter. For example, if no language code

is speciﬁed, Aditi will use Indian English rather than Hindi.

Type: String

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

SynthesisTask 404

Amazon Polly Developer Guide

Required: No

LexiconNames

List of one or more pronunciation lexicon names you want the service to apply during synthesis.

Lexicons are applied only if the language of the lexicon is the same as the language of the voice.

Type: Array of strings

Array Members: Maximum number of 5 items.

Pattern: [0-9A-Za-z]{1,20}

Required: No

OutputFormat

The format in which the returned output will be encoded. For audio stream, this will be mp3,

ogg_vorbis, or pcm. For speech marks, this will be json.

Type: String

Valid Values: json | mp3 | ogg_vorbis | pcm

Required: No

OutputUri

Pathway for the output speech ﬁle.

Type: String

Required: No

RequestCharacters

Number of billable characters synthesized.

Type: Integer

Required: No

SampleRate

The audio frequency speciﬁed in Hz.

SynthesisTask 405

Amazon Polly Developer Guide

The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The

default value for standard voices is "22050". The default value for neural voices is "24000". The

default value for long-form voices is "24000". The default value for generative voices is "24000".

Valid values for pcm are "8000" and "16000" The default value is "16000".

Type: String

Required: No

SnsTopicArn

ARN for the SNS topic optionally used for providing status notiﬁcation for a speech synthesis

task.

Type: String

Pattern: ^arn:aws(-(cn|iso(-b)?|us-gov))?:sns:[a-z0-9_-]{1,50}:\d{12}:[a-

zA-Z0-9_-]{1,251}([a-zA-Z0-9_-]{0,5}|\.fifo)$

Required: No

SpeechMarkTypes

The type of speech marks returned for the input text.

Type: Array of strings

Array Members: Maximum number of 4 items.

Valid Values: sentence | ssml | viseme | word

Required: No

TaskId

The Amazon Polly generated identiﬁer for a speech synthesis task.

Type: String

Pattern: ^[a-zA-Z0-9_-]{1,100}$

Required: No

TaskStatus

Current status of the individual speech synthesis task.

SynthesisTask 406

Amazon Polly Developer Guide

Type: String

Valid Values: scheduled | inProgress | completed | failed

Required: No

TaskStatusReason

Reason for the current status of a speciﬁc speech synthesis task, including errors if the task has

failed.

Type: String

Required: No

TextType

Speciﬁes whether the input text is plain text or SSML. The default value is plain text.

Type: String

Valid Values: ssml | text

Required: No

VoiceId

Voice ID to use for the synthesis.

Type: String

SynthesisTask 407

Amazon Polly Developer Guide

Required: No

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS SDK for C++

• AWS SDK for Java V2

• AWS SDK for Ruby V3

SynthesisTask 408

Amazon Polly Developer Guide

Voice

Description of the voice.

Contents

AdditionalLanguageCodes

Additional codes for languages available for the speciﬁed voice in addition to its default

language.

For example, the default language for Aditi is Indian English (en-IN) because it was ﬁrst used

for that language. Since Aditi is bilingual and ﬂuent in both Indian English and Hindi, this

parameter would show the code hi-IN.

Type: Array of strings

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

Required: No

Gender

Gender of the voice.

Type: String

Valid Values: Female | Male

Required: No

Amazon Polly assigned voice ID. This is the ID that you specify when calling the

SynthesizeSpeech operation.

Type: String

Voice 409

Amazon Polly Developer Guide

Required: No

LanguageCode

Language code of the voice.

Type: String

ar-AE | fi-FI | en-IE | nl-BE | fr-BE

Required: No

LanguageName

Human readable name of the language in English.

Type: String

Required: No

Name

Name of the voice (for example, Salli, Kendra, etc.). This provides a human readable voice name

that you might display in your application.

Type: String

Required: No

Voice 410

Amazon Polly Developer Guide

SupportedEngines

Speciﬁes which engines (standard, neural, long-form or generative) are supported by a

given voice.

Type: Array of strings

Valid Values: standard | neural | long-form | generative

Required: No

See Also

For more information about using this API in one of the language-speciﬁc AWS SDKs, see the

following:

• AWS SDK for C++

• AWS SDK for Java V2

• AWS SDK for Ruby V3

Voice 411

Amazon Polly Developer Guide

Document History for Amazon Polly

The following table describes important changes in each release of the Amazon Polly Developer

Guide. For notiﬁcation about updates to this documentation, you can subscribe to an RSS feed.

• Latest documentation update: August 27, 2024

Change Description Date

New voices added for NTTS Amazon Polly now provides

two new NTTS voices: Jitka

and Sabrina. See Neural

voices for a list of NTTS

voices.

August 27, 2024

New generative voice engine

added

Amazon Polly now oﬀers

a generative voice engine

designed for longer content,

with three English voices in

a generative variant: Amy,

Matthew, and Ruth. See

Generative voices for more

information.

March 28, 2024

New voice added for NTTS Amazon Polly now provides

the NTTS Turkish voice Burcu.

See Neural voices for a list of

NTTS voices.

February 14, 2024

New long-form voice engine

added

Amazon Polly now oﬀers

a long-form voice engine

designed for longer content,

with three en-US voices:

Danielle, Gregory, and Ruth.

See Long-form voices for

more information.

November 16, 2023

412

Amazon Polly Developer Guide

New voices added for NTTS Amazon Polly now provides

two new NTTS US English

voices: Danielle and Gregory.

See Neural voices for a list of

NTTS voices.

October 5, 2023

Amazon Polly for Windows The Amazon Polly Windows

Speech Application

Programming Interface (SAPI)

plugin will no longer be

supported.

September 26, 2023

Updated quota guidance for

Amazon Polly

Updated Amazon Polly quotas

guide. Added examples and

clariﬁcation of terms. Refer to

Quotas in Amazon Polly for

the updates.

August 17, 2023

New voice added for NTTS Amazon Polly now provides

the Gulf Arabic NTTS voice

Zayd. See Neural voices for a

list of NTTS voices.

August 16, 2023

New voice added for NTTS Amazon Polly now provides

the Belgian French NTTS voice

Isabelle. See Neural voices for

a list of NTTS voices.

August 1, 2023

New voice added for NTTS Amazon Polly now provides

the Belgian Dutch (Flemish)

NTTS voice Lisa. See Neural

voices for a list of NTTS

voices.

June 7, 2023

413

Amazon Polly Developer Guide

New voices added for NTTS Amazon Polly now provides

two new NTTS voices: Irish

English (Niamh), and Danish

(Soﬁe). See Neural voices for a

list of NTTS voices.

May 30, 2023

Updated the IAM guidance for

Amazon Polly

Updated guide to align

with the IAM best practices

. For more information, see

Security best practices in IAM.

April 19, 2023

WordPress update The Amazon Polly WordPress

plugin will no longer be

supported.

April 6, 2023

New Region added Amazon Polly is now available

in the Asia Paciﬁc (Osaka)

AWS Region. This Region

supports neural TTS (NTTS).

For more information, see

Feature and Region Compatibi

lity for a list of regions that

support NTTS.

April 5, 2023

New voices added for NTTS Amazon Polly now provides

two new Japanese NTTS

voices: Kazuha and Tomoko.

See Neural voices for a list of

NTTS voices.

February 7, 2023

New voices added for NTTS Amazon Polly now provides

two new US English NTTS

voices: Stephen and Ruth.

See Neural voices for a list of

NTTS voices.

January 31, 2023

414

Amazon Polly Developer Guide

New voices added for NTTS Amazon Polly now provides

new NTTS voices for: Brazilian

Portuguese (Thiago), Castilian

Spanish (Sergio), French

(Rémi), Italian (Adriano), and

Mexican Spanish (Andrés).

See Neural voices for a list of

NTTS voices.

January 24, 2023

New voices added for NTTS Amazon Polly now provides

NTTS voices for Arabic (Hala)

and Polish (Ola). See Neural

voices for a list of NTTS

voices.

November 17, 2022

Release AWS PrivateLink

support

Amazon Polly now provides

AWS PrivateLink support. See

Using Amazon Polly with VPC

endpoints to learn more.

November 9, 2022

New voices and languages

added for NTTS

Amazon Polly now provides

NTTS voices for Finnish (Suvi),

Norwegian (Ida), and Swedish

(Elin). See Neural voices for a

list of NTTS voices.

November 8, 2022

New voice added for NTTS Amazon Polly now provides

the Dutch NTTS voice Laura.

See Neural voices for a list of

NTTS voices.

November 2, 2022

415

Amazon Polly Developer Guide

New Region added Amazon Polly is now available

in the Europe (Paris) AWS

Region. This Region supports

neural TTS (NTTS). For more

information, see Feature and

Region Compatibility for a list

of regions that support NTTS.

September 22, 2022

New voice and language

added for NTTS

Amazon Polly now provides

the Cantonese NTTS voice

Hiujin. See Neural voices for a

list of NTTS voices.

September 20, 2022

New Region added Amazon Polly is now available

in the Asia Paciﬁc (Mumbai)

AWS Region. This Region

supports neural TTS (NTTS).

For more information, see

Feature and Region Compatibi

lity for a list of regions that

support NTTS.

September 1, 2022

New voice added for NTTS Amazon Polly now provides

the Mandarin voice Zhiyu as

an NTTS voice. See Neural

voices for a list of NTTS

voices.

August 23, 2022

New voice added for NTTS Amazon Polly now provides

the Hindi NTTS voice Kajal.

See Neural voices for a list of

NTTS voices.

July 27, 2022

416

Amazon Polly Developer Guide

New voices added for NTTS Amazon Polly now provides

NTTS voices for US Spanish

(Pedro), German (Daniel),

Canadian French (Liam), and

UK English (Arthur). See

Neural voices for a list of

NTTS voices.

June 28, 2022

New voice added for NTTS Amazon Polly now provides

the Portuguese (Brazilian)

voice Vitória as an NTTS voice.

See Neural voices for a list of

NTTS voices.

April 27, 2022

New voice added for NTTS Amazon Polly now provides

the Portuguese (European)

voice Inês as an NTTS voice.

See Neural voices for a list of

NTTS voices.

April 26, 2022

New voice and language

added for NTTS

Amazon Polly now provides

the German (Austrian)

language and the NTTS voice

Hannah. See Neural voices for

a list of NTTS voices.

April 19, 2022

New voices and language

added for NTTS

Amazon Polly now provides

the Spanish (Mexican) voice

Mia as an NTTS voice. A new

language, Catalan, was added

along with the NTTS voice

Arlet. See Neural voices for a

list of NTTS voices.

March 22, 2022

417

Amazon Polly Developer Guide

New voice added for NTTS Amazon Polly now provides

the Japanese voice Takumi

as an NTTS voice. See Neural

voices for a list of NTTS

voices.

December 6, 2021

New voice added for NTTS Amazon Polly now provides

the French voice Léa as an

NTTS voice. See Neural voices

for a list of NTTS voices.

November 18, 2021

New voices added for NTTS Amazon Polly now provides

the Italian voice Bianca and

the European Spanish voice

Lucia as NTTS voices. See

Neural voices for a list of

NTTS voices.

November 8, 2021

New voice added for NTTS Amazon Polly now provides

a new South African English

voice, Ayanda. The voice is

available as an NTTS voice

only. See Neural voices for a

list of NTTS voices.

September 1, 2021

New Region added Amazon Polly is now available

in the Africa (Cape Town) AWS

Region. This Region supports

neural TTS (NTTS). For more

information, see Feature and

Region Compatibility for a list

of regions that support NTTS.

September 1, 2021

418

Amazon Polly Developer Guide

New language and voice

added

Amazon Polly now supports

New Zealand English (en-

NZ). A new NTTS voice, Aria,

speaks New Zealand English

and a selection of Maori

words.

August 24, 2021

New feature Amazon Polly makes the

conversational speaking

style the default version

for the neural Matthew and

Joanna voices. We removed

references to the conversat

ional speaking style.

June 28, 2021

New voice added for NTTS Amazon Polly now provides

the German voice Vicki as an

NTTS voice.

June 15, 2021

New voice added A new female voice, Gabrielle,

has been added to the French

(Canadian) (fr-CA) locale. The

voice is high quality and only

available as an NTTS voice.

Like all neural voices, it is only

available in certain regions.

For a list of regions, see

Feature and region compatibi

lity.

June 1, 2021

New voice added for NTTS Amazon Polly now provides

the Korean voice Seoyeon as

an NTTS voice.

May 11, 2021

419

Amazon Polly Developer Guide

New Region added for NTTS Amazon Polly now supports

neural TTS (NTTS) in the

Canada (Central) AWS Region.

For more information, see

Feature and Region Compatibi

lity for NTTS.

March 17, 2021

New voice available for

newscaster style

In addition to the Matthew,

Joanna, and Lupe voices for

the Newscaster speaking

style, Amazon Polly now

provides an additional option

for this speaking style. Using

the neural engine, you can

use the Amy voice in British

English for the Newscaster

style. For more information,

see NTTS Speaking Styles.

November 10, 2020

New Regions added for NTTS In addition to the existing

Regions for NTTS (us-east-1,

us-west-2, eu-west-1, and ap-

southeast-2), neural voices

are now supported in four

additional Regions: (ap-north

east-1 (Tokyo), ap-southe

ast-1 (Singapore), eu-centra

l-1 (Frankfurt), and eu-west-2

(London). For more informati

on, see Feature and Region

Compatibility for NTTS.

September 3, 2020

420

Amazon Polly Developer Guide

New voice added In addition to child voices

Ivy and Justin, a new male

child voice, Kevin, has been

added to American English

(en-US). This new voice is

very high quality and is only

available as an NTTS voice.

Like all neural voices, it is only

supported in four Regions: us-

east-1 (N. Virginia), us-west-2

(Oregon), eu-west-1 (Ireland),

and ap-southeast-2 (Sydney).

For more information, see

NTTS Voices.

June 16, 2020

New voice available for

newscaster style

In addition to the Matthew

and Joanna voices for the

Newscaster speaking style,

Amazon Polly now provides

an additional option for this

speaking style. Using the

neural engine, you can use

the Lupe voice in Spanish

(American) for the Newscaste

r style. For more information,

see NTTS Speaking Styles.

April 16, 2020

421

Amazon Polly Developer Guide

New feature In addition to the Newscaste

r speaking style, Amazon

Polly now provides a second

NTTS speaking style to help

you synthesize even better

text to speech passages.

The Conversational style

uses the neural system to

generate speech in a more

friendly and expressive

conversational style that can

be used in many use cases.

For more information, see

NTTS Speaking Styles.

November 25, 2019

New voices added Two new voices added: Camila

(female, Portuguese-Brazil)

and Lupe (female, Spanish-U

S).

October 23, 2019

New feature added Addition of Amazon Polly for

Windows plugin to incorpora

te the full range of Amazon

Polly voices into Windows

SAPI-compliant applications.

September 26, 2019

422

Amazon Polly Developer Guide

Major new feature In addition to the standard

text-to-speech (TTS) voices

supported by Amazon Polly

since its launch, Amazon Polly

now provides an improved

Neural TTS (NTTS) system

that can provide even higher

quality voices, thereby

providing you with the most

natural and human-like text-

to-speech voices possible. For

more information, see Neural

Text-to-Speech.

July 30, 2019

New voices added New voices added: Lucia

(female, Spanish), and Bianca

(female, Italian).

August 2, 2018

New language added New language added:

Mexican Spanish (es-MX). This

language uses the female

voice of Mia.

August 2, 2018

New language added New language added: Hindi

(hi-IN). This voice uses the

female voice of Aditi, which is

also used for Indian English,

making Aditi Amazon Polly's

ﬁrst bilingual voice.

August 2, 2018

New feature added Addition of Speech synthesis

of long text passages (up to

100,000 billed characters).

July 17, 2018

New SSML feature added Addition of Maximum

Duration for Synthesized

Speech.

July 17, 2018

423

Amazon Polly Developer Guide

New voice added New voice added: Léa (female,

French).

June 5, 2018

Region expansion Expansion of Amazon Polly

service to all commercial

regions.

June 4, 2018

New language added New language added: Korean

(ko-KR).

June 4, 2018

Expanded feature The Amazon Polly WordPress

Plugin feature, including

addition of Amazon Translate

capabilities.

June 4, 2018

New voices added Two new voices added: Aditi

(female, Indian English) and

Seoyeon (female, Korean).

November 15, 2017

New feature Addition of new Speech

Marks feature, as well as an

expansion of SSML capabilit

ies..

April 19, 2017

New guide This is the ﬁrst release of

the Amazon Polly Developer

Guide.

November 30, 2016

424

Amazon Polly Developer Guide

AWS Glossary

For the latest AWS terminology, see the AWS glossary in the AWS Glossary Reference.

425