Developer Guide
Amazon Polly
Copyright © 2024 Amazon Web Services, Inc. and/or its affiliates. All rights reserved.
Amazon Polly Developer Guide
Amazon Polly: Developer Guide
Copyright © 2024 Amazon Web Services, Inc. and/or its affiliates. All rights reserved.
Amazon's trademarks and trade dress may not be used in connection with any product or service
that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any
manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are
the property of their respective owners, who may or may not be affiliated with, connected to, or
sponsored by Amazon.
Amazon Polly Developer Guide
Table of Contents
What Is Amazon Polly? ................................................................................................................... 1
Benefits ........................................................................................................................................................... 1
Are you a first-time user? ........................................................................................................................... 2
How it works .................................................................................................................................... 3
Are you a first-time user? ........................................................................................................................... 2
Getting started ................................................................................................................................ 5
Setting up Amazon Polly ............................................................................................................................ 5
Sign up for an AWS account ................................................................................................................ 5
Create a user with administrative access ........................................................................................... 6
Using Amazon Polly on the console ........................................................................................................ 8
Step 1.1: Synthesize speech quick start on the console ................................................................. 8
Step 1.2: Synthesize speech with plaintext input on the console ................................................. 9
Using Amazon Polly on the AWS CLI ....................................................................................................... 9
Step 2.1: Set up the AWS CLI ............................................................................................................ 10
Step 2.2: Getting started exercise using the AWS CLI .................................................................. 13
Python examples ........................................................................................................................................ 15
Set up Python and test an example (SDK) ...................................................................................... 15
Voices in Amazon Polly ................................................................................................................. 18
Listening to voices ..................................................................................................................................... 18
Available voices ........................................................................................................................................... 19
Brand voices ........................................................................................................................................... 26
Voice speed .................................................................................................................................................. 26
Changing your voice speed ................................................................................................................. 27
Bilingual voices ........................................................................................................................................... 28
Accented bilingual voices .................................................................................................................... 28
Fully bilingual voices ............................................................................................................................ 29
Newscaster voices ...................................................................................................................................... 30
Languages in Amazon Polly .......................................................................................................... 33
Phoneme and Viseme Tables for Supported Languages ................................................................... 35
Arabic (arb) ............................................................................................................................................. 36
Arabic (Gulf) (ar-AE) ............................................................................................................................. 41
Catalan (ca-ES) ...................................................................................................................................... 47
Chinese (Cantonese) (yue-CN) ............................................................................................................ 51
Chinese (Mandarin) (cmn-CN) ............................................................................................................. 55
iii
Amazon Polly Developer Guide
Danish (da-DK) ....................................................................................................................................... 60
Dutch (Belgian) (nl-BE) ........................................................................................................................ 64
Dutch (nl-NL) ......................................................................................................................................... 68
English (US) (en-US) ............................................................................................................................. 72
English (Australian) (en-AU) ................................................................................................................ 75
English (British) (en-GB) ...................................................................................................................... 79
English (Indian) (en-IN) ........................................................................................................................ 84
English (Ireland) (en-IE) ....................................................................................................................... 88
English (New Zealand) (en-NZ) .......................................................................................................... 91
English (South African) (en-ZA) ......................................................................................................... 97
English (Welsh) (en-GB-WLS) ........................................................................................................... 102
Finnish (fi-FI) ........................................................................................................................................ 106
French (fr-FR) ....................................................................................................................................... 111
French (Belgian) (fr-BE) ..................................................................................................................... 114
French (Canadian) (fr-CA) ................................................................................................................. 118
German (de-DE) .................................................................................................................................. 121
German (Austrian) (de-AT) ................................................................................................................ 125
Hindi (hi-IN) ......................................................................................................................................... 130
Icelandic (is-IS) .................................................................................................................................... 133
Italian (it-IT) ......................................................................................................................................... 138
Japanese (ja-JP) .................................................................................................................................. 141
Korean (ko-KR) .................................................................................................................................... 144
Norwegian (nb-NO) ............................................................................................................................ 147
Polish (pl-PL) ....................................................................................................................................... 151
Portuguese (pt-PT) ............................................................................................................................. 155
Portuguese (Brazilian) (pt-BR) ......................................................................................................... 158
Romanian (ro-RO) ............................................................................................................................... 161
Russian (ru-RU) .................................................................................................................................... 164
Spanish (es-ES) .................................................................................................................................... 168
Spanish (Mexican) (es-MX) ................................................................................................................ 171
Spanish (US) (es-US) .......................................................................................................................... 174
Swedish (sv-SE) ................................................................................................................................... 176
Turkish (tr-TR) ..................................................................................................................................... 180
Welsh (cy-GB) ...................................................................................................................................... 184
Voice engines ............................................................................................................................... 189
Generative engine .................................................................................................................................... 189
iv
Amazon Polly Developer Guide
Available generative voices .............................................................................................................. 190
Feature and region compatibility .................................................................................................... 190
Using the Generative engine on the console ............................................................................... 191
Long-form engine .................................................................................................................................... 192
Available long-form voices ............................................................................................................... 193
Feature and region compatibility .................................................................................................... 193
Using the Long-form engine on the console ................................................................................ 194
Neural engine ........................................................................................................................................... 194
Available neural voices ...................................................................................................................... 195
Feature and region compatibility .................................................................................................... 199
Using the Neural engine on the console ....................................................................................... 200
Standard engine ....................................................................................................................................... 201
Available Standard voices ................................................................................................................. 201
Feature and region compatibility .................................................................................................... 204
Using the Standard engine on the console .................................................................................. 206
Speech marks ............................................................................................................................... 207
Speech mark types .................................................................................................................................. 207
Visemes and Amazon Polly .............................................................................................................. 208
Using speech marks ................................................................................................................................ 209
Requesting speech marks ................................................................................................................. 209
Speech mark output .......................................................................................................................... 210
Speech mark examples ...................................................................................................................... 211
Requesting speech marks on the console .......................................................................................... 213
Using SSML .................................................................................................................................. 215
Reserved characters ................................................................................................................................. 216
Using SSML on the console ................................................................................................................... 218
Using SSML on the AWS CLI ................................................................................................................. 220
Using SSML with the Synthesize-Speech command ................................................................... 220
Synthesizing an SSML-enhanced document ................................................................................. 221
Using SSML for common Amazon Polly tasks .............................................................................. 222
Supported SSML tags ............................................................................................................................. 226
Identifying SSML-enhanced text ..................................................................................................... 228
Adding a pause ................................................................................................................................... 228
Emphasizing words ............................................................................................................................ 229
Specifying another language for specific words .......................................................................... 230
Placing a custom tag in your text .................................................................................................. 231
v
Amazon Polly Developer Guide
Adding a pause between paragraphs ............................................................................................. 232
Using phonetic pronunciation .......................................................................................................... 232
Controlling volume, speaking rate, and pitch .............................................................................. 234
Setting a maximum duration for synthesized speech ................................................................ 237
Adding a pause between sentences ............................................................................................... 240
Controlling how special types of words are spoken ................................................................... 241
Pronouncing acronyms and abbreviations .................................................................................... 244
Improving pronunciation by specifying parts of speech ............................................................ 245
Adding the sound of breathing ....................................................................................................... 246
Newscaster speaking style ................................................................................................................ 250
Adding dynamic range compression .............................................................................................. 251
Speaking softly ................................................................................................................................... 253
Controlling timbre .............................................................................................................................. 254
Whispering ........................................................................................................................................... 255
Managing lexicons ....................................................................................................................... 257
Applying multiple lexicons ..................................................................................................................... 258
Managing lexicons on the console ....................................................................................................... 259
Uploading lexicons on the console ................................................................................................. 259
Applying lexicons on the console (Synthesize Speech) .............................................................. 260
Filtering the lexicon list on the console ........................................................................................ 261
Downloading lexicons on the console ............................................................................................ 262
Deleting a lexicon on the console .................................................................................................. 262
Managing lexicons on the AWS CLI ..................................................................................................... 263
PutLexicon ............................................................................................................................................ 263
GetLexicon ............................................................................................................................................ 270
ListLexicons .......................................................................................................................................... 271
DeleteLexicon ...................................................................................................................................... 272
Creating long audio files ............................................................................................................ 273
Setting up the IAM policy for asynchronous synthesis .................................................................... 274
Creating long audio files on the console ............................................................................................ 275
Creating long audio files on the AWS CLI .......................................................................................... 276
Code and application examples .................................................................................................. 279
Sample code ............................................................................................................................................. 279
Java samples ........................................................................................................................................ 279
Python samples .................................................................................................................................. 289
Example applications .............................................................................................................................. 295
vi
Amazon Polly Developer Guide
Python example .................................................................................................................................. 295
Java example ....................................................................................................................................... 309
iOS example ......................................................................................................................................... 314
Android example ................................................................................................................................. 316
Quotas .......................................................................................................................................... 319
Supported regions ................................................................................................................................... 320
Quotas and throttle rates ...................................................................................................................... 320
Concurrent requests ........................................................................................................................... 321
Best practices to mitigate throttling .............................................................................................. 321
Pronunciation lexicons ............................................................................................................................ 322
SynthesizeSpeech API operations ......................................................................................................... 322
SpeechSynthesisTask API operations ................................................................................................... 323
Speech Synthesis Markup Language (SSML) ...................................................................................... 323
Security ........................................................................................................................................ 324
Data Protection ........................................................................................................................................ 325
Encryption at Rest .............................................................................................................................. 325
Encryption in Transit .......................................................................................................................... 326
Internetwork Traffic Privacy ............................................................................................................. 326
Identity and Access Management ........................................................................................................ 326
Audience ............................................................................................................................................... 326
Authenticating with identities ......................................................................................................... 327
Managing access using policies ....................................................................................................... 330
How Amazon Polly works with IAM ............................................................................................... 333
Identity-based policy examples ....................................................................................................... 340
Amazon Polly API Permissions Reference ..................................................................................... 347
Troubleshooting .................................................................................................................................. 348
Logging and Monitoring ......................................................................................................................... 350
Compliance Validation ............................................................................................................................ 351
Resilience ................................................................................................................................................... 351
Infrastructure Security ............................................................................................................................ 352
Security Best Practices ............................................................................................................................ 352
Using Interface VPC Endpoints ............................................................................................................. 352
Availability ............................................................................................................................................ 353
Creating a VPC endpoint for Amazon Polly .................................................................................. 353
Testing the connection between your VPC and Amazon Polly ................................................. 353
Controlling access to your Amazon Polly endpoint .................................................................... 354
vii
Amazon Polly Developer Guide
Support for VPC context keys ......................................................................................................... 355
Logging Amazon Polly API calls with AWS CloudTrail .............................................................. 356
Amazon Polly information in CloudTrail ............................................................................................. 356
Example: Amazon Polly Log File Entries ............................................................................................. 357
CloudWatch integration .............................................................................................................. 359
Getting CloudWatch Metrics (Console) ............................................................................................... 359
Getting CloudWatch metrics on the AWS CLI .................................................................................... 359
Amazon Polly Metrics ............................................................................................................................. 360
Dimensions for Amazon Polly Metrics ................................................................................................. 361
API Reference ............................................................................................................................... 363
Actions ........................................................................................................................................................ 363
DeleteLexicon ...................................................................................................................................... 364
DescribeVoices ..................................................................................................................................... 366
GetLexicon ............................................................................................................................................ 370
GetSpeechSynthesisTask ................................................................................................................... 373
ListLexicons .......................................................................................................................................... 376
ListSpeechSynthesisTasks ................................................................................................................. 379
PutLexicon ............................................................................................................................................ 382
StartSpeechSynthesisTask ................................................................................................................. 385
SynthesizeSpeech ............................................................................................................................... 393
Data Types ................................................................................................................................................. 399
Lexicon .................................................................................................................................................. 400
LexiconAttributes ................................................................................................................................ 401
LexiconDescription .............................................................................................................................. 403
SynthesisTask ....................................................................................................................................... 404
Voice ...................................................................................................................................................... 409
Document History ........................................................................................................................ 412
AWS Glossary ............................................................................................................................... 425
viii
Amazon Polly Developer Guide
What Is Amazon Polly?
Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to
develop applications that increase engagement and accessibility. Amazon Polly supports multiple
languages and includes a variety of lifelike voices. With Amazon Polly, you can build speech-
enabled applications that work in multiple locations and use the ideal voice for your customers.
Also, you only pay for the text you synthesize. You can also cache and replay Amazon Polly’s
generated speech at no additional cost.
Amazon Polly offers many voice options, including generative, long-form, neural, and standard
text-to-speech (TTS) options. These voices deliver ground-breaking improvements in speech
quality using new machine learning technology to offer the most natural and human-like text-to-
speech voices possible. Neural TTS technology also supports a Newscaster speaking style, tailored
to news narration use cases.
Common use cases for Amazon Polly include, but are not limited to: mobile applications such as
newsreaders, games, eLearning platforms, accessibility applications for visually impaired people,
and the rapidly growing segment of Internet of Things (IoT).
Amazon Polly is certified for use with regulated workloads for HIPAA (the Health Insurance
Portability and Accountability Act of 1996), and Payment Card Industry Data Security Standard (PCI
DSS).
Benefits
Some of the benefits of using Amazon Polly include:
High quality – Amazon Polly offers highly-performant generative, long-form, neural, and high-
quality text-to-speech (TTS) voices. These technologies synthesize natural speech with high
pronunciation accuracy (including abbreviations, acronym expansions, date/time interpretations,
and homograph disambiguation).
Low latency – Amazon Polly achieves fast responses, which makes it a viable option for low-
latency use cases such as dialogue systems.
Support for a large portfolio of languages and voices – Amazon Polly supports dozens of
voices and languages, offering male and female voice options for most languages. This number
will continue to increase as we bring more neural voices online. US English voices Matthew and
Benefits 1
Amazon Polly Developer Guide
Joanna can also use the Neural Newscaster speaking style, similar to what you might hear from a
professional news anchor.
Cost-effective – Amazon Polly's pay-per-use model means that there are no setup costs. Start
small and scale up as your application grows.
Cloud-based solution – On-device TTS solutions require significant computing resources,
notably CPU power, RAM, and disk space. These can result in higher development costs and
higher power consumption on devices such as tablets, smartphones, and so on. In contrast,
TTS conversion done in the AWS Cloud dramatically reduces local resource requirements. This
enables support of all the available languages and voices with outstanding quality. Moreover,
speech improvements are instantly available to all end users and don't require additional
updates for devices.
Note
To hear example Amazon Polly voices in your browser, see the Amazon Polly product
overview.
Are you a first-time user?
If you're a first-time user of Amazon Polly, we recommend that you read the following sections in
the listed order:
1. How Amazon Polly works – This section introduces various Amazon Polly inputs and options
that you can work with in order to create a simple experience.
2. Getting started with Amazon Polly – In this section, you set up your account and test Amazon
Polly speech synthesis.
3. Example applications – This section provides additional examples that you can use to explore
Amazon Polly.
Are you a first-time user? 2
Amazon Polly Developer Guide
How Amazon Polly works
Amazon Polly converts input text into life-like speech. To use an Amazon Polly voice, choose a
voice engine, call a speech synthesis method, provide the text that you want to synthesize, then
specify an audio output format. Amazon Polly then synthesizes the provided text into a high-
quality speech audio stream.
Input text – Provide the text that you want to synthesize, and Amazon Polly returns an audio
stream. You can provide the input as plaintext or in Speech Synthesis Markup Language (SSML)
format. With SSML you can control various aspects of speech, such as pronunciation, volume,
pitch, and speech rate. For more information, see Generating speech from SSML documents.
Available voices – Amazon Polly provides a portfolio of languages and a variety of voices,
including a bilingual voice (for both English and Hindi). For most languages you can choose from
several voices, both male and female. When launching a speech synthesis task, you specify the
voice ID, and then Amazon Polly uses this voice to convert the text to speech. Amazon Polly is
not a translation service—the synthesized speech is in the same language as the text. Numbers
represented as digits (for example, 53, not fifty-three) are synthesized in the language of the
voice and not the text. For more information, see Voices in Amazon Polly.
Output format – Amazon Polly can deliver the synthesized speech in multiple formats. You can
select the audio format that suits your needs. For example, you might request the speech in
the MP3 or Ogg Vorbis format for consumption by web and mobile applications. Or, you might
request the PCM output format for consumption by AWS IoT devices and telephony solutions.
Note
To hear example Amazon Polly voices in your browser, see the Amazon Polly product
overview.
Are you a first-time user?
If you're new to Amazon Polly, we recommend that you read the following topics in order:
Getting started with Amazon Polly
Example applications
Are you a first-time user? 3
Amazon Polly Developer Guide
Quotas in Amazon Polly
Are you a first-time user? 4
Amazon Polly Developer Guide
Getting started with Amazon Polly
Amazon Polly provides several API operations that you can easily integrate with your existing
applications. For a list of supported operations, see Actions. You can use either of the following
options:
AWS SDKs – When using the SDKs, your requests to Amazon Polly are automatically signed and
authenticated using the credentials you provide. This is the recommended choice for building
your applications.
AWS CLI – You can use the AWS CLI to use Amazon Polly without writing any code.
The following sections describe how to get started using Amazon Polly.
Topics
Setting up Amazon Polly
Using Amazon Polly on the console
Using Amazon Polly on the AWS CLI
Python examples
Setting up Amazon Polly
Before you use Amazon Polly for the first time, you must sign up for AWS. When you sign up for
Amazon Web Services (AWS), your AWS account is automatically signed up for all services in AWS,
including Amazon Polly. You're charged only for the services and resources that you use. If you're a
new AWS customer, you can get started with Amazon Polly with no charge. For more information,
see AWS Free Usage Tier.
If you already have an AWS account, you can move on to either of the following activities:
Using Amazon Polly on the console
Using Amazon Polly on the AWS CLI
Sign up for an AWS account
If you do not have an AWS account, complete the following steps to create one.
Setting up Amazon Polly 5
Amazon Polly Developer Guide
To sign up for an AWS account
1. Open https://portal.aws.amazon.com/billing/signup.
2. Follow the online instructions.
Part of the sign-up procedure involves receiving a phone call and entering a verification code
on the phone keypad.
When you sign up for an AWS account, an AWS account root user is created. The root user
has access to all AWS services and resources in the account. As a security best practice, assign
administrative access to a user, and use only the root user to perform tasks that require root
user access.
AWS sends you a confirmation email after the sign-up process is complete. At any time, you can
view your current account activity and manage your account by going to https://aws.amazon.com/
and choosing My Account.
Create a user with administrative access
After you sign up for an AWS account, secure your AWS account root user, enable AWS IAM Identity
Center, and create an administrative user so that you don't use the root user for everyday tasks.
Secure your AWS account root user
1. Sign in to the AWS Management Console as the account owner by choosing Root user and
entering your AWS account email address. On the next page, enter your password.
For help signing in by using root user, see Signing in as the root user in the AWS Sign-In User
Guide.
2. Turn on multi-factor authentication (MFA) for your root user.
For instructions, see Enable a virtual MFA device for your AWS account root user (console) in
the IAM User Guide.
Create a user with administrative access
1. Enable IAM Identity Center.
Create a user with administrative access 6
Amazon Polly Developer Guide
For instructions, see Enabling AWS IAM Identity Center in the AWS IAM Identity Center User
Guide.
2. In IAM Identity Center, grant administrative access to a user.
For a tutorial about using the IAM Identity Center directory as your identity source, see
Configure user access with the default IAM Identity Center directory in the AWS IAM Identity
Center User Guide.
Sign in as the user with administrative access
To sign in with your IAM Identity Center user, use the sign-in URL that was sent to your email
address when you created the IAM Identity Center user.
For help signing in using an IAM Identity Center user, see Signing in to the AWS access portal in
the AWS Sign-In User Guide.
Assign access to additional users
1. In IAM Identity Center, create a permission set that follows the best practice of applying least-
privilege permissions.
For instructions, see Create a permission set in the AWS IAM Identity Center User Guide.
2. Assign users to a group, and then assign single sign-on access to the group.
For instructions, see Add groups in the AWS IAM Identity Center User Guide.
For more information about IAM, see the following:
AWS Identity and Access Management (IAM)
Getting started
IAM User Guide
Note
Note your AWS account ID. You will need it in the next steps.
Create a user with administrative access 7
Amazon Polly Developer Guide
Using Amazon Polly on the console
From the Amazon Polly console, you can quickly start testing and using Amazon Polly's speech
synthesizing features. The Amazon Polly console supports synthesizing speech from either
plaintext or SSML input.
Topics
Step 1.1: Synthesize speech quick start on the console
Step 1.2: Synthesize speech with plaintext input on the console
Step 1.1: Synthesize speech quick start on the console
From the console, you can quickly test Amazon Polly speech synthesis for speech quality.
To listen to an Amazon Polly voice on the console
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Text-to-Speech tab. The text field will load with example text so you can quickly
try out Amazon Polly.
3. Turn off SSML.
4. Under Engine, choose Generative, Long Form, Neural, or Standard.
5. Choose a language and AWS Region, then choose a voice. (If you select Neural for Engine, only
the languages and voices that support NTTS are available. All Standard and Long Form voices
are disabled.)
6. Choose Listen.
For more in-depth testing, see the following topics:
Step 1.2: Synthesize speech with plaintext input on the console
Using SSML on the console
Applying lexicons on the console (Synthesize Speech)
Using Amazon Polly on the console 8
Amazon Polly Developer Guide
Step 1.2: Synthesize speech with plaintext input on the console
The following procedure synthesizes speech using plaintext input. (Note how "W3C" and the date
"10/3" (October 3) are synthesized.)
To synthesize speech using plaintext input on the console
1. After logging on to the Amazon Polly console, choose Try Amazon Polly. Then choose the
Text-to-Speech tab.
2. Turn off SSML.
3. Type or paste this text into the input box.
He was caught up in the game.
In the middle of the 10/3/2014 W3C meeting
he shouted, "Score!" quite loudly.
4. For Engine, choose Generative, Long Form, Neural, or Standard.
5. Choose a language and AWS Region, then choose a voice. (If you choose Neural for Engine,
only the languages and voices that support NTTS are available. All Standard and Long Form
voices are disabled.)
6. To listen to the speech immediately, choose Listen.
7. To save the speech to a file, do one of the following:
a. Choose Download.
b. To change to a different file format, expand Additional settings, turn on Speech file
format settings, choose the file format that you want, and then choose Download.
For more in-depth examples, see the following topics:
Applying lexicons on the console (Synthesize Speech)
Using SSML on the console
Using Amazon Polly on the AWS CLI
You can perform almost all of the same operations on the Amazon Polly console and the AWS CLI.
However, you can't listen to synthesized speech on the AWS CLI. To work with audio on the AWS
CLI, save your text to a file. Then open the file in an audio application of your choice.
Step 1.2: Synthesize speech with plaintext input on the console 9
Amazon Polly Developer Guide
Topics
Step 2.1: Set up the AWS CLI
Step 2.2: Getting started exercise using the AWS CLI
Step 2.1: Set up the AWS CLI
Follow these steps to download and configure the AWS CLI to work with Amazon Polly.
Important
You don't need the AWS CLI to perform the steps in this exercise. However, some of the
exercises in this guide use the AWS CLI. You can skip this step and go to Step 2.2: Getting
started exercise using the AWS CLI, and then set up the AWS CLI later when you need it.
Set up the AWS CLI
To set up the AWS Command Line Interface
1. Download and configure the AWS CLI. For instructions, see the following topics in the AWS
Command Line Interface User Guide:
Getting Set Up with the AWS Command Line Interface
Configuring the AWS Command Line Interface
2. Add a named profile for the administrator user in the AWS CLI AWS Config file. You can use
this profile when running the AWS CLI commands. For more information about named profiles,
see Named Profiles in the AWS Command Line Interface User Guide.
[profile adminuser]
aws_access_key_id = adminuser access key ID
aws_secret_access_key = adminuser secret access key
region = aws-region
For a list of available AWS Regions and those supported by Amazon Polly, see Regions and
Endpoints in the Amazon Web Services General Reference.
Step 2.1: Set up the AWS CLI 10
Amazon Polly Developer Guide
Note
If you're using a Region supported by Amazon Polly that you specified when you
configured the AWS CLI, omit the following line from the AWS CLI code examples.
--region aws-region
3. Verify the setup by typing the following help command at the command prompt.
aws help
A list of valid AWS commands should appear in the AWS CLI window.
Activate Amazon Polly from the AWS CLI
If you've previously downloaded and configured the AWS CLI, Amazon Polly may be unavailable
unless you reconfigure the AWS CLI. The following procedure checks to see if this is necessary.
To activate Amazon Polly from the AWS CLI
1. Verify the availability of Amazon Polly by typing the following help command at the AWS CLI
command prompt.
aws polly help
If you see a description of Amazon Polly and a list of valid commands appears in the AWS CLI
window, you can use Amazon Polly from the AWS CLI immediately. In this case, you can skip
the rest of this procedure. If this is not displayed, continue with Step 2.
2. Activate Amazon Polly using one of the two following options:
a. Uninstall and reinstall the AWS CLI.
For instructions, see Installing the AWS Command Line Interface in the AWS Command
Line Interface User Guide.
or
Step 2.1: Set up the AWS CLI 11
Amazon Polly Developer Guide
b. Download the file service-2.json.
At the command prompt, run the following command.
aws configure add-model --service-model file://service-2.json --service-name
polly
3. Reverify the availability of Amazon Polly.
aws polly help
The description of Amazon Polly should be visible.
Set up a voice engine from the AWS CLI
From the AWS CLI, the engine parameter is optional, with four possible values: generative,
long-form, neural, and standard. For example, if you use the following code to run the
start-speech-synthesis-task AWS CLI command in the US West-2 (Oregon) region:
aws polly start-speech-synthesis-task \
--engine neural
--region us-west-2 \
--endpoint-url "https://polly.us-west-1.amazonaws.com/" \
--output-format mp3 \
--output-s3-bucket-name your-bucket-name \
--output-s3-key-prefix optional/prefix/path/file \
--voice-id Joanna \
--text file://text_file.txt
The output will resemble the following:
"SynthesisTask":
{
"CreationTime": [..],
"Engine": "neural",
"OutputFormat": "mp3",
"OutputUri": "https://s3.us-west-1.amazonaws.com/your-bucket-name/optional/prefix/
path/file.<task_id>.mp3",
"TextType": "text",
Step 2.1: Set up the AWS CLI 12
Amazon Polly Developer Guide
"RequestCharacters": [..],
"TaskStatus": "scheduled",
"TaskId": [task_id],
"VoiceId": "Joanna"
}
Step 2.2: Getting started exercise using the AWS CLI
If you've already set up the AWS CLI, you can test the speech synthesis offered by Amazon Polly. In
this exercise, you call the SynthesizeSpeech operation by passing input text. You can save the
resulting audio as a file and verify its content.
1.
Run the synthesize-speech AWS CLI command to synthesize sample text to an audio file
(hello.mp3).
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use
full quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \
--output-format mp3 \
--voice-id Joanna \
--text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last
year.' \
hello.mp3
In the call to synthesize-speech, you provide sample text to be synthesized by a voice
of your choice. You must provide a voice ID (explained in the following step) and an output
format. The command saves the resulting audio to the hello.mp3 file. In addition to the MP3
file, the operation sends the following output to the console.
{
"ContentType": "audio/mpeg",
"RequestCharacters": "71"
}
2.
Play the resulting hello.mp3 file to verify the synthesized speech.
3.
Get the list of available voices by using the DescribeVoices operation. Run the following
describe-voices AWS CLI command.
Step 2.2: Getting started exercise using the AWS CLI 13
Amazon Polly Developer Guide
aws polly describe-voices
In response, Amazon Polly returns the list of all available voices. For each voice, the response
provides the following metadata: voice ID, language code, language name, and the gender of
the voice. The following is a sample response.
{
"Voices": [
{
"Gender": "Female",
"Name": "Salli",
"LanguageName": "US English",
"Id": "Salli",
"LanguageCode": "en-US",
"SupportedEngines": [
"neural",
"standard",
"generative"
]
},
{
"Gender": "Female",
"Name": "Danielle",
"LanguageName": "US English",
"Id": "Danielle",
"LanguageCode": "en-US",
"SupportedEngines": [
"long-form"
]
}
]
}
Optionally, you can specify the language code to find the available voices for a specific
language. Amazon Polly supports dozens of voices. The following example lists all the voices
for Brazilian Portuguese.
aws polly describe-voices \
--language-code pt-BR
Step 2.2: Getting started exercise using the AWS CLI 14
Amazon Polly Developer Guide
For a list of language codes, see Languages in Amazon Polly. These language codes are
W3C language identification tags (ISO 639 code for the language name-ISO 3166
country code). For example, en-US (US English), en-GB (British English), and es-ES (Spanish),
etc. You can also use the help option in the AWS CLI to get the list of language codes:
aws polly describe-voices help
Python examples
This guide provides a few Python code examples that use AWS SDK for Python (Boto) to make API
calls to Amazon Polly. We recommend that you set up Python and test the example code provided
in the following section. For additional examples, see Example applications.
Set up Python and test an example (SDK)
To test the Python example code, you need the AWS SDK for Python (Boto). For instruction, see
AWS SDK for Python (Boto3).
To test the example Python code
The following Python code example performs the following actions:
Invokes the AWS SDK for Python (Boto) to send a SynthesizeSpeech request to Amazon Polly
(by providing some text as input).
Accesses the resulting audio stream in the response and saves the audio to a file (speech.mp3)
on your local disk.
Plays the audio file with the default audio player for your local system.
Save the code to a file (example.py) and run it.
"""Getting Started Example for Python 2.7+/3.3+"""
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import subprocess
Python examples 15
Amazon Polly Developer Guide
from tempfile import gettempdir
# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(profile_name="adminuser")
polly = session.client("polly")
try:
# Request speech synthesis
response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3",
VoiceId="Joanna")
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
print(error)
sys.exit(-1)
# Access the audio stream from the response
if "AudioStream" in response:
# Note: Closing the stream is important because the service throttles on the
# number of parallel connections. Here we are using contextlib.closing to
# ensure the close method of the stream object will be called automatically
# at the end of the with statement's scope.
with closing(response["AudioStream"]) as stream:
output = os.path.join(gettempdir(), "speech.mp3")
try:
# Open a file for writing the output as a binary stream
with open(output, "wb") as file:
file.write(stream.read())
except IOError as error:
# Could not write to file, exit gracefully
print(error)
sys.exit(-1)
else:
# The response didn't contain audio data, exit gracefully
print("Could not stream audio")
sys.exit(-1)
# Play the audio using the platform's default player
if sys.platform == "win32":
os.startfile(output)
else:
# The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).
Set up Python and test an example (SDK) 16
Amazon Polly Developer Guide
opener = "open" if sys.platform == "darwin" else "xdg-open"
subprocess.call([opener, output])
For additional examples including an example application, see Example applications.
Set up Python and test an example (SDK) 17
Amazon Polly Developer Guide
Voices in Amazon Polly
Amazon Polly provides dozens of lifelike voices and support for a variety of languages. Each voice
is created using native language speakers, so there are variations from voice to voice, even within
the same language. You can also use the AWS Management Console to test each voice with text of
your choice. For most languages, there will be at least one male and one female voice, and often
more than one of each. A few languages only have a single voice.
Note
To hear example Amazon Polly voices in your browser, see the Amazon Polly product
overview.
Topics
Listening to voices
Available voices
Voice speed
Bilingual voices
Newscaster voices
Listening to voices
Once you have set up Amazon Polly, you can test voices using custom text on the console.
To listen to Amazon Polly voices on the console
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Text-to-Speech tab.
3. For Engine, choose Generative, Long Form, Neural, or Standard.
4. Select a language and a Region. Then choose a voice.
5. Enter text for the voice to speak or use the default phrase, and then choose Listen.
Listening to voices 18
Amazon Polly Developer Guide
Note
The inventory of voices and the number of languages included is continually being updated
to include additional choices. To suggest a new language or voice, provide feedback on
this page. Unfortunately, we are not able to comment on plans for specific new languages
before they are released.
Available voices
Amazon Polly provides a variety of lifelike voices in multiple languages for synthesizing speech
from text. The following table shows all the voices that Amazon Polly offers.
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
1 Arabic arb Zeina Female No No No Yes
2 Arabic
(Gulf)
ar-AE Hala*
Zayd*
Female
Male
No
No
No
No
Yes
Yes
No
No
3 Dutch
(Belgian)
nl-BE Lisa Female No No Yes No
4 Catalan ca-ES Arlet Female No No Yes No
5 Czech cs-CZ Jitka Female No No Yes No
6 Chinese
(Cantones
e)
yue-CN Hiujin Female No No Yes No
7 Chinese
(Mandarin
)
cmn-CN Zhiyu Female No No Yes Yes
Available voices 19
Amazon Polly Developer Guide
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
8 Danish da-DK Naja
Mads
Sofie
Female
Male
Female
No
No
No
No
No
No
No
No
Yes
Yes
Yes
No
9 Dutch nl-NL Laura
Lotte
Ruben
Female
Female
Male
No
No
No
No
No
No
Yes
No
No
No
Yes
Yes
10 English
(Australi
an)
en-AU Nicole
Olivia
Russell
Female
Female
Male
No
No
No
No
No
No
No
Yes
No
Yes
No
Yes
11 English
(British)
en-GB Amy**
Emma
Brian
Arthur
Female
Female
Male
Male
Yes
No
No
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
12 English
(Indian)
en-IN Aditi*
Raveena
Kajal*
Female
Female
Female
No
No
No
No
No
No
No
No
Yes
Yes
Yes
No
13 English
(Ireland)
en-IE Niamh Female No No Yes No
Available voices 20
Amazon Polly Developer Guide
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
14 English
(New
Zealand)
en-NZ Aria Female No No Yes No
15 English
(South
African)
en-ZA Ayanda Female No No Yes No
16 English
(US)
en-US Danielle
Gregory
Ivy
Joanna**
Kendra
Kimberly
Salli
Joey
Justin
Kevin
Matthew**
Ruth
Stephen
Female
Male
Female(child)
Female
Female
Female
Female
Male
Male
(child)
Male
(child)
Male
Female
Male
No
No
No
No
No
No
No
No
No
No
Yes
Yes
No
Yes
Yes
No
No
No
No
No
No
No
No
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Available voices 21
Amazon Polly Developer Guide
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
17 English
(Welsh)
en-GB-
WLS
Geraint Male No No No Yes
18 Finnish fi-FI Suvi Female No No Yes No
19 French fr-FR Céline/
Celine
Léa
Mathieu
Rémi
Female
Female
Male
Male
No
No
No
No
No
No
No
No
No
Yes
No
Yes
Yes
Yes
Yes
No
20 French
(Belgian)
fr-BE Isabelle Female No No Yes No
21 French
(Canadian
)
fr-CA Chantal
Gabrielle
Liam
Female
Female
Male
No
No
No
No
No
No
No
Yes
Yes
Yes
No
No
22 German de-DE Marlene
Vicki
Hans
Daniel
Female
Female
Male
Male
No
No
No
No
No
No
No
No
No
Yes
No
Yes
Yes
Yes
Yes
No
23 German
(Austrian
)
de-AT Hannah Female No No Yes No
Available voices 22
Amazon Polly Developer Guide
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
24 German
(Swiss)
de-CH Sabrina Female No No Yes No
25 Hindi hi-IN Aditi*
Kajal*
Female
Female
No
No
No
No
No
Yes
Yes
No
26 Icelandic is-IS Dóra/
Dora
Karl
Female
Male
No
No
No
No
No
No
Yes
Yes
27 Italian it-IT Carla
Bianca
Giorgio
Adriano
Female
Female
Male
Male
No
No
No
No
No
No
No
No
No
Yes
No
Yes
Yes
Yes
Yes
No
28 Japanese ja-JP Mizuki
Takumi
Kazuha
Tomoko
Female
Male
Female
Female
No
No
No
No
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
No
No
29 Korean ko-KR Seoyeon Female No No Yes Yes
30 Norwegiannb-NO Liv
Ida
Female
Female
No
No
No
No
No
Yes
Yes
No
Available voices 23
Amazon Polly Developer Guide
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
31 Polish pl-PL Ewa
Maja
Jacek
Jan
Ola
Female
Female
Male
Male
Female
No
No
No
No
No
No
No
No
No
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
No
32 Portugues
e
(Brazilia
n)
pt-BR Camila
Vitória/
Vitoria
Ricardo
Thiago
Female
Female
Male
Male
No
No
No
No
No
No
No
No
Yes
Yes
No
Yes
Yes
Yes
Yes
No
33 Portugues
e
(European
)
pt-PT Inês/
Ines
Cristiano
Female
Male
No
No
No
No
Yes
No
Yes
Yes
34 Romanian ro-RO Carmen Female No No No Yes
35 Russian ru-RU Tatyana
Maxim
Female
Male
No
No
No
No
No
No
Yes
Yes
Available voices 24
Amazon Polly Developer Guide
Language
and
language
variants
Language
code
Name/
ID
Gender Generativ
e voice
Long
Form
voice
Neural
voice
Standard
voice
36 Spanish
(European
)
es-ES Conchita
Lucia
Enrique
Sergio
Female
Female
Male
Male
No
No
No
No
No
No
No
No
No
Yes
No
Yes
Yes
Yes
Yes
No
37 Spanish
(Mexican)
es-MX Mia
Andrés
Female
Male
No
No
No
No
Yes
Yes
Yes
No
38 Spanish
(US)
es-US Lupe**
Penélope/
Penelope
Miguel
Pedro
Female
Female
Male
Male
No
No
No
No
No
No
No
No
Yes
No
No
Yes
Yes
Yes
Yes
No
39 Swedish sv-SE Astrid
Elin
Female
Female
No
No
No
No
No
Yes
Yes
No
40 Turkish tr-TR Filiz
Burcu
Female
Female
No
No
No
No
No
Yes
Yes
No
41 Welsh cy-GB Gwyneth Female No No No Yes
* This voice is bilingual. For more information, see Bilingual voices.
** These voices can be used with Newscaster speaking styles when used with the Neural format. For
more information, see Newscaster voices.
Available voices 25
Amazon Polly Developer Guide
Each Amazon Polly voice engine has unique features. Learn more about features and Region
availability for the voice engines offered by Amazon Polly:
Generative voices
Long-form voices
Neural voices
Standard voices
Brand voices
In addition to the available voices listed in the previous table, you can use Amazon Polly to build a
custom voice for your brand persona. With a brand voice, you can offer unique and exclusive voices
to your customers. To learn more about Amazon Polly brand voices, see Brand Voice.
Voice speed
Because of the natural variation between voices, each available voice speaks at slightly different
speeds. For instance, with US English voices, Ivy and Joanna are slightly faster than Matthew, and
considerably faster than Joey. Since there is so much variation between voices, there is no standard
speed (words per minute) available for Amazon Polly voices. However, you can find how long it
takes for your voice to say the selected text using Speech Marks.
To time the length of a spoken text passage
1. Open the AWS CLI.
2. Run the following code, filling in as needed.
aws polly synthesize-speech \
--language-code optional language code if needed
--output-format json \
--voice-id [name of desired voice] \
--text '[desired text]' \
--speech-mark-types='["viseme"]' \
LengthOfText.txt
3.
Open LengthOfText.txt.
If the text were "Mary had a little lamb," the last few lines returned by Amazon Polly would be:
Brand voices 26
Amazon Polly Developer Guide
{"time":882,"type":"viseme","value":"t"}
{"time":964,"type":"viseme","value":"a"}
{"time":1082,"type":"viseme","value":"p"}
The last viseme, essentially the sound for the final letters in "lamb" starts 1082 milliseconds after
the beginning of the speech. While this is not exactly the length of the audio, it's close and can
serve as the basis for comparison between voices.
Changing your voice speed
For certain applications, you may find that you'd prefer the voice you like be slowed down, or
speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this
using SSML tags. For example, if your organization was making an application that reads books
to immigrant audiences, you may want to vary the voice speed. Your audience may speak English,
but their fluency is limited. Amazon Polly helps you slow down the rate of speech using the SSML
<prosody> tag.
You can use a percentage:
<speak>
In some cases, it might help your audience to <prosody rate="85%">slow
the speaking rate slightly to aid in comprehension.</prosody>
</speak>
Or a preset speed:
<speak>
In some cases, it might help your audience to <prosody rate="slow">slow
the speaking rate slightly to aid in comprehension.</prosody>
</speak>
Two speed options are available to you when using SSML with Amazon Polly:
Preset speeds: x-slow, slow, medium, fast, and x-fast. In these cases, the speed of each
option is approximate, depending on your preferred voice. The medium option is the normal
speed of the voice.
n% of speech rate: any percentage of the speech rate, between 20% and 200% can be used. In
these cases, you can choose exactly the speed you want. However, the actual speed of the voice
Changing your voice speed 27
Amazon Polly Developer Guide
is approximate, depending on the voice you've chosen. 100% is considered to be the normal
speed of the voice.
Note
Test your selected voice at various speeds. The speed of each option is approximate and
depends on the voice you choose.
For more information on using the prosody tag, see Controlling volume, speaking rate, and pitch .
Bilingual voices
Amazon Polly has two ways of producing bilingual voices:
Accented bilingual voices
Fully bilingual voices
Accented bilingual voices
Accented bilingual voices can be created using any Amazon Polly voice, but only when using SSML
tags.
Normally, all words in the input text are spoken in the default language of the voice specified
you're using.
For example, if you're using the voice of Joanna (who speaks US English), Amazon Polly speaks the
following in the Joanna voice without a French accent:
<speak>
Why didn't she just say, 'Je ne parle pas français?'
</speak>
In this case, the words Je ne parle pas français are spoken as they would be if they were English.
However, if you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the
Joanna voice in American-accented French:
Bilingual voices 28
Amazon Polly Developer Guide
<speak>
Why didn't she just say, <lang xml:lang="fr-FR">'Je ne parle pas français?'</
lang>.
</speak>
Because Joanna is not a native French voice, pronunciation is based on her native language, US
English. For instance, although perfect French pronunciation features an uvual trill /R/ in the word
français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.
If you use the voice of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the
sentence in Giorgio's voice with an Italian pronunciation:
<speak>
Mi piace Bruce Springsteen.
</speak>
Fully bilingual voices
A fully bilingual voice like Aditi or Kajal (Indian English and Hindi) can speak two languages
fluently. This gives you the ability to use words and phrases from both languages in a single text
using the same voice.
Currently, Aditi, Kajal, Hala, and Zayd are the only fully bilingual voices available.
Using a Bilingual Voice (example: Aditi)
Aditi speaks both Indian English (en-IN) and Hindi (hi-IN) fluently. You can synthesize speech in
both English and Hindi, and the voice can switch between the two languages even within the same
sentence.
Hindi can be used in two different forms:
Devanagari:
"उसेन
कहँा,
खेल
तोह
अब
शुूर
होगा"
Romanagari (using the Latin alphabet): "Usne kahan, khel toh ab shuru hoga"
Additionally, it's possible to mix English and Hindi of either or both forms within a single sentence:
Devanagari + English: "This is the song
कभी
कभी
अदिति"
Romanagari + English: "This is the song from the movie Jaane Tu Ya Jaane Na."
Fully bilingual voices 29
Amazon Polly Developer Guide
Devanagari + Romanagari + English: "This is the song
कभी
कभी
अदिति
from the movie Jaane Tu Ya
Jaane Na."
Because Aditi is a bilingual voice, text in all of these cases will be read correctly, as Amazon Polly
can differentiate between the languages and scripts.
Amazon Polly also supports numbers, dates, times, and currency expansion in both English (Arabic
numerals) and Hindi (Devanagari numerals). By default, Arabic numerals are read in Indian English.
To make Amazon Polly read them in Hindi, you must use the hi-IN language code parameter.
Newscaster voices
People use different speaking styles, depending on context. Casual conversation, for example,
sounds very different from a TV or radio newscast. Because of the way standard voices are made,
they can't produce different speaking styles. However, neural voices can. They can be trained for a
specific speaking style, with the variations and emphasis on certain parts of speech inherent in that
style.
In addition to the default neural voices, Amazon Polly provides a newscaster speaking style that
uses the neural system to generate speech in the style of a TV or radio newscaster. The Newscaster
style is available with the Matthew and Joanna voices in US English (en-US), the Lupe voice in US
Spanish (es-US), and the Amy voice in British English (en-GB).
To use the Newscaster style, first choose the neural engine and then use the syntax described in
the following steps in your input text.
Note
To use any neural speaking style, you must use one of the AWS Regions that support
neural voices. This option is not available in all Regions. For more information, see
Feature and region compatibility.
To apply the Newscaster style (console)
1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.
2. Make sure that you are using an AWS Region where neural voices are supported.
3. On the Text-to-Speech page, for Engine, choose Neural.
Newscaster voices 30
Amazon Polly Developer Guide
4. Choose the language and voice you want to use. Only Matthew and Joanna for US English
(en-US), Lupe for US Spanish (es-US), and Amy for British English (en-GB) are available in the
newscaster voice.
5. Turn on SSML.
6. Add input text to your text-to-speech request using the Newscaster style SSML syntax.
<amazon:domain name="news">text</amazon:domain>
For example, you might use the newscaster tag as follows:
<speak>
<amazon:domain name="news">
From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:
The maiden voyage of the White Star liner Titanic, the largest ship ever launched
ended in disaster.
The Titanic started her trip from Southampton for New York on Wednesday. Late on
Sunday night she struck an iceberg off the Grand Banks of Newfoundland. By
wireless telegraphy she sent out signals of distress, and several liners were
near enough to catch and respond to the call.
</amazon:domain>
</speak>
7. Choose Listen.
To apply the Newscaster style (CLI)
1.
In your API request, include the engine parameter with the neural value:
--engine neural
2. Add input text to your API request using the Newscaster style SSML syntax.
<amazon:domain name="news">text</amazon:domain>
For example, you might use the newscaster tag as follows:
<speak>
Newscaster voices 31
Amazon Polly Developer Guide
<amazon:domain name="news">
From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:
The maiden voyage of the White Star liner Titanic, the largest ship ever launched
ended in disaster.
The Titanic started her trip from Southampton for New York on Wednesday. Late on
Sunday night she struck an iceberg off the Grand Banks of Newfoundland. By
wireless telegraphy she sent out signals of distress, and several liners were
near enough to catch and respond to the call.
</amazon:domain>
</speak>
For more information about SSML, see Supported SSML tags.
Newscaster voices 32
Amazon Polly Developer Guide
Languages in Amazon Polly
The following languages are supported by Amazon Polly and can be used to synthesize speech.
Each language has a unique language code. These language codes are W3C language identification
tags (ISO 639-3 for the language name and ISO 3166 for the country code).
Select a language from the following table for details on the phonemes and visemes that Amazon
Polly provides.
Language Language code
1 Arabic arb
2 Arabic (Gulf) ar-AE
3 Catalan ca-ES
4 Chinese (Cantonese) yue-CN
5 Chinese (Mandarin) cmn-CN
6 Danish da-DK
7 Dutch (Belgian) nl-BE
8 Dutch nl-NL
9 English (Australian) en-AU
10 English (British) en-GB
11 English (Indian) en-IN
12 English (New Zealand) en-NZ
13 English (South African) en-ZA
14 English (US) en-US
15 English (Welsh) en-GB-WLS
33
Amazon Polly Developer Guide
Language Language code
16 Finnish fi-FI
17 French fr-FR
18 French (Belgian) fr-BE
19 French (Canadian) fr-CA
20 Hindi hi-IN
21 German de-DE
22 German (Austrian) de-AT
23 Icelandic is-IS
24 Italian it-IT
25 Japanese ja-JP
26 Korean ko-KR
27 Norwegian nb-NO
28 Polish pl-PL
29 Portuguese (Brazilian) pt-BR
30 Portuguese (European) pt-PT
31 Romanian ro-RO
32 Russian ru-RU
33 Spanish (European) es-ES
34 Spanish (Mexican) es-MX
35 Spanish (US) es-US
34
Amazon Polly Developer Guide
Language Language code
36 Swedish sv-SE
37 Turkish tr-TR
38 Welsh cy-GB
For more information, see Phoneme and Viseme Tables for Supported Languages.
Phoneme and Viseme Tables for Supported Languages
The following tables list the phonemes for the languages supported by Amazon Polly, along with
examples and the corresponding visemes.
Topics
Arabic (arb)
Arabic (Gulf) (ar-AE)
Catalan (ca-ES)
Chinese (Cantonese) (yue-CN)
Chinese (Mandarin) (cmn-CN)
Danish (da-DK)
Dutch (Belgian) (nl-BE)
Dutch (nl-NL)
English (US) (en-US)
English (Australian) (en-AU)
English (British) (en-GB)
English (Indian) (en-IN)
English (Ireland) (en-IE)
English (New Zealand) (en-NZ)
English (South African) (en-ZA)
Phoneme and Viseme Tables for Supported Languages 35
Amazon Polly Developer Guide
English (Welsh) (en-GB-WLS)
Finnish (fi-FI)
French (fr-FR)
French (Belgian) (fr-BE)
French (Canadian) (fr-CA)
German (de-DE)
German (Austrian) (de-AT)
Hindi (hi-IN)
Icelandic (is-IS)
Italian (it-IT)
Japanese (ja-JP)
Korean (ko-KR)
Norwegian (nb-NO)
Polish (pl-PL)
Portuguese (pt-PT)
Portuguese (Brazilian) (pt-BR)
Romanian (ro-RO)
Russian (ru-RU)
Spanish (es-ES)
Spanish (Mexican) (es-MX)
Spanish (US) (es-US)
Swedish (sv-SE)
Turkish (tr-TR)
Welsh (cy-GB)
Arabic (arb)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Arabic voice of Zeina that is supported by Amazon Polly.
Arabic (arb) 36
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ʔ
? glottal stop
انَأ
ʕ
?\ voiced pharyngeal
fricative
رَمُع
k
b b voiced bilabial
plosive
دَلَب
p
d d voiced alveolar
plosive
يراد
t
d_?\ emphatic voiced
alveolar plosive
ءوَض
t
d͡ʒ
dZ voiced postalveo
lar affricate
ليمَج
S
ð D voiced dental
fricative
َكِلذ
T
ðˤ
D_?\ emphatic voiced
dental fricative
مالَظ
T
f f voiceless labiodent
al fricative
لصَف
f
ɡ
g voiced velar
plosive
ارتلجنإ
k
ɣ
G voiced velar
fricative
برَغ
k
h h voiceless glottal
fricative
اذه
k
Arabic (arb) 37
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
j j palatal approxima
nt
يشمَي
i
k k voiceless velar
plosive
بلَك
k
l l alveolar lateral
approximant
ىقال
t
l_G emphatic alveolar
lateral approxima
nt
هللادبع
t
m m bilabial nasal
اذام
p
n n alveolar nasal
رون
t
p p voiceless bilabial
plosive
سبَح
p
q q voiceless uvular
plosive
بيرَق
k
r r alveolar trill
لمَر
r
s s voiceless alveolar
fricative
لاؤُس
s
s_?\ emphatic voiceless
alveolar fricative
بِحاص
s
ʃ
S voiceless postalveo
lar fricative
ركُش
S
t t voiceless alveolar
plosive
رمَت
t
Arabic (arb) 38
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t_?\ emphatic voiceless
alveolar plosive
بِلاط
t
θ T voiceless dental
fricative
ثالَث
T
v v voiced labiodental
fricative
نيماتيف
f
w w labio-velar
approximant
دَلَو
u
x x voiceless velar
fricative
فْوَخ
k
ħ X\ voiceless
pharyngeal
fricative
َلْوَح
k
z z voiced alveolar
fricative
روهُز
s
Vowels
a a open front
unrounded vowel
درَب
a
a: long open front
unrounded vowel
راد
a
ɑˤ
A_?\ emphatic open
back unrounded
vowel
لبَط
a
ɑˤː
A_?\: emphatic long
open back
unrounded vowel
مِلاظ
a
Arabic (arb) 39
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
u u close back
rounded vowel
برُش
u
u: u: long close back
rounded vowel
روس
u
u_?\ emphatic close
back rounded
vowel
ّدُب
u
uˤː
u_?\: emphatic long
close back
rounded vowel
لوط
u
i i close front
unrounded vowel
تنِب
i
i: long close front
unrounded vowel
نيزَح
i
i_?\ emphatic close
front unrounded
vowel
ّدِض
i
iˤː
i_?\: emphatic long
close front
unrounded vowel
يضام
i
e e close-mid front
unrounded vowel
تكرام
e
e: long close-mid
front unrounded
vowel
ليدوم
e
ɔ
O open-mid back
rounded vowel
يجولونكت
O
Arabic (arb) 40
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɔː
O: long open-mid
back rounded
vowel
نويزفيلت
O
Arabic (Gulf) (ar-AE)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Arabic voice of Hala that is supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Pronunciation Viseme
Consonants
b b voiced bilabial
plosive
دلب
/ " b a . l a d / b
d d voiced alveolar
plosive
در
/ " r a d d / d
d_?\ pharyngea
lised voiced
alveolar
plosive
ءوض
/ " d_?\ a w ? / D
f f voiceless
labiodental
fricative
نرف
/ " f I . r I n / f
g g voiced velar
plosive
لاق
/ " g a: l / k
j j voiced palatal
approximant
يشمي
/ " j I m . S i: / i
Arabic (Gulf) (ar-AE) 41
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Pronunciation Viseme
k k voiceless velar
plosive
لماك
/ " k a: . m i l / k
l l voiced alveolar
lateral
approximant
ليل
/ " l e: l / t
I_G pharyngea
lised voiced
alveolar lateral
approximant
هللادبع
/ ?\ a b . " d
A_?\ l_G . l_G
A_?\ /
t
m m bilabial nasal
stop
ةئم
/ " m I j . j a / p
n n alveolar nasal
stop
رون
/ " n u: r / t
p p voiceless
bilabial plosive
اربوأ
/ " ? O . p e . r
a: /
p
q q voiceless
uvular plosive
رصق
/ " q A_?\ s_?\
r /
k
r r alveolar trill
لمر
/ " r a . m I l / r
s s voiceless
alveolar
fricative
مسمس
/ " s I m . s I
m /
s
s_?\ pharyngea
lised voiceless
alveolar
fricative
بحاص
/ " s_?\ A_?: . X
\ I b /
s
Arabic (Gulf) (ar-AE) 42
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Pronunciation Viseme
t t voiceless
alveolar
plosive
رمت
/ "t a . m a r / t
t_?\ pharyngea
lised voiceless
alveolar
fricative
بلاط
/ " t_?\ A_?: . l I
b /
t
v v voiced
labiodental
fricative
نيماتيف
/ v i: . t A . " m
i: n /
f
w w voiced
labiovelar
approximant
دياو
/ " w a: . j I d / u
x x voiceless velar
fricative
فورخ
/ x a . " r u: f / k
z z voiceless velar
fricative
روهز
/ " z h u: r / s
ð D voiced
interdental
fricative
كلذ
/ " D a: . l I k / D
ðˤ
D_?\ pharyngea
lised voiced
interdental
fricative
مالظ
/ D_?\ A_?\ . " l
a: m /
D
ħ X\ voiceless
pharyngeal
fricative
نيجلا
/ ? a l . " X\ i:
n /
k
Arabic (Gulf) (ar-AE) 43
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Pronunciation Viseme
ŋ N velar nasal
stop
غنوك
غنوه
/ h O N . " k O
N g /
k
ɣ
G voiced velar
fricative
ةبيرغ
/ G I . " r i: . b
a /
k
ʃ
S voiceless
postalveolar
fricative
سمش
/ " S a m s / S
ʒ
Z voiced
postalveolar
fricative
تيكاج
/ Z a . " k e: t / S
ʔ
? glottal stop
ةسسؤم
/ m u . " ? a s .
s a . s a /
ʕ
?\ voiced
pharyngeal
fricative
ماع
/ " ?\ a: m m / k
ʤ
dZ voiced
postalveolar
affricate
ةعماج
/ " dZ a: m . ?\
a /
S
θ T voiced
interdental
fricative
ةثالث
/ T a . " l a: . T
a /
T
ɦ
h voiced glottal
fricative
لاله
/ " h l a: l / k
Vowels
Arabic (Gulf) (ar-AE) 44
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Pronunciation Viseme
æ a mid-open
front
unrounded
short vowel
رفس
/ " s a . f a r / a
ɑˤ
A_?\ pharyngea
lised open
back
unrounded
short vowel
بلص
/ " s_?\ A_?\ l
b /
a
æː
a: mid-open
front
unrounded
long vowel
باب
/ " b a: b / a
ɑˤː
A_?\: pharyngea
lised open
back
unrounded
long vowel
جضان
/ " n A_?: . D_?
\ i_?\ dZ /
a
a A open central
unrounded
short vowel
wifi / " w A j . f A j / a
i i tense
close front
unrounded
short vowel
(MSA)
قاحسإ
/ ? i s . " X\ A_?
\: q /
i
ɪ
I lax close front
unrounded
short vowel
تنب
/ " b I n t / i
Arabic (Gulf) (ar-AE) 45
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Pronunciation Viseme
i_?\ pharyngea
lised close
front
unrounded
short vowel
لفط
/ " t_?\ i_?\ f I
l /
i
close front
unrounded
long vowel
ليبس
/ s a . " b i: l / i
iˤː
i_?: pharyngea
lised close
front
unrounded
long vowel
بيطر
/ r A_?\ . " t_?\
i_?: b /
i
u u tense close
back rounded
short vowel
(MSA)
عرتخم
/ " m u x . t a .
r i ?\ /
u
ʊ
U lax close back
rounded short
vowel
موسر
/ r U . " s u: m / u
u_?\ pharyngea
lised close
back rounded
short vowel
روفصع
/ ?\ u_?\ s_?\ .
" f u: r /
u
u: u: close back
rounded long
vowel
توت
/ " t u: t / u
Arabic (Gulf) (ar-AE) 46
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Pronunciation Viseme
uˤː
u_?\: pharyngea
lised close
back rounded
long vowel
روص
/ " s_?\ u_?\:
r /
u
e e mid front
unrounded
short vowel
تِنْرَتْنِإ
/ " s e n t / e
e: e: mid front
unrounded
long vowel
شيإ
/ " ? e: S / e
ɔ
O open-mid back
rounded short
vowel
رالود
/ d O . " l A r / O
ɔː
O: open-mid back
rounded long
vowel
نول
/ " l O: n / O
Catalan (ca-ES)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Catalan voice of Arlet that is supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
p p voiceless bilabial
plosive
ploure p
Catalan (ca-ES) 47
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t t voiceless alveolar
plosive
Tarragona t
k k voiceless velar
plosive
com k
b b voiced bilabial
plosive
bata p
d d voiced alveolar
plosive
endoll t
g g voiced velar
plosive
gros k
m m voiced bilabial
nasal
manera p
n n voiced alveolar
nasal
donar t
ɲ
J voiced palatal
nasal
any J
ŋ N voiced velar nasal pingüí k
ɫ
5 voiced velarized
alveolar lateral
approximant (dark
l)
albercoc l
ʎ
L voiced palatal
lateral approxima
nt
llop J
r r voiced alveolar trill parra r
ɾ
4 voiced alveolar tap para t
Catalan (ca-ES) 48
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
f f voiceless labiodent
al fricative
èmfasi f
s s voiceless alveolar
fricative
sac s
z z voiced alveolar
fricative
calzes s
ʃ
S voiceless postalveo
lar fricative
guix S
ʒ
Z voiced postalveo
lar fricative
col·legi S
t͡ʃ
tS voiceless postalveo
lar affricate
cotxe S
d͡ʒ
dZ voiced postalveo
lar affricate
platja S
β B voiced bilabial
approximant
obert B
ð D voiced dental
approximant
bedoll T
j j voiced palatal
approximant
noia i
ɣ
G voiced velar
approximant
pega k
v v voiced labiodental
fricative
af f
w w voiced labiovelar
approximant
aigua u
Catalan (ca-ES) 49
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
x x voiceless velar
fricative
Jiménez k
ʝ
j\ voiced palatal
fricative
yeso J
l l voiced alveolar
lateral approxima
nt
alondra t
θ T voiceless dental
fricative
González T
Vowels
a a open back vowel casa a
e e close-mid front
unrounded vowel
llenya e
ɛ
E open-mid front
unrounded vowel
xec E
i i closed front
unrounded vowel
visca i
o o close-mid back
rounded vowel
gos o
ɔ
O open-mid back
rounded vowel
joc O
u u closed back
rounded vowel
un u
ə
@ mid-central vowel casa @
Additional Symbols
Catalan (ca-ES) 50
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Chinese (Cantonese) (yue-CN)
The following table lists the Jyutping and International Phonetic Alphabet (IPA) phonemes for
the Cantonese voice that is supported by Amazon Polly. Jyutping is a romanization system of
Cantonese which is commonly used in academia and among Cantonese speakers. IPA and X-SAMPA
are not commonly used but are available for English support. The IPA and X-SAMPA symbols in the
table are for reference only and should not be used for Chinese transcription. Jyutping examples
and the corresponding visemes are also shown.
To make Amazon Polly use phonetic pronunciation with Jyutping, use the phoneme
alphabet="x-amazon-jyutping"tag.
The following examples show this with each standard.
Jyutping:
<speak>
## <phoneme alphabet="x-amazon-jyutping" ph="sing2">#</phoneme>#
## <phoneme alphabet="x-amazon-jyutping" ph="seng2">#</phoneme>#
</speak>
IPA:
<speak>
## <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>#
## <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>#
</speak>
X-SAMPA:
<speak>
Chinese (Cantonese) (yue-CN) 51
Amazon Polly Developer Guide
## <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>#
## <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>#
</speak>
Note
Amazon Polly accepts Cantonese input encoded in UTF-8 only.
Phoneme/Viseme Table
Jyutping IPA X-
SAMPA
Description Jyutping
Example
Viseme
Consonants
b p p voiceless bilabial plosive
巴,
baa1 p
c
tsʰ
ts_h aspirated voiceless
alveolar affricate
叉,
caa1 s
d t t voiceless alveolar
plosive
打,
daa2 t
f f f voiceless labiodental
fricative
花,
faa1 f
g k k voiceless velar plosive
家,
gaa1 k
gw
k_w labialized voiceless velar
plosive
瓜,
gwaa1 u
h h h voiceless glottal
fricative
哈,
haa1 k
k
k_h aspirated voiceless velar
plosive
卡,
kaa1 k
kw
kʷʰ
k_wh labialized aspirated
voiceless velar plosive
誇,
kwaa1 u
Chinese (Cantonese) (yue-CN) 52
Amazon Polly Developer Guide
Jyutping IPA X-
SAMPA
Description Jyutping
Example
Viseme
l l l alveolar lateral
approximant
啦,
laa1 t
m m m bilabial nasal
媽,
maa1 p
m m m= syllabic bilabial nasal
唔,
m4 p
ng ŋ N velar nasal
牙,
ngaa4 k
ng ŋ N= syllabic velar nasal
吳,
ng4 k
n n n alveolar nasal
拿,
naa4 t
p
p_h aspirated voiceless
bilabial plosive
趴,
paa1 p
s s s voiceless alveolar
fricative
沙,
saa1 s
t
t_h aspirated voiceless
alveolar plosive
他,
taa1 t
w w w labio-velar approximant
娃,
waa1 u
y j j palatal approximant
也,
jaa5 i
z ts ts voiceless alveolar
affricate
渣,
zaa1 s
Vowels
a
ɐ
6 near-open central vowel
吉,
gat1 a
aa
ɑ
A open back unrounded
vowel
家,
gaa1 a
aai
ɑi
Ai dipthong
街,
gaai1 a
Chinese (Cantonese) (yue-CN) 53
Amazon Polly Developer Guide
Jyutping IPA X-
SAMPA
Description Jyutping
Example
Viseme
aau
ɑu
Au dipthong
交,
gaau1 a
ai
ɐi
6i dipthong
雞,
gai1 a
au
ɐu
6u dipthong
溝,
kau1 a
e
ɛ
E open-mid front
unrounded vowel
爹,
de1 E
ei ei ei dipthong
基,
gei1 e
eo
ɵ
8 close-mid central
rounded vowel
春,
ceon1 o
eoi
ɵy
8y diphthong
居,
geoi1 o
eu
ɛu
Eu diphthong
in
掉垃圾,
deu6
E
i i i close front unrounded
vowel
斯,
si1 i
i I l near-close near-front
unrounded vowel
激,
gik1 i
iu iu iu diphthong
驕,
giu1 i
o
ɔ
O open-mid back rounded
vowel
哥,
go1 O
oe œ 9 open-mid front rounded
vowel
鋸,
goe3 O
oi
ɔi
Oi dipthong
該,
goi1 O
ou ou ou dipthong
高,
gou1 o
Chinese (Cantonese) (yue-CN) 54
Amazon Polly Developer Guide
Jyutping IPA X-
SAMPA
Description Jyutping
Example
Viseme
u u u close back rounded
vowel
姑,
gu1 u
u
ʊ
U near-close near-back
rounded vowel
谷,
guk5 u
ui ui ui dipthong
攰,
gui6 u
yu y y close front rounded
vowel
於,
jyu1 u
Tone marks and Additional Symbols
1 high level
詩,
si1
2 medium rising
史,
si2
3 medium level
試,
si3
4 very low level
時,
si4
5 low rising
市,
si5
6 low level
是,
si6
- . . syllable boundary
語音
jyu5-
jam1
Chinese (Mandarin) (cmn-CN)
The following table lists the Pinyin and International Phonetic Alphabet (IPA) phonemes for the
Mandarin Chinese voice that is supported by Amazon Polly. Pinyin is the international standard for
Standard Chinese romanization. IPA and X-SAMPA are not commonly used but are available for
English support. The IPA and X-SAMPA symbols in the table are for reference only and should not
be used for Chinese transcription. Pinyin examples and the corresponding visemes are also shown.
Chinese (Mandarin) (cmn-CN) 55
Amazon Polly Developer Guide
To make Amazon Polly use phonetic pronunciation with Pinyin, use the phoneme alphabet="x-
amazon-phonetic standard used" tag.
The following examples show this with each standard.
Pinyin:
<speak>
## <phoneme alphabet="x-amazon-pinyin" ph="bo2">#</phoneme>#
## <phoneme alphabet="x-amazon-pinyin" ph="bao2">#</phoneme>#
</speak>
IPA:
<speak>
## <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>#
## <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>#
</speak>
X-SAMPA:
<speak>
## <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>#
## <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>#
</speak>
Note
Amazon Polly accepts Mandarin Chinese input encoded in UTF-8 only. The GB 18030
encoding standard is not currently supported by Amazon Polly.
Phoneme/Viseme Table
Pinyin IPA X-
SAMPA
Description Pinyin
Example
Viseme
Consonants
Chinese (Mandarin) (cmn-CN) 56
Amazon Polly Developer Guide
Pinyin IPA X-
SAMPA
Description Pinyin
Example
Viseme
f f f voiceless labiodental
fricative
发,
fa1 f
h h h voiceless glottal
fricative
和,
he2 k
g k k voiceless velar plosive
古,
gu3 k
k
k_h aspirated voiceless velar
plosive
苦,
ku3 k
l l l alveolar lateral
approximant
拉,
la1 t
m m m bilabial nasal
骂,
ma4 p
n n n alveolar nasal
那,
na4 t
ng ŋ N velar nasal
正,
zheng4 k
b p p voiceless bilabial plosive
爸,
ba4 p
p
p_h aspirated voiceless
bilabial plosive
怕,
pa4 p
s s s voiceless alveolar
fricative
四,
si4 s
x
ɕ
s\ voiceless alveolo-palatal
fricative
西,
xi1 J
sh
ʂ
s` voiceless retroflex
fricative
是,
shi4 S
d t t voiceless alveolar
plosive
打,
da3 t
Chinese (Mandarin) (cmn-CN) 57
Amazon Polly Developer Guide
Pinyin IPA X-
SAMPA
Description Pinyin
Example
Viseme
t
t_h aspirated voiceless
alveolar plosive
他,
ta1 t
zh
ʈ͡ʂ
t`s` voiceless retroflex
affricate
之,
zhi1 S
ch
ʈ͡ʂʰ
t`s`_h aspirated voiceless
retroflex affricate
吃,
chi1 S
s
t͡s
ts voiceless alveolar
affricate
字,
zi4 s
j
t͡ɕ
ts\ voiceless alveolo-palatal
affricate
鸡,
ji1 J
q
t͡ɕʰ
ts\_h aspirated voiceless
alveolo-palatal affricate
七,
qi1 J
c
t͡sʰ
ts_h aspirated voiceless
alveolar affricate
次,
ci4 s
w w w labio-velar approximant
我,
wo3 u
r
ʐ
z` voiced retroflex fricative
日,
ri4 S
"er" and "r" colored syllables
er
ɚ
@` r-coloured mid central
vowel
二,
er4 @
-r r-colored syllable
馅儿,
xianr4 @
Vowels
e
ɤ
7 close-mid back
unrounded vowel
恶,
e4 e
Chinese (Mandarin) (cmn-CN) 58
Amazon Polly Developer Guide
Pinyin IPA X-
SAMPA
Description Pinyin
Example
Viseme
e
ə
@ mid central vowel
恩,
en1 @
a a a open front unrounded
vowel
安,
an1 a
ai
aI diphthong
爱,
ai4 a
ao
aU diphthong
奥,
ao4 a
ei
e diphthong
诶,
ei4 e
e
ɛ
E open-mid front
unrounded vowel
姐,
jie3 E
i i i close front unrounded
vowel
鸡,
ji1 i
ou
oU diphthong
欧,
ou1 o
o
ɔ
O open-mid back rounded
vowel
哦,
o4 o
u u u close back rounded
vowel
主,
zhu3 u
yu y y close front rounded
vowel
于,
yu2 u
Tone marks and Additional Symbols
1 high level tone
淤,
yu1
2 rising tone
鱼,
yu2
3 low (falling-rising) tone
语,
yu3
4 falling tone
育,
yu4
Chinese (Mandarin) (cmn-CN) 59
Amazon Polly Developer Guide
Pinyin IPA X-
SAMPA
Description Pinyin
Example
Viseme
0 neutral tone
的,
de0
- . . syllable boundary
语音
yu3-yin1
Danish (da-DK)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Danish voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bat p
d d voiced alveolar
plosive
da t
ð D voiced dental
fricative
mad, thriller T
f f voiceless labiodent
al fricative
fat f
g g voiced velar
plosive
gat k
h h voiceless glottal
fricative
hat k
j j palatal approxima
nt
jo i
Danish (da-DK) 60
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
k k voiceless velar
plosive
kat k
l l alveolar lateral
approximant
ladt t
m m bilabial nasal mat p
n n alveolar nasal nay t
ŋ N velar nasal lang k
p p voiceless bilabial
plosive
pande p
r r alveolar trill thriller, story r
ʁ
R voiced uvular
fricative
rat k
s s voiceless alveolar
fricative
sat s
t t voiceless alveolar
plosive
tal t
v v voiced labiodental
fricative
vat f
w w labial-velar
approximant
hav, weekend u
Vowels
ø 2 close-mid front
rounded vowel
øst o
Danish (da-DK) 61
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ø: 2: long close-mid
front rounded
vowel
øse o
ɐ
6 near-open central
vowel
mor a
œ 9 open-mid front
rounded vowel
skøn, grønt O
œ: 9: long open-mid
front rounded
vowel
høne, gøre O
ə
@ mid central vowel ane @
æː
{: long near-open
front unrounded
vowel
male a
a a open front
unrounded vowel
man a
æ { near-open front
unrounded vowel
adresse a
ɑ
A open back
unrounded vowel
lak, tak a
ɑ:
A: long open back
unrounded vowel
rase a
e e close-mid front
unrounded vowel
midt e
Danish (da-DK) 62
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
e: e: long close-mid
front unrounded
vowel
mele e
ɛ
E open-mid front
unrounded vowel
mæt E
ɛ:
E: long open-mid
front unrounded
vowel
mæle E
i i close front
unrounded vowel
mit i
i: i: long close front
unrounded vowel
mile i
o o close-mid back
rounded vowel
foto o
o: o: long close-mid
back rounded
vowel
mole o
ɔ
O open-mid back
rounded vowel
mund O
ɔ:
O: long open-mid
back rounded
vowel
måle O
ɒː
Q: long open back
rounded vowel
morse O
u u close back
rounded vowel
lusk u
Danish (da-DK) 63
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
u: u: long close back
rounded vowel
mule u
ʌ
V open-mid back
unrounded
kører E
y y close front
rounded vowel
yt u
y: y: long close front
rounded vowel
hyle u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Dutch (Belgian) (nl-BE)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Belgian Dutch (Flemish) voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bak p
d d voiced alveolar
plosive
dak t
Dutch (Belgian) (nl-BE) 64
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
d͡ʒ
dZ voiced postalveo
lar affricate
manager S
f f voiceless labiodent
al fricative
fel f
g g voiced velar
plosive
goal k
ɣ
G voiced velar
fricative
hoed k
ɦ
h\ voiced glottal
fricative
hand k
j j palatal approxima
nt
ja i
k k voiceless velar
plosive
kap k
l l alveolar lateral
approximant
land t
m m bilabial nasal met p
n n alveolar nasal net t
ŋ N velar nasal bang k
p p voiceless bilabial
plosive
pak p
r r alveolar trill rand r
s s voiceless alveolar
fricative
sein s
Dutch (Belgian) (nl-BE) 65
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʃ
S voiceless postalveo
lar fricative
show S
t t voiceless alveolar
plosive
tak t
v v voiced labiodental
fricative
vel f
ʋ
v\ labiodental
approximant
wit f
x x voiceless velar
fricative
toch k
z z voiced alveolar
fricative
ziin s
ʒ
Z voiced postalveo
lar fricative
bagage S
Vowels
øː
2: long close-mid
front rounded
vowel
neus o
œy 9y dipthong buit O
ə
@ mid central vowel de @
a: a: long open front
unrounded vowel
baad a
ɑ:
A open back
unrounded vowel
bad a
Dutch (Belgian) (nl-BE) 66
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
e: e: long close-mid
front unrounded
vowel
beet e
ɜː
3: long open-mid
central unrounded
vowel
barrière E
ɛ
E open-mid front
unrounded vowel
bed E
ɛi
Ei dipthong beet E
i i close front
unrounded vowel
vier i
ɪ
I near-close near-
front unrounded
vowel
pit i
o: o: long close-mid
back rounded
vowel
boot o
ɔ
O open-mid back
rounded vowel
pot O
u u close back
rounded vowel
hoed u
ʌu
Vu dipthong fout E
y: long close front
rounded vowel
fuut u
Dutch (Belgian) (nl-BE) 67
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʏ
Y near-close near-
front rounded
vowel
hut u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Dutch (nl-NL)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Dutch voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bak p
d d voiced alveolar
plosive
dak t
d͡ʒ
dZ voiced postalveo
lar affricate
manager S
f f voiceless labiodent
al fricative
fel f
Dutch (nl-NL) 68
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
g g voiced velar
plosive
goal k
ɣ
G voiced velar
fricative
hoed k
ɦ
h\ voiced glottal
fricative
hand k
j j palatal approxima
nt
ja i
k k voiceless velar
plosive
kap k
l l alveolar lateral
approximant
land t
m m bilabial nasal met p
n n alveolar nasal net t
ŋ N velar nasal bang k
p p voiceless bilabial
plosive
pak p
r r alveolar trill rand r
s s voiceless alveolar
fricative
sein s
ʃ
S voiceless postalveo
lar fricative
show S
t t voiceless alveolar
plosive
tak t
Dutch (nl-NL) 69
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
v v voiced labiodental
fricative
vel f
ʋ
v\ labiodental
approximant
wit f
x x voiceless velar
fricative
toch k
z z voiced alveolar
fricative
ziin s
ʒ
Z voiced postalveo
lar fricative
bagage S
Vowels
øː
2: long close-mid
front rounded
vowel
neus o
œy 9y dipthong buit O
ə
@ mid central vowel de @
a: a: long open front
unrounded vowel
baad a
ɑ:
A open back
unrounded vowel
bad a
e: e: long close-mid
front unrounded
vowel
beet e
ɜː
3: long open-mid
central unrounded
vowel
barrière E
Dutch (nl-NL) 70
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɛ
E open-mid front
unrounded vowel
bed E
ɛi
Ei dipthong beet E
i i close front
unrounded vowel
vier i
ɪ
I near-close near-
front unrounded
vowel
pit i
o: o: long close-mid
back rounded
vowel
boot o
ɔ
O open-mid back
rounded vowel
pot O
u u close back
rounded vowel
hoed u
ʌu
Vu dipthong fout E
y: long close front
rounded vowel
fuut u
ʏ
Y near-close near-
front rounded
vowel
hut u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Dutch (nl-NL) 71
Amazon Polly Developer Guide
English (US) (en-US)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the American English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
ɡ
g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
English (US) (en-US) 72
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
l l alveolar lateral
approximant
lay l
m m bilabial nasal mouse p
n n alveolar nasal nap t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
speak p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
trap t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
z z voiced alveolar
fricative
zero s
English (US) (en-US) 73
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid-central vowel arena @
ɚ
@` mid-central r-
colored vowel
reader @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑ
A long open-back
unrounded vowel
father a
eI diphthong face e
ɝ
3` open mid-centr
al unrounded r-
colored vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
i i long close front
unrounded vowel
eece i
ɪ
I near-close near-
front unrounded
vowel
kit i
oU diphthong goat o
English (US) (en-US) 74
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɔ
O long open mid-
back rounded
vowel
thought O
ɔɪ
OI diphthong choice O
u u long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʌ
V open-mid-back
unrounded vowel
strut E
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
English (Australian) (en-AU)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Australian English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
English (Australian) (en-AU) 75
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
g g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
l l alveolar lateral
approximant
lay t
l= syllabic alveolar
lateral approxima
nt
battle t
m m bilabial nasal mouse p
m= syllabic bilabial
nasal
anthem p
English (Australian) (en-AU) 76
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
n n alveolar nasal nap t
n= syllabic alveolar
nasal
button t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
pin p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
task t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
Θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
z z voiced alveolar
fricative
zero s
English (Australian) (en-AU) 77
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid central vowel arena @
əʊ
@U diphthong goat @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑː
A: long open-back
unrounded vowel
father a
eI diphthong face e
ɜː
3: long open mid-
central unrounded
vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
ɛə
E@ diphthong square E
i: i long close front
unrounded vowel
eece i
ɪ
I near-close near-
front unrounded
vowel
kit i
ɪə
I@ diphthong near i
English (Australian) (en-AU) 78
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɔː
OI long open-mid
back rounded
vowel
thought O
ɔɪ
OI Diphthong choice O
ɒ
Q open back
rounded vowel
lot O
u: u: long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʊə
U@ diphthong cure u
ʌ
V Open-mid-back
unrounded vowel
strut E
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
English (British) (en-GB)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the British English voices that are supported by Amazon Polly.
English (British) (en-GB) 79
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
g g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
l l alveolar lateral
approximant
lay t
l= syllabic alveolar
lateral approxima
nt
battle t
English (British) (en-GB) 80
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
m m bilabial nasal mouse p
m= syllabic bilabial
nasal
anthem p
n n alveolar nasal nap t
n= syllabic alveolar
nasal
button t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
pin p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
task t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
Θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
English (British) (en-GB) 81
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
z z voiced alveolar
fricative
zero s
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid central vowel arena @
əʊ
@U diphthong goat @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑː
A: long open-back
unrounded vowel
father a
eI diphthong face e
ɜː
3: long open mid-
central unrounded
vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
ɛə
E@ diphthong square E
i: i long close front
unrounded vowel
eece i
English (British) (en-GB) 82
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɪ
I near-close near-
front unrounded
vowel
kit i
ɪə
I@ diphthong near i
ɔː
O: long open-mid
back rounded
vowel
thought O
ɔɪ
OI Diphthong choice O
ɒ
Q open back
rounded vowel
lot O
u: u: long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʊə
U@ diphthong cure u
ʌ
V Open-mid-back
unrounded vowel
strut E
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
English (British) (en-GB) 83
Amazon Polly Developer Guide
English (Indian) (en-IN)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Indian English voice supported by Amazon Polly.
For additional phonemes used in conjunction with Indian English, see Hindi (hi-IN).
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
g g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
English (Indian) (en-IN) 84
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
l l alveolar lateral
approximant
lay t
l= syllabic alveolar
lateral approxima
nt
battle t
m m bilabial nasal mouse p
m= syllabic bilabial
nasal
anthem p
n n alveolar nasal nap t
n= syllabic alveolar
nasal
nap t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
pin p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
task t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
English (Indian) (en-IN) 85
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
z z voiced alveolar
fricative
zero s
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid central vowel arena @
əʊ
@U diphthong goat @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑː
A: long open-back
unrounded vowel
father a
eI diphthong face e
ɜː
3: long open mid-
central unrounded
vowel
nurse E
English (Indian) (en-IN) 86
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɛ
E open mid-front
unrounded vowel
dress E
ɛə
E@ diphthong square E
i: i long close front
unrounded vowel
eece i
ɪ
I near-close near-
front unrounded
vowel
kit i
ɪə
I@ diphthong near i
ɔː
OI long open-mid
back rounded
vowel
thought O
ɔɪ
OI Diphthong choice O
ɒ
Q open back
rounded vowel
lot O
u: u: long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʊə
U@ diphthong cure u
ʌ
V Open-mid-back
unrounded vowel
strut E
Additional Symbols
ˈ
" primary stress Alabama
English (Indian) (en-IN) 87
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
English (Ireland) (en-IE)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Irish English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
ɡ
g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
English (Ireland) (en-IE) 88
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
l l alveolar lateral
approximant
lay t
m m bilabial nasal mouse p
n n alveolar nasal nap t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
speak p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
trap t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
English (Ireland) (en-IE) 89
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
w w labial-velar
approximant
west u
z z voiced alveolar
fricative
zero s
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid-central vowel arena @
ɚ
@` mid-central r-
colored vowel
reader @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑ
A long open-back
unrounded vowel
father a
eI diphthong face e
ɝ
3` open mid-centr
al unrounded r-
colored vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
i i long close front
unrounded vowel
eece i
English (Ireland) (en-IE) 90
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɪ
I near-close near-
front unrounded
vowel
kit i
oU diphthong goat o
ɔ
O long open mid-
back rounded
vowel
thought O
ɔɪ
OI diphthong choice O
u u long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʌ
V open-mid-back
unrounded vowel
strut E
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
English (New Zealand) (en-NZ)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the New Zealand English voices that are supported by Amazon Polly.
English (New Zealand) (en-NZ) 91
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
g g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
l l alveolar lateral
approximant
lay t
l= syllabic alveolar
lateral approxima
nt
battle t
English (New Zealand) (en-NZ) 92
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
m m bilabial nasal mouse p
m= syllabic bilabial
nasal
anthem p
n n alveolar nasal nap t
n= syllabic alveolar
nasal
button t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
pin p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
task t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
Θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
English (New Zealand) (en-NZ) 93
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
z z voiced alveolar
fricative
zero s
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid central vowel arena @
əʊ
@U diphthong goat @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑː
A: long open-back
unrounded vowel
father a
eI diphthong face e
ɜː
3: long open mid-
central unrounded
vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
ɛə
E@ diphthong square E
i: i long close front
unrounded vowel
eece i
English (New Zealand) (en-NZ) 94
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɪ
I near-close near-
front unrounded
vowel
kit i
ɪə
I@ diphthong near i
ɔː
O: long open-mid
back rounded
vowel
thought O
ɔɪ
OI Diphthong choice O
ɒ
Q open back
rounded vowel
lot O
u: u: long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʊə
U@ diphthong cure u
ʌ
V Open-mid-back
unrounded vowel
strut E
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
The Aria voice speaks New Zealand English and offers limited support for Maori. It can pronounce
the following Maori words and phrases. The Maori phrases are case-sensitive.
English (New Zealand) (en-NZ) 95
Amazon Polly Developer Guide
English Maori
Hello/cheers Kia ora
Welcome (to) Nau mai (ki)
Hello (one person)/thank you Tēnā koe
Hello (three or more people)/thank you Tēnā koutou
Good morning Ata mārie
Good morning Mōrena
Thank you Ngā mihi
Take care Ngā manaakitanga
See you Ka kite
See you later Mā te wā
Have a good day Kia pai tō rā
Merry Christmas Meri Kirihimete
Maori Māori
Maori language te reo Māori
Maori language week Te wiki o te reo Māori
New Zealand Aotearoa
Maori New Year Mātariki
Town in New Zealand / Waitangi Day is the
national day of New Zealand
Waitangi
One tahi
Two rua
English (New Zealand) (en-NZ) 96
Amazon Polly Developer Guide
English Maori
Three toru
Four whā
Five rima
Six ono
Seven whitu
Eight waru
Nine iwa
Ten tekau
Twenty rua tekau
Thirty Toru tekau
English (South African) (en-ZA)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the South African English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
English (South African) (en-ZA) 97
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
g g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
l l alveolar lateral
approximant
lay t
l= syllabic alveolar
lateral approxima
nt
battle t
ɬ̩
K voiceless lateral
fricative
umhlanga t
m m bilabial nasal mouse p
m= syllabic bilabial
nasal
anthem p
n n alveolar nasal nap t
English (South African) (en-ZA) 98
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
n= syllabic alveolar
nasal
button t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
pin p
ɹ
r\ alveolar approxima
nt
red r
r r alveolar trill pareis r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
t t voiceless alveolar
plosive
task t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
Θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
x x voiceless velar
fricative
gauteng k
English (South African) (en-ZA) 99
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
z z voiced alveolar
fricative
zero s
! !\ post-alveolar click gqeberha k
| |\ dental click ncube t
|| ||\ lateral click xhosa t
Vowels
ə
@ mid central vowel arena @
əi
@i diphthong nelspruit i
əʊ
@U diphthong goat @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑː
A: long open-back
unrounded vowel
father a
eI diphthong face e
ɜː
3: long open mid-
central unrounded
vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
ɛə
E@ diphthong square E
English (South African) (en-ZA) 100
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
i: i long close front
unrounded vowel
eece i
I@ diphthong du preez i
ɪ
I near-close near-
front unrounded
vowel
kit i
ɪə
I@ diphthong near i
ɔː
O: long open-mid
back rounded
vowel
thought O
ɔɪ
OI Diphthong choice O
ɒ
Q open back
rounded vowel
lot O
u: u: long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʊə
U@ diphthong cure u
ʌ
V Open-mid-back
unrounded vowel
strut E
y y close front
rounded vowel
van vuuren u
Additional Symbols
ˈ
" primary stress Alabama
English (South African) (en-ZA) 101
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
English (Welsh) (en-GB-WLS)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Welsh English voice supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bed p
d d voiced alveolar
plosive
dig t
d͡ʒ
dZ voiced postalveo
lar affricate
jump S
ð D voiced dental
fricative
then T
f f voiceless labiodent
al fricative
five f
g g voiced velar
plosive
game k
h h voiceless glottal
fricative
house k
English (Welsh) (en-GB-WLS) 102
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
j j palatal approxima
nt
yes i
k k voiceless velar
plosive
cat k
l l alveolar lateral
approximant
lay t
l= syllabic alveolar
lateral approxima
nt
battle t
m m bilabial nasal mouse p
m= syllabic bilabial
nasal
anthem p
n n alveolar nasal nap t
n= syllabic alveolar
nasal
nap t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive
pin p
ɹ
r\ alveolar approxima
nt
red r
s s voiceless alveolar
fricative
seem s
ʃ
S voiceless postalveo
lar fricative
ship S
English (Welsh) (en-GB-WLS) 103
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t t voiceless alveolar
plosive
task t
t͡ʃ
tS voiceless postalveo
lar affricate
chart S
Θ T voiceless dental
fricative
thin T
v v voiced labiodental
fricative
vest f
w w labial-velar
approximant
west u
z z voiced alveolar
fricative
zero s
ʒ
Z voiced postalveo
lar fricative
vision S
Vowels
ə
@ mid central vowel arena @
əʊ
@U diphthong goat @
æ { near open-front
unrounded vowel
trap a
aI diphthong price a
aU diphthong mouth a
ɑː
A: long open-back
unrounded vowel
father a
eI diphthong face e
English (Welsh) (en-GB-WLS) 104
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɜː
3: long open mid-
central unrounded
vowel
nurse E
ɛ
E open mid-front
unrounded vowel
dress E
ɛə
E@ diphthong square E
i: i long close front
unrounded vowel
eece i
ɪ
I near-close near-
front unrounded
vowel
kit i
ɪə
I@ diphthong near i
ɔː
OI long open-mid
back rounded
vowel
thought O
ɔɪ
OI Diphthong choice O
ɒ
Q open back
rounded vowel
lot O
u: u: long close-back
rounded vowel
goose u
ʊ
U near-close near-
back rounded
vowel
foot u
ʊə
U@ diphthong cure u
ʌ
V Open-mid-back
unrounded vowel
strut E
English (Welsh) (en-GB-WLS) 105
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Finnish (fi-FI)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Finnish voice that is supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Finnish consonants
p p voiceless bilabial
plosive
[p]ankki p
t t voiceless alveolar
plosive
[t]alo t
k k voiceless velar
plosive
[k]aali k
d d voiced alveolar
plosive
[d]ata t
s s voiceless alveolar
fricative
[s]ali s
h h voiceless glottal
fricative
[h]attu k
Finnish (fi-FI) 106
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʋ
v\ voiced labiodental
approximant
[v]aivَa
v
j j palatal approxima
nt
[j]oki i
l l alveolar lateral
approximant
[l]oma t
r r voiced alveolar trill [r]iita r
m m bilabial nasal [m]ato p
n n alveolar nasal [n]enäa t
ŋ N velar nasal he[n]ki k
Consonants found in loanwords
b b voiced bilabial
plosive
[b]ussi p
f f voiceless labiodent
al fricative
[f]irma v
w w labial-velar
approximant
[w]iki u
z z voiced alveolar
fricative
[z]ulu s
g g voiced velar
plosive
[g]aala k
ʃ
S voiceless postalveo
lar fricative
[sh]akki S
Finnish (fi-FI) 107
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʒ
Z voiced postalveo
lar fricative
[g]enre S
θ T voiceless dental
fricative
ear[th] T
ð D voiced dental
fricative
ei[th]er T
Short vowels
i i close front
unrounded vowel
k[i]lo i
ɛ
E open mid-front
unrounded vowel
k[e]sä E
æ { near open-front
unrounded vowel
k[ä]ly A
y y close front
rounded vowel
k[y]lä u
ø 2 close mid-front
rounded vowel
p[ö]ly O
u u close back
rounded vowel
k[u]lo u
ɔ
O open mid-back
rounded vowel
k[o]lo O
ɑ
A open back
unrounded vowel
k[a]la A
Long vowels
Finnish (fi-FI) 108
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
long close front
unrounded vowel
s[ii]li i
ɛː
E: long open mid-
front unrounded
vowel
[ee]tu E
æː
{: long near open-
front unrounded
vowel
t[ää]llä A
y: y: long close front
unrounded vowel
t[yy]li u
øː
2: long close mid-
front rounded
vowel
t[öö]lö O
u: u: long close back
rounded vowel
t[uu]li u
ɔː
O: long open mid-
back rounded
vowel
r[oo]li O
ɑː
A: long open back
unrounded vowel
k[aa]su A
Dipthongs
ɛi
Ei dipthong l[ei]pä E
æi {i dipthong [äi]ti A
ui ui dipthong k[ui]n u
ɑi
Ai dipthong k[ai]kki A
Finnish (fi-FI) 109
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɔi
Oi dipthong p[oi]ka O
øi 2i dipthong s[öi]n O
yi yi dipthong l[yi]jy u
ɑu
Au dipthong s[au]na A
ɔu
Ou dipthong k[ou]lu O
ɛu
Eu dipthong r[eu]na E
iu iu dipthong v[iu]lu i
æy {y dipthong t[äy]nnä A
øy 2y dipthong k[öy]hä O
ɛy
Ey dipthong pes[ey]t E
iy iy dipthong käär[iy]tyä i
iE dipthong t[ie] i
y2 dipthong [yö] u
uO dipthong t[uo] u
Vowels found in English loanwords
ɪ
I near-close near-
front unrounded
vowel
b[i]t i
ʊ
U near-close near-
back rounded
vowel
b[oo]k u
ə
@ mid-central vowel [a]bout @
Finnish (fi-FI) 110
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʌ
V open-mid-back
unrounded vowel
c[u]t E
French (fr-FR)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the French voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
boire p
d d voiced alveolar
plosive
madame t
f f voiceless labiodent
al fricative
femme f
g g voiced velar
plosive
grand k
ɥ
H labial-palatal
approximant
bruit u
j j palatal approxima
nt
meilleur i
k k voiceless velar
plosive
quatre k
French (fr-FR) 111
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
l l alveolar lateral
approximant
malade t
m m bilabial nasal maison p
n n alveolar nasal astronome t
ɲ
J palatal nasal baigner J
ŋ N velar nasal parking k
p p voiceless bilabial
plosive
pomme p
ʁ
R voiced uvular
fricative
amoureux k
s s voiceless alveolar
fricative
santé s
ʃ
S voiceless postalveo
lar fricative
chat S
t t voiceless alveolar
plosive
téléphone t
v v voiced labiodental
fricative
vrai f
w w labial-velar
approximant
soir u
z z voiced alveolar
fricative
raison s
ʒ
Z voiced postalveo
lar fricative
aubergine S
French (fr-FR) 112
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Vowels
ø 2 close-mid front
rounded vowel
deux o
œ 9 open-mid front
rounded vowel
neuf O
œ̃ 9~ nasal open-mid
front rounded
vowel
brun O
ə
@ mid central vowel je @
a a open front
unrounded vowel
table a
ɑ̃
A~ nasal open back
unrounded vowel
camembert a
e e close-mid front
unrounded vowel
marché e
ɛ
E open-mid front
unrounded vowel
neige E
ɛ̃
E~ nasal open-mid
front unrounded
vowel
sapin E
i i close front
unrounded vowel
mille i
o o close-mid back
rounded vowel
hôpital o
ɔ
O open-mid back
rounded vowel
homme O
French (fr-FR) 113
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɔ̃
O~ nasal open-mid
back rounded
vowel
bon O
u u close back
rounded vowel
sous u
y y close front
rounded vowel
dur u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
French (Belgian) (fr-BE)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Belgian French voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
boire p
d d voiced alveolar
plosive
madame t
French (Belgian) (fr-BE) 114
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
f f voiceless labiodent
al fricative
femme f
g g voiced velar
plosive
grand k
ɥ
H labial-palatal
approximant
bruit u
j j palatal approxima
nt
meilleur i
k k voiceless velar
plosive
quatre k
l l alveolar lateral
approximant
malade t
m m bilabial nasal maison p
n n alveolar nasal astronome t
ɲ
J palatal nasal baigner J
ŋ N velar nasal parking k
p p voiceless bilabial
plosive
pomme p
ʁ
R voiced uvular
fricative
amoureux k
s s voiceless alveolar
fricative
santé s
ʃ
S voiceless postalveo
lar fricative
chat S
French (Belgian) (fr-BE) 115
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t t voiceless alveolar
plosive
téléphone t
v v voiced labiodental
fricative
vrai f
w w labial-velar
approximant
soir u
z z voiced alveolar
fricative
raison s
ʒ
Z voiced postalveo
lar fricative
aubergine S
Vowels
ø 2 close-mid front
rounded vowel
deux o
œ 9 open-mid front
rounded vowel
neuf O
œ̃ 9~ nasal open-mid
front rounded
vowel
brun O
ə
@ mid central vowel je @
a a open front
unrounded vowel
table a
ɑ̃
A~ nasal open back
unrounded vowel
camembert a
e e close-mid front
unrounded vowel
marché e
French (Belgian) (fr-BE) 116
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɛ
E open-mid front
unrounded vowel
neige E
ɛ̃
E~ nasal open-mid
front unrounded
vowel
sapin E
i i close front
unrounded vowel
mille i
o o close-mid back
rounded vowel
hôpital o
ɔ
O open-mid back
rounded vowel
homme O
ɔ̃
O~ nasal open-mid
back rounded
vowel
bon O
u u close back
rounded vowel
sous u
y y close front
rounded vowel
dur u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
French (Belgian) (fr-BE) 117
Amazon Polly Developer Guide
French (Canadian) (fr-CA)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the French Canadian voice supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
boire p
d d voiced alveolar
plosive
madame t
f f voiceless labiodent
al fricative
femme f
g g voiced velar
plosive
grand k
ɥ
H labial-palatal
approximant
bruit u
j j palatal approxima
nt
meilleur i
k k voiceless velar
plosive
quatre k
l l alveolar lateral
approximant
malade t
m m bilabial nasal maison p
n n alveolar nasal astronome t
ɲ
J palatal nasal baigner J
French (Canadian) (fr-CA) 118
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ŋ N velar nasal parking k
p p voiceless bilabial
plosive
pomme p
ʁ
R voiced uvular
fricative
amoureux k
s s voiceless alveolar
fricative
santé s
ʃ
S voiceless postalveo
lar fricative
chat S
t t voiceless alveolar
plosive
téléphone t
v v voiced labiodental
fricative
vrai f
w w labial-velar
approximant
soir u
z z voiced alveolar
fricative
raison s
ʒ
Z voiced postalveo
lar fricative
aubergine S
Vowels
ø 2 close-mid front
rounded vowel
deux o
œ 9 open-mid front
rounded vowel
neuf O
French (Canadian) (fr-CA) 119
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
œ̃ 9~ nasal open-mid
front rounded
vowel
brun O
ə
@ mid central vowel je @
a a open front
unrounded vowel
table a
ɑ̃
A~ nasal open back
unrounded vowel
camembert a
e e close-mid front
unrounded vowel
marché e
ɛ
E open-mid front
unrounded vowel
neige E
ɛ̃
E~ nasal open-mid
front unrounded
vowel
sapin E
i i close front
unrounded vowel
mille i
o o close-mid back
rounded vowel
hôpital o
ɔ
O open-mid back
rounded vowel
homme O
ɔ̃
O~ nasal open-mid
back rounded
vowel
bon O
u u close back
rounded vowel
sous u
French (Canadian) (fr-CA) 120
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
y y close front
rounded vowel
dur u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
German (de-DE)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the German voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ʔ
? glottal stop
b b voiced bilabial
plosive
Bier p
d d voiced alveolar
plosive
Dach t
ç C voiceless palatal
fricative
ich k
d͡ʒ
dZ voiced postalveo
lar affricate
Dschungel S
German (de-DE) 121
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
f f Voiceless labiodent
al fricative
Vogel f
g g Voiced velar
plosive
Gabel k
h h Voiceless glottal
fricative
Haus k
j j Voiceless glottal
fricative
jemand i
k k Voiceless velar
plosive
Kleid k
l l Alveolar lateral
approximant
Loch t
m m Bilabial nasal Milch p
n n Alveolar nasal Natur t
ŋ N Velar nasal klingen k
p p Voiceless bilabial
plosive
Park p
p͡f
pf Voiceless labiodent
al affricate
Apfel
ʀ
R Uvular trill Regen
s s voiceless alveolar
fricative
Messer s
ʃ
S Voiceless
postalveolar
fricative
Fischer S
German (de-DE) 122
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t t Voiceless alveolar
plosive
Topf T
t͡s
Ts Voiceless alveolar
affricate
Zahl
t͡ʃ
tS Voiceless
postalveolar
affricate
deutsch S
v v Voiced labiodental
fricative
Wasser f
x x Voiceless velar
fricative
kochen k
z z Voiced alveolar
fricative
See s
ʒ
Z Voiced postalveo
lar fricative
Orange S
Vowels
øː
2: long close-mid
front rounded
vowel
böse o
ɐ
6 near-open central
vowel
besser a
ɐ̯
6_^ non-syllabic near-
open central vowel
Klar a
œ 9 open-mid front
rounded vowel
können O
ə
@ mid central vowel Rede @
German (de-DE) 123
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
a a open front
unrounded vowel
Salz a
a: a: long open front
unrounded vowel
Sahne a
aI diphthong nein a
aU diphthong Augen a
ɑ̃
A~ nasal open back
unrounded vowel
Restaurant a
e: e: long close-mid
front unrounded
vowel
Rede e
ɛ
E open-mid front
unrounded vowel
Keller E
ɛ̃
E~ nasal open-mid
front unrounded
vowel
Terrain E
i: i: long close front
unrounded vowel
Lied i
ɪ
I near-close near-
front unrounded
vowel
bitte i
o: o: long close-mid
back rounded
vowel
Kohl o
ɔ
O open-mid back
rounded vowel
Koffer O
German (de-DE) 124
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɔ̃
O~ nasal open-mid
back rounded
vowel
Annonce O
ɔʏ
OY diphthong neu O
u: u: long close back
rounded vowel
Bruder u
ʊ
U near-close near-
back rounded
vowel
Wunder u
y: y: long close front
rounded vowel
kühl u
ʏ
Y near-close near-
front rounded
vowel
Küche u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
German (Austrian) (de-AT)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Austrian German voices that are supported by Amazon Polly.
German (Austrian) (de-AT) 125
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ʔ
? glottal stop
b b voiced bilabial
plosive
Bier p
d d voiced alveolar
plosive
Dach t
ç C voiceless palatal
fricative
ich k
d͡ʒ
dZ voiced postalveo
lar affricate
Dschungel S
f f Voiceless labiodent
al fricative
Vogel f
g g Voiced velar
plosive
Gabel k
h h Voiceless glottal
fricative
Haus k
j j Voiceless glottal
fricative
jemand i
k k Voiceless velar
plosive
Kleid k
l l Alveolar lateral
approximant
Loch t
m m Bilabial nasal Milch p
n n Alveolar nasal Natur t
German (Austrian) (de-AT) 126
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ŋ N Velar nasal klingen k
p p Voiceless bilabial
plosive
Park p
p͡f
pf Voiceless labiodent
al affricate
Apfel
ʀ
R Uvular trill Regen
s s voiceless alveolar
fricative
Messer s
ʃ
S Voiceless
postalveolar
fricative
Fischer S
t t Voiceless alveolar
plosive
Topf T
t͡s
Ts Voiceless alveolar
affricate
Zahl
t͡ʃ
tS Voiceless
postalveolar
affricate
deutsch S
v v Voiced labiodental
fricative
Wasser f
x x Voiceless velar
fricative
kochen k
z z Voiced alveolar
fricative
See s
ʒ
Z Voiced postalveo
lar fricative
Orange S
German (Austrian) (de-AT) 127
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Vowels
øː
2: long close-mid
front rounded
vowel
böse o
ɐ
6 near-open central
vowel
besser a
ɐ̯
6_^ non-syllabic near-
open central vowel
Klar a
œ 9 open-mid front
rounded vowel
können O
ə
@ mid central vowel Rede @
a a open front
unrounded vowel
Salz a
a: a: long open front
unrounded vowel
Sahne a
aI diphthong nein a
aU diphthong Augen a
ɑ̃
A~ nasal open back
unrounded vowel
Restaurant a
e: e: long close-mid
front unrounded
vowel
Rede e
ɛ
E open-mid front
unrounded vowel
Keller E
German (Austrian) (de-AT) 128
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɛ̃
E~ nasal open-mid
front unrounded
vowel
Terrain E
i: i: long close front
unrounded vowel
Lied i
ɪ
I near-close near-
front unrounded
vowel
bitte i
o: o: long close-mid
back rounded
vowel
Kohl o
ɔ
O open-mid back
rounded vowel
Koffer O
ɔ̃
O~ nasal open-mid
back rounded
vowel
Annonce O
ɔʏ
OY diphthong neu O
u: u: long close back
rounded vowel
Bruder u
ʊ
U near-close near-
back rounded
vowel
Wunder u
y: y: long close front
rounded vowel
kühl u
ʏ
Y near-close near-
front rounded
vowel
Küche u
German (Austrian) (de-AT) 129
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Hindi (hi-IN)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the phoneme's sound type for
the Hindi voices that are supported by Amazon Polly.
For additional phonemes used in conjunction with Hindi, see English (Indian) (en-IN).
Phoneme/Viseme Table
IPA X-SAMPA Description Example
Consonants
p_h voiceless aspirated
bilabial plosive
फू
(phool)
b_h voiced aspirated bilabial
plosive
भारी
(bhaari)
t_d voiceless dental plosive
तापमान
(taapmaan)
t̪ʰ
t_d_h voiceless aspirated
dental plosive
थोड़ा
(thoda)
d_d voiced dental plosive
दिल्ली
(dilli)
d̪ʱ
d_d_h voiced aspirated dental
plosive
धोबी
(dhobi)
Hindi (hi-IN) 130
Amazon Polly Developer Guide
IPA X-SAMPA Description Example
ʈ
t` voiceless retroflex plosive
टोरा
(katora)
ʈʰ
t`_h voiceless aspirated
retroflex plosive
ठं
(thand)
ɖ
d` voiced retroflex plosive
(darr)
ɖʱ
d`_h voiced aspirated retroflex
plosive
ढा
(dhal)
tʃʰ
tS_h voiceless aspirated
palatal affricate
छा
(chaal)
dʒʱ
dZ_h voiced aspirated palatal
affricate
झा
(jhaal)
k_h voiceless aspirated velar
plosive
खा
(khan)
ɡʱ
g_h voiced aspirated velar
plosive
घा
(ghaan)
ɳ
n` retroflex nasal
क्ष
(kshan)
ɾ
4 alveolar flap
रा
(ram)
ɽ
r` plain retroflex flap
ड़ा
(bada)
ɽʱ
r`_h voiced aspirated retroflex
flap
ढ़ी
(barhi)
ʋ
v\ bilabial approximant
सूल
(wasool)
Vowels
ə
@_o mid central vowel
च्छा
(achhaa)
ə̃
@~ nasalised mid central
vowel
हँसना
(hansnaa)
Hindi (hi-IN) 131
Amazon Polly Developer Guide
IPA X-SAMPA Description Example
a A_o open front unrounded
vowel
(aag)
a ̃ A~ nasalised open front
unrounded vowel
घ़डियँा
(ghariyaan)
ɪ
I_o near-close near-front
unrounded vowel
क्कीस
(ikkees)
ɪ̃
I~ nasalised near-close near
front unrounded vowel
संिचाई
(sinchai)
i i_o close front unrounded
vowel
बिल्ली
(billee)
ı̃ i~ nasalised close front
unrounded vowel
हंी
(nahin)
ʊ
U_o near-close near-back
rounded vowel
ल्ूल
(ullu)
ʊ̃
U~ nasalised near-close
near-back rounded vowel
मँु
(munh)
u u_o close back rounded
vowel
फू
(phool)
u ̃ u~ nasalised close back
rounded vowel
ऊँ
(oont)
ɔ
O_o open-mid back rounded
vowel
कौ
(kaun)
ɔ̃
O~ nasalised open-mid back
rounded vowel
भंौ
(bhaun)
o o close-mid back rounded
vowel
सोना
(sona)
Hindi (hi-IN) 132
Amazon Polly Developer Guide
IPA X-SAMPA Description Example
o ̃ o~ nasalised close-mid back
rounded vowel
क्यंो
(kyon)
ɛ
E_o open-mid front
unrounded vowel
पैसा
(paisa)
ɛ̃
E~ nasalised open-mid front
unrounded vowel
मंै
(main)
e e close-mid front
unrounded vowel
(ek)
e ̃ e~ nasalised close-mid front
unrounded vowel
किताबंे
(kitabein)
Icelandic (is-IS)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Icelandic voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
grasbakkanum 0
c c voiceless palatal
plosive
pakkin k
c_h aspirated voiceless
palatal plosive
anarkistai k
Icelandic (is-IS) 133
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ç C voiceless palatal
fricative
héðan k
d d voiced alveolar
plosive
bóndi t
ð D voiced dental
fricative
borð T
f f voiceless labiodent
al fricative
duft f
g g voiced velar
plosive
holgóma k
ɣ
G voiced velar
fricative
hugur k
h h voiceless glottal
fricative
heili k
j j palatal approxima
nt
jökull i
k_h aspirated voiceless
velar plosive
ósköpunum k
l l alveolar lateral
approximant
lf t
l_0 voiceless alveolar
lateral approxima
nt
lk t
m m bilabial nasal september p
m_0 voiceless bilabial
nasal
kompa p
Icelandic (is-IS) 134
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
n n alveolar nasal númer t
n_0 voiceless alveolar
nasal
ntun t
ɲ
J palatal nasal pælingar J
ŋ N velar nasal ngvarann k
ŋ̊ N_0 voiceless velar
nasal
frænka k
p_h aspirated voiceless
bilabial plosive
afplánun p
r r alveolar trill afskrifta r
r_0 voiceless alveolar
trill
andvörpum r
s s voiceless alveolar
fricative
baðhús s
t_h aspirated voiceless
alveolar plosive
tanki t
θ T voiceless dental
fricative
þeldökki T
v v voiced labiodental
fricative
silfur f
w w labial-velar
approximant
u
x x voiceless velar
fricative
samfélags k
Icelandic (is-IS) 135
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Vowels
œ 9 open-mid front
rounded vowel
þröskuldinum O
œː
9: long open-mid
front rounded
vowel
tvö O
a a open front
unrounded vowel
nefna a
a: a: long open front
unrounded vowel
fara a
au au diphthong átta a
au: au: diphthong átján a
ɛ
E open-mid front
unrounded vowel
kennari E
ɛ:
E: long open-mid
front unrounded
vowel
dreka E
i i close front
unrounded vowel
Gúlíver i
i: i: long close front
unrounded vowel
þrír i
ɪ
I near-close near-
front unrounded
vowel
samspil i
Icelandic (is-IS) 136
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɪ:
I: long near-clos
e near-front
unrounded vowel
stig i
ɔ
O open-mid back
rounded vowel
regndropar O
ɔ:
O: long open-mid
back rounded
vowel
ullarbolur O
ɔu
Ou diphthong tólf O
ɔu:
Ou: diphthong fjórir O
u u close back
rounded vowel
stúlkan u
u: u: long close back
rounded vowel
frú u
ʏ
Y near-close near-
front rounded
vowel
tíu u
ʏ:
Y long near-clos
e near-front
rounded vowel
gruninn u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Icelandic (is-IS) 137
Amazon Polly Developer Guide
Italian (it-IT)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Italian voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bacca p
d d voiced alveolar
plosive
dama t
d͡z
dz voiced alveolar
affricate
zero s
d͡ʒ
dZ voiced postalveo
lar affricate
giro S
f f voiceless labiodent
al fricative
famiglia f
g g voiced velar
plosive
gatto k
h h voiceless glottal
fricative
horror k
j j palatal approxima
nt
dieci i
k k voiceless velar
plosive
campo k
Italian (it-IT) 138
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
l l alveolar lateral
approximant
lido t
ʎ
L palatal lateral
approximant
aglio J
m m bilabial nasal mille p
n n alveolar nasal nove t
ɲ
J palatal nasal lasagne J
p p voiceless bilabial
plosive
pizza p
r r alveolar trill risata r
s s voiceless alveolar
fricative
sei s
ʃ
S voiceless postalveo
lar fricative
scienza S
t t voiceless alveolar
plosive
tavola t
t͡s
ts voiceless alveolar
affricate
forza s
t͡ʃ
tS voiceless postalveo
lar affricate
cielo S
v v voiced labiodental
fricative
venti f
w w labial-velar
approximant
quattro u
Italian (it-IT) 139
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
z z voiced alveolar
fricative
bisogno s
ʒ
Z voiced postalveo
lar fricative
bijou S
Vowels
a a open front
unrounded vowel
arco a
e e close-mid front
unrounded vowel
tre e
ɛ
E open-mid front
unrounded vowel
ettaro E
i i close front
unrounded vowel
impero i
o o close-mid back
rounded vowel
cento o
ɔ
O open-mid back
rounded vowel
otto O
u u close back
rounded vowel
uno u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Italian (it-IT) 140
Amazon Polly Developer Guide
Japanese (ja-JP)
Amazon Polly supports the Pronunciation Kana and Yomigana alphabets for Japanese. To make
Amazon Polly use phonetic pronunciation with these alphabets, use the phoneme alphabet="x-
amazon-phonetic standard used" attribute.
x-amazon-pron-kana – indicates that Pronunciation Kana is used. Pronunciation Kana are
special Katakana characters used for phonetic transcription and can encode pitch accent.
x-amazon-yomigana – indicates that Yomigana is used. Yomigana can be conventional
Katakana, Hiragana, and Latin alphabets interpreted as hepburn romanization.
The following examples show how these are used:
Pronunciation Kana
<speak>
###<phoneme alphabet="x-amazon-pron-kana" ph="###'#">##</phoneme>###
</speak>
Yomigana
<speak>
###<phoneme alphabet="x-amazon-yomigana" ph="####">##</phoneme>###
###<phoneme alphabet="x-amazon-yomigana" ph="####">##</phoneme>###
###<phoneme alphabet="x-amazon-yomigana" ph="Hirokazu">##</phoneme>###
</speak>
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Japanese voice supported by Amazon Polly.
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap
練習,
renshuu t
ʔ
? glottal stop
あつっ,
atsu'
Japanese (ja-JP) 141
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
b b voiced bilabial
plosive
舞踊,
buyou p
β B voiced bilabial
fricative
ヴィンテージ,
vinteeji
B
c c voiceless palatal
plosive
ききょう,
kikyou k
ç C voiceless palatal
fricative
人,
hito k
d d voiced alveolar
plosive
濁点,
dakuten t
d͡ʑ
dz\ voiced alveolo-p
alatal affricate
純,
jun J
ɡ
g voiced velar
plosive
ご飯,
gohan k
h h voiceless glottal
fricative
本,
hon k
j j palatal approxima
nt
屋根,
yane i
ɟ
J\ voiced palatal
plosive
行儀,
gyougi J
k k voiceless velar
plosive
漢字,
kanji k
ɺ
l\ alveolar lateral
flap
釣り,
tsuri r
Japanese (ja-JP) 142
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɺj
l\j alveolar lateral
flap, palatal
approximant
流行,
ryuukou r
m m bilabial nasal
飯,
meshi p
n n alveolar nasal
猫,
neko t
ɲ
J palatal nasal
日本,
nippon J
ɴ
N\ uvular nasal
缶,
kan k
p p voiceless bilabial
plosive
パン,
pan p
ɸ
p\ voiceless bilabial
fricative
福,
huku f
s s voiceless alveolar
fricative
層,
sou s
ɕ
s\ voiceless alveolo-p
alatal fricative
書簡,
shokan J
t t voiceless alveolar
plosive
手紙,
tegami t
t͡s
ts voiceless alveolar
affricate
釣り,
tsuri s
t͡ɕ
ts\ voiceless alveolo-p
alatal affricate
吉,
kichi J
w w labial-velar
approximant
電話,
denwa u
z z voiced alveolar
fricative
座敷,
zashiki s
Japanese (ja-JP) 143
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Vowels
äː
a:_" long open central
unrounded vowel
羽蟻,
haari a
ä a_" open central
unrounded vowel
仮名,
kana a
e:_o long mid front
unrounded vowel
学生,
gakusei @
e e_o mid front
unrounded vowel
歴,
reki @
i i close front
unrounded vowel
気,
ki i
i: long close front
unrounded vowel
詩歌,
shiika i
ɯ
M close back
unrounded vowel
運,
un i
ɯː
M: long close back
unrounded vowel
宗教,
shuukyou i
o:_o long mid back
rounded vowel
購読,
koodoku o
o o_o mid back rounded
vowel
読者,
dokusha o
Korean (ko-KR)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA)symbols, and the corresponding visemes for the
Korean voice supported by Amazon Polly.
Korean (ko-KR) 144
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Consonants
k k voiceless velar
plosive
강,
[g]ang k
k# k_t strong voiceless
velar plosive
깨,
[kk]e k
n n alveolar nasal
남,
[n]am t
t t voiceless alveolar
plosive
도,
[d]o t
t# t_t strong voiceless
alveolar plosive
때,
[tt]e t
ɾ
4 alveolar flap
사랑,
sa[r]ang t
l l alveolar lateral
approximant
돌,
do[l] t
m m bilabial nasal
무,
[m]u p
p p voiceless bilabial
plosive
봄,
[b]om p
p# p_t strong voiceless
bilabial plosive
뻘,
[pp]eol p
s s voiceless alveolar
fricative
새,
[s]e s
s# s_t strong voiceless
alveolar fricative
씨,
[ss]i s
ŋ N velar nasal
방,
ba[ng] k
Korean (ko-KR) 145
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t͡ɕ
ts\ voiceless alveolo-p
alatal affricate
조,
[j]o J
t#͡ɕ
ts\_t strong voiceless
alveolo-palatal
affricate
찌,
[jj]i J
t͡ɕʰ
ts\_h aspirated voiceless
alveolo-palatal
affricate
차,
[ch]a J
k_h aspirated voiceless
velar plosive
코,
[k]o k
t_h aspirated voiceless
alveolar plosive
통,
[t]ong t
p_h aspirated voiceless
bilabial plosive
패,
[p]e p
h h voiceless glottal
fricative
힘,
[h]im k
j j palatal approxima
nt
양,
[y]ang i
w w labial-velar
approximant
왕,
[w]ang u
ɰ
M\ velar approxima
nt>
의,
[wj]i i
Vowels
a a open front
unrounded vowel
밥,
b[a]b a
Korean (ko-KR) 146
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʌ
V open-mid back
unrounded vowel
정,
j[eo]ng E
ɛ
E open-mid front
unrounded vowel
배,
b[e] E
o o close-mid back
rounded vowel
노,
n[o] o
u u close back
rounded vowel
둘,
d[u]l u
ɯ
M close back
unrounded vowel
은,
[eu]n i
i i close front
unrounded vowel
김,
k[i]m i
Norwegian (nb-NO)
The following chart lists the full set of International Phonetic Alphabet (IPA) phonemes and the
Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols as well as the
corresponding visemes as supported by Amazon Polly for Norwegian language voices.
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap prøv t
b b voiced bilabial
plosive
labb p
ç C voiceless palatal
fricative
kino k
Norwegian (nb-NO) 147
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
d d voiced alveolar
plosive
ladd t
ɖ
d` voiced retroflex
plosive
verdi t
f f voiceless labiodent
al fricative
fot f
ɡ ɡ
voiced velar
plosive
tagg k
h h voiceless glottal
fricative
ha k
j j palatal approxima
nt
gi i
k k voiceless velar
plosive
takk k
l l alveolar lateral
approximant
fall, ball t
ɭ
l` retroflex lateral
approximant
ærlig t
m m bilabial nasal lam p
n n alveolar nasal vann t
ɳ
n` retroflex nasal garn t
ŋ N velar nasal sang k
p p voiceless bilabial
plosive
hopp p
Norwegian (nb-NO) 148
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
s s voiceless alveolar
fricative
lass s
ʂ
s` voiceless retroflex
fricative
års S
ʃ
S voiceless postalveo
lar fricative
skyt S
t t voiceless alveolar
plosive
lat t
ʈ
t` voiceless retroflex
plosive
hardt t
ʋ
v\ labiodental
approximant
vin f
w w labial-velar
approximant
will x
Vowels
øː
2: long close-mid
front rounded
vowel
søt o
œ 9 open-mid front
rounded vowel
søtt O
ə
@ mid central vowel ape @
æː
{: long near-open
front unrounded
vowel
vær a
ʉ
} close central
rounded vowel
lund u
Norwegian (nb-NO) 149
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʉː
}: long close central
rounded vowel
lun u
æ { near-open front
unrounded vowel
vært a
ɑ
A open back
unrounded vowel
hatt a
ɑː
A: long open back
unrounded vowel
hat a
e: e: long close-mid
front unrounded
vowel
sen e
ɛ
E open-mid front
unrounded vowel
send E
i: i: long close front
unrounded vowel
vin i
ɪ
I near-close near-
front unrounded
vowel
vind i
long close-mid
back rounded
vowel
våt o
ɔ
O open-mid back
rounded vowel
vått O
u: u: long close back
rounded vowel
bok u
Norwegian (nb-NO) 150
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʊ
U near-close near-
back rounded
vowel
bukk u
y: y: long close front
rounded vowel
lyn u
ʏ
Y near-close near-
front rounded
vowel
lynne u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Polish (pl-PL)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Polish voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bobas, belka p
d d voiced alveolar
plosive
dar, do t
Polish (pl-PL) 151
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
d͡z
dz voiced alveolar
affricate
dzwon, widzowie s
d͡ʑ
dz\ voiced alveolo-p
alatal affricate
więk J
d͡ʐ
dz` voiced retroflex
affricate
em, ungla S
f f voiceless labiodent
al fricative
furtka, film f
g g voiced velar
plosive
gazeta, waga k
h h voiceless glottal
fricative
chleb, handel k
j j palatal approxima
nt
jak, maja i
k k voiceless velar
plosive
kura, marek k
l l alveolar lateral
approximant
lipa, alicja t
m m bilabial nasal matka, molo p
n n alveolar nasal norka t
ɲ
J palatal nasal koń, toruń J
p p voiceless bilabial
plosive
pora, stop p
r r alveolar trill rok, park r
Polish (pl-PL) 152
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
s s voiceless alveolar
fricative
sum, pas s
ɕ
s\ voiceless alveolo-p
alatal fricative
śruba, śnieg J
ʂ
s` voiceless retroflex
fricative
szum, masz S
t t voiceless alveolar
plosive
tok, stół t
t͡s
ts voiceless alveolar
affricate
car, co s
t͡ɕ
ts\ voiceless alveolo-p
alatal affricate
ćma, mieć J
t͡ʂ
ts` voiceless retroflex
affricate
czas, raczej S
v v voiced labiodental
fricative
worek, mewa f
w w labial-velar
approximant
łaska, mało u
z z voiced alveolar
fricative
zero s
ʑ
z\ voiced alveolo-p
alatal fricative
źrebię, bieliźnie J
ʐ
z` voiced retroflex
fricative
żar, żona S
Vowels
Polish (pl-PL) 153
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
a a open front
unrounded vowel
ja a
ɛ
E open-mid front
unrounded vowel
echo E
ɛ̃
E~ nasal open-mid
front unrounded
vowel
węże E
i i close front
unrounded vowel
ile i
ɔ
O open-mid back
rounded vowel
oczy O
ɔ̃
O~ nasal open-mid
back rounded
vowel
wąż O
u u close back
rounded vowel
uczta u
ɨ
1 close central
unrounded vowel
byk i
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Polish (pl-PL) 154
Amazon Polly Developer Guide
Portuguese (pt-PT)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Portuguese voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap pira t
b b voiced bilabial
plosive
dato p
d d voiced alveolar
plosive
dato t
f f voiceless labiodent
al fricative
facto f
g g voiced velar
plosive
gato k
j j palatal approxima
nt
paraguay i
k k voiceless velar
plosive
cacto k
l l alveolar lateral
approximant
galo t
ʎ
L palatal lateral
approximant
galho J
m m bilabial nasal mato p
n n alveolar nasal nato t
Portuguese (pt-PT) 155
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɲ
J palatal nasal pinha J
p p voiceless bilabial
plosive
pato p
ʀ
R\ uvular trill barroso k
s s voiceless alveolar
fricative
saca s
ʃ
S voiceless postalveo
lar fricative
chato S
t t voiceless alveolar
plosive
tacto t
v v voiced labiodental
fricative
vaca f
w w labial-velar
approximant
mau u
z z voiced alveolar
fricative
zaca s
ʒ
Z voiced postalveo
lar fricative
jacto S
Vowels
a a open front
unrounded vowel
parto a
a ̃ a~ nasal open front
unrounded vowel
pega a
e e close-mid front
unrounded vowel
pega e
Portuguese (pt-PT) 156
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
e ̃ e~ nasal close-mid
front unrounded
vowel
movem e
ɛ
E open-mid front
unrounded vowel
café E
i i close front
unrounded vowel
lingueta i
ı̃ i~ nasal close front
unrounded vowel
cinto i
o o close-mid back
rounded vowel
poder o
o ̃ o~ nasal close-mid
back rounded
vowel
compra o
ɔ
O open-mid back
rounded vowel
cotó O
u u close back
rounded vowel
fui u
u ̃ u~ nasal close back
rounded vowel
sunto u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Portuguese (pt-PT) 157
Amazon Polly Developer Guide
Portuguese (Brazilian) (pt-BR)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Brazilian Portuguese voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap pira t
b b voiced bilabial
plosive
bato p
d d voiced alveolar
plosive
dato t
d͡ʒ
dZ voiced postalveo
lar affricate
idade S
f f voiceless labiodent
al fricative
facto f
g g voiced velar
plosive
gato k
j j palatal approxima
nt
paraguay i
k k voiceless velar
plosive
cacto k
l l alveolar lateral
approximant
galo t
ʎ
L palatal lateral
approximant
galho J
Portuguese (Brazilian) (pt-BR) 158
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
m m bilabial nasal mato p
n n alveolar nasal nato t
ɲ
J palatal nasal pinha J
p p voiceless bilabial
plosive
pato p
s s voiceless alveolar
fricative
saca s
ʃ
S voiceless postalveo
lar fricative
chato S
t t voiceless alveolar
plosive
tacto t
t͡ʃ
tS voiceless postalveo
lar affricate
noite S
v v voiced labiodental
fricative
vaca f
w w labial-velar
approximant
mau u
χ X voiceless uvular
fricative
carro k
z z voiced alveolar
fricative
zaca s
ʒ
Z voiced postalveo
lar fricative
jacto S
Vowels
Portuguese (Brazilian) (pt-BR) 159
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
a a open front
unrounded vowel
parto a
a ̃ a~ nasal open front
unrounded vowel
pensamos a
e e close-mid front
unrounded vowel
pega e
e ̃ e~ nasal close-mid
front unrounded
vowel
movem e
ɛ
E open-mid front
unrounded vowel
café E
i i close front
unrounded vowel
lingueta i
ı̃ i~ nasal close front
unrounded vowel
cinto i
o o close-mid back
rounded vowel
poder o
o ̃ o~ nasal close-mid
back rounded
vowel
compra o
ɔ
O open-mid back
rounded vowel
cotó O
u u close back
rounded vowel
fui u
u ̃ u~ nasal close back
rounded vowel
sunto u
Portuguese (Brazilian) (pt-BR) 160
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Romanian (ro-RO)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Romanian voice supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bubă p
d d voiced alveolar
plosive
după t
d͡ʒ
dZ voiced postalveo
lar affricate
george S
f f voiceless labiodent
al fricative
afacere f
g g voiced velar
plosive
agriș k
h h voiceless glottal
fricative
harpă k
Romanian (ro-RO) 161
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
j j palatal approxima
nt
baie i
k k voiceless velar
plosive
c k
l l alveolar lateral
approximant
lampa t
m m bilabial nasal mama p
n n alveolar nasal nor t
p p voiceless bilabial
plosive
pilă p
r r alveolar trill rampă r
s s voiceless alveolar
fricative
soare s
ʃ
S voiceless postalveo
lar fricative
mașină S
t t voiceless alveolar
plosive
tata t
t͡s
ts voiceless alveolar
affricate
țară s
t͡ʃ
tS voiceless postalveo
lar affricate
ceai S
v v voiced labiodental
fricative
viață f
w w labial-velar
approximant
beau u
Romanian (ro-RO) 162
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
z z voiced alveolar
fricative
mozol s
ʒ
Z voiced postalveo
lar fricative
joacă S
Vowels
ə
@ mid central vowel babă @
a a open front
unrounded vowel
casa a
e e close-mid front
unrounded vowel
elan e
e_^ non-syllabic
close-mid front
unrounded vowel
beau e
i i close front
unrounded vowel
mie i
o o close-mid back
rounded vo
o o
oa o_^a diphthong oare o
u u close back
rounded vowel
unde u
ɨ
1 close central
unrounded vowel
România i
Additional Symbols
ˈ
" primary stress Alabama
Romanian (ro-RO) 163
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Russian (ru-RU)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Russian voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
борт p
b' palatalized voiced
bilabial plosive
бюро p
d d voiced alveolar
plosive
дом t
d' palatalized voiced
alveolar plosive
дядя t
f f voiceless labiodent
al fricative
флаг f
f' palatalized
voiceless labiodent
al fricative
февраль f
g g voiced velar
plosive
нога k
Russian (ru-RU) 164
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɡʲ
g' palatalized voiced
velar plosive
герой k
j j palatal approxima
nt
дизайн, ящик i
k k voiceless velar
plosive
кот k
k' palatalized
voiceless velar
plosive
кино k
l l alveolar lateral
approximant
лампа t
l' palatalized
alveolar lateral
approximant
лес t
m m bilabial nasal мама p
m' palatalized bilabial
nasal
мяч p
n n alveolar nasal нос t
n' palatalized
alveolar nasal
няня t
p p voiceless bilabial
plosive
папа p
p' palatalized
voiceless bilabial
plosive
перо p
r r alveolar trill роза r
Russian (ru-RU) 165
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
r' palatalized
alveolar trill
рюмка r
s s voiceless alveolar
fricative
сыр s
s' palatalized
voiceless alveolar
fricative
сердце, русь s
ɕ:
s\: long voiceless
alveolo-palatal
fricative
щека J
ʂ
s` voiceless retroflex
fricative
шум S
t t voiceless alveolar
plosive
точка t
t' palatalized
voiceless alveolar
plosive
тётя t
t͡s
ts voiceless alveolar
affricate
царь s
t͡ɕ
ts\ voiceless alveolo-p
alatal affricate
час J
v v voiced labiodental
fricative
вор f
v' palatalized voiced
labiodental
fricative
верфь f
Russian (ru-RU) 166
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
x x voiceless velar
fricative
хор k
x' palatalized
voiceless velar
fricative
химия k
z z voiced alveolar
fricative
зуб s
z' palatalized voiced
alveolar fricative
зима s
ʑ:
z\: long voiced
alveolo-palatal
fricative
уезжать J
ʐ
z` voiced retroflex
fricative
жена S
Vowels
ə
@ mid central vowel канарейка @
a a open front
unrounded vowel
два, яблоко a
e e close-mid front
unrounded vowel
печь e
ɛ
E open-mid front
unrounded vowel
это E
i i close front
unrounded vowel
один, четыре i
o o close-mid back
rounded vowel
кот o
Russian (ru-RU) 167
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
u u close back
rounded vowel
муж, вьюга u
ɨ
1 close central
unrounded vowel
мышь i
Spanish (es-ES)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Spanish voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap pero, bravo, amor,
eterno
t
b b voiced bilabial
plosive
bestia p
β B voiced bilabial
fricative
bebé B
d d voiced alveolar
plosive
cuando t
ð D voiced dental
fricative
arder T
f f voiceless labiodent
al fricative
fase, café f
Spanish (es-ES) 168
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
g g voiced velar
plosive
gato, lengua,
guerra
k
ɣ
G voiced velar
fricative
trigo, Argos k
j j palatal approxima
nt
hacia, tierra, radio,
viuda
i
ʝ
j\ voiced palatal
fricative
enhielar, sayo,
inyectado,
desyerba
J
k k voiceless velar
plosive
caña, laca,
quisimos
k
l l alveolar lateral
approximant
lino, calor,
principal
t
ʎ
L palatal lateral
approximant
llave, pollo J
m m bilabial nasal madre, comer,
anfibio
p
n n alveolar nasal nido, anillo, sin t
ɲ
J palatal nasal cabaña, ñoquis J
ŋ N velar nasal cinco, venga k
p p voiceless bilabial
plosive
pozo, topo p
r r alveolar trill perro, enrachado r
s s voiceless alveolar
fricative
saco, casa, puertas s
Spanish (es-ES) 169
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t t voiceless alveolar
plosive
tamiz, átomo t
t͡ʃ
tS voiceless postalveo
lar affricate
chubasco S
θ T voiceless dental
fricative
cereza, zorro,
lacero, paz
T
w w labial-velar
approximant
fuego, fuimos,
cuota, cuadro
u
x x voiceless velar
fricative
jamón, general,
suje, reloj
k
z z voiced alveolar
fricative
rasgo, mismo s
Vowels
a a open front
unrounded vowel
tanque a
e e close-mid front
unrounded vowel
peso e
i i close front
unrounded vowel
cinco i
o o close-mid back
rounded vowel
bosque o
u u close-mid front
unrounded vowel
publicar u
Additional Symbols
ˈ
" primary stress Alabama
Spanish (es-ES) 170
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Spanish (Mexican) (es-MX)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Mexican Spanish voice that is supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap pero, bravo, amor,
eterno
t
b b voiced bilabial
plosive
bestia p
β B voiced bilabial
fricative
bebé B
d d voiced alveolar
plosive
cuando t
ð D voiced dental
fricative
arder T
f f voiceless labiodent
al fricative
fase, café f
g g voiced velar
plosive
gato, lengua,
guerra
k
Spanish (Mexican) (es-MX) 171
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɣ
G voiced velar
fricative
trigo, Argos k
j j palatal approxima
nt
hacia, tierra, radio,
viuda
i
ʝ
j\ voiced palatal
fricative
enhielar, sayo,
inyectado,
desyerba
J
k k voiceless velar
plosive
caña, laca,
quisimos
k
l l lateral alveolar
approximant
lino, calor,
principal
t
m m bilabial nasal madre, comer,
anfibio
p
n n alveolar nasal nido, anillo, sin t
ɲ
J palatal nasal cabaña, ñoquis J
ŋ N velar nasal angosto, increíble k
p p voiceless bilabial
plosive
pozo, topo p
r r alveolar trill perro, enrachado r
s s voiceless alveolar
fricative
saco, casa, puertas s
ʃ
S voiceless postalveo
lar fricative
show, flash S
t t voiceless alveolar
plosive
tamiz, átomo t
Spanish (Mexican) (es-MX) 172
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
t͡ʃ
tS voiceless postalveo
lar affricate
chubasco S
w w labial-velar
approximant
fuego, fuimos,
cuota, cuadro
u
x x voiceless velar
fricative
jamón, general,
peaje, reloj
k
z z voiced alveolar
fricative
rasgo, mismo s
Vowels
a a central open
unrounded vowel
tanque a
e e close-mid front
unrounded vowel
peso e
i i close front
unrounded vowel
cinco i
o o close-mid back
rounded vowel
bosque o
u u close back
rounded vowel
publicar u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Spanish (Mexican) (es-MX) 173
Amazon Polly Developer Guide
Spanish (US) (es-US)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the US Spanish voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap pero, bravo, amor,
eterno
t
b b voiced bilabial
plosive
bestia p
β B voiced bilabial
fricative
bebé B
d d voiced alveolar
plosive
cuando t
ð D voiced dental
fricative
arder T
f f voiceless labiodent
al fricative
fase, café f
g g voiced velar
plosive
gato, lengua,
guerra
k
ɣ
G voiced velar
fricative
trigo, Argos k
j j palatal approxima
nt
hacia, tierra, radio,
viuda
i
Spanish (US) (es-US) 174
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ʝ
j\ voiced palatal
fricative
enhielar, sayo,
inyectado,
desyerba
J
k k voiceless velar
plosive
caña, laca,
quisimos
k
l l lateral alveolar
approximant
lino, calor,
principal
t
m m bilabial nasal madre, comer,
anfibio
p
n n alveolar nasal nido, anillo, sin t
ɲ
J palatal nasal cabaña, ñoquis J
ŋ N velar nasal angosto, increíble k
p p voiceless bilabial
plosive
pozo, topo p
r r alveolar trill perro, enrachado r
s s voiceless alveolar
fricative
saco, casa, puertas s
ʃ
S voiceless postalveo
lar fricative
show, flash S
t t voiceless alveolar
plosive
tamiz, átomo t
t͡ʃ
tS voiceless postalveo
lar affricate
chubasco S
w w labial-velar
approximant
fuego, fuimos,
cuota, cuadro
u
Spanish (US) (es-US) 175
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
x x voiceless velar
fricative
jamón, general,
peaje, reloj
k
z z voiced alveolar
fricative
rasgo, mismo s
Vowels
a a central open
unrounded vowel
tanque a
e e close-mid front
unrounded vowel
peso e
i i close front
unrounded vowel
cinco i
o o close-mid back
rounded vowel
bosque o
u u close back
rounded vowel
publicar u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Swedish (sv-SE)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Swedish voice supported by Amazon Polly.
Swedish (sv-SE) 176
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
bil p
d d voiced alveolar
plosive
dal t
ɖ
d` voiced retroflex
plosive
bord t
f f voiceless labiodent
al fricative
fil f
g g voiced velar
plosive
gås k
h h voiceless glottal
fricative
hal k
j j palatal approxima
nt
jag i
k k voiceless velar
plosive
kal k
l l alveolar lateral
approximant
lös t
ɭ
l` retroflex lateral
approximant
rlig t
m m bilabial nasal mil p
n n alveolar nasal nålar t
ɳ
n` retroflex nasal barn t
Swedish (sv-SE) 177
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ŋ N velar nasal ring k
p p voiceless bilabial
plosive
pil p
r r alveolar trill ris r
s s voiceless alveolar
fricative
sil s
ɕ
s\ voiceless alveolo-p
alatal fricative
tjock J
ʂ
s` voiceless retroflex
fricative
fors, schlager S
t t voiceless alveolar
plosive
tal t
ʈ
t` voiceless retroflex
plosive
hjort t
v v voiced labiodental
fricative
vår f
w w labial-velar
approximant
aula, airways u
ɧ
x\ voiceless palatal-v
elar fricative
sjuk k
Vowels
ø 2 close-mid front
rounded vowel
föll, förr o
Swedish (sv-SE) 178
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ø 2: long close-mid
front rounded
vowel
föl, nöt, för o
ɵ
8 close-mid central
rounded vowel
buss, full o
ə
@ mid central vowel pojken @
ʉː
}: long close central
rounded vowel
hus, ful u
a a open front
unrounded vowel
hall, matt a
æ { near-open front
unrounded vowel
herr a
ɑː
A: long open back
unrounded vowel
hal, mat a
e: e: long close-mid
front unrounded
vowel
vet, hel e
ɛ
E open-mid front
unrounded vowel
vett, rätt, hetta,
häll
E
ɛː
E: long open-mid
front unrounded
vowel
säl, häl, här E:
i: i: long close front
unrounded vowel
vit, sil i:
Swedish (sv-SE) 179
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
ɪ
I near-close near-
front unrounded
vowel
vitt, sill i
o: o: long close-mid
back rounded
vowel
hål, mål o
ɔ
O open-mid back
rounded vowel
håll, moll O
u: u: long close back
rounded vowel
sol, bot u
ʊ
U near-close near-
back rounded
vowel
bott u
y y close front
rounded vowel
bytt u
y: y: long close front
rounded vowel
syl, syl u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Turkish (tr-TR)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Turkish voice supported by Amazon Polly.
Turkish (tr-TR) 180
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ɾ
4 alveolar flap durum t
ɾ̝̊
4_0_r voiceless fricated
alveolar flap
bir t
ɾ̝
4_r fricated alveolar
flap
raf t
b b voiced bilabial
plosive
raf p
c c voiceless palatal
plosive
kedi k
d d voiced alveolar
plosive
dede t
d͡ʒ
dZ voiced postalveo
lar affricate
cam S
f f voiceless labiodent
al fricative
fare f
g g voiced velar plosiv galibi k
h h voiceless glottal
fricative
hasta k
j j palatal approxima
nt
yat i
ɟ
J\ voiced palatal
plosive
genç J
Turkish (tr-TR) 181
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
k k voiceless velar
plosive
akıl k
l l alveolar lateral
approximant
lale t
ɫ
5 velarized alveolar
lateral approxima
nt
labirent t
m m bilabial nasal maaş p
n n alveolar nasal anı t
p p voiceless bilabial
plosive
ip p
s s voiceless alveolar
fricative
ses s
ʃ
S voiceless postalveo
lar fricative
aşı S
t t voiceless alveolar
plosive
ütü t
t͡ʃ
tS voiceless postalveo
lar affricate
çaba S
v v voiced labiodental
fricative
ekvator, kahveci,
akvaryum, isveçli,
teşviki, cetvel
f
z z voiced alveolar
fricative
ver s
ʒ
Z voiced postalveo
lar fricative
azık S
Turkish (tr-TR) 182
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
Vowels
ø 2 close-mid front
rounded vowel
göl 0
œ 9 open-mid front
rounded vowel
banliyö O
a a open front
unrounded vowel
kal a
a: a: long open front
unrounded vowel
dava a
æ { near-open front
unrounded vowel
özlem, güvenlik,
gürel, somersault
a
e e close-mid front
unrounded vowel
keçi e
ɛ
E open-mid front
unrounded vowel
dede E
i i close front
unrounded vowel
bir i
i: i: long close front
unrounded vowel
izah i
ɪ
I near-close near-
front unrounded
vowel
ki i
ɯ
M close back
unrounded vowel
kıl i
o o close-mid back
rounded vowel
kol o
Turkish (tr-TR) 183
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
o: o: long close-mid
back rounded
vowel
dolar o
u u close back
rounded vowel
durum u
u: u: long close back
rounded vowel
ruhum u
ʊ
U near-close near-
back rounded
vowel
dolu u
y y close front
rounded vowel
güvenlik u
ʏ
Y near-close near-
front rounded
vowel
ı u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Welsh (cy-GB)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech
Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for
the Welsh voice supported by Amazon Polly.
Welsh (cy-GB) 184
Amazon Polly Developer Guide
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive
baban p
d d voiced alveolar
plosive
deg t
d͡ʒ
dZ voiced postalveo
lar affricate
garej S
ð D voiced dental
fricative
deuddeg T
f f voiceless labiodent
al fricative
acs f
g g voiced velar
plosive
gadael k
h h voiceless glottal
fricative
haearn k
j j palatal approxima
nt
astudio i
k k voiceless velar
plosive
cant k
l l alveolar lateral
approximant
lan t
ɬ
K voiceless alveolar
lateral fricative
llan t
m m bilabial nasal mae p
Welsh (cy-GB) 185
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
m_0 voiceless bilabial
nasal
ymhen p
n n alveolar nasal naw t
n_0 voiceless alveolar
nasal
anhawster t
ŋ N velar nasal argyfwng k
ŋ̊ N_0 voiceless velar
nasal
anghenion k
p p voiceless bilabial
plosive
pump p
r r alveolar trill rhoi r
r_0 voiceless alveolar
trill
garw r
s s voiceless alveolar
fricative
saith s
ʃ
S voiceless postalveo
lar fricative
siawns S
t t voiceless alveolar
plosive
tegan t
t͡ʃ
tS voiceless postalveo
lar affricate
cytsain S
θ T voiceless dental
fricative
aberth T
v v voiced labiodental
fricative
prawf f
Welsh (cy-GB) 186
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
w w labial-velar
approximant
rhagweld u
χ X voiceless uvular
fricative
chwech k
z z voiced alveolar
fricative
aids s
ʒ
Z voiced postalveo
lar fricative
rouge S
Vowels
ə
@ mid central vowel ychwanega @
a a open front
unrounded vowel
acen a
ai ai diphthong dau a
au au diphthong awdur a
ɑː
A: long open back
unrounded vowel
mab a
ɑːɨ
A:1 diphthong aelod a
e: e: long close-mid
front unrounded
vowel
peth e
ɛ
E open-mid front
unrounded vowel
pedwar E
ɛi
Ei diphthong beic E
Welsh (cy-GB) 187
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
i: i: long close front
unrounded vowel
tri i
ɪ
I near-close near-
front unrounded
vowel
miliwn i
ɨu
1u diphthong unigryw i
o: o: long close-mid
back rounded
vowel
oddi o
ɔ
O open-mid back
rounded vowel
oddieithr O
ɔi
Oi diphthong troi O
ɔu
Ou diphthong rownd O
u: u: long close back
rounded vowel
cwch u
ʊ
U near-close near-
back rounded
vowel
acwstig u
ʊi
Ui diphthong wyth u
Additional Symbols
ˈ
" primary stress Alabama
ˌ
% secondary stress Alabama
. . syllable boundary A.la.ba.ma
Welsh (cy-GB) 188
Amazon Polly Developer Guide
Amazon Polly voice engines
Amazon Polly has four voice engines that convert input text into life-like speech. These include:
Generative, Long-form, Neural, and Standard. To use an Amazon Polly voice, select an engine
and a speech synthesis API operation. Then provide input text for the engine to synthesize, and
select an audio output format. Given these inputs, Amazon Polly synthesizes the provided text into
a high-quality speech audio stream.
The following sections include details about the voice engines offered by Amazon Polly.
Topics
Generative voices
Long-form voices
Neural voices
Standard voices
Generative voices
Amazon Polly's generative text-to-speech (TTS) engine offers the most human-like, emotionally
engaged, and adaptive conversational voices available for the use via the Amazon Polly console.
The Generative engine is the largest Amazon Polly TTS model to-date. It deploys a billion-
parameter transformer that converts raw text into speech codes, followed by a convolution-based
decoder that converts these speech codes into waveforms in an incremental, streamable manner.
This method shows the widely-reported emergent abilities of Large Language Models (LLMs) when
trained on increasing volumes of publicly available and proprietary data comprising a variety of
voices, languages, and styles.
The Generative engine creates synthetic speech which is emotionally engaged, assertive, and
highly colloquial in a way that is remarkably similar to a human voice. You can use these voices as
a knowledgeable customer assistant, a virtual trainer, or an advertiser with a near-human synthetic
speech.
Note
The state-of-the-art technology underlying these voices falls within the paradigm of
generative AI for language and voice modelling. A side effect of the technology is that any
Generative engine 189
Amazon Polly Developer Guide
updates to the training data and the model could result in slight variations to the way the
voices sound, even in case when their overall quality improves with model updates. This
could have an impact on use cases with different content parts synthesized over a long
time period – for example, a season of podcasts.
Available generative voices
Amazon Polly currently offers two female and one male English voice in a generative variant. These
generative voices are also available in a conversational NTTS variant.
Language Language code Name/ID Gender
1 English (UK) en-GB Amy Female
2 English (US) en-US Matthew
Ruth
Male
Female
Note
Generative voices cost is specified on the Amazon Polly pricing information page.
Feature and region compatibility
Amazon Polly generative voices are available in the following regions:
US East (N. Virginia): us-east-1
Europe (Frankfurt): eu-central-1
Other Regions are not available
The following features are supported for generative voices:
Real-time and asynchronous speech synthesis operations.
Newscaster speaking style is not supported in the Generative engine.
Available generative voices 190
Amazon Polly Developer Guide
Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-
supported SSML tags, see Supported SSML tags
As with standard voices, you can choose from various sampling rates to optimize the bandwidth
and audio quality for your application. Valid sampling rates for standard and neural voices are
8 kHz, 16 kHz, 22 kHz, or 24 kHz. The default for standard voices is 22 kHz. The default for
generative voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis), and raw PCM audio
stream formats.
Support for generating speech marks is currently not available.
Note
In the unlikely event of model hallucination, (and with the Generative engine's model
behavior of rendering the speech token by token) an imposed emergency stop mechanism
is in place. The built-in mechanism stops the model from rendering speech any further. This
safety feature is based on data analysis where the model has the potential to hallucinate,
usually at the end of the sentence.
There could be cases where the model thinks it is going to hallucinate and then might
end up cutting a word during a generation step, thus rendering half the word. This could
potentially generate inappropriate results.
Using the Generative engine on the console
You can access Amazon Polly generative voices through the Amazon Polly console or AWS CLI.
From the console, select the Generative engine, then select a corresponding generative voice from
the list to hear the synthesized speech in that voice. You can also explore generative voices with the
SynthesizeSpeech and StartSpeechSynthesisTask API operations. For the API operations,
you can specify the engine and the name of the voices in the API request. For quick-start getting
started code examples using Python, see Python examples.
To use the generative engine on the console
1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.
2. From the Amazon Polly console, choose the Generative engine.
3. Choose the desired voice from the voice dropdown menu.
4. Generate TTS audio with text of your choice.
Using the Generative engine on the console 191
Amazon Polly Developer Guide
Note
Generative voices can also be used with the SynthesizeSpeech and
StartSpeechSynthesisTask API operations. For the API operations, customers can
specify the engine and the name of the voices in the API request. You can find more quick-
start code samples here.
Long-form voices
Amazon Polly has a Long-form engine that produces human-like, highly expressive, and
emotionally adept voices. Long-form voices are designed to captivate listeners’ attention for longer
content, such as news articles, training materials, or marketing videos.
Amazon Polly Long-form voices are developed with a cutting-edge deep learning TTS technology.
The model learns to replicate phonemes, prosody, intonation, and other phonetic and acoustic
aspects of human language, resulting in a highly natural speech output.
The Long-form engine uses text embeddings to interpret the meaning of a text. Using text
embeddings, the Long-form engine can generate the correct emphasis, pauses, and tone of a
natural voice. The result is a voice that combines the complete range of emotional elements
present in human communication. This includes mimicking surprisal or differentiating dialogue
from narration. Together, this creates a premium speech product that sounds like a live human
being.
Note
The state-of-the-art technology underlying these voices falls within the paradigm of
generative AI for language and voice modelling. A side effect of the technology is that any
updates to the training data and the model could result in a slight variations to the way
the voices sound, even in case when their overall quality improves with model updates.
This could have an impact on use cases with different content parts synthesized over a long
time period – for example, a season of podcasts.
Long-form engine 192
Amazon Polly Developer Guide
Available long-form voices
Amazon Polly currently offers two female and one male en-US long-form voice. These long-form
voices are also available in a conversational NTTS variant.
Language Language code Name/ID Gender
1 English (US) en-US Danielle
Gregory
Ruth
Female
Male
Female
Feature and region compatibility
Amazon Polly long-form voices are available in the following regions:
US East (N. Virginia): us-east-1
Other regions not available
The Amazon Polly Long-form engine supports the following features:
Real-time and asynchronous speech synthesis operations.
All speech marks.
Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-
supported SSML tags, see Supported SSML tags
As with standard voices, you can choose from various sampling rates to optimize the bandwidth
and audio quality for your application. Valid sampling rates for standard, long-form, and neural
voices are: 8 kHz, 16 kHz, 22kHz, or 24 kHz. The default for standard voices is 22 kHz. The
default for long-form and neural voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis),
and raw PCM audio stream formats.
Note
Long-form voices cost is specified on the Amazon Polly pricing information page.
Available long-form voices 193
Amazon Polly Developer Guide
Using the Long-form engine on the console
You can access Amazon Polly long-form voices through the Amazon Polly console or AWS CLI.
To use the Long-form engine on the console
1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.
2. From the Amazon Polly console, choose the Long Form engine.
3. Choose the desired voice from the voice dropdown menu.
4. Generate TTS audio with text of your choice.
Note
Long-form voices can also be used with the SynthesizeSpeech and
StartSpeechSynthesisTask API operations. For the API operations, customers can
specify the engine and the name of the voices in the API request. You can find more quick-
start code samples here.
Neural voices
Amazon Polly has a Neural text-to-speech (NTTS) engine that can produce even higher quality
voices than its standard voices. Standard TTS voices use concatenative synthesis. The standard
engine concatenates phonemes of recorded speech, producing very natural-sounding synthesized
speech. However, the inevitable variations in speech and the techniques used to segment the
waveforms limits the quality of speech. The Amazon Polly NTTS engine doesn't use standard
concatenative synthesis to produce speech. It has two parts:
A neural network — that converts a sequence of phonemes (the most basic units of language)
into a sequence of spectrograms. (Spectograms are snapshots of the energy levels in different
frequency bands.)
A vocoder — that converts spectrograms into a nearly continuous audio signal.
The first component of the neural TTS system is a sequence-to-sequence model. This model
doesn’t create its results solely from the corresponding input but also considers how the sequence
of the elements of the input work together. The model chooses the spectrograms that it outputs so
Using the Long-form engine on the console 194
Amazon Polly Developer Guide
that their frequency bands emphasize acoustic features that the human brain uses when processing
speech.
The output of this model then passes to a neural vocoder. This converts the spectrograms
into speech waveforms. When trained on the large datasets used to build general-purpose
concatenative-synthesis systems, this sequence-to-sequence approach will yield higher-quality,
more natural-sounding voices.
Available neural voices
Neural voices are available in 35 languages and language variants. The following table lists the
voices.
Language
and language
variants
Language code Name/ID Gender
1 Arabic (Gulf) ar-AE Hala
Zayd
Female
Male
2 Belgian Dutch
(Flemish)
nl-BE Lisa Female
3 Catalan ca-ES Arlet Female
4 Czech cs-CZ Jitka Female
5 Chinese
(Cantonese)
yue-CN Hiujin Female
6 Chinese
(Mandarin)
cmn-CN Zhiyu Female
7 Danish da-DK Sofie Female
8 Dutch nl-NL Laura Female
9 English
(Australian)
en-AU Olivia Female
Available neural voices 195
Amazon Polly Developer Guide
Language
and language
variants
Language code Name/ID Gender
10 English (British) en-GB Amy*
Emma
Brian
Arthur
Female
Female
Male
Male
11 English (Indian) en-IN Kajal Female
12 English (Irish) en-IE Niamh Female
13 English (New
Zealand)
en-NZ Aria Female
14 English (South
African)
en-ZA Ayanda Female
Available neural voices 196
Amazon Polly Developer Guide
Language
and language
variants
Language code Name/ID Gender
15 English (US) en-US Danielle
Gregory
Ivy
Joanna*
Kendra
Kimberly
Salli
Joey
Justin
Kevin
Matthew*
Ruth
Stephen
Female
Male
Female(child)
Female
Female
Female
Female
Male
Male (child)
Male (child)
Male
Female
Male
16 Finnish fi-FI Suvi Female
17 French (Belgian) fr-BE Isabelle Female
18 French
(Canadian)
fr-CA Gabrielle
Liam
Female
Male
19 French fr-FR Léa
Rémi
Female
Male
Available neural voices 197
Amazon Polly Developer Guide
Language
and language
variants
Language code Name/ID Gender
20 German de-DE Vicki
Daniel
Female
Male
21 German
(Austrian)
de-AT Hannah Female
22 German (Swiss) de-CH Sabrina Female
23 Hindi hi-IN Kajal Female
24 Italian it-IT Bianca
Adriano
Female
Male
25 Japanese ja-JP Takumi
Kazuha
Tomoko
Male
Female
Female
26 Korean ko-KR Seoyeon Female
27 Norwegian nb-NO Ida Female
28 Polish pl-PL Ola Female
29 Portuguese
(Brazilian)
pt-BR Camila
Vitória/Vitoria
Thiago
Female
Female
Male
30 Portuguese
(European)
pt-PT Inês/Ines Female
Available neural voices 198
Amazon Polly Developer Guide
Language
and language
variants
Language code Name/ID Gender
31 Spanish
(European)
es-ES Lucia
Sergio
Female
Male
32 Spanish
(Mexican)
es-MX Mia
Andrés
Female
Male
33 Spanish (US) es-US Lupe*
Pedro
Female
Male
34 Swedish sv-SE Elin Female
35 Turkish tr-TR Burcu Female
*The Amy, Joanna, Lupe, and Matthew voices can be used with the Newscaster speaking style. For
more information, see Newscaster voices.
Topics
Feature and region compatibility
Using the Neural engine on the console
Feature and region compatibility
Neural voices aren't available in all AWS Regions, nor do they support all Amazon Polly features.
Neural voices are supported in the following regions:
US East (N. Virginia): us-east-1
US West (Oregon): us-west-2
Africa (Cape Town): af-south-1
Asia Pacific (Tokyo): ap-northeast-1
Asia Pacific (Seoul): ap-northeast-2
Feature and region compatibility 199
Amazon Polly Developer Guide
Asia Pacific (Osaka): ap-northeast-3
Asia Pacific (Mumbai): ap-south-1
Asia Pacific (Singapore): ap-southeast-1
Asia Pacific (Sydney): ap-southeast-2
Canada (Central): ca-central-1
Europe (Frankfurt): eu-central-1
Europe (Ireland): eu-west-1
Europe (London): eu-west-2
Europe (Paris): eu-west-3
AWS GovCloud (US-West): us-gov-west-1
Endpoints and protocols for these Regions are identical to those used for standard voices. For more
information, see Amazon Polly endpoints and quotas.
The following features are supported for neural voices:
Real-time and asynchronous speech synthesis operations.
Newscaster speaking style. For more information about the speaking styles, see Newscaster
voices.
All speech marks.
Many (but not all) of the SSML tags that are supported by Amazon Polly. For more information
about NTTS-supported SSML tags, see Supported Tags.
As with standard voices, you can choose from various sampling rates to optimize the bandwidth
and audio quality for your application. Valid sampling rates for standard and neural voices are 8
kHz, 16 kHz, 22 kHz, or 24 kHz. The default for standard voices is 22 kHz. The default for neural
voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis), and raw PCM audio stream formats.
Using the Neural engine on the console
You can access Amazon Polly Neural voices through the Amazon Polly console or AWS CLI.
To use the neural engine on the console
1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.
Using the Neural engine on the console 200
Amazon Polly Developer Guide
2. From the console, choose the Neural engine.
3. Choose the desired voice from the voice dropdown menu.
4. Generate TTS audio with text of your choice.
Standard voices
Amazon Polly has a standard engine that use concatenative synthesis. The standard engine
concatenates phonemes of recorded speech, producing very natural-sounding synthesized speech.
Available Standard voices
Amazon Polly currently offers 40 female and 20 male standard voices in 29 language and language
variants.
Language Language code Name/ID Gender
1 Arabic arb Zeina Female
2 Chinese
(Mandarin)
cmn-CN Zhiyu Female
3 Danish da-DK Naja
Mads
Female
Male
4 Dutch nl-NL Lotte
Ruben
Female
Male
5 English
(Australian)
en-AU Nicole
Russell
Female
Male
6 English (British) en-GB Amy
Emma
Brian
Female
Female
Male
7 English (Indian) en-IN Aditi Female
Standard engine 201
Amazon Polly Developer Guide
Language Language code Name/ID Gender
Raveena Female
8 English (US) en-US Ivy
Joanna
Kendra
Kimberly
Salli
Joey
Kevin
Female
Female
Female
Female
Female
Male
Male
9 English (Welsh) en-GB-WLS Geraint Male
10 French fr-FR Céline/Celine
Léa
Mathieu
Female
Female
Male
11 French
(Canadian)
fr-CA Chantal Female
12 German de-DE Marlene
Vicki
Hans
Female
Female
Male
13 Hindi hi-IN Aditi Female
14 Icelandic is-IS Dóra/Dora
Karl
Female
Male
Available Standard voices 202
Amazon Polly Developer Guide
Language Language code Name/ID Gender
15 Italian it-IT Carla
Bianca
Giorgio
Female
Female
Male
16 Japanese ja-JP Mizuki
Takumi
Female
Male
17 Korean ko-KR Seoyeon Female
18 Norwegian nb-NO Liv Female
19 Polish pl-PL Ewa
Maja
Jacek
Jan
Female
Female
Male
Male
20 Portuguese
(Brazilian)
pt-BR Camila
Vitória/Vitoria
Ricardo
Female
Female
Male
21 Portuguese
(European)
pt-PT Inês/Ines
Cristiano
Female
Male
22 Romanian ro-RO Carmen Female
23 Russian ru-RU Tatyana
Maxim
Female
Male
Available Standard voices 203
Amazon Polly Developer Guide
Language Language code Name/ID Gender
24 Spanish
(European)
es-ES Conchita
Lucia
Enrique
Female
Female
Male
25 Spanish
(Mexican)
es-MX Mia Female
26 Spanish (US) es-US Lupe
Penélope/
Penelope
Miguel
Female
Female
Male
27 Swedish sv-SE Astrid Female
28 Turkish tr-TR Filiz Male
29 Welsh cy-GB Gwyneth Female
Feature and region compatibility
Amazon Polly standard voices are available in the following Amazon Polly regions:
US East (N. Virginia): us-east-1
US East (Ohio): us-east-2
US West (N. California): us-west-1
US West (Oregon): us-west-2
Africa (Cape Town): af-south-1
Asia Pacific (Hong Kong): ap-east-1
Asia Pacific (Tokyo): ap-northeast-1
Asia Pacific (Seoul): ap-northeast-2
Asia Pacific (Osaka): ap-northeast-3
Feature and region compatibility 204
Amazon Polly Developer Guide
Asia Pacific (Mumbai): ap-south-1
Asia Pacific (Singapore): ap-southeast-1
Asia Pacific (Sydney): ap-southeast-2
China (Ningxia): cn-northwest-1;
Canada (Central): ca-central-1
Europe (Frankfurt): eu-central-1
Europe (Ireland): eu-west-1
Europe (London): eu-west-2
Europe (Paris): eu-west-3
Europe (Stockholm): eu-north-1
Middle East (Bahrain): me-south-1
South America (São Paulo): sa-east-1
AWS GovCloud (US-West): us-gov-west-1
Endpoints and protocols for these Regions are identical to those used for Neural voices. For more
information, see Amazon Polly endpoints and quotas.
The Amazon Polly standard engine supports the following features (TBD):
Real-time and asynchronous speech synthesis operations.
All speech marks.
Many (but not all) SSML tags are supported by Amazon Polly. For more information about NTTS-
supported SSML tags, see Supported SSML tags.
You can choose from various sampling rates to optimize the bandwidth and audio quality for
your application. The default sampling rates for standard voices are 22 kHz. Amazon Polly
supports MP3, OGG (Vorbis), and raw PCM audio stream formats.
Note
Standard voices cost is specified on the Amazon Polly pricing information page.
Feature and region compatibility 205
Amazon Polly Developer Guide
Using the Standard engine on the console
You can access Amazon Polly standard voices through the Amazon Polly console or AWS CLI.
To use a standard voice on the console
1. Open the Amazon Polly console at https://console.aws.amazon.com/polly/.
2. From the Amazon Polly console, choose the Standard engine.
3. Choose the desired voice from the voice dropdown menu.
4. Generate TTS audio with text of your choice.
Note
Standard voices can also be used with the SynthesizeSpeech and
StartSpeechSynthesisTask API operations. For the API operations, customers can
specify the engine and the name of the voices in the API request. You can find more quick-
start code samples.
Using the Standard engine on the console 206
Amazon Polly Developer Guide
Speech marks
Speech marks are metadata that describe the speech that you synthesize, such as where a sentence
or word starts and ends in the audio stream. When you request speech marks for your text,
Amazon Polly returns this metadata instead of synthesized speech. By using speech marks in
conjunction with the synthesized speech audio stream, you can provide your applications with an
enhanced visual experience.
For example, combining the metadata with the audio stream from your text can enable you to
synchronize speech with facial animation (lip-syncing) or to highlight written words as they're
spoken.
Speechmarks are available when using either neural or standard text-to-speech formats.
Topics
Speech mark types
Using speech marks
Requesting speech marks on the console
Speech mark types
You request speech marks using the SpeechMarkTypes option for either the SynthesizeSpeech or
StartSpeechSynthesisTask commands. You specify the metadata elements that you want to return
from your input text. You can request as many as four types of metadata but you must specify at
least one per request. No audio output is generated with the request.
In the AWS CLI, for example:
--speech-mark-types='["sentence", "word", "viseme", "ssml"]'
Amazon Polly generates speech marks using the following elements:
sentence – Indicates a sentence element in the input text.
word – Indicates a word element in the text.
viseme – Describes the face and mouth movements corresponding to each phoneme being
spoken. For more information, see Visemes and Amazon Polly.
Speech mark types 207
Amazon Polly Developer Guide
ssml – Describes a <mark> element from the SSML input text. For more information, see
Generating speech from SSML documents.
Visemes and Amazon Polly
A viseme represents the position of the face and mouth when saying a word. It is the visual
equivalent of a phoneme, which is the basic acoustic unit from which a word is formed. Visemes are
the basic visual building blocks of speech.
Each language has a set of viseme that correspond to their specific phonemes. In a language,
each phoneme has a corresponding viseme that represents the shape that the mouth makes when
forming the sound. However, not all visemes can be mapped to a particular phoneme because
numerous phonemes appear the same when spoken, even though they sound different. For
example, in English, the words "pet" and "bet" are acoustically different. However, when observed
visually (without sound), they look exactly the same.
The following chart shows a partial list of International Phonetic Alphabet (IPA) phonemes and
Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols as well as their
corresponding visemes for US English voices.
For the complete table and tables for all available languages, see Phoneme and Viseme Tables for
Supported Languages.
IPA X-SAMPA Description Example Viseme
Consonants
b b Voiced bilabial
plosive
bed p
d d Voiced alveolar
plosive
dig t
d͡ʒ
dZ Voiced postalveo
lar affricate
jump S
ð D Voiced dental
fricative
then T
Visemes and Amazon Polly 208
Amazon Polly Developer Guide
IPA X-SAMPA Description Example Viseme
f f Voiceless labiodent
al fricative
five f
g g Voiced velar
plosive
game k
h h Voiceless glottal
fricative
house k
... ... ... ... ...
Using speech marks
Requesting speech marks
To request speech marks for input text, use the synthesize-speech command. Besides the input
text, the following elements are required to return this metadata:
output-format
Amazon Polly supports only the JSON format when returning speech marks.
--output-format json
If you use an unsupported output format, Amazon Polly throws an exception.
voice-id
To ensure that the metadata matches the associated audio stream, specify the same voice that is
used to generate the synthesized speech audio stream. The available voices don't have identical
speech rates. If you use a voice other than the one used to generate the speech, the metadata
will not match the audio stream.
--voice-id Joanna
speech-mark-types
Using speech marks 209
Amazon Polly Developer Guide
Specify the type or types of speech marks you want. You can request any or all of the speech
mark types, but must specify at least one type.
--speech-mark-types='["sentence", "word", "viseme", "ssml"]'
text-type
Plain text is the default input text for Amazon Polly, so you must use text-type ssml if you
want to return SSML speech marks.
outfile
Specify the output file to which the metadata is written.
MaryLamb.txt
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \
--output-format json \
--voice-id Voice ID \
--text 'Input text' \
--speech-mark-types='["sentence", "word", "viseme"]' \
outfile
Speech mark output
Amazon Polly returns speech mark objects in a line-delimited JSON stream. A speech mark object
contains the following fields:
time – the timestamp in milliseconds from the beginning of the corresponding audio stream
type – the type of speech mark (sentence, word, viseme, or ssml)
start – the offset in bytes (not characters) of the start of the object in the input text (not
including viseme marks)
Speech mark output 210
Amazon Polly Developer Guide
end – the offset in bytes (not characters) of the object's end in the input text (not including
viseme marks)
value – this varies depending on the type of speech mark
SSML: <mark> SSML tag
viseme: the viseme name
word or sentence: a substring of the input text, as delimited by the start and end fields
For example, Amazon Polly generates the following word speech mark object from the text "Mary
had a little lamb":
{"time":373,"type":"word","start":5,"end":8,"value":"had"}
The described word ("had") begins 373 milliseconds after the audio stream begins, and starts at
byte 5 and ends at byte 8 of the input text.
Note
This metadata is for the Joanna voice-id. If you use another voice with the same input text,
the metadata might differ.
Speech mark examples
The following examples of speech mark requests show how to make common requests and the
output that they generate.
Example 1: Speech Marks Without SSML
The following example shows you what requested metadata looks like on your screen for the
simple sentence: "Mary had a little lamb." For simplicity, we don't include SSML speech marks in
this example.
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
Speech mark examples 211
Amazon Polly Developer Guide
aws polly synthesize-speech \
--output-format json \
--voice-id Joanna \
--text 'Mary had a little lamb.' \
--speech-mark-types='["viseme", "word", "sentence"]' \
MaryLamb.txt
When you make this request, Amazon Polly returns the following in the .txt file:
{"time":0,"type":"sentence","start":0,"end":23,"value":"Mary had a little lamb."}
{"time":6,"type":"word","start":0,"end":4,"value":"Mary"}
{"time":6,"type":"viseme","value":"p"}
{"time":73,"type":"viseme","value":"E"}
{"time":180,"type":"viseme","value":"r"}
{"time":292,"type":"viseme","value":"i"}
{"time":373,"type":"word","start":5,"end":8,"value":"had"}
{"time":373,"type":"viseme","value":"k"}
{"time":460,"type":"viseme","value":"a"}
{"time":521,"type":"viseme","value":"t"}
{"time":604,"type":"word","start":9,"end":10,"value":"a"}
{"time":604,"type":"viseme","value":"@"}
{"time":643,"type":"word","start":11,"end":17,"value":"little"}
{"time":643,"type":"viseme","value":"t"}
{"time":739,"type":"viseme","value":"i"}
{"time":769,"type":"viseme","value":"t"}
{"time":799,"type":"viseme","value":"t"}
{"time":882,"type":"word","start":18,"end":22,"value":"lamb"}
{"time":882,"type":"viseme","value":"t"}
{"time":964,"type":"viseme","value":"a"}
{"time":1082,"type":"viseme","value":"p"}
In this output, each part of the text is broken out in terms of speech marks:
The sentence "Mary had a little lamb."
Each word in the text: "Mary", "had", "a", "little", and "lamb."
The viseme for each sound in the corresponding audio stream: "p", "E", "r", "i", and so on. For
more information on visemes see Visemes and Amazon Polly.
Speech mark examples 212
Amazon Polly Developer Guide
Example 2: Speech marks with SSML
The process of generating speech marks from SSML-enhanced text is similar to the process when
SSML is not present. Use the synthesize-speech command, and specify the SSML-enhanced
text and the type of speech marks that you want, as shown in the following example. To make the
example easier to read, we don't include viseme speech marks, but these could be included as well.
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \
--output-format json \
--voice-id Joanna \
--text-type ssml \
--text '<speak><prosody volume="+20dB">Mary had <break time="300ms"/>a little <mark
name="animal"/>lamb</prosody></speak>' \
--speech-mark-types='["sentence", "word", "ssml"]' \
output.txt
When you make this request, Amazon Polly returns the following in the .txt file:
{"time":0,"type":"sentence","start":31,"end":95,"value":"Mary had <break time=\"300ms
\"\/>a little <mark name=\"animal\"\/>lamb"}
{"time":6,"type":"word","start":31,"end":35,"value":"Mary"}
{"time":325,"type":"word","start":36,"end":39,"value":"had"}
{"time":897,"type":"word","start":40,"end":61,"value":"<break time=\"300ms\"\/>"}
{"time":1291,"type":"word","start":61,"end":62,"value":"a"}
{"time":1373,"type":"word","start":63,"end":69,"value":"little"}
{"time":1635,"type":"ssml","start":70,"end":91,"value":"animal"}
{"time":1635,"type":"word","start":91,"end":95,"value":"lamb"}
Requesting speech marks on the console
You can use the console to request speech marks from Amazon Polly. You can then view the
metadata or save it to a file.
To generate speech marks (console)
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
Requesting speech marks on the console 213
Amazon Polly Developer Guide
2. Choose the Text-to-Speech tab.
3. Turn on SSML to use SSML.
4. Type or paste your text into the input box.
5. For Language, choose the language for your text.
6. For Voice, choose the voice you want to use for the text.
7. To change text pronunciation, expand Additional settings, turn on Customize pronunciation,
and for Apply lexicon, choose the desired lexicon.
8. To verify that the speech is in its final form, choose Listen.
9. Turn on Speech file format settings.
Note
Downloading MP3, OGG, or PCM formats will not generate speech marks.
10. For File Format, choose Speech marks.
11. For Speech mark types, choose the types of speech marks to generate. The option to choose
SSML metadata is only available when SSML is on. For more information on using SSML with
Amazon Polly see Generating speech from SSML documents.
12. Choose Download.
Requesting speech marks on the console 214
Amazon Polly Developer Guide
Generating speech from SSML documents
You can use Amazon Polly to generate speech from either plain text or from documents marked up
with Speech Synthesis Markup Language (SSML). Using SSML-enhanced text gives you additional
control over how Amazon Polly generates speech from the text you provide.
For example, you can include a long pause within your text, or change the speech rate or pitch.
Other options include:
emphasizing specific words or phrases
using phonetic pronunciation
including breathing sounds
whispering
using the Newscaster speaking style.
For complete details on the SSML tags supported by Amazon Polly and how to use them, see
Supported SSML tags
When using SSML, there are several reserved characters that require special treatment. This is
because SSML uses these characters as part of its code. In order to use them, you use a specific
entity to escape them. For more information, see Reserved characters in SSML
Amazon Polly provides these types of control with a subset of the SSML markup tags that are
defined by Speech Synthesis Markup Language (SSML) Version 1.1, W3C Recommendation.
You can use SSML within the Amazon Polly console or by using the AWS CLI. The following topics
show you how you can use SSML to generate speech and control the output so that it precisely fits
your needs.
Topics
Reserved characters in SSML
Using SSML on the console
Using SSML on the AWS CLI
Supported SSML tags
215
Amazon Polly Developer Guide
Reserved characters in SSML
There are five predefined characters that can't normally be used within an SSML statement. These
entities are reserved by the language specification. These characters are
NameCharacter Escape
code
quotation
mark
(double
quotation
mark)
"&quot;
ampersand&&amp;
apostroph
e
or
single
quotation
mark
'&apos;
less
than
sign
<&lt;
greater
than
sign
>&gt;
Because SSML uses these characters as part of its code, to use these symbols in SSML, you must
escape the character when you use it. You use the escape code instead of the actual character so it
displays properly while still creating a valid SSML document. For example, the following sentence
We're using the lawyer at Peabody & Chambers, attorneys-at-law.
Reserved characters 216
Amazon Polly Developer Guide
would be rendered in SSML as
<speak>
We&apos;re using the lawyer at Peabody &amp; Chambers, attorneys-at-law.
</speak>
In this case, the special characters for the apostrophe and ampersand are escaped so the SSML
document remains valid.
For the &, <, and > symbols, escape codes are always necessary when you use SSML. Additionallty,
when you use the apostrophe/single quotation mark (') as an apostrophe, you must also use the
escape code.
However, when you use the double quotation mark ("), or the apostrophe/single quotation mark (')
as a quotation mark, then whether or not you use the escape code is dependent on context.
Double quotation marks
Must be escaped when in a attribute value delimited by double quotes. For example, in the
following AWS CLI code
--text "Pete &quot;Maverick&quot; Mitchell"
Do not need to be escaped when in textual context. For example, in the following
He said, "Turn right at the corner."
Do not need to be escaped when in a attribute value delimited by single quotes. For example, in
the following AWS CLI code
--text 'Pete "Maverick" Mitchell'
Single quotation marks
Must be escaped when used as an apostrophe. For example, in the following
We&apos;ve got to leave quickly.
Do not need to be escaped when in textual context. For example, in the following
Reserved characters 217
Amazon Polly Developer Guide
"And then I said, 'Don't quote me.'"
Do not need to be escaped when in a code attribute delimited by double quotes. For example, in
the following AWS CLI code
--text "Pete 'Maverick' Mitchell"
Using SSML on the console
With SSML tags, you can customize and control aspects of speech such as pronunciation, volume,
and speech rate. In the AWS Management Console, the SSML-enhanced text that you want to
convert to audio is entered on the SSML tab of the Text-to-Speech page. Although text entered
in plain text relies on default settings for the language and voice you've chosen, text enhanced
with SSML tells Amazon Polly not only what you want to say, but how you want to say it. Except
for the added SSML tags, Amazon Polly synthesizes SSML-enhanced text in the same way as it
synthesizes plain text. See Step 1.2: Synthesize speech with plaintext input on the console for more
information.
When using SSML, you enclose the entire text in a <speak> tag to let Amazon Polly know that
you're using SSML. For example:
<speak>Hi! My name is Joanna. I will read any text you type here.</speak>
You then use specific SSML tags on the text inside the <speak> tags to customize the way you
want the text to sound. You can add a pause, change the pace of the speech, lower or raise the
volume of the voice, or add many other customizations so that the text sounds right for you. For a
full list of the SSML tags that you can use, see Supported SSML tags.
In the following example, you use an SSML tag to tell Amazon Polly to substitute "World Wide Web
Consortium" for "W3C" when it speaks a short paragraph. You also use tags to introduce a pause
and whisper a word. Compare the results of this exercise with that of Applying lexicons on the
console (Synthesize Speech) .
For more information on SSML, with examples, see Supported SSML tags.
To synthesize speech from SSML-enhanced text (console)
Using SSML on the console 218
Amazon Polly Developer Guide
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. If it isn't already displayed, choose the Text-to-Speech tab.
3. Turn on SSML.
4. Type or paste the following text in the text box:
<speak>
He was caught up in the game.<break time="1s"/> In the middle of the
10/3/2014 <sub alias="World Wide Web Consortium">W3C</sub> meeting,
he shouted, "Nice job!" quite loudly. When his boss stared at him, he
repeated
<amazon:effect name="whispered">"Nice job,"</amazon:effect> in a
whisper.
</speak>
The SSML tags tell Amazon Polly how to render the text:
<break time="1s"/> tells Amazon Polly to pause 1 second between the first two
sentences.
<sub alias="World Wide Web Consortium">W3C</sub> tells Amazon Polly to
substitute World Wide Web Consortium for the acronym W3C.
<amazon:effect name="whispered">Nice job</amazon:effect> tells Amazon
Polly to whisper the second instance of "Nice job." .
Note
When you use the AWS CLI, you enclose the input text in quotation marks to
differentiate it from the surrounding code. The Amazon Polly console doesn't show
you code, so you don't enclose input text in quotation marks when you use it.
5. For Language, choose English, US, then choose a voice.
6. To listen to the speech, choose Listen.
7. To save the speech file, choose Download. If you want to save it in a different format, expand
Additional settings, turn on Speech file format settings and choose the format that you
want, then choose Download.
Using SSML on the console 219
Amazon Polly Developer Guide
Using SSML on the AWS CLI
You can use the AWS CLI to synthesize SSML input text. The following examples show how to
perform common tasks using the AWS CLI.
Topics
Using SSML with the Synthesize-Speech command
Synthesizing an SSML-enhanced document
Using SSML for common Amazon Polly tasks
Using SSML with the Synthesize-Speech command
This example shows how to use the synthesize-speech command with an SSML string. When
you use the synthesize-speech command, you typically provide the following:
The input text (required)
Opening and closing tags (required)
The output format
A voice
In this example, you specify a simple text string in quotation marks along with the required
opening and closing <speak></speak> tags.
Important
Although you don't use quotation marks around input text in the Amazon Polly console,
you must use them in use the AWS CLI It's also important that you differentiate between
the quotation marks around input text and quotations required for individual tags.
For example, you can use standard quotation marks (") to enclose the input text, and single
quotation marks (') for interior tags, or vice versa. Either option works for Unix, Linux, and
macOS. However, with Windows you must enclose the input text in standard quotations
marks and use single quotation marks for the tags.
For all operating systems, you can use standard quotation marks (") to enclose the input
text, and single quotation marks (') for interior tags). For example:
--text "<speak>Hello <break time='300ms'/> World</speak>"
Using SSML on the AWS CLI 220
Amazon Polly Developer Guide
For Unix, Linux, and macOS, you can also use the reverse, with single quotation marks (')
enclosing the input text and standard quotation marks (") for interior tags:
--text '<speak>Hello <break time="300ms"/> World</speak>'
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \
--text-type ssml \
--text '<speak>Hello world</speak>' \
--output-format mp3 \
--voice-id Joanna \
speech.mp3
To hear the synthesized speech, play the resulting speech.mp3 file using any audio player.
Synthesizing an SSML-enhanced document
For longer input text, you may find it easier to save your SSML content to a file and simply specify
the file name in the synthesize-speech command. For example you could save the following to
a file called example.xml:
<?xml version="1.0"?>
<speak version="1.1"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/
speech-synthesis11/synthesis.xsd"
xml:lang="en-US">Hello World</speak>
The xml:lang attribute specifies en-US (US English) as the language of the input text. For
information about how the language of the input text and the language of the chosen voice affect
the SynthesizeSpeech operation, see Improving the pronunciation of foreign words.
Synthesizing an SSML-enhanced document 221
Amazon Polly Developer Guide
To run an SSML-enhanced file
1.
Save the SSML to a file (for example, example.xml).
2.
Run the following synthesize-speech command from the path where the XML file is stored
and specify the SSML file as input by substituting file:\\example.xml for the input text.
Because this command points to a file instead of containing the actual input text, you don't
use quotation marks.
Note
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows,
replace the backslash (\) Unix continuation character at the end of each line with a
caret (^).
aws polly synthesize-speech \
--text-type ssml \
--text file://example.xml \
--output-format mp3 \
--voice-id Joanna \
speech.mp3
3.
To hear the synthesized speech, play the resulting speech.mp3 file using any audio player.
Using SSML for common Amazon Polly tasks
The following examples show how to use SSML tags to complete common Amazon Polly tasks. For
more SSML tags, see Supported SSML tags.
To test the following examples, use the following synthesize-speech command with the
appropriate SSML-enhanced text:
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \
--text-type ssml \
--text '<speak>Hello <break time="300ms"/> World</speak>' \
Using SSML for common Amazon Polly tasks 222
Amazon Polly Developer Guide
--output-format mp3 \
--voice-id Joanna \
speech.mp3
Adding a pause
To add a pause between words, use the <break> element. The following SSML synthesize-
speechcommand uses the <break> element to add a 300-millisecond delay between the words
"Hello" and "World."
<speak>
Hello <break time="300ms"/> World.
</speak>
Controlling volume, pitch, and speed
To control pitch, speaking rate, and speech volume, use the <prosody> element.
The following synthesize-speech command uses the <prosody> element to control volume:
<speak>
<prosody volume="+20dB">Hello world</prosody>
</speak>
The following synthesize-speech command uses the <prosody> element to control pitch:
<speak>
<prosody pitch="x-high">Hello world.</prosody>
</speak>
The following synthesize-speech command uses the <prosody> element to specify the
speech rate (speaking speed):
<speak>
<prosody rate="x-fast">Hello world.</prosody>
</speak>
You can specify multiple attributes in a <prosody> element, as shown in the following
examples:
Using SSML for common Amazon Polly tasks 223
Amazon Polly Developer Guide
<speak>
<prosody volume="x-loud" pitch="x-high" rate="x-fast">Hello world.</prosody>
</speak>
Whispering
To whisper words, use the <amazon:effect name="whispered"> element. In the following
example, the <amazon:effect name="whispered"> element tells Amazon Polly to whisper
"little lamb":
<speak>
Mary has a <amazon:effect name="whispered">little lamb.</amazon:effect>
</speak>
To enhance this effect, use the <prosody> element to slightly slow down the whispered speech.
Emphasizing words
To stress a word or phrase, use the <emphasis> element.
<speak>
<emphasis level="strong">Hello</emphasis> world how are you?
</speak>
Specifying how to say certain words
To provide information about the type of text to be spoken, use the <say-as> element.
For instance, in the following SSML, <say-as> indicates that the text 4/6 should be interpreted as
a date. The attribute interpret-as="date" format="dm" indicates that it should be spoken as
a date with the format month/day.
You can also use the <say-as> element to tell Amazon Polly to say numbers as fractions, telephone
numbers, measurement units, and more.
<speak>
Today is <say-as interpret-as="date" format="md" >4/6</say-as>
Using SSML for common Amazon Polly tasks 224
Amazon Polly Developer Guide
</speak>
The resulting speech is "Today is June 4th." The <say-as> tag describes how the text should be
interpreted by providing additional context with the interpret-as attribute.
To verify the accuracy of the synthesized speech, play the resulting speech.mp3 file.
For more information on this element, see Controlling how special types of words are spoken .
Improving the pronunciation of foreign words
Amazon Polly assumes that the input text is in the same language as the language spoken by
the voice you choose. To improve the pronunciation of foreign words within input text, in the
synthesize-speech call. Specify the target language with the xml:lang attribute. This tells
Amazon Polly to apply different pronunciation rules for the foreign words that you tag.
The following examples show how to use different combinations of languages in the input text,
and how to specify voices and the pronunciation of foreign words. For a complete list of available
languages, see Languages in Amazon Polly.
In the following example, the voice (Joanna) is a US English voice. By default, Amazon Polly
assumes that the input text is in the same language as the voice (in this case, US English). When
you use the xml:lang tag, Amazon Polly interprets the text as Spanish and the text is spoken as
the selected voice would pronounce Spanish words, according to the pronunciation rules of the
foreign language. Without this tag, the text is spoken using the pronunciation rules of the selected
voice.
<speak>
That restaurant is terrific. <lang xml:lang="es-ES">Mucho gusto.</lang>
</speak>
Because the language of the input text is English, Amazon Polly maps the Spanish phonemes
to the closest English phonemes. As a result, Joanna speaks the text as a native US speaker who
pronounces the works correctly in Spanish, but with a US English accent.
Note
Some languages are more similar than others, and so some language combinations work
better than others.
Using SSML for common Amazon Polly tasks 225
Amazon Polly Developer Guide
Supported SSML tags
Amazon Polly supports the following SSML tags:
Action SSML tag Availabil
ity with
neural
voices
Availabil
ity with
long-form
voices
Availabil
ity with
generative
voices
Adding a pause <break> Full
availability
Full
availability
Full
availability
Emphasizing words <emphasis> Not
available
Not
available
Not
available
Specifying another
language for specific
words
<lang> Full
availability
Full
availability
Full
availability
Placing a custom tag in
your text
<mark> Full
availability
Full
availability
Full
availability
Adding a pause between
paragraphs
<p> Full
availability
Full
availability
Full
availability
Using phonetic
pronunciation
<phoneme> Full
availability
Full
availability
Not
available
Controlling volume,
speaking rate, and pitch
<prosody> Partial
availability
Partial
availability
Not
available
Setting a maximum
duration for synthesized
speech
<prosody amazon:max-
duration>
Not
available
Not
available
Not
available
Adding a pause between
sentences
<s> Full
availability
Full
availability
Full
availability
Supported SSML tags 226
Amazon Polly Developer Guide
Action SSML tag Availabil
ity with
neural
voices
Availabil
ity with
long-form
voices
Availabil
ity with
generative
voices
Controlling how special
types of words are
spoken
<say-as> Partial
availability
Partial
availability
Partial
availability
Identifying SSML-enha
nced text
<speak> Full
availability
Full
availability
Full
availability
Pronouncing acronyms
and abbreviations
<sub> Full
availability
Full
availability
Full
availability
Improving pronunciation
by specifying parts of
speech
<w> Full
availability
Full
availability
Full
availability
Adding the sound of
breathing
<amazon:auto-breaths> Not
available
Not
available
Not
available
Newscaster speaking
style
<amazon:domain
name="news">
Select
neural
voices only
Not
available
Not
available
Adding dynamic range
compression
<amazon:effect
name="drc">
Full
availability
Full
availability
Not
available
Speaking softly <amazon:effect
phonation="soft">
Not
available
Not
available
Not
available
Controlling timbre <amazon:effect vocal-tra
ct-length>
Not
available
Not
available
Not
available
Whispering <amazon:effect
name="whispered">
Not
available
Not
available
Not
available
Supported SSML tags 227
Amazon Polly Developer Guide
Note
If you use unsupported SSML tags in standard, neural, or long-form format, you will get an
error.
Identifying SSML-enhanced text
<speak>
This tag is supported by generative, long-form, neural, and standard TTS formats.
The <speak> tag is the root element of all Amazon Polly SSML text. All SSML-enhanced text must
be enclosed within a pair of <speak> tags.
<speak>Mary had a little lamb.</speak>
Adding a pause
<break>
This tag is supported by generative, long-form, neural, and standard TTS formats.
To add a pause to your text, use the <break> tag. You can set a pause based on strength
(equivalent to the pause after a comma, a sentence, or a paragraph), or you can set it to a specific
length of time in seconds or milliseconds. If you don't specify an attribute to determine the pause
length, Amazon Polly uses the default, which is <break strength="medium"/>, which adds a
pause the length of a pause after a comma.
strength attribute values:
none: No pause. Use none to remove a normally occurring pause, such as after a period.
x-weak: Has the same strength as none, no pause.
weak: Sets a pause of the same duration as the pause after a comma.
medium: Has the same strength as weak.
strong: Sets a pause of the same duration as the pause after a sentence.
Identifying SSML-enhanced text 228
Amazon Polly Developer Guide
x-strong: Sets a pause of the same duration as the pause after a paragraph.
time attribute values:
[number]s: The duration of the pause, in seconds. The maximum duration is 10s.
[number]ms: The duration of the pause, in milliseconds. The maximum duration is 10000ms.
For example:
<speak>
Mary had a little lamb <break time="3s"/>Whose fleece was white as snow.
</speak>
If you don't use an attribute with the break tag, the result varies depending on text:
If there is no other punctuation next to the break tag, it creates a <break
strength="medium"/> (comma-length pause).
If the tag is next to a comma, it upgrades the tag to a <break strength="strong"/>
(sentence-length pause).
If the tag is next to a period, it upgrades the tag to <break strength="x-strong"/>
(paragraph-length pause).
Emphasizing words
<emphasis>
This tag is supported only by the standard TTS format.
To emphasize words, use the <emphasis> tag. Emphasizing words changes the speaking rate and
volume. More emphasis makes Amazon Polly speak the text louder and slower. Less emphasis
makes it speak quieter and faster. To specify the degree of emphasis, use the level attribute.
level attribute values:
Strong: Increases the volume and slows the speaking rate so that the speech is louder and
slower.
Emphasizing words 229
Amazon Polly Developer Guide
Moderate: Increases the volume and slows the speaking rate, but less than strong. Moderate
is the default.
Reduced: Decreases the volume and speeds up the speaking rate. Speech is softer and faster.
Note
The normal speaking rate and volume for a voice falls between the moderate and
reduced levels.
For example:
<speak>
I already told you I <emphasis level="strong">really like</emphasis> that person.
</speak>
Specifying another language for specific words
<lang>
This tag is supported by generative, long-form, neural, and standard TTS formats.
Specify another language for a specific word, phrase, or sentence with the <lang> tag. Foreign
language words and phrases are generally spoken better when they are enclosed within a pair of
<lang> tags. To specify the language, use the xml:lang attribute. For a complete list of available
languages, see Languages in Amazon Polly.
Unless you apply the <lang> tag, all of the words in the input text are spoken in the language of
the voice specified in the voice-id. If you apply the <lang> tag, the words are spoken in that
language.
For example, if the voice-id is Joanna (who speaks US English), Amazon Polly speaks the
following in the Joanna voice without a French accent:
<speak>
Je ne parle pas français.
</speak>
Specifying another language for specific words 230
Amazon Polly Developer Guide
If you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the Joanna
voice in American-accented French:
<speak>
<lang xml:lang="fr-FR">Je ne parle pas français.</lang>.
</speak>
Because Joanna is not a native French voice, pronunciation is based on her native language, US
English. For example, although perfect French pronunciation features an uvual trill /R/ in the word
français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.
If you use the voice-id of Giorgio, who speaks Italian, with the following text, Amazon Polly
speaks the sentence in Giorgio's voice with an Italian pronunciation:
<speak>
Mi piace Bruce Springsteen.
</speak>
If you use the same voice with the following <lang> tag, Amazon Polly pronounces Bruce
Springsteen in Italian-accented English:
<speak>
Mi piace <lang xml:lang="en-US">Bruce Springsteen.</lang>
</speak>
This tag can also be used as a substitute for the optional DefaultLangCode option when
synthesizing speech. However, doing so requires that you format your text using SSML.
Placing a custom tag in your text
<mark>
This tag is supported by generative, long-form, neural, and standard TTS formats.
To put a custom tag within the text, use the <mark> tag. Amazon Polly takes no action on the tag,
but returns the location of the tag in the SSML metadata. This tag can be anything you want to call
out, as long as it maintains the following format:
Placing a custom tag in your text 231
Amazon Polly Developer Guide
<mark name="tag_name"/>
For example, suppose that the tag name is "animal" and the input text is:
<speak>
Mary had a little <mark name="animal"/>lamb.
</speak>
Amazon Polly might return the following SSML metadata:
{"time":767,"type":"ssml","start":25,"end":46,"value":"animal"}
Adding a pause between paragraphs
<p>
This tag is supported by generative, long-form, neural, and standard TTS formats.
To add a pause between paragraphs in your text, use the <p> tag. Using this tag provides a longer
pause than native speakers usually place at commas or the end of a sentence. Use the <p> tag to
enclose the paragraph:
<speak>
<p>This is the first paragraph. There should be a pause after this text is
spoken.</p>
<p>This is the second paragraph.</p>
</speak>
This is equivalent to specifying a pause using <break strength="x-strong"/>.
Using phonetic pronunciation
<phoneme>
This tag is supported by long-form, neural, and standard TTS formats.
To make Amazon Polly use phonetic pronunciation for specific text, use the <phoneme> tag.
Adding a pause between paragraphs 232
Amazon Polly Developer Guide
Two attributes are required with the <phoneme> tag. They indicate the phonetic alphabet Amazon
Polly uses and the phonetic symbols of the corrected pronunciation:
alphabet
ipa Indicates that the International Phonetic Alphabet (IPA) will be used.
x-sampa Indicates that the Extended Speech Assessment Methods Phonetic Alphabet (X-
SAMPA) will be used.
ph
Specifies the phonetic symbols for pronunciation. For more information, see Phoneme and
Viseme Tables for Supported Languages
With the <phoneme> tag, Amazon Polly uses the pronunciation specified by the ph attribute
instead of the standard pronunciation associated by default with the language used by the selected
voice.
For instance, the word "pecan" can be pronounced two ways. In the following example, “pecan” is
assigned a different pronunciation in each line. Amazon Polly pronounces pecan as specified in the
ph attributes, instead of using the default pronunciation.
International Phonetic Alphabet (IPA)
<speak>
You say, <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>.
I say, <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>.
</speak>
Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA)
<speak>
You say, <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>.
I say, <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>.
</speak>
Mandarin Chinese uses Pinyin for phonetic pronunciation..
Pinyin
<speak>
## <phoneme alphabet="x-amazon-pinyin" ph="bo2">#</phoneme>#
Using phonetic pronunciation 233
Amazon Polly Developer Guide
## <phoneme alphabet="x-amazon-pinyin" ph="bao2">#</phoneme>#
</speak>
Japanese uses Yomigana and Pronunciation Kana.
Yomigana
<speak>
###<phoneme alphabet="x-amazon-yomigana" ph="####">##</phoneme>###
###<phoneme alphabet="x-amazon-yomigana" ph="####">##</phoneme>###
###<phoneme alphabet="x-amazon-yomigana" ph="Hirokazu">##</phoneme>###
</speak>
Pronunciation Kana
<speak>
###<phoneme alphabet="x-amazon-pron-kana" ph="##'##">##</phoneme>###
</speak>
Controlling volume, speaking rate, and pitch
<prosody>
Prosody tag attributes are fully supported by the standard TTS voices. Neural and long-form voices
support the volume and rate attributes, but don't support the pitch attribute.
To control the volume, rate, or pitch of your selected voice, use the prosody tag.
Volume, speech rate, and pitch are dependent on the specific voice selected. In addition to
differences between voices for different languages, there are differences between individual voices
speaking the same language. Because of this, while attributes are similar across all languages,
there are clear variations from language to language and no absolute value is available.
The prosody tag has three attributes, each of which has several available values to set the
attribute. Each attribute uses the same syntax:
<prosody attribute="value"></prosody>
volume
default: Resets volume to the default level for the current voice.
Controlling volume, speaking rate, and pitch 234
Amazon Polly Developer Guide
silent, x-soft, soft, medium, loud, x-loud: Sets the volume to a predefined value for the
current voice.
+ndB, -ndB: Changes volume relative to the current level. A value of +0dB means no change,
+6dB means approximately twice the current volume, and -6dB means approximately half the
current volume.
For example, you could set the volume for a passage as follows:
<speak>
Sometimes it can be useful to <prosody volume="loud">increase the volume
for a specific speech.</prosody>
</speak>
Or you could set it this way:
<speak>
And sometimes a lower volume <prosody volume="-6dB">is a more effective way of
interacting with your audience.</prosody>
</speak>
rate
x-slow, slow, medium, fast,x-fast. Sets the pitch to a predefined value for the selected
voice.
n%: A non-negative percentage change in the speaking rate. For example, a value of 100%
means no change in speaking rate, a value of 200% means a speaking rate twice the default
rate, and a value of 50% means a speaking rate of half the default rate. This value has a range
of 20-200%.
For example, you could set the speech rate for a passage as follows:
<speak>
For dramatic purposes, you might wish to <prosody rate="slow">slow up the
speaking
rate of your text.</prosody>
</speak>
Or you could set it this way:
<speak>
Controlling volume, speaking rate, and pitch 235
Amazon Polly Developer Guide
Although in some cases, it might help your audience to <prosody rate="85%">slow
the speaking rate slightly to aid in comprehension.</prosody>
</speak>
pitch
default: Resets pitch to the default level for the current voice.
x-low, low, medium, high, x-high: Sets the pitch to a predefined value for the current voice.
+n% or -n%: Adjusts pitch by a relative percentage. For example, a value of +0% means no
baseline pitch change, +5% gives a little higher baseline pitch, and -5% results in a little lower
baseline pitch.
For example, you could set the pitch for a passage as follows:
<speak>
Do you like sythesized speech <prosody pitch="high">with a pitch that is higher
than normal?</prosody>
</speak>
Or you could set it this way:
<speak>
Or do you prefer your speech <prosody pitch="-10%">with a somewhat lower pitch?
</prosody>
</speak>
The <prosody> tag must contain at least one attribute, but can include more within the same tag.
<speak>
Each morning when I wake up, <prosody volume="loud" rate="x-slow">I speak
quite slowly and deliberately until I have my coffee.</prosody>
</speak>
It can also be combined with nested tags, as follows:
<speak>
<prosody rate="85%">Sometimes combining attributes <prosody pitch="-10%">can
change the impression your audience has of a voice</prosody> as well.</prosody>
</speak>
Controlling volume, speaking rate, and pitch 236
Amazon Polly Developer Guide
Setting a maximum duration for synthesized speech
<prosody amazon:max-duration>
This tag is currently supported only by the standard TTS format.
To control how long you want a speech to take when it is synthesized, use the <prosody> tag with
the amazon:max-duration attribute.
The duration of synthesized speech varies slightly, depending on the voice you select. This can
make it difficult to match synthesized speech with visuals or other activities that require precise
timing. This issue is magnified for translation applications because the time it takes to say
particular phrases can vary widely with different languages.
The <prosody amazon:max-duration> tag matches synthesized speech to the amount of time
you want it to take (the duration).
This tag uses the following syntax:
<prosody amazon:max-duration="time duration">
With the <prosody amazon:max-duration> tag, you can specify duration in either seconds or
milliseconds:
ns: the maximum duration in seconds
nms: the maximum duration in milliseconds
For example, the following spoken text has a maximum duration of 2 seconds:
<speak>
<prosody amazon:max-duration="2s">
Human speech is a powerful way to communicate.
</prosody>
</speak>
Text placed within the tag, it doesn't exceed the specified duration. If the chosen voice or language
would normally take longer than that duration, Amazon Polly speeds up the speech so that it fits
into the specified duration.
Setting a maximum duration for synthesized speech 237
Amazon Polly Developer Guide
If the specified duration is longer than it takes to read the text at a normal rate, Amazon Polly
reads the speech normally. It doesn't slow down the speech or add silence, so the resulting audio is
shorter than requested.
Note
Amazon Polly increases the speed no more than 5 times the normal rate. If text is spoken
faster than this, it usually doesn't make sense. If a speech cannot fit within your specified
duration even when speeded up to the maximum, the audio will be speeded up but will last
longer than the specified duration.
You can include a single sentence or multiple sentences within a <prosody amazon:max-
duration> tag, and you can use multiple <prosody amazon:max-duration> tags within your
text.
For example:
<speak>
<prosody amazon:max-duration="2400ms">
Human speech is a powerful way to communicate.
</prosody>
<break strength="strong"/>
<prosody amazon:max-duration="5100ms">
Even a simple ‘Hello’ can convey a lot of information depending on the pitch,
intonation, and tempo.
</prosody>
<break strength="strong"/>
<prosody amazon:max-duration="8900ms">
We naturally understand this information, which is why speech is ideal for
creating applications where
a screen isn’t practical or possible, or simply isn’t convenient.
</prosody>
</speak>
Using the <prosody amazon:max-duration> tag can increase latency when Amazon Polly
is returns synthesized speech. The degree of latency depends on the passage and its length. We
recommend using text comprised of relatively short text passages.
Setting a maximum duration for synthesized speech 238
Amazon Polly Developer Guide
Limitations
There are limitations both in how you use <prosody amazon:max-duration> tag and in how it
works with other SSML tags:
The text inside a <prosody amazon:max-duration> tag can't be longer than 1500 characters.
You can't nest <prosody amazon:max-duration> tags. If you put one <prosody
amazon:max-duration> tag inside another, Amazon Polly ignores the inner tag.
For example, in the following, the <prosody amazon:max-duration="5s"> tag is ignored:
<speak>
<prosody amazon:max-duration="16s">
Human speech is a powerful way to communicate.
<prosody amazon:max-duration="5s">
Even a simple ‘Hello’ can convey a lot of information depending on the
pitch, intonation, and tempo.
</prosody>
We naturally understand this information, which is why speech is ideal for
creating applications where a screen isn’t practical or possible, or simply isn’t
convenient.
</prosody>
</speak>
You can't use the <prosody> tags with the rate attribute within a <prosody amazon:max-
duration> tag. This is because both affect the speed at which text is spoken.
In the following example, Amazon Polly ignores the <prosody rate="2"> tag:
<speak>
<prosody amazon:max-duration="7500ms">
Human speech is a powerful way to communicate.
<prosody rate="2">
Even a simple ‘Hello’ can convey a lot of information depending on the
pitch, intonation, and tempo.
</prosody>
</prosody>
</speak>
Setting a maximum duration for synthesized speech 239
Amazon Polly Developer Guide
Pauses and max-duration
When using max-duration tag, you can still insert pauses within your text. However, Amazon
Polly includes the length of the pause when calculating the maximum duration for speech.
Additionally, Amazon Polly preserves the short pauses that occur where commas and periods are
placed within a passage and includes in the maximum duration.
For example, in the following block, the 600 millisecond break and the breaks caused by the
commas and periods occur within the 8-second speech:
<speak>
<prosody amazon:max-duration="8s">
Human speech is a powerful way to communicate.
<break time="600ms"/>
Even a simple ‘Hello’ can convey a lot of information depending on the pitch,
intonation, and tempo.
</prosody>
</speak>
Adding a pause between sentences
<s>
This tag is supported by generative, long-form, neural, and standard TTS formats.
To add a pause between lines or sentences in your text, use the <s> tag. Using this tag has the
same effect as:
Ending a sentence with a period (.)
Specifying a pause with <break strength="strong"/>
Unlike the <break> tag, the <s> tag encloses the sentence. This is useful for synthesizing speech
that is organized in lines, rather than sentence, such as poetry.
In the following example, the <s> tag creates a short pause after both the first and second
sentences. The final sentence has no <s> tag, but it is also followed by a short pause because it
ends with a period.
<speak>
Adding a pause between sentences 240
Amazon Polly Developer Guide
<s>Mary had a little lamb</s>
<s>Whose fleece was white as snow</s>
And everywhere that Mary went, the lamb was sure to go.
</speak>
Controlling how special types of words are spoken
<say-as>
Except for the characters option, the <say-as> tag is supported by generative, long-form,
neural, and standard TTS formats. Note that if Amazon Polly is using a neural voice and encounters
the <say-as> tag with the characters option at runtime, the affected sentence will be
synthesized using the related standard voice. However, the affected sentence will still be billed as if
it uses a neural voice.
Use the <say-as> tag with the interpret-as attribute to tell Amazon Polly how to say certain
characters, words, and numbers. This enables you to provide additional context to eliminate any
ambiguity on how Amazon Polly should render the text.
The <say-as> tag uses one attribute, interpret-as, which uses a number of possible available
values. Each uses the same syntax:
<say-as interpret-as="value">[text to be interpreted]</say-as>
The following values are available with interpret-as:
characters or spell-out: Spells out each letter of the text, as in a-b-c.
Note
This option is not currently supported for neural voices. If you're using a neural voice and
this SSML code is encountered by Amazon Polly at run-time, the affected sentence will
be synthesized using the related standard voice. Please note, however, that this sentence
will still be billed as if it uses a neural voice.
cardinal or number: Interprets the numerical text as a cardinal number, as in 1,234.
ordinal: Interprets the numerical text as an ordinal number, as in 1,234th.
digits: Spells out each digit individually, as in 1-2-3-4.
Controlling how special types of words are spoken 241
Amazon Polly Developer Guide
fraction: Interprets the numerical text as a fraction. This works for both common fractions
such as 3/20, and mixed fractions, such as 2 ½. See below for more information.
unit: Interprets a numerical text as a measurement. The value should be either a number or
a fraction followed by a unit with no space in between as in 1/2inch, or by just a unit, as in
1meter.
date: Interprets the text as a date. The format of the date must be specified with the format
attribute. See below for more information.
time: Interprets the numerical text as duration, in minutes and seconds, as in 1'21".
address: Interprets the text as part of a street address.
expletive: "Beeps out" the content included within the tag.
telephone: Interprets the numerical text as a 7-digit or 10-digit telephone number,
as in 2025551212. You can also use this value for handle telephone extensions, as in
2025551212x345. See below for more information.
Note
Currently the telephone option is not available for all languages. However, it is
available for voices speaking English language variants (en-AU, en-GB, en-IN, en-US,
and en-GB-WLS), Spanish language variants (es-ES, es-MX, and es-US), French language
variants (fr-FR and fr-CA), and Portuguese variants (pt-BR and pt-PT), as well as German
(de-DE), Italian (it-IT), Japanese (ja-JP), and Russian (ru-RU). It should also be noted that
in some cases, languages such as Arabic (arb) automatically handle the number set as a
telephone number and so don't actually implement the telephone SSML tag.
Fractions
Amazon Polly interprets values within the say-as tag that have the interpret-as="fraction"
attribute as common fractions. The following is the syntax for fractions:
Fraction
Syntax: cardinal number/cardinal number, such as 2/9.
For example: <say-as interpret-as="fraction">2/9</say-as> is pronounced "two
ninths."
Non-negative Mixed Number
Controlling how special types of words are spoken 242
Amazon Polly Developer Guide
Syntax: cardinal number+cardinal number/cardinal number, such as 3+1/2.
For example, <say-as interpret-as="fraction">3+1/2</say-as> is pronounced "three
and a half."
Note
There must be a + between the "3" and the "1/2". Amazon Polly doesn't support a mixed
number without the +, such as "3 1/2".
Dates
When interpret-as is set to date, you also need to indicate the format of the date.
This uses the following syntax:
<say-as interpret-as="date" format="format">[date]</say-as>
For example:
<speak>
I was born on <say-as interpret-as="date" format="mdy">12-31-1900</say-as>.
</speak>
The following formats can be used with the date attribute.
mdy: Month-day-year.
dmy: Day-month-year.
ymd: Year-month-day.
md: Month-day.
dm: Day-month.
ym: Year-month.
my: Month-year.
d: Day.
m: Month.
y: Year.
Controlling how special types of words are spoken 243
Amazon Polly Developer Guide
yyyymmdd: Year-month-day. If you use this format, you can make Amazon Polly skip parts of the
date using question marks.
For example, Amazon Polly renders the following as "September 22nd":
<say-as interpret-as="date">????0922</say-as>
Format is not needed.
Telephone
Amazon Polly attempts to interpret the text you provide correctly based on the text’s formatting
even without the <say-as> tag. For example, if your text includes "202-555-1212," Amazon Polly
interprets it as a 10-digit telephone number and says each digit individually, with a brief pause
for each dash. In this case, you don't need to use <say-as interpret-as="telephone">.
However, if you provide the text “2025551212” and want Amazon Polly to say it as a phone
number, you would specify <say-as interpret-as="telephone">.
The logic for interpreting each element is language-specific. For example, US and UK English differ
in how phone numbers are pronounced (in UK English, sequences of the same digit are grouped
together, as in "double five" or "triple four"). To see the difference, test the following example with
a US voice and with a UK voice:
<speak>
Richard's number is <say-as interpret-as="telephone">2122241555</say-as>
</speak>
Pronouncing acronyms and abbreviations
<sub>
This tag is supported by generative, long-form, neural, and standard TTS formats.
Use the <sub> tag with the alias attribute to substitute a different word (or pronunciation) for
selected text such as an acronym or abbreviation.
This uses the syntax:
<sub alias="new word">abbreviation</sub>
Pronouncing acronyms and abbreviations 244
Amazon Polly Developer Guide
In the following example, the name "Mercury" is substituted for the element's chemical symbol to
make the audio content clearer.
<speak>
My favorite chemical element is <sub alias="Mercury">Hg</sub>, because it looks so
shiny.
</speak>
Improving pronunciation by specifying parts of speech
<w>
This tag is supported by generative, long-form, neural, and standard TTS formats.
You can use the <w> tag to customize the pronunciation of words by specifying the word’s part of
speech or alternate meaning. This is done using the role attribute.
This tag uses the following syntax:
<w role="attribute">text</w>
The following values can be used for the role attribute:
To specify the part of speech:
amazon:VB: interprets the word as a verb (present simple).
amazon:VBD: interprets the word as past tense verb.
amazon:DT: interprets the word as a determiner.
amazon:IN: interprets the word as a preposition.
amazon:JJ: interprets the word as an adjective.
amazon:NN: interprets the word as a noun.
For example, depending on its part of speech, the US English pronunciation of the word "read"
varies based on the tag:
<speak>
The word <say-as interpret-as="characters">read</say-as> may be interpreted
as either the present simple form <w role="amazon:VB">read</w>, or the past
participle form <w role="amazon:VBD">read</w>.
Improving pronunciation by specifying parts of speech 245
Amazon Polly Developer Guide
</speak>
To specify a specific meaning:
amazon:DEFAULT: uses the default sense of the word.
amazon:SENSE_1: uses the non-default sense of the word when present. For example, the noun
"bass" is pronounced differently depending on its meaning. The default meaning is the lowest
part of the musical range. The alternate meaning is a species of freshwater fish, also called "bass"
but pronounced differently. Using <w role="amazon:SENSE_1">bass</w> renders the non-
default pronunciation (freshwater fish) for the audio text.
This difference in pronunciation and meaning can be heard if you synthesize the following:
<speak>
Depending on your meaning, the word <say-as interpret-as="characters">bass</say-
as>
may be interpreted as either a musical element: bass, or as its alternative
meaning,
a freshwater fish <w role="amazon:SENSE_1">bass</w>.
</speak>
Note
Some languages may have a different selection of supported parts of speech.
Adding the sound of breathing
<amazon:breath> and <amazon:auto-breaths>
This tag is supported only by the standard TTS format.
Natural-sounding speech includes both correctly spoken words and breathing sounds. By
adding breathing sounds to synthesized speech, you can make it sound more natural. The
<amazon:breath> and <amazon:auto-breaths> tags provide breaths. You have the following
options:
Manual mode: you set the location, length, and volume of a breath sound within the text
Automated mode: Amazon Polly automatically inserts breathing sounds into the speech output
Adding the sound of breathing 246
Amazon Polly Developer Guide
Mixed mode: both you and Amazon Polly add breathing sounds
Manual Mode
In manual mode, you place the <amazon:breath/> tag in the input text where you want to locate
a breath. You can customize the length and volume of breaths with the duration and volume
attributes, respectively:
duration: Controls the length of the breath. Valid values are: default, x-short, short,
medium, long, x-long. The default value is medium.
volume: Controls how loud breathing sounds. Valid values are: default, x-soft, soft,
medium, loud, x-loud. The default value is medium.
Note
The exact length and volume of each attribute value is dependent on the specific Amazon
Polly voice used.
To set a breath sound using the defaults, use <amazon:breath/> without attributes.
For example, to use attributes to set the duration and volume for a breath to medium, you would
set the attributes as follows:
<speak>
Sometimes you want to insert only <amazon:breath duration="medium" volume="x-
loud"/>a single breath.
</speak>
To use the defaults, you would just use the tag:
<speak>
Sometimes you need <amazon:breath/>to insert one or more average breaths
<amazon:breath/> so that the
text sounds correct.
</speak>
You can add individual breathing sounds within a passage, as follows:
Adding the sound of breathing 247
Amazon Polly Developer Guide
<speak>
<amazon:breath duration="long" volume="x-loud"/> <prosody rate="120%"> <prosody
volume="loud">
Wow! <amazon:breath duration="long" volume="loud"/> </prosody> That was quite
fast. <amazon:breath
duration="medium" volume="x-loud"/> I almost beat my personal best time on this
track. </prosody>
</speak>
Automated Mode
In automated mode, you use the <amazon:auto-breaths> tag to tell Amazon Polly to
automatically create breathing noises at appropriate intervals. You can set the frequency of the
intervals, their volume, and their duration. Place the </amazon:auto-breaths> tag at the
beginning of the text that you want to apply automated breathing to and then close the tag at the
end.
Note
Unlike the manual mode tag, <amazon:breath/>, the <amazon:auto-breaths> tag
requires a closing tag (</amazon:auto-breaths>).
You can use the following optional attributes with the <amazon:auto-breaths> tag:
volume: Controls how loud the breathing sounds. Valid values are: default, x-soft, soft,
medium, loud, x-loud. The default value is medium.
frequency: Controls how often breathing sounds occur in the text. Valid values are: default,
x-low, low, medium, high, x-high. The default value is medium.
duration: Controls the length of the breath. Valid values are: default, x-short, short,
medium, long, x-long. The default value is medium.
By default, the frequency of breathing sounds depends on the input text. However, breathing
sounds often occur after commas and periods.
The following examples show how to use the <amazon:auto-breaths> tag. To decide which
options to use for your content, copy the applicable examples to the Amazon Polly console and
listen to the differences.
Adding the sound of breathing 248
Amazon Polly Developer Guide
Using automated mode without optional parameters.
<speak>
<amazon:auto-breaths>Amazon Polly is a service that turns text into lifelike
speech,
allowing you to create applications that talk and build entirely new categories
of speech-
enabled products. Amazon Polly is a text-to-speech service that uses advanced
deep learning
technologies to synthesize speech that sounds like a human voice. With dozens of
lifelike
voices across a variety of languages, you can select the ideal voice and build
speech-
enabled applications that work in many different countries.</amazon:auto-
breaths>
</speak>
Using automated mode with volume control. The unspecified parameters (duration and
frequency) are set to the default values (medium).
<speak>
<amazon:auto-breaths volume="x-soft">Amazon Polly is a service that turns text
into lifelike
speech, allowing you to create applications that talk and build entirely new
categories of
speech-enabled products. Amazon Polly is a text-to-speech service, that uses
advanced deep
learning technologies to synthesize speech that sounds like a human voice. With
dozens of
lifelike voices across a variety of languages, you can select the ideal voice
and build speech-
enabled applications that work in many different countries.</amazon:auto-
breaths>
</speak>
Using automated mode with frequency control. The unspecified parameters (duration and
volume) are set to the default values (medium).
<speak>
<amazon:auto-breaths frequency="x-low">Amazon Polly is a service that turns text
into lifelike
speech, allowing you to create applications that talk and build entirely new
categories of
Adding the sound of breathing 249
Amazon Polly Developer Guide
speech-enabled products. Amazon Polly is a text-to-speech service, that uses
advanced deep
learning technologies to synthesize speech that sounds like a human voice. With
dozens of
lifelike voices across a variety of languages, you can select the ideal voice
and build speech-
enabled applications that work in many different countries.</amazon:auto-
breaths>
</speak>
Using automated mode with multiple parameters. For the unspecified Duration parameter,
Amazon Polly uses the default value (medium).
<speak>
<amazon:auto-breaths volume="x-loud" frequency="x-low">Amazon Polly is a service
that turns
text into lifelike speech, allowing you to create applications that talk and
build entirely new
categories of speech-enabled products. Amazon Polly is a text-to-speech service,
that uses
advanced deep learning technologies to synthesize speech that sounds like a
human voice. With
dozens of lifelike voices across a variety of languages, you can select the
ideal voice and build
speech-enabled applications that work in many different countries.</amazon:auto-
breaths>
</speak>
Newscaster speaking style
<amazon:domain name="news">
The newscaster style is available only for the Matthew or Joanna voices, which are available only in
American English (en-US), Lupe, in US Spanish (es-US) and Amy, in British English (en-GB). It is only
supported when using Neural format.
To use the newscaster style, you use SSML tags and the following syntax::
<amazon:domain name="news">text</amazon:domain>
For example, you might use the newscaster style with the Amy voice as follows:
Newscaster speaking style 250
Amazon Polly Developer Guide
<speak>
<amazon:domain name="news">
From the Tuesday, April 16th, 1912 edition of The Guardian newspaper:
The maiden voyage of the White Star liner Titanic, the largest ship ever launched, has
ended in disaster.
The Titanic started her trip from Southampton for New York on Wednesday. Late on Sunday
night she struck
an iceberg off the Grand Banks of Newfoundland. By wireless telegraphy she sent out
signals of distress,
and several liners were near enough to catch and respond to the call.
</amazon:domain>
</speak>
Adding dynamic range compression
<amazon:effect name="drc">
This tag is supported by long-form, neural, and standard TTS formats.
Depending on the text, language, and voice used in an audio file, the sounds range from soft to
loud. Environmental sounds, such as the sound of a moving vehicle, can often mask the softer
sounds, which makes the audio track difficult to hear clearly. To enhance the volume of certain
sounds in your audio file, use the dynamic range compression (drc) tag.
The drc tag sets a midrange "loudness" threshold for your audio, and increases the volume (the
gain) of the sounds around that threshold. It applies the greatest gain increase closest to the
threshold, and the gain increase is lessened farther away from the threshold.
This makes the middle-range sounds easier to hear in a noisy environment, which makes the entire
audio file clearer.
Adding dynamic range compression 251
Amazon Polly Developer Guide
The drc tag is a Boolean parameter (it's either present or it isn't). It uses the syntax:
<amazon:effect name="drc"> and is closed with </amazon:effect>.
You can use the drc tag with any voice or language supported by Amazon Polly. You can apply it to
an entire section of the recording, or for only a few words. For example:
<speak>
Some audio is difficult to hear in a moving vehicle, but <amazon:effect
name="drc"> this audio
is less difficult to hear in a moving vehicle.</amazon:effect>
</speak>
Note
When you use "drc" in the amazon:effect syntax, it is case-sensitive.
Using drc with the prosody volume Tag
As the following graphic shows, the prosody volume tag evenly increases the volume of an entire
audio file from the original level (dotted line) to an adjusted level (solid line). To further increase
the volume of certain parts of the file, use the drc tag with the prosody volume tag. Combining
tags doesn't affect the settings of the prosody volume tag.
When you use the drc and prosody volume tags together, Amazon Polly applies the drc tag
first, increasing the middle-range sounds (those near the threshold). It then applies the prosody
volume tag and further increases the volume of the entire audio track evenly.
Adding dynamic range compression 252
Amazon Polly Developer Guide
To use the tags together, nest one inside the other. For example:
<speak>
<prosody volume="loud">This text needs to be understandable and loud.
<amazon:effect name="drc">
This text also needs to be more understandable in a moving car.</amazon:effect></
prosody>
</speak>
In this text, the prosody volume tag increases the volume of the entire passage to "loud." The
drc tag enhances the volume of the middle-range values in the second sentence.
Note
When using the drc and prosody volume tags together, use standard XML practices for
nesting tags.
Speaking softly
<amazon:effect phonation="soft">
This tag is currently supported only by the standard TTS format.
To specify that input text should be spoken in a softer-than-normal voice, use the <amazon:effect
phonation="soft"> tag.
This uses the syntax:
<amazon:effect phonation="soft">text</amazon:effect>
Speaking softly 253
Amazon Polly Developer Guide
For example, you might use this tag with the Matthew voice as follows:
<speak>
This is Matthew speaking in my normal voice. <amazon:effect phonation="soft">This
is Matthew speaking in my softer voice.</amazon:effect>
</speak>
Controlling timbre
<amazon:effect vocal-tract-length>
This tag is currently supported only by the standard TTS format.
Timbre is the tonal quality of a voice that helps you tell the difference between voices, even when
they have the same pitch and loudness. One of the most important physiological features that
contributes to speech timbre is the length of the vocal tract. The vocal tract is a cavity of air that
spans from the top of the vocal folds up to the edge of the lips.
To control the timbre of output speech in Amazon Polly, use the vocal-tract-length tag. This
tag has the effect of changing the length of the speaker’s vocal tract, which sounds like a change
in the speakers size. When you increase the vocal-tract-length, the speaker sounds physically
bigger. When you decrease it, the speaker sounds smaller. You can use this tag with any of the
voices in the Amazon Polly Text-to-Speech portfolio.
To change timbre, use the following values:
+n% or -n%: Adjusts the vocal tract length by a relative percentage change in the current voice.
For example, +4% or -2%. Valid values range from +100% to -50%. Values outside this range are
clipped. For example, +111% sounds like +100% and -60% sounds like -50%.
n%: Changes the vocal tract length to an absolute percentage of the tract length of the current
voice. For example, 110% or 75%. An absolute value of 110% is equivalent to a relative value of
+10%. An absolute value of 100% is the same as the default value for the current voice.
The following example shows how to change the vocal tract length to change timbre:
<speak>
This is my original voice, without any modifications. <amazon:effect vocal-tract-
length="+15%">
Controlling timbre 254
Amazon Polly Developer Guide
Now, imagine that I am much bigger. </amazon:effect> <amazon:effect vocal-tract-
length="-15%">
Or, perhaps you prefer my voice when I'm very small. </amazon:effect> You can also
control the
timbre of my voice by making minor adjustments. <amazon:effect vocal-tract-
length="+10%">
For example, by making me sound just a little bigger. </
amazon:effect><amazon:effect
vocal-tract-length="-10%"> Or, making me sound only somewhat smaller. </
amazon:effect>
</speak>
Combining Multiple Tags
You can combine the vocal-tract-length tag with any other SSML tag that is supported by
Amazon Polly. Because timbre (vocal tract length) and pitch are closely connected, you might get
the best results by using both the vocal-tract-length and the <prosody pitch> tags. To
produce the most realistic voice, we recommend that you use different percentages of change for
the two tags. Experiment with various combinations to get the results you want.
The following example shows how to combine tags.
<speak>
The pitch and timbre of a person's voice are connected in human speech.
<amazon:effect vocal-tract-length="-15%"> If you are going to reduce the vocal
tract length,
</amazon:effect><amazon:effect vocal-tract-length="-15%"> <prosody pitch="+20%">
you
might consider increasing the pitch, too. </prosody></amazon:effect>
<amazon:effect vocal-tract-length="+15%"> If you choose to lengthen the vocal
tract,
</amazon:effect> <amazon:effect vocal-tract-length="+15%"> <prosody pitch="-10%">
you might also want to lower the pitch. </prosody></amazon:effect>
</speak>
Whispering
<amazon:effect name="whispered">
This tag is currently supported only by the standard TTS format.
Whispering 255
Amazon Polly Developer Guide
This tag indicates that the input text should be spoken in a whispered voice rather than as normal
speech. This can be used with any of the voices in the Amazon Polly Text-to-Speech portfolio.
This uses the following syntax:
<amazon:effect name="whispered">text</amazon:effect>
For example:
<speak>
<amazon:effect name="whispered">If you make any noise, </amazon:effect>
she said, <amazon:effect name="whispered">they will hear us.</amazon:effect>
</speak>
In this case, the synthesized speech spoken by the character is whispered, but the phrase "she said"
is spoken in the normal synthesized speech of the selected Amazon Polly voice.
You can enhance the "whispered" effect by slowing down the prosody rate by up to 10%,
depending on the effect you want.
For example:
<speak>
When any voice is made to whisper, <amazon:effect name="whispered">
<prosody rate="-10%">the sound is slower and quieter than normal speech
</prosody></amazon:effect>
</speak>
When generating speech marks for a whispered voice, the audio stream must also include the
whispered voice to ensure that the speech marks match the audio stream.
Whispering 256
Amazon Polly Developer Guide
Managing lexicons
Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides
API operations that you can use to store lexicons in an AWS region. Those lexicons are then
specific to that particular region. You can use one or more of the lexicons from that region when
synthesizing the text by using the SynthesizeSpeech operation. This applies the specified
lexicon to the input text before the synthesis begins. For more information, see SynthesizeSpeech.
Note
These lexicons must conform with the Pronunciation Lexicon Specification (PLS) W3C
recommendation. For more information, see Pronunciation Lexicon Specification (PLS)
Version 1.0 on the W3C website.
The following are examples of ways to use lexicons with speech synthesis engines:
Common words are sometimes stylized with numbers taking the place of letters, as with "g3t
sm4rt" (get smart). Humans can read these words correctly. However, a Text-to-Speech (TTS)
engine reads the text literally, pronouncing the name exactly as it is spelled. This is where
you can leverage lexicons to customize the synthesized speech by using Amazon Polly. In this
example, you can specify an alias (get smart) for the word "g3t sm4rt" in the lexicon.
Your text might include an acronym, such as W3C. You can use a lexicon to define an alias for the
word W3C so that it is read in the full, expanded form (World Wide Web Consortium).
Lexicons give you additional control over how Amazon Polly pronounces words uncommon to the
selected language. For example, you can specify the pronunciation using a phonetic alphabet. For
more information, see Pronunciation Lexicon Specification (PLS) Version 1.0 on the W3C website.
Topics
Applying multiple lexicons
Managing lexicons on the Amazon Polly console
Managing lexicons on the AWS CLI
257
Amazon Polly Developer Guide
Applying multiple lexicons
You can apply up to five lexicons to your text. If the same grapheme appears in more than one
lexicon that you apply to your text, the order in which they are applied can make a difference in the
resulting speech. For example, given the following text, "Hello, my name is Bob." and two lexemes
in different lexicons that both use the grapheme Bob.
LexA
<lexeme>
<grapheme>Bob</grapheme>
<alias>Robert</alias>
</lexeme>
LexB
<lexeme>
<grapheme>Bob</grapheme>
<alias>Bobby</alias>
</lexeme>
If the lexicons are listed in the order LexA and then LexB, the synthesized speech will be "Hello, my
name is Robert." If they are listed in the order LexB and then LexA, the synthesized speech is "Hello,
my name is Bobby."
Example – Applying LexA Before LexB
aws polly synthesize-speech \
--lexicon-names LexA LexB \
--output-format mp3 \
--text 'Hello, my name is Bob' \
--voice-id Justin \
bobAB.mp3
Speech output: "Hello, my name is Robert."
Example – Applying LexB before LexA
aws polly synthesize-speech \
--lexicon-names LexB LexA \
Applying multiple lexicons 258
Amazon Polly Developer Guide
--output-format mp3 \
--text 'Hello, my name is Bob' \
--voice-id Justin \
bobBA.mp3
Speech output: "Hello, my name is Bobby."
For information about applying lexicons using the Amazon Polly console, see Applying lexicons on
the console (Synthesize Speech).
Managing lexicons on the Amazon Polly console
You can use the Amazon Polly console to upload, download, apply, filter, and delete lexicons. The
following procedures demonstrate each of these processes.
Uploading lexicons on the console
To use a pronunciation lexicon, you must first upload it. There are two locations on the console
from which you can upload a lexicon, the Text-to-Speech tab and the Lexicons tab.
The following processes describe how to add lexicons that you can use to customize how words and
phrases uncommon to the chosen language are pronounced.
To add a lexicon from the Lexicons tab
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Lexicons tab.
3. Choose Upload lexicon.
4. Provide a name for the lexicon and then use Choose a lexicon file to find the lexicon to
upload. You can only upload PLS files with .pls or .xml extensions.
5. Choose Upload lexicon. If a lexicon by the same name (whether a .pls or .xml file) already
exists, uploading the lexicon overwrites the existing lexicon.
To add a lexicon from the text-to-Speech tab
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
Managing lexicons on the console 259
Amazon Polly Developer Guide
2. Choose the Text-to-Speech tab.
3. Expand Additional settings, turn on Customize pronunciation, and then choose Upload
lexicon.
4. Provide a name for the lexicon and then use Choose a lexicon file to find the lexicon to
upload. You can only use PLS files with .pls or .xml extensions.
5. Choose Upload lexicon. If a lexicon with the same name (whether a .pls or .xml file) already
exists, uploading the lexicon overwrites the existing lexicon.
Applying lexicons on the console (Synthesize Speech)
The following procedure demonstrates how to apply a lexicon to your input text by applying the
W3c.pls lexicon to substitute "World Wide Web Consortium" for "W3C". If you apply multiple
lexicons to your text they are applied in a top-down order with the first match taking precedence
over later matches. A lexicon is applied to the text only if the language specified in the lexicon is
the same as the language chosen.
You can apply a lexicon to plain text or SSML input.
Example – Applying the W3C.pls Lexicon
To create the lexicon you'll need for this exercise, see Using the PutLexicon Operation. Use a plain
text editor to create the W3C.pls lexicon shown at the top of the topic. Remember where you save
this file.
To apply the W3C.pls lexicon to your input
In this example we introduce a lexicon to substitute "World Wide Web Consortium" for "W3C".
Compare the results of this exercise with that of Using SSML on the console for both US English
and another language.
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Do one of the following:
Turn off SSML and then type or paste this text into the text input box.
He was caught up in the game.
In the middle of the 10/3/2014 W3C meeting
Applying lexicons on the console (Synthesize Speech) 260
Amazon Polly Developer Guide
he shouted, "Score!" quite loudly.
Turn on SSML and then type or paste this text into the text input box.
<speak>He wasn't paying attention.<break time="1s"/>
In the middle of the 10/3/2014 W3C meeting
he shouted, "Score!" quite loudly.</speak>
3. From the Language list, choose English, US, then choose the voice you want to use for this
text.
4. Expand Additional settings and turn on Customize pronunciation.
5.
From the list of lexicons, choose W3C (English, US).
If the W3C (English, US) lexicon is not listed, choose Upload lexicon and upload it, then
choose it from the list. To create this lexicon, see Using the PutLexicon Operation.
6. To listen to the speech immediately, choose Listen.
7. To save the speech to a file,
a. Choose Download.
b. To change to a different file format, turn on Speech file format settings, choose the file
format you want, and then choose Download.
Repeat the previous steps, but choose a different language and notice the difference in the output.
Filtering the lexicon list on the console
The following procedure describes how to filter the lexicons list so that only lexicons of a chosen
language are displayed.
To filter the lexicons listed by language
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Lexicons tab.
3. Choose Any language.
4. From the list of languages, choose the language you want to filter on.
The list displays only the lexicons for the chosen language.
Filtering the lexicon list on the console 261
Amazon Polly Developer Guide
Downloading lexicons on the console
The following process describes how to download one or more lexicons. You can add, remove, or
modify lexicon entries in the file and then upload it again to keep your lexicon up-to-date.
To download one or more lexicons
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Lexicons tab.
3. Choose the lexicon or lexicons you want to download.
a. To download a single lexicon, choose its name from the list.
b. To download multiple lexicons as a single compressed archive file, select the check box
next to each entry in the list that you want to download.
4. Choose Download.
5. Open the folder where you want to download the lexicon.
6. Choose Save.
Deleting a lexicon on the console
To delete a lexicon
The following process describes how to delete a lexicon. After deleting the lexicon, you must add it
back before you can use it again. You can delete one or more lexicons at the same time by selecting
the check boxes next to individual lexicons.
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Lexicons tab.
3. Choose one or more lexicons that you want to delete from the list.
4. Choose Delete.
5. Enter confirmation text and then choose Delete to remove the lexicon from the Region or
Cancel to keep it.
Downloading lexicons on the console 262
Amazon Polly Developer Guide
Managing lexicons on the AWS CLI
The following topics cover the AWS CLI commands needed to manage your pronunciation lexicons.
Topics
Using the PutLexicon Operation
Using the GetLexicon operation
Using the ListLexicons operation
Using the DeleteLexicon operation
Using the PutLexicon Operation
With Amazon Polly, you can use PutLexicon to store pronunciation lexicons in a specific AWS
Region for your account. Then, you can specify one or more of these stored lexicons in your
SynthesizeSpeech request that you want to apply before the service starts synthesizing the text.
For more information, see Managing lexicons.
This section provides example lexicons and step-by-step instructions for storing and testing them.
Note
These lexicons must conform to the Pronunciation Lexicon Specification (PLS) W3C
recommendation. For more information, see Pronunciation Lexicon Specification (PLS)
Version 1.0 on the W3C website.
Example 1: Lexicon with one lexeme
Consider the following W3C PLS-compliant lexicon.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa"
Managing lexicons on the AWS CLI 263
Amazon Polly Developer Guide
xml:lang="en-US">
<lexeme>
<grapheme>W3C</grapheme>
<alias>World Wide Web Consortium</alias>
</lexeme>
</lexicon>
Note the following:
The two attributes specified in the <lexicon> element:
The xml:lang attribute specifies the language code, en-US, to which the lexicon applies.
Amazon Polly can use this example lexicon if the voice you specify in the SynthesizeSpeech
call has the same language code (en-US).
Note
You can use the DescribeVoices operation to find the language code associated
with a voice.
The alphabet attribute specifies IPA, which means that the International Phonetic
Alphabet (IPA) alphabet is used for pronunciations. IPA is one of the alphabets for writing
pronunciations. Amazon Polly also supports the Extended Speech Assessment Methods
Phonetic Alphabet (X-SAMPA).
The <lexeme> element describes the mapping between <grapheme> (that is, a textual
representation of the word) and <alias>.
To test this lexicon, do the following:
1.
Save the lexicon as example.pls.
2.
Run the put-lexicon AWS CLI command to store the lexicon (with the name w3c), in the us-
east-2 region.
aws polly put-lexicon \
--name w3c \
PutLexicon 264
Amazon Polly Developer Guide
--content file://example.pls
3.
Run the synthesize-speech command to synthesize sample text to an audio stream
(speech.mp3), and specify the optional lexicon-name parameter.
aws polly synthesize-speech \
--text 'W3C is a Consortium' \
--voice-id Joanna \
--output-format mp3 \
--lexicon-names="w3c" \
speech.mp3
4.
Play the resulting speech.mp3, and notice that the word W3C in the text is replaced by World
Wide Web Consortium.
The preceding example lexicon uses an alias. The IPA alphabet mentioned in the lexicon is not used.
The following lexicon specifies a phonetic pronunciation using the <phoneme> element with the
IPA alphabet.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa"
xml:lang="en-US">
<lexeme>
<grapheme>pecan</grapheme>
<phoneme>p##k##n</phoneme>
</lexeme>
</lexicon>
Follow the same steps to test this lexicon. Make sure you specify input text that has word
"pecan" (for example, "Pecan pie is delicious").
Example 2: Lexicon with multiple lexemes
In this example, the lexeme that you specify in the lexicon applies exclusively to the input text for
the synthesis. Consider the following lexicon:
PutLexicon 265
Amazon Polly Developer Guide
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>W3C</grapheme>
<alias>World Wide Web Consortium</alias>
</lexeme>
<lexeme>
<grapheme>W3C</grapheme>
<alias>WWW Consortium</alias>
</lexeme>
<lexeme>
<grapheme>Consortium</grapheme>
<alias>Community</alias>
</lexeme>
</lexicon>
The lexicon specifies three lexemes, two of which define an alias for the grapheme W3C as follows:
The first <lexeme> element defines an alias (World Wide Web Consortium).
The second <lexeme> defines an alternative alias (WWW Consortium).
Amazon Polly uses the first replacement for any given grapheme in a lexicon.
The third <lexeme> defines a replacement (Community) for the word Consortium.
First, let's test this lexicon. Suppose you want to synthesize the following sample text to an audio
file (speech.mp3), and you specify the lexicon in a call to SynthesizeSpeech.
The W3C is a Consortium
SynthesizeSpeech first applies the lexicon as follows:
As per the first lexeme, the word W3C is revised as World Wide Web Consortium. The revised text
appears as follows:
PutLexicon 266
Amazon Polly Developer Guide
The World Wide Web Consortium is a Consortium
The alias defined in the third lexeme applies only to the word Consortium that was part of the
original text, resulting in the following text:
The World Wide Web Consortium is a Community.
You can test this using the AWS CLI as follows:
1.
Save the lexicon as example.pls.
2.
Run the put-lexicon command to store the lexicon with name w3c in the us-east-2 region.
aws polly put-lexicon \
--name w3c \
--content file://example.pls
3.
Run the list-lexicons command to verify that the w3c lexicon is in the list of lexicons
returned.
aws polly list-lexicons
4.
Run the synthesize-speech command to synthesize sample text to an audio file
(speech.mp3), and specify the optional lexicon-name parameter.
aws polly synthesize-speech \
--text 'W3C is a Consortium' \
--voice-id Joanna \
--output-format mp3 \
--lexicon-names="w3c" \
speech.mp3
5.
Play the resulting speech.mp3 file to verify that the synthesized speech reflects the text
changes.
PutLexicon 267
Amazon Polly Developer Guide
Example 3: Specifying multiple lexicons
In a call to SynthesizeSpeech, you can specify multiple lexicons. In this case, the first lexicon
specified (in order from left to right) overrides any preceding lexicons.
Consider the following two lexicons. Note that each lexicon describes different aliases for the same
grapheme W3C.
Lexicon 1: w3c.pls
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>W3C</grapheme>
<alias>World Wide Web Consortium</alias>
</lexeme>
</lexicon>
Lexicon 2: w3cAlternate.pls
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>W3C</grapheme>
<alias>WWW Consortium</alias>
</lexeme>
</lexicon>
PutLexicon 268
Amazon Polly Developer Guide
Suppose you store these lexicons as w3c and w3cAlternate respectively. If you specify lexicons in
order (w3c followed by w3cAlternate) in a SynthesizeSpeech call, the alias for W3C defined in
the first lexicon has precedence over the second. To test the lexicons, do the following:
1.
Save the lexicons locally in files called w3c.pls and w3cAlternate.pls.
2.
Upload these lexicons using the put-lexicon AWS CLI command.
Upload the w3c.pls lexicon and store it as w3c.
aws polly put-lexicon \
--name w3c \
--content file://w3c.pls
Upload the w3cAlternate.pls lexicon on the service as w3cAlternate.
aws polly put-lexicon \
--name w3cAlternate \
--content file://w3cAlternate.pls
3.
Run the synthesize-speech command to synthesize sample text to an audio stream
(speech.mp3), and specify both lexicons using the lexicon-name parameter.
aws polly synthesize-speech \
--text 'PLS is a W3C recommendation' \
--voice-id Joanna \
--output-format mp3 \
--lexicon-names '["w3c","w3cAlternative"]' \
speech.mp3
4.
Test the resulting speech.mp3. It should read as follows:
PLS is a World Wide Web Consortium recommendation
Additional code samples for the PutLexicon API
Java Sample: PutLexicon
Python (Boto3) Sample: PutLexicon
PutLexicon 269
Amazon Polly Developer Guide
Using the GetLexicon operation
Amazon Polly provides the GetLexicon API operation to retrieve the content of a pronunciation
lexicon you stored in your account in a specific region.
The following get-lexicon AWS CLI command retrieves the content of the example lexicon.
aws polly get-lexicon \
--name example
If you don't already have a lexicon stored in your account, you can use the PutLexicon operation
to store one. For more information, see Using the PutLexicon Operation.
The following is a sample response. In addition to the lexicon content, the response returns the
metadata, such as the language code to which the lexicon applies, number of lexemes defined in
the lexicon, the Amazon Resource Name (ARN) of the resource, and the size of the lexicon in bytes.
The LastModified value is a Unix timestamp.
{
"Lexicon": {
"Content": "lexicon content in plain text PLS format",
"Name": "example"
},
"LexiconAttributes": {
"LanguageCode": "en-US",
"LastModified": 1474222543.989,
"Alphabet": "ipa",
"LexemesCount": 1,
"LexiconArn": "arn:aws:polly:us-east-2:account-id:lexicon/example",
"Size": 495
}
}
Additional code samples for the GetLexicon API
Java Sample: GetLexicon
Python (Boto3) Sample: GetLexicon
GetLexicon 270
Amazon Polly Developer Guide
Using the ListLexicons operation
Amazon Polly provides the ListLexicons API operation that you can use to get the list of
pronunciation lexicons in your account in a specific AWS Region. The following AWS CLI call lists
the lexicons in your account in the us-east-2 region.
aws polly list-lexicons
The following is an example response, showing two lexicons named w3c and tomato. For each
lexicon, the response returns metadata such as the language code to which the lexicon applies, the
number of lexemes defined in the lexicon, the size in bytes, and so on. The language code describes
a language and locale to which the lexemes defined in the lexicon apply.
{
"Lexicons": [
{
"Attributes": {
"LanguageCode": "en-US",
"LastModified": 1474222543.989,
"Alphabet": "ipa",
"LexemesCount": 1,
"LexiconArn": "arn:aws:polly:aws-region:account-id:lexicon/w3c",
"Size": 495
},
"Name": "w3c"
},
{
"Attributes": {
"LanguageCode": "en-US",
"LastModified": 1473099290.858,
"Alphabet": "ipa",
"LexemesCount": 1,
"LexiconArn": "arn:aws:polly:aws-region:account-id:lexicon/tomato",
"Size": 645
},
"Name": "tomato"
}
]
}
ListLexicons 271
Amazon Polly Developer Guide
Additional code samples for the ListLexicon API
Java Sample: ListLexicons
Python (Boto3) Sample: ListLexicon
Using the DeleteLexicon operation
Amazon Polly provides the DeleteLexicon API operation to delete a pronunciation lexicon from a
specific AWS Region in your account. The following AWS CLI deletes the specified lexicon.
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
aws polly delete-lexicon \
--name example
Additional code samples for the DeleteLexicon API
Java Sample: DeleteLexicon
Python (Boto3) Sample: DeleteLexicon
DeleteLexicon 272
Amazon Polly Developer Guide
Creating long audio files
To create TTS files for large passages of text, use Amazon Polly's asynchronous synthesis
functionality. This uses the three SpeechSynthesisTask APIs:
StartSpeechSynthesisTask: starts a new synthesis task.
GetSpeechSynthesisTask: returns details about a previously submitted synthesis task.
ListSpeechSynthesisTasks: lists all submitted synthesis tasks.
The SynthesizeSpeech operation produces audio in near-real time, with relatively little latency
in most cases. To do this, the operation can only synthesize 3000 characters.
Amazon Polly's Asynchronous Synthesis feature overcomes the challenge of processing a larger
text document by changing the way the document is both synthesized and returned. When a
synthesis request is made by submitting input text using the StartSpeechSynthesisTask,
Amazon Polly queues the requests, and then asynchronously processes them in the background
as soon as the system resources are available. Amazon Polly then uploads the resulting speech
or speech marks stream directly to your (required) Amazon Simple Storage Service (Amazon S3)
bucket, and notifies you about the completed file's availability through your (optional) SNS topic.
In this way, all of the functionality except near-real time processing is available for texts of up to
100,000 billable characters (or 200,000 total characters) in length.
To synthesize a document using this method, you must have an Amazon S3 bucket that is writable
to which the audio file can be saved. You can be notified when the synthesized audio is ready by
providing an optional SNS Topic identifier. When the synthesis task is complete, Amazon Polly will
publish a message on that topic. This message may also contain useful error information in cases
where the synthesis task didn't succeed. To do this, make sure that the user creating the synthesis
task can also publish to the SNS Topic. See the Amazon SNS documentation for more information
on how to create and subscribe to an SNS Topic.
Encryption
You can store the output file in an encrypted form in your S3 bucket if desired. To do this, you
enable Amazon S3 bucket encryption, which use one of the strongest block ciphers available, 256-
bit Advanced Encryption Standard (AES-256).
Topics
273
Amazon Polly Developer Guide
Setting up the IAM policy for asynchronous synthesis
Creating long audio files on the console
Creating long audio files on the AWS CLI
Setting up the IAM policy for asynchronous synthesis
In order to use the asynchronous synthesis functionality, you will need an IAM policy that allows
the following:
use of new Amazon Polly operations
writing to the output S3 bucket
publishing to the status SNS topic [optional]
The following policy grants only the necessary permissions required for asynchronous synthesis
and can be attached to the IAM user.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"polly:StartSpeechSynthesisTask",
"polly:GetSpeechSynthesisTask",
"polly:ListSpeechSynthesisTasks"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::bucket-name/*"
},
{
"Effect": "Allow",
"Action": "sns:Publish",
"Resource": "arn:aws:sns:region:account:topic"
}
]
Setting up the IAM policy for asynchronous synthesis 274
Amazon Polly Developer Guide
}
Creating long audio files on the console
You can use the Amazon Polly console to create long speeches using asynchronous synthesis with
the same functionality as you can use with the AWS CLI. This is done using the Text-to-Speech tab
much like any other synthesis.
The other asynchronous synthesis functionality is also available via the console. The S3 synthesis
tasks tab reflects the ListSpeechSynthesisTasks functionality, displaying all tasks saved to
the S3 bucket and enabling you to filter them if you want. Clicking on a specific single task shows
its details, reflecting GetSpeechSynthesisTask functionality.
To synthesize a large text using the Amazon Polly console
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Text-to-Speech tab. Select Long Form as the engine if appropriate.
3. With SSML on or off, type or paste your text into the input box.
4. Choose the language, region, and voice for your text.
5. Choose Save to S3.
Note
Both the Download and Listen options are greyed out if the text length is above the
3,000 character limit for the real-time SynthesizeSpeech operation.
6. The console opens a form so that you can choose where to store the output file.
a. Fill in the name of the destination Amazon S3 bucket.
b. Optionally, fill in the prefix key of the output.
Note
The output S3 bucket must be writable.
c. If you want to be notified when the synthesis task is complete, provide an optional SNS
topic identifier.
Creating long audio files on the console 275
Amazon Polly Developer Guide
Note
The SNS must be open for publication by the current console user to use this
option. For more information, see Amazon Simple Notification Service (SNS)
d. Choose Save to S3.
To retrieve information on your speech synthesis tasks
1. In the console, choose the S3 Synthesis Tasks tab.
2. The tasks are displayed in date order. To filter the tasks, by status, choose All statuses and
then choose the status to use.
3. To view the details of a specific task, choose the linked Task ID.
Creating long audio files on the AWS CLI
Amazon Polly asynchronous synthesis functionality uses three SpeechSynthesisTask APIs to
work with large amounts of text:
StartSpeechSynthesisTask: starts a new synthesis task.
GetSpeechSynthesisTask: returns details about a previously submitted synthesis task.
ListSpeechSynthesisTasks: lists all submitted synthesis tasks.
Synthesizing large amounts of text (StartSpeechSynthesisTask)
When you want to create an audio file larger than one that you can create with the real-time
SynthesizeSpeech, use the StartSpeechSynthesisTask operation. In addition to the
arguments needed for the SynthesizeSpeech operation, StartSpeechSynthesisTask also
requires the name of an Amazon S3 bucket. Two other optional arguments are also available: a key
prefix for the output file and the ARN for an SNS Topic if you want to receive status notification
about the task.
OutputS3BucketName: The name of the Amazon S3 bucket where the synthesis should be
uploaded. This bucket should be in the same region as the Amazon Polly service. Additionally,
the IAM user being used to make the call should have access to the bucket. [Required]
Creating long audio files on the AWS CLI 276
Amazon Polly Developer Guide
OutputS3KeyPrefix: Key prefix for the output file. Use this parameter if you want to save the
output speech file in a custom directory-like key in your bucket. [Optional]
SnsTopicArn: The SNS topic ARN to use if you want to receive notification about status of the
task. This SNS topic should be in the same region as the Amazon Polly service. Additionally, the
IAM user being used to make the call should have access to the topic. [Optional]
For example, the following example can be used to run the start-speech-synthesis-task
AWS CLI command in the US East (Ohio) region:
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace
the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full
quotation marks (") around the input text with single quotes (') for interior tags.
aws polly start-speech-synthesis-task \
--region us-east-2 \
--endpoint-url "https://polly.us-east-2.amazonaws.com/" \
--output-format mp3 \
--output-s3-bucket-name your-bucket-name \
--output-s3-key-prefix optional/prefix/path/file \
--voice-id Joanna \
--text file://text_file.txt
This will result in a response that looks similar to this:
"SynthesisTask":
{
"OutputFormat": "mp3",
"OutputUri": "https://s3.us-east-2.amazonaws.com/your-bucket-name/optional/prefix/
path/file.<task_id>.mp3",
"TextType": "text",
"CreationTime": [..],
"RequestCharacters": [..],
"TaskStatus": "scheduled",
"TaskId": [task_id],
"VoiceId": "Joanna"
}
The start-speech-synthesis-task operation returns several new fields:
Creating long audio files on the AWS CLI 277
Amazon Polly Developer Guide
OutputUri: the location of your output speech file.
TaskId: a unique identifier for the speech synthesis task generated by Amazon Polly.
CreationTime: a timestamp for when the task was initially submitted.
RequestCharacters: the number of billable characters in the task.
TaskStatus: provides information on the status of the submitted task.
When your task is submitted, the initial status will show scheduled. When Amazon Polly
starts processing the task, the status will change to inProgress and later, to completed
or failed. If the task fails, an error message will be returned when calling either the
GetSpeechSynthesisTask or ListSpeechSynthesisTasks operation.
When the task is completed, the speech file is available at the location specified in OutputUri.
Retrieving information on your speech synthesis task
You can get information on a task, such as errors, status, and so on, using the
GetSpeechSynthesisTask operation. To do this, you will need the task-id returned by the
StartSpeechSynthesisTask.
For example, the following example can be used to run the get-speech-synthesis-task AWS
CLI command:
aws polly get-speech-synthesis-task \
--region us-east-2 \
--endpoint-url "https:// polly.us-east-2.amazonaws.com/" \
--task-id task identifier
You can also list all speech synthesis tasks that you've run in the current region using the
ListSpeechSynthesisTasks operation.
For example, the following example can be used to run the list-speech-synthesis-tasks
AWS CLI command:
aws polly list-speech-synthesis-tasks \
--region us-east-2 \
--endpoint-url "https:// polly.us-east-2.amazonaws.com/"
Creating long audio files on the AWS CLI 278
Amazon Polly Developer Guide
Code and application examples
This section provides code samples and example applications that you can use to explore Amazon
Polly.
Topics
Sample code
Example applications
The Sample Code topic contains snippets of code organized by programming language and
separated into examples for different Amazon Polly functionality. The Example Application topic
contains applications organized by programming language that can be used independently to
explore Amazon Polly.
Before you start using these examples, we recommend that you first read How Amazon Polly works
and follow the steps described in Getting started with Amazon Polly.
Sample code
This topic contains code samples for various functionality which can be used to explore Amazon
Polly.
Sample Code by Programming Language
Java samples
Python samples
Java samples
The following code samples show how to use Java-based applications to accomplish various
tasks with Amazon Polly. These samples are not full examples, but can be included in larger Java
applications that use the AWS SDK for Java.
Code Snippets
DeleteLexicon
DescribeVoices
Sample code 279
Amazon Polly Developer Guide
GetLexicon
ListLexicons
PutLexicon
StartSpeechSynthesisTask
Speech Marks
SynthesizeSpeech
DeleteLexicon
The following Java code sample show how to use Java-based applications to delete a specific
lexicon stored in an AWS Region. A lexicon which has been deleted is not available for speech
synthesis, nor can it be retrieved using either the GetLexicon or ListLexicon APIs.
For more information on this operation, see the reference for the DeleteLexicon API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
import com.amazonaws.services.polly.model.DeleteLexiconRequest;
public class DeleteLexiconSample {
private String LEXICON_NAME = "SampleLexicon";
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
public void deleteLexicon() {
DeleteLexiconRequest deleteLexiconRequest = new
DeleteLexiconRequest().withName(LEXICON_NAME);
try {
client.deleteLexicon(deleteLexiconRequest);
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
Java samples 280
Amazon Polly Developer Guide
DescribeVoices
The following Java code sample show how to use Java-based applications to produce a list of the
voices that are available for use when requesting speech synthesis. You can optionally specify
a language code to filter the available voices. For example, if you specify en-US, the operation
returns a list of all available US English voices.
For more information on this operation, see the reference for the DescribeVoices API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
import com.amazonaws.services.polly.model.DescribeVoicesRequest;
import com.amazonaws.services.polly.model.DescribeVoicesResult;
public class DescribeVoicesSample {
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
public void describeVoices() {
DescribeVoicesRequest allVoicesRequest = new DescribeVoicesRequest();
DescribeVoicesRequest enUsVoicesRequest = new
DescribeVoicesRequest().withLanguageCode("en-US");
try {
String nextToken;
do {
DescribeVoicesResult allVoicesResult =
client.describeVoices(allVoicesRequest);
nextToken = allVoicesResult.getNextToken();
allVoicesRequest.setNextToken(nextToken);
System.out.println("All voices: " + allVoicesResult.getVoices());
} while (nextToken != null);
do {
DescribeVoicesResult enUsVoicesResult =
client.describeVoices(enUsVoicesRequest);
nextToken = enUsVoicesResult.getNextToken();
enUsVoicesRequest.setNextToken(nextToken);
System.out.println("en-US voices: " + enUsVoicesResult.getVoices());
} while (nextToken != null);
Java samples 281
Amazon Polly Developer Guide
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
GetLexicon
The following Java code sample show how to use Java-based applications to produce the content
of a specific pronunciation lexicon stored in a AWS Region.
For more information on this operation, see the reference for the GetLexicon API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
import com.amazonaws.services.polly.model.GetLexiconRequest;
import com.amazonaws.services.polly.model.GetLexiconResult;
public class GetLexiconSample {
private String LEXICON_NAME = "SampleLexicon";
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
public void getLexicon() {
GetLexiconRequest getLexiconRequest = new
GetLexiconRequest().withName(LEXICON_NAME);
try {
GetLexiconResult getLexiconResult = client.getLexicon(getLexiconRequest);
System.out.println("Lexicon: " + getLexiconResult.getLexicon());
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
ListLexicons
The following Java code sample shows how to use Java-based applications to produce a list of
pronunciation lexicons stored in an AWS Region.
Java samples 282
Amazon Polly Developer Guide
For more information on this operation, see the reference for the ListLexicons API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
import com.amazonaws.services.polly.model.LexiconAttributes;
import com.amazonaws.services.polly.model.LexiconDescription;
import com.amazonaws.services.polly.model.ListLexiconsRequest;
import com.amazonaws.services.polly.model.ListLexiconsResult;
public class ListLexiconsSample {
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
public void listLexicons() {
ListLexiconsRequest listLexiconsRequest = new ListLexiconsRequest();
try {
String nextToken;
do {
ListLexiconsResult listLexiconsResult =
client.listLexicons(listLexiconsRequest);
nextToken = listLexiconsResult.getNextToken();
listLexiconsRequest.setNextToken(nextToken);
for (LexiconDescription lexiconDescription :
listLexiconsResult.getLexicons()) {
LexiconAttributes attributes = lexiconDescription.getAttributes();
System.out.println("Name: " + lexiconDescription.getName()
+ ", Alphabet: " + attributes.getAlphabet()
+ ", LanguageCode: " + attributes.getLanguageCode()
+ ", LastModified: " + attributes.getLastModified()
+ ", LexemesCount: " + attributes.getLexemesCount()
+ ", LexiconArn: " + attributes.getLexiconArn()
+ ", Size: " + attributes.getSize());
}
} while (nextToken != null);
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
Java samples 283
Amazon Polly Developer Guide
PutLexicon
The following Java code sample show how to use Java-based applications to store a pronunciation
lexicon in an AWS Region.
For more information on this operation, see the reference for the PutLexicon API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
import com.amazonaws.services.polly.model.PutLexiconRequest;
public class PutLexiconSample {
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
private String LEXICON_CONTENT = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<lexicon version=\"1.0\" xmlns=\"http://www.w3.org/2005/01/pronunciation-
lexicon\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
"xsi:schemaLocation=\"http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd\" " +
"alphabet=\"ipa\" xml:lang=\"en-US\">" +
"<lexeme><grapheme>test1</grapheme><alias>test2</alias></lexeme>" +
"</lexicon>";
private String LEXICON_NAME = "SampleLexicon";
public void putLexicon() {
PutLexiconRequest putLexiconRequest = new PutLexiconRequest()
.withContent(LEXICON_CONTENT)
.withName(LEXICON_NAME);
try {
client.putLexicon(putLexiconRequest);
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
StartSpeechSynthesisTask
The following Java code sample show how to use Java-based applications to synthesize a long
speech (up to 100,000 billed characters) and store it directly in an Amazon S3 bucket.
Java samples 284
Amazon Polly Developer Guide
For more information, see the reference for StartSpeechSynthesisTask API.
package com.amazonaws.parrot.service.tests.speech.task;
import com.amazonaws.parrot.service.tests.AbstractParrotServiceTest;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.model.*;
import org.awaitility.Duration;
import java.util.concurrent.TimeUnit;
import static org.awaitility.Awaitility.await;
public class StartSpeechSynthesisTaskSample {
private static final int SYNTHESIS_TASK_TIMEOUT_SECONDS = 300;
private static final AmazonPolly AMAZON_POLLY_CLIENT =
AmazonPollyClientBuilder.defaultClient();
private static final String PLAIN_TEXT = "This is a sample text to be
synthesized.";
private static final String OUTPUT_FORMAT_MP3 = OutputFormat.Mp3.toString();
private static final String OUTPUT_BUCKET = "synth-books-buckets";
private static final String SNS_TOPIC_ARN = "arn:aws:sns:eu-
west-2:123456789012:synthesize-finish-topic";
private static final Duration SYNTHESIS_TASK_POLL_INTERVAL = Duration.FIVE_SECONDS;
private static final Duration SYNTHESIS_TASK_POLL_DELAY = Duration.TEN_SECONDS;
public static void main(String... args) {
StartSpeechSynthesisTaskRequest request = new StartSpeechSynthesisTaskRequest()
.withOutputFormat(OUTPUT_FORMAT_MP3)
.withText(PLAIN_TEXT)
.withTextType(TextType.Text)
.withVoiceId(VoiceId.Amy)
.withOutputS3BucketName(OUTPUT_BUCKET)
.withSnsTopicArn(SNS_TOPIC_ARN)
.withEngine("neural");
StartSpeechSynthesisTaskResult result =
AMAZON_POLLY_CLIENT.startSpeechSynthesisTask(request);
String taskId = result.getSynthesisTask().getTaskId();
await().with()
.pollInterval(SYNTHESIS_TASK_POLL_INTERVAL)
.pollDelay(SYNTHESIS_TASK_POLL_DELAY)
Java samples 285
Amazon Polly Developer Guide
.atMost(SYNTHESIS_TASK_TIMEOUT_SECONDS, TimeUnit.SECONDS)
.until(
() ->
getSynthesisTaskStatus(taskId).equals(TaskStatus.Completed.toString())
);
}
private static SynthesisTask getSynthesisTask(String taskId) {
GetSpeechSynthesisTaskRequest getSpeechSynthesisTaskRequest = new
GetSpeechSynthesisTaskRequest()
.withTaskId(taskId);
GetSpeechSynthesisTaskResult result
=AMAZON_POLLY_CLIENT.getSpeechSynthesisTask(getSpeechSynthesisTaskRequest);
return result.getSynthesisTask();
}
private static String getSynthesisTaskStatus(String taskId) {
GetSpeechSynthesisTaskRequest getSpeechSynthesisTaskRequest = new
GetSpeechSynthesisTaskRequest()
.withTaskId(taskId);
GetSpeechSynthesisTaskResult result
=AMAZON_POLLY_CLIENT.getSpeechSynthesisTask(getSpeechSynthesisTaskRequest);
return result.getSynthesisTask().getTaskStatus();
}
}
Speech Marks
The following code sample shows how to use Java-based applications to synthesize speech marks
for inputed text. This functionality uses the SynthesizeSpeech API.
For more information on this functionality, see Speech marks.
For more information on the API, see the reference for SynthesizeSpeech API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
Java samples 286
Amazon Polly Developer Guide
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SpeechMarkType;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.VoiceId;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
public class SynthesizeSpeechMarksSample {
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
public void synthesizeSpeechMarks() {
String outputFileName = "/tmp/speechMarks.json";
SynthesizeSpeechRequest synthesizeSpeechRequest = new SynthesizeSpeechRequest()
.withOutputFormat(OutputFormat.Json)
.withSpeechMarkTypes(SpeechMarkType.Viseme, SpeechMarkType.Word)
.withVoiceId(VoiceId.Joanna)
.withText("This is a sample text to be synthesized.");
try (FileOutputStream outputStream = new FileOutputStream(new
File(outputFileName))) {
SynthesizeSpeechResult synthesizeSpeechResult =
client.synthesizeSpeech(synthesizeSpeechRequest);
byte[] buffer = new byte[2 * 1024];
int readBytes;
try (InputStream in = synthesizeSpeechResult.getAudioStream()){
while ((readBytes = in.read(buffer)) > 0) {
outputStream.write(buffer, 0, readBytes);
}
}
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
Java samples 287
Amazon Polly Developer Guide
SynthesizeSpeech
The following Java code sample show how to use Java-based applications to synthesize speech
with shorter texts for near-real time processing.
For more information, see the reference for SynthesizeSpeech API.
package com.amazonaws.polly.samples;
import com.amazonaws.services.polly.AmazonPolly;
import com.amazonaws.services.polly.AmazonPollyClientBuilder;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.VoiceId;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
public class SynthesizeSpeechSample {
AmazonPolly client = AmazonPollyClientBuilder.defaultClient();
public void synthesizeSpeech() {
String outputFileName = "/tmp/speech.mp3";
SynthesizeSpeechRequest synthesizeSpeechRequest = new SynthesizeSpeechRequest()
.withOutputFormat(OutputFormat.Mp3)
.withVoiceId(VoiceId.Joanna)
.withText("This is a sample text to be synthesized.")
.withEngine("neural");
try (FileOutputStream outputStream = new FileOutputStream(new
File(outputFileName))) {
SynthesizeSpeechResult synthesizeSpeechResult =
client.synthesizeSpeech(synthesizeSpeechRequest);
byte[] buffer = new byte[2 * 1024];
int readBytes;
try (InputStream in = synthesizeSpeechResult.getAudioStream()){
while ((readBytes = in.read(buffer)) > 0) {
outputStream.write(buffer, 0, readBytes);
}
Java samples 288
Amazon Polly Developer Guide
}
} catch (Exception e) {
System.err.println("Exception caught: " + e);
}
}
}
Python samples
The following code samples show how to use Python (boto3)-based applications to accomplish
various tasks with Amazon Polly. These samples are not intended to be full examples, but can be
included in larger Python applications that use the AWS SDK for Python (Boto).
Code Snipppets
DeleteLexicon
GetLexicon
ListLexicon
PutLexicon
StartSpeechSynthesisTask
SynthesizeSpeech
DeleteLexicon
The following Python code example uses the AWS SDK for Python (Boto) to delete a lexicon in the
region specified in your local AWS configuration. The example deletes only the specified lexicon. It
asks you to confirm that you want to proceed before actually deleting the lexicon.
The following code example uses default credentials stored in the AWS SDK configuration file. For
information about creating the configuration file, see Step 2.1: Set up the AWS CLI.
For more information on this operation, see the reference for the DeleteLexicon API.
from argparse import ArgumentParser
from sys import version_info
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
Python samples 289
Amazon Polly Developer Guide
# Define and parse the command line arguments
cli = ArgumentParser(description="DeleteLexicon example")
cli.add_argument("name", type=str, metavar="LEXICON_NAME")
arguments = cli.parse_args()
# Create a client using the credentials and region defined in the adminuser
# section of the AWS credentials and configuration files
session = Session(profile_name="adminuser")
polly = session.client("polly")
# Request confirmation
prompt = input if version_info >= (3, 0) else raw_input
proceed = prompt((u"This will delete the \"{0}\" lexicon,"
" do you want to proceed? [y,n]: ").format(arguments.name))
if proceed in ("y", "Y"):
print(u"Deleting {0}...".format(arguments.name))
try:
# Request deletion of a lexicon by name
response = polly.delete_lexicon(Name=arguments.name)
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
cli.error(error)
print("Done.")
else:
print("Cancelled.")
GetLexicon
The following Python code uses the AWS SDK for Python (Boto) to retrieve all lexicons stored in an
AWS Region. The example accepts a lexicon name as a command line parameter and fetches that
lexicon only, printing out the tmp path where it has been saved locally.
The following code example uses default credentials stored in the AWS SDK configuration file. For
information about creating the configuration file, see Step 2.1: Set up the AWS CLI.
For more information on this operation, see the reference for the GetLexicon API.
from argparse import ArgumentParser
from os import path
from tempfile import gettempdir
Python samples 290
Amazon Polly Developer Guide
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
# Define and parse the command line arguments
cli = ArgumentParser(description="GetLexicon example")
cli.add_argument("name", type=str, metavar="LEXICON_NAME")
arguments = cli.parse_args()
# Create a client using the credentials and region defined in the adminuser
# section of the AWS credentials and configuration files
session = Session(profile_name="adminuser")
polly = session.client("polly")
print(u"Fetching {0}...".format(arguments.name))
try:
# Fetch lexicon by name
response = polly.get_lexicon(Name=arguments.name)
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
cli.error(error)
# Get the lexicon data from the response
lexicon = response.get("Lexicon", {})
# Access the lexicon's content
if "Content" in lexicon:
output = path.join(gettempdir(), u"%s.pls" % arguments.name)
print(u"Saving to %s..." % output)
try:
# Save the lexicon contents to a local file
with open(output, "w") as pls_file:
pls_file.write(lexicon["Content"])
except IOError as error:
# Could not write to file, exit gracefully
cli.error(error)
else:
# The response didn't contain lexicon data, exit gracefully
cli.error("Could not fetch lexicons contents")
print("Done.")
Python samples 291
Amazon Polly Developer Guide
ListLexicon
The following Python code example uses the AWS SDK for Python (Boto) to list the lexicons in your
account in the region specified in your local AWS configuration. For information about creating the
configuration file, see Step 2.1: Set up the AWS CLI.
For more information on this operation, see the reference for the ListLexicons API.
import sys
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
# Create a client using the credentials and region defined in the adminuser
# section of the AWS credentials and configuration files
session = Session(profile_name="adminuser")
polly = session.client("polly")
try:
# Request the list of available lexicons
response = polly.list_lexicons()
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
print(error)
sys.exit(-1)
# Get the list of lexicons in the response
lexicons = response.get("Lexicons", [])
print("{0} lexicon(s) found".format(len(lexicons)))
# Output a formatted list of lexicons with some of the attributes
for lexicon in lexicons:
print((u" - {Name} ({Attributes[LanguageCode]}), "
"{Attributes[LexemesCount]} lexeme(s)").format(**lexicon))
PutLexicon
The following code sample show how to use Python (boto3)-based applications to store a
pronunciation lexicon in an AWS Region.
For more information on this operation, see the reference for the PutLexicon API.
Note the following:
Python samples 292
Amazon Polly Developer Guide
You need to update the code by providing a local lexicon file name and a stored lexicon name.
The example assumes you have lexicon files created in a subdirectory called pls. You need to
update the path as appropriate.
The following code example uses default credentials stored in the AWS SDK configuration file. For
information about creating the configuration file, see Step 2.1: Set up the AWS CLI.
For more information on this operation, see the reference for the PutLexicon API.
from argparse import ArgumentParser
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
# Define and parse the command line arguments
cli = ArgumentParser(description="PutLexicon example")
cli.add_argument("path", type=str, metavar="FILE_PATH")
cli.add_argument("-n", "--name", type=str, required=True,
metavar="LEXICON_NAME", dest="name")
arguments = cli.parse_args()
# Create a client using the credentials and region defined in the adminuser
# section of the AWS credentials and configuration files
session = Session(profile_name="adminuser")
polly = session.client("polly")
# Open the PLS lexicon file for reading
try:
with open(arguments.path, "r") as lexicon_file:
# Read the pls file contents
lexicon_data = lexicon_file.read()
# Store the PLS lexicon on the service.
# If a lexicon with that name already exists,
# its contents will be updated
response = polly.put_lexicon(Name=arguments.name,
Content=lexicon_data)
except (IOError, BotoCoreError, ClientError) as error:
# Could not open/read the file or the service returned an error,
# exit gracefully
cli.error(error)
Python samples 293
Amazon Polly Developer Guide
print(u"The \"{0}\" lexicon is now available for use.".format(arguments.name))
StartSpeechSynthesisTask
The following Python code example uses the AWS SDK for Python (Boto) to list the lexicons in your
account in the region specified in your local AWS configuration. For information about creating the
configuration file, see Step 2.1: Set up the AWS CLI.
For more information, see the reference for StartSpeechSynthesisTask API.
import boto3
import time
polly_client = boto3.Session(
aws_access_key_id='',
aws_secret_access_key='',
region_name='eu-west-2').client('polly')
response = polly_client.start_speech_synthesis_task(VoiceId='Joanna',
OutputS3BucketName='synth-books-buckets',
OutputS3KeyPrefix='key',
OutputFormat='mp3',
Text='This is a sample text to be synthesized.',
Engine='neural')
taskId = response['SynthesisTask']['TaskId']
print( "Task id is {} ".format(taskId))
task_status = polly_client.get_speech_synthesis_task(TaskId = taskId)
print(task_status)
SynthesizeSpeech
The following Python code example uses the AWS SDK for Python (Boto) synthesize speech
with shorter texts for near real-time processing. For more information, see the reference for the
SynthesizeSpeech operation.
This example uses a short string of plain text. You can use SSML text for more control over the
output. For more information, see Generating speech from SSML documents.
Python samples 294
Amazon Polly Developer Guide
import boto3
polly_client = boto3.Session(
aws_access_key_id=,
aws_secret_access_key=,
region_name='us-west-2').client('polly')
response = polly_client.synthesize_speech(VoiceId='Joanna',
OutputFormat='mp3',
Text = 'This is a sample text to be synthesized.',
Engine = 'neural')
file = open('speech.mp3', 'wb')
file.write(response['AudioStream'].read())
file.close()
Example applications
This section contains additional examples, in the form of example applications which can be used
to explore Amazon Polly.
Example Applications by Programming Language
Python example (HTML5 Client and Python Server)
Java example
iOS example
Android example
Python example (HTML5 Client and Python Server)
This example application consists of the following:
An HTTP 1.1 server using the HTTP chunked transfer coding (see Chunked Transfer Coding)
A simple HTML5 user interface that interacts with the HTTP 1.1 server (shown below):
Example applications 295
Amazon Polly Developer Guide
The goal of this example is to show how to use Amazon Polly to stream speech from a browser-
based HTML5 application. Consuming the audio stream produced by Amazon Polly as the text gets
synthesized is the recommended approach for use cases where responsiveness is an important
factor (for example, dialog systems, screen readers, etc.).
To run this example application you need the following:
Web browser compliant with the HTML5 and EcmaScript5 standards (for example, Chrome 23.0
or higher, Firefox 21.0 or higher, Internet Explorer 9.0, or higher)
Python version greater than 3.0
To test the application
1.
Save the server code as server.py. For the code, see Python example: Python Server Code
(server.py).
2.
Save the HTML5 client code as index.html. For the code, see Python example: HTML5 User
Interface (index.html).
Python example 296
Amazon Polly Developer Guide
3. Run the following command from the path where you saved server.py to start the application
(on some systems you might need to use python3 instead of python when running the
command).
$ python server.py
After the application starts, a URL appears on the terminal.
4. Open the URL shown in the terminal in a web browser.
You can pass the address and port for the application server to use as a parameter to
server.py. For more information, run python server.py -h.
5. To listen to speech, choose a voice from the list, type some text, and then choose Read. The
speech starts playing as soon as Amazon Polly transfers the first usable chunk of audio data.
6. To stop the Python server when you're finished testing the application, press Ctrl+C in the
terminal where the server is running.
Note
The server creates a Boto3 client using the AWS SDK for Python (Boto). The client uses the
credentials stored in the AWS config file on your computer to sign and authenticate the
requests to Amazon Polly. For more information on how to create the AWS config file and
store credentials, see Configuring the AWS Command Line Interface in the AWS Command
Line Interface User Guide.
Python example: HTML5 User Interface (index.html)
This section provides the code for the HTML5 client described in Python example (HTML5 Client
and Python Server).
<html>
<head>
<title>Text-to-Speech Example Application</title>
<script>
/*
* This sample code requires a web browser with support for both the
* HTML5 and ECMAScript 5 standards; the following is a non-comprehensive
Python example 297
Amazon Polly Developer Guide
* list of compliant browsers and their minimum version:
*
* - Chrome 23.0+
* - Firefox 21.0+
* - Internet Explorer 9.0+
* - Edge 12.0+
* - Opera 15.0+
* - Safari 6.1+
* - Android (stock web browser) 4.4+
* - Chrome for Android 51.0+
* - Firefox for Android 48.0+
* - Opera Mobile 37.0+
* - iOS (Safari Mobile and Chrome) 3.2+
* - Internet Explorer Mobile 10.0+
* - Blackberry Browser 10.0+
*/
// Mapping of the OutputFormat parameter of the SynthesizeSpeech API
// and the audio format strings understood by the browser
var AUDIO_FORMATS = {
'ogg_vorbis': 'audio/ogg',
'mp3': 'audio/mpeg',
'pcm': 'audio/wave; codecs=1'
};
/**
* Handles fetching JSON over HTTP
*/
function fetchJSON(method, url, onSuccess, onError) {
var request = new XMLHttpRequest();
request.open(method, url, true);
request.onload = function () {
// If loading is complete
if (request.readyState === 4) {
// if the request was successful
if (request.status === 200) {
var data;
// Parse the JSON in the response
try {
data = JSON.parse(request.responseText);
} catch (error) {
onError(request.status, error.toString());
}
Python example 298
Amazon Polly Developer Guide
onSuccess(data);
} else {
onError(request.status, request.responseText)
}
}
};
request.send();
}
/**
* Returns a list of audio formats supported by the browser
*/
function getSupportedAudioFormats(player) {
return Object.keys(AUDIO_FORMATS)
.filter(function (format) {
var supported = player.canPlayType(AUDIO_FORMATS[format]);
return supported === 'probably' || supported === 'maybe';
});
}
// Initialize the application when the DOM is loaded and ready to be
// manipulated
document.addEventListener("DOMContentLoaded", function () {
var input = document.getElementById('input'),
voiceMenu = document.getElementById('voice'),
text = document.getElementById('text'),
player = document.getElementById('player'),
submit = document.getElementById('submit'),
supportedFormats = getSupportedAudioFormats(player);
// Display a message and don't allow submitting the form if the
// browser doesn't support any of the available audio formats
if (supportedFormats.length === 0) {
submit.disabled = true;
alert('The web browser in use does not support any of the' +
' available audio formats. Please try with a different' +
' one.');
}
// Play the audio stream when the form is submitted successfully
input.addEventListener('submit', function (event) {
// Validate the fields in the form, display a message if
Python example 299
Amazon Polly Developer Guide
// unexpected values are encountered
if (voiceMenu.selectedIndex <= 0 || text.value.length === 0) {
alert('Please fill in all the fields.');
} else {
var selectedVoice = voiceMenu
.options[voiceMenu.selectedIndex]
.value;
// Point the player to the streaming server
player.src = '/read?voiceId=' +
encodeURIComponent(selectedVoice) +
'&text=' + encodeURIComponent(text.value) +
'&outputFormat=' + supportedFormats[0];
player.play();
}
// Stop the form from submitting,
// Submitting the form is allowed only if the browser doesn't
// support Javascript to ensure functionality in such a case
event.preventDefault();
});
// Load the list of available voices and display them in a menu
fetchJSON('GET', '/voices',
// If the request succeeds
function (voices) {
var container = document.createDocumentFragment();
// Build the list of options for the menu
voices.forEach(function (voice) {
var option = document.createElement('option');
option.value = voice['Id'];
option.innerHTML = voice['Name'] + ' (' +
voice['Gender'] + ', ' +
voice['LanguageName'] + ')';
container.appendChild(option);
});
// Add the options to the menu and enable the form field
voiceMenu.appendChild(container);
voiceMenu.disabled = false;
},
// If the request fails
function (status, response) {
Python example 300
Amazon Polly Developer Guide
// Display a message in case loading data from the server
// fails
alert(status + ' - ' + response);
});
});
</script>
<style>
#input {
min-width: 100px;
max-width: 600px;
margin: 0 auto;
padding: 50px;
}
#input div {
margin-bottom: 20px;
}
#text {
width: 100%;
height: 200px;
display: block;
}
#submit {
width: 100%;
}
</style>
</head>
<body>
<form id="input" method="GET" action="/read">
<div>
<label for="voice">Select a voice:</label>
<select id="voice" name="voiceId" disabled>
<option value="">Choose a voice...</option>
</select>
</div>
<div>
<label for="text">Text to read:</label>
<textarea id="text" maxlength="1000" minlength="1" name="text"
placeholder="Type some text here..."></textarea>
</div>
Python example 301
Amazon Polly Developer Guide
<input type="submit" value="Read" id="submit" />
</form>
<audio id="player"></audio>
</body>
</html>
Python example: Python Server Code (server.py)
This section provides the code for the Python server described in Python example (HTML5 Client
and Python Server).
"""
Example Python 2.7+/3.3+ Application
This application consists of a HTTP 1.1 server using the HTTP chunked transfer
coding (https://tools.ietf.org/html/rfc2616#section-3.6.1) and a minimal HTML5
user interface that interacts with it.
The goal of this example is to start streaming the speech to the client (the
HTML5 web UI) as soon as the first consumable chunk of speech is returned in
order to start playing the audio as soon as possible.
For use cases where low latency and responsiveness are strong requirements,
this is the recommended approach.
The service documentation contains examples for non-streaming use cases where
waiting for the speech synthesis to complete and fetching the whole audio stream
at once are an option.
To test the application, run 'python server.py' and then open the URL
displayed in the terminal in a web browser (see index.html for a list of
supported browsers). The address and port for the server can be passed as
parameters to server.py. For more information, run: 'python server.py -h'
"""
from argparse import ArgumentParser
from collections import namedtuple
from contextlib import closing
from io import BytesIO
from json import dumps as json_encode
import os
import sys
if sys.version_info >= (3, 0):
Python example 302
Amazon Polly Developer Guide
from http.server import BaseHTTPRequestHandler, HTTPServer
from socketserver import ThreadingMixIn
from urllib.parse import parse_qs
else:
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
from SocketServer import ThreadingMixIn
from urlparse import parse_qs
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
ResponseStatus = namedtuple("HTTPStatus",
["code", "message"])
ResponseData = namedtuple("ResponseData",
["status", "content_type", "data_stream"])
# Mapping the output format used in the client to the content type for the
# response
AUDIO_FORMATS = {"ogg_vorbis": "audio/ogg",
"mp3": "audio/mpeg",
"pcm": "audio/wave; codecs=1"}
CHUNK_SIZE = 1024
HTTP_STATUS = {"OK": ResponseStatus(code=200, message="OK"),
"BAD_REQUEST": ResponseStatus(code=400, message="Bad request"),
"NOT_FOUND": ResponseStatus(code=404, message="Not found"),
"INTERNAL_SERVER_ERROR": ResponseStatus(code=500, message="Internal
server error")}
PROTOCOL = "http"
ROUTE_INDEX = "/index.html"
ROUTE_VOICES = "/voices"
ROUTE_READ = "/read"
# Create a client using the credentials and region defined in the adminuser
# section of the AWS credentials and configuration files
session = Session(profile_name="adminuser")
polly = session.client("polly")
class HTTPStatusError(Exception):
"""Exception wrapping a value from http.server.HTTPStatus"""
def __init__(self, status, description=None):
Python example 303
Amazon Polly Developer Guide
"""
Constructs an error instance from a tuple of
(code, message, description), see http.server.HTTPStatus
"""
super(HTTPStatusError, self).__init__()
self.code = status.code
self.message = status.message
self.explain = description
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
"""An HTTP Server that handle each request in a new thread"""
daemon_threads = True
class ChunkedHTTPRequestHandler(BaseHTTPRequestHandler):
""""HTTP 1.1 Chunked encoding request handler"""
# Use HTTP 1.1 as 1.0 doesn't support chunked encoding
protocol_version = "HTTP/1.1"
def query_get(self, queryData, key, default=""):
"""Helper for getting values from a pre-parsed query string"""
return queryData.get(key, [default])[0]
def do_GET(self):
"""Handles GET requests"""
# Extract values from the query string
path, _, query_string = self.path.partition('?')
query = parse_qs(query_string)
response = None
print(u"[START]: Received GET for %s with query: %s" % (path, query))
try:
# Handle the possible request paths
if path == ROUTE_INDEX:
response = self.route_index(path, query)
elif path == ROUTE_VOICES:
response = self.route_voices(path, query)
elif path == ROUTE_READ:
response = self.route_read(path, query)
else:
Python example 304
Amazon Polly Developer Guide
response = self.route_not_found(path, query)
self.send_headers(response.status, response.content_type)
self.stream_data(response.data_stream)
except HTTPStatusError as err:
# Respond with an error and log debug
# information
if sys.version_info >= (3, 0):
self.send_error(err.code, err.message, err.explain)
else:
self.send_error(err.code, err.message)
self.log_error(u"%s %s %s - [%d] %s", self.client_address[0],
self.command, self.path, err.code, err.explain)
print("[END]")
def route_not_found(self, path, query):
"""Handles routing for unexpected paths"""
raise HTTPStatusError(HTTP_STATUS["NOT_FOUND"], "Page not found")
def route_index(self, path, query):
"""Handles routing for the application's entry point'"""
try:
return ResponseData(status=HTTP_STATUS["OK"], content_type="text_html",
# Open a binary stream for reading the index
# HTML file
data_stream=open(os.path.join(sys.path[0],
path[1:]), "rb"))
except IOError as err:
# Couldn't open the stream
raise HTTPStatusError(HTTP_STATUS["INTERNAL_SERVER_ERROR"],
str(err))
def route_voices(self, path, query):
"""Handles routing for listing available voices"""
params = {}
voices = []
while True:
try:
# Request list of available voices, if a continuation token
# was returned by the previous call then use it to continue
Python example 305
Amazon Polly Developer Guide
# listing
response = polly.describe_voices(**params)
except (BotoCoreError, ClientError) as err:
# The service returned an error
raise HTTPStatusError(HTTP_STATUS["INTERNAL_SERVER_ERROR"],
str(err))
# Collect all the voices
voices.extend(response.get("Voices", []))
# If a continuation token was returned continue, stop iterating
# otherwise
if "NextToken" in response:
params = {"NextToken": response["NextToken"]}
else:
break
json_data = json_encode(voices)
bytes_data = bytes(json_data, "utf-8") if sys.version_info >= (3, 0) \
else bytes(json_data)
return ResponseData(status=HTTP_STATUS["OK"],
content_type="application/json",
# Create a binary stream for the JSON data
data_stream=BytesIO(bytes_data))
def route_read(self, path, query):
"""Handles routing for reading text (speech synthesis)"""
# Get the parameters from the query string
text = self.query_get(query, "text")
voiceId = self.query_get(query, "voiceId")
outputFormat = self.query_get(query, "outputFormat")
# Validate the parameters, set error flag in case of unexpected
# values
if len(text) == 0 or len(voiceId) == 0 or \
outputFormat not in AUDIO_FORMATS:
raise HTTPStatusError(HTTP_STATUS["BAD_REQUEST"],
"Wrong parameters")
else:
try:
# Request speech synthesis
response = polly.synthesize_speech(Text=text,
VoiceId=voiceId,
Python example 306
Amazon Polly Developer Guide
OutputFormat=outputFormat,
Engine="neural")
except (BotoCoreError, ClientError) as err:
# The service returned an error
raise HTTPStatusError(HTTP_STATUS["INTERNAL_SERVER_ERROR"],
str(err))
return ResponseData(status=HTTP_STATUS["OK"],
content_type=AUDIO_FORMATS[outputFormat],
# Access the audio stream in the response
data_stream=response.get("AudioStream"))
def send_headers(self, status, content_type):
"""Send out the group of headers for a successful request"""
# Send HTTP headers
self.send_response(status.code, status.message)
self.send_header('Content-type', content_type)
self.send_header('Transfer-Encoding', 'chunked')
self.send_header('Connection', 'close')
self.end_headers()
def stream_data(self, stream):
"""Consumes a stream in chunks to produce the response's output'"""
print("Streaming started...")
if stream:
# Note: Closing the stream is important as the service throttles on
# the number of parallel connections. Here we are using
# contextlib.closing to ensure the close method of the stream object
# will be called automatically at the end of the with statement's
# scope.
with closing(stream) as managed_stream:
# Push out the stream's content in chunks
while True:
data = managed_stream.read(CHUNK_SIZE)
self.wfile.write(b"%X\r\n%s\r\n" % (len(data), data))
# If there's no more data to read, stop streaming
if not data:
break
# Ensure any buffered output has been transmitted and close the
# stream
self.wfile.flush()
Python example 307
Amazon Polly Developer Guide
print("Streaming completed.")
else:
# The stream passed in is empty
self.wfile.write(b"0\r\n\r\n")
print("Nothing to stream.")
# Define and parse the command line arguments
cli = ArgumentParser(description='Example Python Application')
cli.add_argument(
"-p", "--port", type=int, metavar="PORT", dest="port", default=8000)
cli.add_argument(
"--host", type=str, metavar="HOST", dest="host", default="localhost")
arguments = cli.parse_args()
# If the module is invoked directly, initialize the application
if __name__ == '__main__':
# Create and configure the HTTP server instance
server = ThreadedHTTPServer((arguments.host, arguments.port),
ChunkedHTTPRequestHandler)
print("Starting server, use <Ctrl-C> to stop...")
print(u"Open {0}://{1}:{2}{3} in a web browser.".format(PROTOCOL,
arguments.host,
arguments.port,
ROUTE_INDEX))
try:
# Listen for requests indefinitely
server.serve_forever()
except KeyboardInterrupt:
# A request to terminate has been received, stop the server
print("\nShutting down...")
server.socket.close()
Python example 308
Amazon Polly Developer Guide
Java example
This example shows how to use Amazon Polly to stream speech from a Java-based application. The
example uses the AWS SDK for Java to read the specified text using a voice selected from a list.
The code shown covers major tasks, but does only minimal error checking. If Amazon Polly
encounters an error, the application terminates.
To run this example application, you need the following:
Java 8 Java Development Kit (JDK)
AWS SDK for Java
Apache Maven
To test the application
1. Ensure that the JAVA_HOME environment variable is set for the JDK.
For example, if you installed JDK 1.8.0_121 on Windows at C:\Program Files\Java
\jdk1.8.0_121, you would type the following at the command prompt:
set JAVA_HOME=""C:\Program Files\Java\jdk1.8.0_121""
If you installed JDK 1.8.0_121 in Linux at /usr/lib/jvm/java8-openjdk-amd64 , you
would type the following at the command prompt:
export JAVA_HOME=/usr/lib/jvm/java8-openjdk-amd64
2. Set the Maven environment variables to run Maven from the command line.
For example, if you installed Maven 3.3.9 on Windows at C:\Program Files\apache-
maven-3.3.9, you would type the following:
set M2_HOME=""C:\Program Files\apache-maven-3.3.9""
set M2=%M2_HOME%\bin
set PATH=%M2%;%PATH%
If you installed Maven 3.3.9 on Linux at /home/ec2-user/opt/apache-maven-3.3.9, you
would type the following:
Java example 309
Amazon Polly Developer Guide
export M2_HOME=/home/ec2-user/opt/apache-maven-3.3.9
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
3.
Create a new directory called polly-java-demo.
4.
In the polly-java-demo directory, create a new file called pom.xml, and paste the following
code into it:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/
maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.amazonaws.polly</groupId>
<artifactId>java-demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-polly -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-polly</artifactId>
<version>1.11.77</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.googlecode.soundlibs/jlayer -->
<dependency>
<groupId>com.googlecode.soundlibs</groupId>
<artifactId>jlayer</artifactId>
<version>1.0.1-1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<executions>
<execution>
<goals>
<goal>java</goal>
Java example 310
Amazon Polly Developer Guide
</goals>
</execution>
</executions>
<configuration>
<mainClass>com.amazonaws.demos.polly.PollyDemo</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
5.
Create a new directory called polly at src/main/java/com/amazonaws/demos.
6.
In the polly directory, create a new Java source file called PollyDemo.java, and paste in the
following code:
package com.amazonaws.demos.polly;
import java.io.IOException;
import java.io.InputStream;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.polly.AmazonPollyClient;
import com.amazonaws.services.polly.model.DescribeVoicesRequest;
import com.amazonaws.services.polly.model.DescribeVoicesResult;
import com.amazonaws.services.polly.model.OutputFormat;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import com.amazonaws.services.polly.model.Voice;
import javazoom.jl.player.advanced.AdvancedPlayer;
import javazoom.jl.player.advanced.PlaybackEvent;
import javazoom.jl.player.advanced.PlaybackListener;
public class PollyDemo {
private final AmazonPollyClient polly;
private final Voice voice;
private static final String SAMPLE = "Congratulations. You have successfully built
this working demo
Java example 311
Amazon Polly Developer Guide
of Amazon Polly in Java. Have fun building voice enabled apps with Amazon Polly
(that's me!), and always
look at the AWS website for tips and tricks on using Amazon Polly and other great
services from AWS";
public PollyDemo(Region region) {
// create an Amazon Polly client in a specific region
polly = new AmazonPollyClient(new DefaultAWSCredentialsProviderChain(),
new ClientConfiguration());
polly.setRegion(region);
// Create describe voices request.
DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();
// Synchronously ask Amazon Polly to describe available TTS voices.
DescribeVoicesResult describeVoicesResult =
polly.describeVoices(describeVoicesRequest);
voice = describeVoicesResult.getVoices().get(0);
}
public InputStream synthesize(String text, OutputFormat format) throws IOException
{
SynthesizeSpeechRequest synthReq =
new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
.withOutputFormat(format).withEngine("neural");
SynthesizeSpeechResult synthRes = polly.synthesizeSpeech(synthReq);
return synthRes.getAudioStream();
}
public static void main(String args[]) throws Exception {
//create the test class
PollyDemo helloWorld = new PollyDemo(Region.getRegion(Regions.US_EAST_1));
//get the audio stream
InputStream speechStream = helloWorld.synthesize(SAMPLE, OutputFormat.Mp3);
//create an MP3 player
AdvancedPlayer player = new AdvancedPlayer(speechStream,
javazoom.jl.player.FactoryRegistry.systemRegistry().createAudioDevice());
player.setPlayBackListener(new PlaybackListener() {
@Override
public void playbackStarted(PlaybackEvent evt) {
System.out.println("Playback started");
System.out.println(SAMPLE);
Java example 312
Amazon Polly Developer Guide
}
@Override
public void playbackFinished(PlaybackEvent evt) {
System.out.println("Playback finished");
}
});
// play it!
player.play();
}
}
7.
Return to the polly-java-demo directory to clean, compile, and execute the demo:
mvn clean compile exec:java
Java example 313
Amazon Polly Developer Guide
iOS example
The following example uses the iOS SDK for Amazon Polly to read the specified text using a voice
selected from a list of voices.
The code shown here covers the major tasks but does not handle errors. For the complete code, see
AWS Mobile SDK for iOS Amazon Polly demo.
Initialize
// Region of Amazon Polly.
let AwsRegion = AWSRegionType.usEast1
// Cognito pool ID. Pool needs to be unauthenticated pool with
// Amazon Polly permissions.
let CognitoIdentityPoolId = "YourCognitoIdentityPoolId"
// Initialize the Amazon Cognito credentials provider.
let credentialProvider = AWSCognitoCredentialsProvider(regionType: AwsRegion,
identityPoolId: CognitoIdentityPoolId)
// Create an audio player
var audioPlayer = AVPlayer()
Get List of Available Voices
// Use the configuration as default
AWSServiceManager.default().defaultServiceConfiguration = configuration
// Get all the voices (no parameters specified in input) from Amazon Polly
// This creates an async task.
let task = AWSPolly.default().describeVoices(AWSPollyDescribeVoicesInput())
// When the request is done, asynchronously do the following block
// (we ignore all the errors, but in a real-world scenario they need
// to be handled)
task.continue(successBlock: { (awsTask: AWSTask) -> Any? in
// awsTask.result is an instance of AWSPollyDescribeVoicesOutput in
// case of the "describeVoices" method
let voices = (awsTask.result! as AWSPollyDescribeVoicesOutput).voices
iOS example 314
Amazon Polly Developer Guide
return nil
})
Synthesize Speech
// First, Amazon Polly requires an input, which we need to prepare.
// Again, we ignore the errors, however this should be handled in
// real applications. Here we are using the URL Builder Request,
// since in order to make the synthesis quicker we will pass the
// presigned URL to the system audio player.
let input = AWSPollySynthesizeSpeechURLBuilderRequest()
// Text to synthesize
input.text = "Sample text"
// We expect the output in MP3 format
input.outputFormat = AWSPollyOutputFormat.mp3
// Choose the voice ID
input.voiceId = AWSPollyVoiceId.joanna
// Create an task to synthesize speech using the given synthesis input
let builder = AWSPollySynthesizeSpeechURLBuilder.default().getPreSignedURL(input)
// Request the URL for synthesis result
builder.continueOnSuccessWith(block: { (awsTask: AWSTask<NSURL>) -> Any? in
// The result of getPresignedURL task is NSURL.
// Again, we ignore the errors in the example.
let url = awsTask.result!
// Try playing the data using the system AVAudioPlayer
self.audioPlayer.replaceCurrentItem(with: AVPlayerItem(url: url as URL))
self.audioPlayer.play()
return nil
})
iOS example 315
Amazon Polly Developer Guide
Android example
The following example uses the Android SDK for Amazon Polly to read the specified text using a
voice selected from a list of voices.
The code shown here covers the major tasks but does not handle errors. For the complete code, see
the AWS Mobile SDK for Android Amazon Polly demo.
Initialize
// Cognito pool ID. Pool needs to be unauthenticated pool with
// Amazon Polly permissions.
String COGNITO_POOL_ID = "YourCognitoIdentityPoolId";
// Region of Amazon Polly.
Regions MY_REGION = Regions.US_EAST_1;
// Initialize the Amazon Cognito credentials provider.
CognitoCachingCredentialsProvider credentialsProvider = new
CognitoCachingCredentialsProvider(
 getApplicationContext(),
 COGNITO_POOL_ID,
 MY_REGION
);
// Create a client that supports generation of presigned URLs.
AmazonPollyPresigningClient client = new
AmazonPollyPresigningClient(credentialsProvider);
Get List of Available Voices
// Create describe voices request.
DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();
// Synchronously ask Amazon Polly to describe available TTS voices.
DescribeVoicesResult describeVoicesResult =
client.describeVoices(describeVoicesRequest);
List<Voice> voices = describeVoicesResult.getVoices();
Get URL for Audio Stream
Android example 316
Amazon Polly Developer Guide
// Create speech synthesis request.
SynthesizeSpeechPresignRequest synthesizeSpeechPresignRequest =
 new SynthesizeSpeechPresignRequest()
 // Set the text to synthesize.
 .withText("Hello world!")
 // Select voice for synthesis.
 .withVoiceId(voices.get(0).getId()) // "Joanna"
 // Set format to MP3.
 .withOutputFormat(OutputFormat.Mp3);
// Get the presigned URL for synthesized speech audio stream.
URL presignedSynthesizeSpeechUrl =
 client.getPresignedSynthesizeSpeechUrl(synthesizeSpeechPresignRequest);
Play Synthesized Speech
// Use MediaPlayer: https://developer.android.com/guide/topics/media/mediaplayer.html
// Create a media player to play the synthesized audio stream.
MediaPlayer mediaPlayer = new MediaPlayer();
mediaPlayer.setAudioStreamType(AudioManager.STREAM_MUSIC);
try {
 // Set media player's data source to previously obtained URL.
 mediaPlayer.setDataSource(presignedSynthesizeSpeechUrl.toString());
} catch (IOException e) {
 Log.e(TAG, "Unable to set data source for the media player! " + e.getMessage());
}
// Prepare the MediaPlayer asynchronously (since the data source is a network stream).
mediaPlayer.prepareAsync();
// Set the callback to start the MediaPlayer when it's prepared.
mediaPlayer.setOnPreparedListener(new MediaPlayer.OnPreparedListener() {
 @Override
 public void onPrepared(MediaPlayer mp) {
 mp.start();
 }
});
// Set the callback to release the MediaPlayer after playback is completed.
mediaPlayer.setOnCompletionListener(new MediaPlayer.OnCompletionListener() {
Android example 317
Amazon Polly Developer Guide
@Override
public void onCompletion(MediaPlayer mp) {
mp.release();
}
});
Android example 318
Amazon Polly Developer Guide
Quotas in Amazon Polly
Amazon Polly applies quotas to customer traffic by rejecting excessive requests. The default
quota for the SynthesizeSpeech request with standard voices is 80 transactions per second
(tps), in a single region, for a single AWS account. If limits did not increase, and if you generated
100 SynthesizeSpeech requests per second using a standard voice, 80 requests per second
would succeed, and 20 requests per second would be throttled by Amazon Polly. These
requests would return a response with HTTP status 400, and a response header indicating
ThrottlingException. Amazon Polly also throttles traffic to all operations based on the request
rate.
Speech synthesis limit examples
Synthesize the first 24 letters of the English alphabet one letter at a time. If the synthesis
of each letter took less than 50 milliseconds, with an operation limit of eight tps, synthesizing
24 letters would take at least three seconds. During that time, you could synthesize up to eight
letters per second. Any further requests would be throttled. As the requests last a short time,
they would be synthesized serially without overlap.
Synthesize 16 paragraphs of text. If each paragraph was synthesized and fully received on the
client side in two seconds or less, with an operation limit of eight concurrent requests, it would
take at least four seconds to synthesize all 16 articles. In the first second, you could start up
to eight requests. During concurrent requests, any attempt to start a new synthesis would be
throttled due to the concurrency limit. You could synthesize the remaining eight paragraphs
after the first two seconds, after the first batch of requests finishes.
Keep the following limits in mind when using Amazon Polly.
Topics
Supported regions
Quotas and throttle rates
Pronunciation lexicons
SynthesizeSpeech API operations
SpeechSynthesisTask API operations
Speech Synthesis Markup Language (SSML)
319
Amazon Polly Developer Guide
Supported regions
For a list of AWS Regions where Amazon Polly is available, see Amazon Polly Endpoints and Quotas
in the Amazon Web Services General Reference.
For Regions that support generative voices, see Generative voices.
For Regions that support long-form voices, see Long-form voices.
For Regions that support neural voices, see the section called “Feature and region compatibility
for neural TTS.
Quotas and throttle rates
The following table defines throttle rates per Amazon Polly operation. You can use the AWS
Management Console to request quota increases for the adjustable quotas when needed.
Operation Limit
Lexicon
DeleteLexicon
PutLexicon
GetLexicon
ListLexicons
Any 2 transactions per second (tps) from these operations
combined.
Maximum allowed burst of 4 tps.
Speech
DescribeVoices
80 tps with a burst limit of 100 tps
SynthesizeSpeech
Generative voice: 8 tps
Long-form voice: 8 tps with a burst limit of 10 tps
Neural voice: 8 tps with a burst limit of 10 tps
Standard voice: 80 tps with a burst limit of 100 tps
Supported regions 320
Amazon Polly Developer Guide
Operation Limit
StartSpeechSynthes
isTask
Generative voice: 1 tps
Long-form voice: 1 tps
Neural voice: 1 tps
Standard voice: 10 tps with a burst limit of 12 tps
GetSynthesizeSpeec
hTask and ListSynth
esizeSpeechTask
Maximum allowed 10 tps combined
Concurrent requests
For generative voice, Amazon Polly supports up to 26 concurrent requests. For long-form voice,
Amazon Polly supports up to 26 concurrent requests. For neural voice, Amazon Polly supports 8
tps with a burst limit of 10 tps, for up to 18 concurrent requests. Amazon Polly also supports limits
for concurrent requests. For standard voice, Amazon Polly supports 80 tps for up to 80 concurrent
requests.
Best practices to mitigate throttling
Retry throttles with backoff and jitter so you can spread the load over a short period of time,
and handle unexpected peaks in usage without compromising availability. AWS Code Sample
Catalog is already configured to do this by default in many programming languages. Visit feature
retry behavior to see the details.
Use Amazon Polly metrics. Amazon Polly automatically publishes to CloudWatch to analyze
your current usage and forecast usage growth.
Note
Before requesting a quota increase (where applicable), calculate your tps needs following
the guidelines on this page. Amazon Polly secures only the required computational
resources according to customer demand in order to keep your costs low.
Concurrent requests 321
Amazon Polly Developer Guide
Pronunciation lexicons
You can store up to 100 lexicons per account.
Lexicon names can be an alphanumeric string up to 20 characters long.
Each lexicon can be up to 40,000 characters in size. (Note that the size of the lexicon affects the
latency of the SynthesizeSpeech operation.)
You can specify up to 100 characters for each <phoneme> or <alias> replacement in a lexicon.
For information about using lexicons, see Managing lexicons.
SynthesizeSpeech API operations
When estimating the usage of SynthesizeSpeech, keep in mind that the audio produced by
Amazon Polly, especially for interactive applications, usually takes at least several seconds to be
played. This reduces the rate of requests to SynthesizeSpeech, even for a large number of
concurrent consumers. Additionally, Amazon Polly throttles SynthesizeSpeech requests by the
number of concurrent requests that it synthesizes. There is no separate setting for concurrent
requests. The concurrent requests limit has always the same value as the number of tps allowed
and scales with it.
Short story example application. You can use Amazon Polly to build an application that plays a
series of short stories. With this kind of app, the first story would start playing, and then the next,
and so on, until a user quit the application. Each story would take around 0.5 seconds to synthesize
and 10 seconds to play. In this scenario, you could expect one call to SynthesizeSpeech for
every 10 seconds that the customer spent using the application. This would translate to one
call per second for every 10 customers who were concurrently using the application. If you had
1000 customers concurrently using the application, you could expect an average call rate to
SynthesizeSpeech of only 100 transactions per second.
Note the following limits related to using the SynthesizeSpeech API operation:
The size of the input text can be up to 3000 billed characters (6000 total characters). SSML tags
are not counted as billed characters.
You can specify up to five lexicons to apply to the input text.
The output audio stream (synthesis) is limited to 10 minutes. After this is reached, any remaining
speech is cut off.
Pronunciation lexicons 322
Amazon Polly Developer Guide
For more information, see SynthesizeSpeech.
Note
Some limitations of the SynthesizeSpeech API operation can be bypassed using the
StartSythensizeSpeechTask API operation. For more information, see Creating long
audio files.
SpeechSynthesisTask API operations
Note the following limit relating to using the StartSpeechSynthesisTask,
GetSpeechSynthesisTask, and ListSpeechSynthesisTasks API operations:
The size of the input text can be up to 100,000 billed characters (200,000 total characters). SSML
tags are not counted as billed characters.
You can specify up to five lexicons to apply to the input text.
Speech Synthesis Markup Language (SSML)
Note the following limits related to using SSML:
The <audio>, <lexicon>, <lookup>, and <voice> tags are not supported.
<break> elements can specify a maximum duration of 10 seconds each.
The <prosody> tag doesn't support values for the rate attribute lower than -80%.
For more information, see Generating speech from SSML documents.
SpeechSynthesisTask API operations 323
Amazon Polly Developer Guide
Security in Amazon Polly
Cloud security at AWS is the highest priority. As an AWS customer, you benefit from a data center
and network architecture that is built to meet the requirements of the most security-sensitive
organizations.
Security is a shared responsibility between AWS and you. The shared responsibility model describes
this as security of the cloud and security in the cloud:
Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS
services in the AWS Cloud. AWS also provides you with services that you can use securely. Third-
party auditors regularly test and verify the effectiveness of our security as part of the AWS
Compliance Programs. To learn about the compliance programs that apply to Amazon Polly, see
AWS Services in Scope by Compliance Program.
Security in the cloud – Your responsibility is determined by the AWS service that you use.
You're also responsible for other factors including the sensitivity of your data, your company’s
requirements, and applicable laws and regulations.
This documentation helps you understand how to apply the shared responsibility model when
using Amazon Polly. The following topics show you how to configure Amazon Polly to meet your
security and compliance objectives. You also learn how to use other AWS services that help you to
monitor and secure your Amazon Polly resources.
Topics
Data Protection in Amazon Polly
Identity and Access Management in Amazon Polly
Logging and Monitoring in Amazon Polly
Compliance Validation for Amazon Polly
Resilience in Amazon Polly
Infrastructure Security in Amazon Polly
Security Best Practices for Amazon Polly
Using Amazon Polly with interface VPC endpoints
324
Amazon Polly Developer Guide
Data Protection in Amazon Polly
Amazon Polly conforms to the AWS shared responsibility model, which includes regulations and
guidelines for data protection. AWS is responsible for protecting the global infrastructure that runs
all the AWS services. AWS maintains control over data hosted on this infrastructure, including the
security configuration controls for handling customer content and personal data. AWS customers
and APN partners, acting either as data controllers or data processors, are responsible for any
personal data that they put in the AWS Cloud.
For data protection purposes, we recommend that you protect AWS account credentials and set up
individual users with AWS Identity and Access Management (IAM), so that each user is given only
the permissions necessary to fulfill their job duties. We also recommend that you secure your data
in the following ways:
Use multi-factor authentication (MFA) with each account.
Use SSL/TLS to communicate with AWS resources.
Set up API and user activity logging with AWS CloudTrail.
Use AWS encryption solutions, along with all default security controls within AWS services.
We strongly recommend that you never put sensitive identifying information, such as your
customers' account numbers, into free-form fields such as a Name field. This includes when you
work with Amazon Polly or other AWS services using the console, API, AWS CLI, or AWS SDKs.
Any data that you enter into Amazon Polly or other services might get picked up for inclusion
in diagnostic logs. When you provide a URL to an external server, don't include credentials
information in the URL to validate your request to that server.
For more information about data protection, see the AWS Shared Responsibility Model and GDPR
blog post on the AWS Security Blog.
Encryption at Rest
Output of your Amazon Polly voice synthesis can be saved on your own system. You can also call
Amazon Polly, and then encrypt the file with any encryption key of your choice and store it in
Amazon Simple Storage Service (Amazon S3) or another secure storage. The Amazon Polly the
section called “SynthesizeSpeech” operation is stateless and is not associated with a customer
identity. You can't retrieve it from Amazon Polly later.
Data Protection 325
Amazon Polly Developer Guide
Encryption in Transit
All text submissions are protected by Secure Sockets Layer (SSL) while in transit. Amazon Polly
does not retain the content of text submissions.
Internetwork Traffic Privacy
Access to Amazon Polly is via the AWS console, CLI, or SDKs. Communications utilize Transport
Layer Security (TLS) session encryption for confidentiality and digital signatures for authentication
and integrity.
Identity and Access Management in Amazon Polly
AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely
control access to AWS resources. IAM administrators control who can be authenticated (signed in)
and authorized (have permissions) to use Amazon Polly resources. IAM is an AWS service that you
can use with no additional charge.
Topics
Audience
Authenticating with identities
Managing access using policies
How Amazon Polly works with IAM
Identity-based policy examples for Amazon Polly
Amazon Polly API Permissions: Actions, Permissions, and Resources Reference
Troubleshooting Amazon Polly identity and access
Audience
How you use AWS Identity and Access Management (IAM) differs, depending on the work that you
do in Amazon Polly.
Service user – If you use the Amazon Polly service to do your job, then your administrator provides
you with the credentials and permissions that you need. As you use more Amazon Polly features to
do your work, you might need additional permissions. Understanding how access is managed can
Encryption in Transit 326
Amazon Polly Developer Guide
help you request the right permissions from your administrator. If you cannot access a feature in
Amazon Polly, see Troubleshooting Amazon Polly identity and access.
Service administrator – If you're in charge of Amazon Polly resources at your company, you
probably have full access to Amazon Polly. It's your job to determine which Amazon Polly features
and resources your service users should access. You must then submit requests to your IAM
administrator to change the permissions of your service users. Review the information on this page
to understand the basic concepts of IAM. To learn more about how your company can use IAM with
Amazon Polly, see How Amazon Polly works with IAM.
IAM administrator – If you're an IAM administrator, you might want to learn details about how you
can write policies to manage access to Amazon Polly. To view example Amazon Polly identity-based
policies that you can use in IAM, see Identity-based policy examples for Amazon Polly.
Authenticating with identities
Authentication is how you sign in to AWS using your identity credentials. You must be
authenticated (signed in to AWS) as the AWS account root user, as an IAM user, or by assuming an
IAM role.
You can sign in to AWS as a federated identity by using credentials provided through an identity
source. AWS IAM Identity Center (IAM Identity Center) users, your company's single sign-on
authentication, and your Google or Facebook credentials are examples of federated identities.
When you sign in as a federated identity, your administrator previously set up identity federation
using IAM roles. When you access AWS by using federation, you are indirectly assuming a role.
Depending on the type of user you are, you can sign in to the AWS Management Console or the
AWS access portal. For more information about signing in to AWS, see How to sign in to your AWS
account in the AWS Sign-In User Guide.
If you access AWS programmatically, AWS provides a software development kit (SDK) and a
command line interface (CLI) to cryptographically sign your requests by using your credentials. If
you don't use AWS tools, you must sign requests yourself. For more information about using the
recommended method to sign requests yourself, see Signing AWS API requests in the IAM User
Guide.
Regardless of the authentication method that you use, you might be required to provide additional
security information. For example, AWS recommends that you use multi-factor authentication
(MFA) to increase the security of your account. To learn more, see Multi-factor authentication in the
Authenticating with identities 327
Amazon Polly Developer Guide
AWS IAM Identity Center User Guide and Using multi-factor authentication (MFA) in AWS in the IAM
User Guide.
AWS account root user
When you create an AWS account, you begin with one sign-in identity that has complete access to
all AWS services and resources in the account. This identity is called the AWS account root user and
is accessed by signing in with the email address and password that you used to create the account.
We strongly recommend that you don't use the root user for your everyday tasks. Safeguard your
root user credentials and use them to perform the tasks that only the root user can perform. For
the complete list of tasks that require you to sign in as the root user, see Tasks that require root
user credentials in the IAM User Guide.
Federated identity
As a best practice, require human users, including users that require administrator access, to use
federation with an identity provider to access AWS services by using temporary credentials.
A federated identity is a user from your enterprise user directory, a web identity provider, the AWS
Directory Service, the Identity Center directory, or any user that accesses AWS services by using
credentials provided through an identity source. When federated identities access AWS accounts,
they assume roles, and the roles provide temporary credentials.
For centralized access management, we recommend that you use AWS IAM Identity Center. You can
create users and groups in IAM Identity Center, or you can connect and synchronize to a set of users
and groups in your own identity source for use across all your AWS accounts and applications. For
information about IAM Identity Center, see What is IAM Identity Center? in the AWS IAM Identity
Center User Guide.
IAM users and groups
An IAM user is an identity within your AWS account that has specific permissions for a single person
or application. Where possible, we recommend relying on temporary credentials instead of creating
IAM users who have long-term credentials such as passwords and access keys. However, if you have
specific use cases that require long-term credentials with IAM users, we recommend that you rotate
access keys. For more information, see Rotate access keys regularly for use cases that require long-
term credentials in the IAM User Guide.
An IAM group is an identity that specifies a collection of IAM users. You can't sign in as a group. You
can use groups to specify permissions for multiple users at a time. Groups make permissions easier
Authenticating with identities 328
Amazon Polly Developer Guide
to manage for large sets of users. For example, you could have a group named IAMAdmins and give
that group permissions to administer IAM resources.
Users are different from roles. A user is uniquely associated with one person or application, but
a role is intended to be assumable by anyone who needs it. Users have permanent long-term
credentials, but roles provide temporary credentials. To learn more, see When to create an IAM user
(instead of a role) in the IAM User Guide.
IAM roles
An IAM role is an identity within your AWS account that has specific permissions. It is similar to an
IAM user, but is not associated with a specific person. You can temporarily assume an IAM role in
the AWS Management Console by switching roles. You can assume a role by calling an AWS CLI or
AWS API operation or by using a custom URL. For more information about methods for using roles,
see Using IAM roles in the IAM User Guide.
IAM roles with temporary credentials are useful in the following situations:
Federated user access – To assign permissions to a federated identity, you create a role
and define permissions for the role. When a federated identity authenticates, the identity
is associated with the role and is granted the permissions that are defined by the role. For
information about roles for federation, see Creating a role for a third-party Identity Provider
in the IAM User Guide. If you use IAM Identity Center, you configure a permission set. To control
what your identities can access after they authenticate, IAM Identity Center correlates the
permission set to a role in IAM. For information about permissions sets, see Permission sets in
the AWS IAM Identity Center User Guide.
Temporary IAM user permissions – An IAM user or role can assume an IAM role to temporarily
take on different permissions for a specific task.
Cross-account access – You can use an IAM role to allow someone (a trusted principal) in a
different account to access resources in your account. Roles are the primary way to grant cross-
account access. However, with some AWS services, you can attach a policy directly to a resource
(instead of using a role as a proxy). To learn the difference between roles and resource-based
policies for cross-account access, see Cross account resource access in IAM in the IAM User Guide.
Cross-service access – Some AWS services use features in other AWS services. For example, when
you make a call in a service, it's common for that service to run applications in Amazon EC2 or
store objects in Amazon S3. A service might do this using the calling principal's permissions,
using a service role, or using a service-linked role.
Authenticating with identities 329
Amazon Polly Developer Guide
Forward access sessions (FAS) – When you use an IAM user or role to perform actions in
AWS, you are considered a principal. When you use some services, you might perform an
action that then initiates another action in a different service. FAS uses the permissions of the
principal calling an AWS service, combined with the requesting AWS service to make requests
to downstream services. FAS requests are only made when a service receives a request that
requires interactions with other AWS services or resources to complete. In this case, you must
have permissions to perform both actions. For policy details when making FAS requests, see
Forward access sessions.
Service role – A service role is an IAM role that a service assumes to perform actions on your
behalf. An IAM administrator can create, modify, and delete a service role from within IAM. For
more information, see Creating a role to delegate permissions to an AWS service in the IAM
User Guide.
Service-linked role – A service-linked role is a type of service role that is linked to an AWS
service. The service can assume the role to perform an action on your behalf. Service-linked
roles appear in your AWS account and are owned by the service. An IAM administrator can
view, but not edit the permissions for service-linked roles.
Applications running on Amazon EC2 – You can use an IAM role to manage temporary
credentials for applications that are running on an EC2 instance and making AWS CLI or AWS API
requests. This is preferable to storing access keys within the EC2 instance. To assign an AWS role
to an EC2 instance and make it available to all of its applications, you create an instance profile
that is attached to the instance. An instance profile contains the role and enables programs that
are running on the EC2 instance to get temporary credentials. For more information, see Using
an IAM role to grant permissions to applications running on Amazon EC2 instances in the IAM
User Guide.
To learn whether to use IAM roles or IAM users, see When to create an IAM role (instead of a user)
in the IAM User Guide.
Managing access using policies
You control access in AWS by creating policies and attaching them to AWS identities or resources.
A policy is an object in AWS that, when associated with an identity or resource, defines their
permissions. AWS evaluates these policies when a principal (user, root user, or role session) makes
a request. Permissions in the policies determine whether the request is allowed or denied. Most
policies are stored in AWS as JSON documents. For more information about the structure and
contents of JSON policy documents, see Overview of JSON policies in the IAM User Guide.
Managing access using policies 330
Amazon Polly Developer Guide
Administrators can use AWS JSON policies to specify who has access to what. That is, which
principal can perform actions on what resources, and under what conditions.
By default, users and roles have no permissions. To grant users permission to perform actions on
the resources that they need, an IAM administrator can create IAM policies. The administrator can
then add the IAM policies to roles, and users can assume the roles.
IAM policies define permissions for an action regardless of the method that you use to perform the
operation. For example, suppose that you have a policy that allows the iam:GetRole action. A
user with that policy can get role information from the AWS Management Console, the AWS CLI, or
the AWS API.
Identity-based policies
Identity-based policies are JSON permissions policy documents that you can attach to an identity,
such as an IAM user, group of users, or role. These policies control what actions users and roles can
perform, on which resources, and under what conditions. To learn how to create an identity-based
policy, see Creating IAM policies in the IAM User Guide.
Identity-based policies can be further categorized as inline policies or managed policies. Inline
policies are embedded directly into a single user, group, or role. Managed policies are standalone
policies that you can attach to multiple users, groups, and roles in your AWS account. Managed
policies include AWS managed policies and customer managed policies. To learn how to choose
between a managed policy or an inline policy, see Choosing between managed policies and inline
policies in the IAM User Guide.
Resource-based policies
Resource-based policies are JSON policy documents that you attach to a resource. Examples of
resource-based policies are IAM role trust policies and Amazon S3 bucket policies. In services that
support resource-based policies, service administrators can use them to control access to a specific
resource. For the resource where the policy is attached, the policy defines what actions a specified
principal can perform on that resource and under what conditions. You must specify a principal
in a resource-based policy. Principals can include accounts, users, roles, federated users, or AWS
services.
Resource-based policies are inline policies that are located in that service. You can't use AWS
managed policies from IAM in a resource-based policy.
Managing access using policies 331
Amazon Polly Developer Guide
Access control lists (ACLs)
Access control lists (ACLs) control which principals (account members, users, or roles) have
permissions to access a resource. ACLs are similar to resource-based policies, although they do not
use the JSON policy document format.
Amazon S3, AWS WAF, and Amazon VPC are examples of services that support ACLs. To learn more
about ACLs, see Access control list (ACL) overview in the Amazon Simple Storage Service Developer
Guide.
Other policy types
AWS supports additional, less-common policy types. These policy types can set the maximum
permissions granted to you by the more common policy types.
Permissions boundaries – A permissions boundary is an advanced feature in which you set
the maximum permissions that an identity-based policy can grant to an IAM entity (IAM user
or role). You can set a permissions boundary for an entity. The resulting permissions are the
intersection of an entity's identity-based policies and its permissions boundaries. Resource-based
policies that specify the user or role in the Principal field are not limited by the permissions
boundary. An explicit deny in any of these policies overrides the allow. For more information
about permissions boundaries, see Permissions boundaries for IAM entities in the IAM User Guide.
Service control policies (SCPs) – SCPs are JSON policies that specify the maximum permissions
for an organization or organizational unit (OU) in AWS Organizations. AWS Organizations is a
service for grouping and centrally managing multiple AWS accounts that your business owns. If
you enable all features in an organization, then you can apply service control policies (SCPs) to
any or all of your accounts. The SCP limits permissions for entities in member accounts, including
each AWS account root user. For more information about Organizations and SCPs, see Service
control policies in the AWS Organizations User Guide.
Session policies – Session policies are advanced policies that you pass as a parameter when you
programmatically create a temporary session for a role or federated user. The resulting session's
permissions are the intersection of the user or role's identity-based policies and the session
policies. Permissions can also come from a resource-based policy. An explicit deny in any of these
policies overrides the allow. For more information, see Session policies in the IAM User Guide.
Managing access using policies 332
Amazon Polly Developer Guide
Multiple policy types
When multiple types of policies apply to a request, the resulting permissions are more complicated
to understand. To learn how AWS determines whether to allow a request when multiple policy
types are involved, see Policy evaluation logic in the IAM User Guide.
How Amazon Polly works with IAM
Before you use IAM to manage access to Amazon Polly, learn what IAM features are available to use
with Amazon Polly.
IAM features you can use with Amazon Polly
IAM feature Amazon Polly support
Identity-based policies Yes
Resource-based policies No
Policy actions Yes
Policy resources Yes
Policy condition keys (service-specific) No
ACLs No
ABAC (tags in policies) No
Temporary credentials Yes
Forward access sessions (FAS) for Amazon
Polly
Yes
Service roles No
Service-linked roles No
To get a high-level view of how Amazon Polly and other AWS services work with most IAM
features, see AWS services that work with IAM in the IAM User Guide.
How Amazon Polly works with IAM 333
Amazon Polly Developer Guide
Identity-based policies for Amazon Polly
Supports identity-based policies: Yes
Identity-based policies are JSON permissions policy documents that you can attach to an identity,
such as an IAM user, group of users, or role. These policies control what actions users and roles can
perform, on which resources, and under what conditions. To learn how to create an identity-based
policy, see Creating IAM policies in the IAM User Guide.
With IAM identity-based policies, you can specify allowed or denied actions and resources as well
as the conditions under which actions are allowed or denied. You can't specify the principal in an
identity-based policy because it applies to the user or role to which it is attached. To learn about all
of the elements that you can use in a JSON policy, see IAM JSON policy elements reference in the
IAM User Guide.
Identity-based policy examples for Amazon Polly
To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for
Amazon Polly.
Resource-based policies within Amazon Polly
Supports resource-based policies: No
Resource-based policies are JSON policy documents that you attach to a resource. Examples of
resource-based policies are IAM role trust policies and Amazon S3 bucket policies. In services that
support resource-based policies, service administrators can use them to control access to a specific
resource. For the resource where the policy is attached, the policy defines what actions a specified
principal can perform on that resource and under what conditions. You must specify a principal
in a resource-based policy. Principals can include accounts, users, roles, federated users, or AWS
services.
To enable cross-account access, you can specify an entire account or IAM entities in another
account as the principal in a resource-based policy. Adding a cross-account principal to a resource-
based policy is only half of establishing the trust relationship. When the principal and the resource
are in different AWS accounts, an IAM administrator in the trusted account must also grant
the principal entity (user or role) permission to access the resource. They grant permission by
attaching an identity-based policy to the entity. However, if a resource-based policy grants access
to a principal in the same account, no additional identity-based policy is required. For more
information, see Cross account resource access in IAM in the IAM User Guide.
How Amazon Polly works with IAM 334
Amazon Polly Developer Guide
Policy actions for Amazon Polly
Supports policy actions: Yes
Administrators can use AWS JSON policies to specify who has access to what. That is, which
principal can perform actions on what resources, and under what conditions.
The Action element of a JSON policy describes the actions that you can use to allow or deny
access in a policy. Policy actions usually have the same name as the associated AWS API operation.
There are some exceptions, such as permission-only actions that don't have a matching API
operation. There are also some operations that require multiple actions in a policy. These
additional actions are called dependent actions.
Include actions in a policy to grant permissions to perform the associated operation.
To see a list of Amazon Polly actions, see Actions defined by Amazon Polly in the Service
Authorization Reference.
Policy actions in Amazon Polly use the following prefix before the action:
polly
To specify multiple actions in a single statement, separate them with commas.
"Action": [
"polly:action1",
"polly:action2"
]
To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for
Amazon Polly.
Policy resources for Amazon Polly
Supports policy resources: Yes
Administrators can use AWS JSON policies to specify who has access to what. That is, which
principal can perform actions on what resources, and under what conditions.
How Amazon Polly works with IAM 335
Amazon Polly Developer Guide
The Resource JSON policy element specifies the object or objects to which the action applies.
Statements must include either a Resource or a NotResource element. As a best practice,
specify a resource using its Amazon Resource Name (ARN). You can do this for actions that support
a specific resource type, known as resource-level permissions.
For actions that don't support resource-level permissions, such as listing operations, use a wildcard
(*) to indicate that the statement applies to all resources.
"Resource": "*"
To see a list of Amazon Polly resource types and their ARNs, see Resources defined by Amazon
Polly in the Service Authorization Reference. To learn with which actions you can specify the ARN of
each resource, see Actions defined by Amazon Polly.
To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for
Amazon Polly.
Policy condition keys for Amazon Polly
Supports service-specific policy condition keys: No
Administrators can use AWS JSON policies to specify who has access to what. That is, which
principal can perform actions on what resources, and under what conditions.
The Condition element (or Condition block) lets you specify conditions in which a statement
is in effect. The Condition element is optional. You can create conditional expressions that use
condition operators, such as equals or less than, to match the condition in the policy with values in
the request.
If you specify multiple Condition elements in a statement, or multiple keys in a single
Condition element, AWS evaluates them using a logical AND operation. If you specify multiple
values for a single condition key, AWS evaluates the condition using a logical OR operation. All of
the conditions must be met before the statement's permissions are granted.
You can also use placeholder variables when you specify conditions. For example, you can grant
an IAM user permission to access a resource only if it is tagged with their IAM user name. For more
information, see IAM policy elements: variables and tags in the IAM User Guide.
AWS supports global condition keys and service-specific condition keys. To see all AWS global
condition keys, see AWS global condition context keys in the IAM User Guide.
How Amazon Polly works with IAM 336
Amazon Polly Developer Guide
To see a list of Amazon Polly condition keys, see Condition keys for Amazon Polly in the Service
Authorization Reference. To learn with which actions and resources you can use a condition key, see
Actions defined by Amazon Polly.
To view examples of Amazon Polly identity-based policies, see Identity-based policy examples for
Amazon Polly.
ACLs in Amazon Polly
Supports ACLs: No
Access control lists (ACLs) control which principals (account members, users, or roles) have
permissions to access a resource. ACLs are similar to resource-based policies, although they do not
use the JSON policy document format.
ABAC with Amazon Polly
Supports ABAC (tags in policies): No
Attribute-based access control (ABAC) is an authorization strategy that defines permissions based
on attributes. In AWS, these attributes are called tags. You can attach tags to IAM entities (users or
roles) and to many AWS resources. Tagging entities and resources is the first step of ABAC. Then
you design ABAC policies to allow operations when the principal's tag matches the tag on the
resource that they are trying to access.
ABAC is helpful in environments that are growing rapidly and helps with situations where policy
management becomes cumbersome.
To control access based on tags, you provide tag information in the condition element of a policy
using the aws:ResourceTag/key-name, aws:RequestTag/key-name, or aws:TagKeys
condition keys.
If a service supports all three condition keys for every resource type, then the value is Yes for the
service. If a service supports all three condition keys for only some resource types, then the value is
Partial.
For more information about ABAC, see What is ABAC? in the IAM User Guide. To view a tutorial with
steps for setting up ABAC, see Use attribute-based access control (ABAC) in the IAM User Guide.
Using temporary credentials with Amazon Polly
Supports temporary credentials: Yes
How Amazon Polly works with IAM 337
Amazon Polly Developer Guide
Some AWS services don't work when you sign in using temporary credentials. For additional
information, including which AWS services work with temporary credentials, see AWS services that
work with IAM in the IAM User Guide.
You are using temporary credentials if you sign in to the AWS Management Console using
any method except a user name and password. For example, when you access AWS using your
company's single sign-on (SSO) link, that process automatically creates temporary credentials. You
also automatically create temporary credentials when you sign in to the console as a user and then
switch roles. For more information about switching roles, see Switching to a role (console) in the
IAM User Guide.
You can manually create temporary credentials using the AWS CLI or AWS API. You can then use
those temporary credentials to access AWS. AWS recommends that you dynamically generate
temporary credentials instead of using long-term access keys. For more information, see
Temporary security credentials in IAM.
Cross-service forward access sessions (FAS) for Amazon Polly
Supports forward access sessions (FAS): Yes
When you use an IAM user or role to perform actions in AWS, you are considered a principal.
When you use some services, you might perform an action that then initiates another action in a
different service. FAS uses the permissions of the principal calling an AWS service, combined with
the requesting AWS service to make requests to downstream services. FAS requests are only made
when a service receives a request that requires interactions with other AWS services or resources to
complete. In this case, you must have permissions to perform both actions. For policy details when
making FAS requests, see Forward access sessions.
Service roles for Amazon Polly
Supports service roles: No
A service role is an IAM role that a service assumes to perform actions on your behalf. An IAM
administrator can create, modify, and delete a service role from within IAM. For more information,
see Creating a role to delegate permissions to an AWS service in the IAM User Guide.
Warning
Changing the permissions for a service role might break Amazon Polly functionality. Edit
service roles only when Amazon Polly provides guidance to do so.
How Amazon Polly works with IAM 338
Amazon Polly Developer Guide
Service-linked roles for Amazon Polly
Supports service-linked roles: No
A service-linked role is a type of service role that is linked to an AWS service. The service can
assume the role to perform an action on your behalf. Service-linked roles appear in your AWS
account and are owned by the service. An IAM administrator can view, but not edit the permissions
for service-linked roles.
For details about creating or managing service-linked roles, see AWS services that work with IAM.
Find a service in the table that includes a Yes in the Service-linked role column. Choose the Yes
link to view the service-linked role documentation for that service.
Amazon Polly IAM roles
You can attach an identity-based permissions policy to an IAM role to grant cross-account
permissions. For example, the administrator in account A can create a role to grant cross-account
permissions to another AWS account (for example, account B) or an AWS service as follows:
1. Account A administrator creates an IAM role and attaches a permissions policy to the role that
grants permissions on resources in account A.
2. Account A administrator attaches a trust policy to the role identifying account B as the principal
who can assume the role.
3. Account B administrator can then delegate permissions to assume the role to any users in
account B. Doing this allows users in account B to create or access resources in account A. The
principal in the trust policy can also be an AWS service principal if you want to grant an AWS
service permissions to assume the role.
For more information about using IAM to delegate permissions, see Access Management in the IAM
User Guide.
The following is an example policy that grants permissions to put and get lexicons as well as to list
those lexicons currently available.
Amazon Polly supports Identity-based policies for actions at the resource-level. In some
cases, the resource can be limited by an ARN. This is true for the SynthesizeSpeech,
StartSpeechSynthesisTask, PutLexicon, GetLexicon, and DeleteLexicon operations.
In these cases, the Resource value is indicated by the ARN. For example: arn:aws:polly:us-
How Amazon Polly works with IAM 339
Amazon Polly Developer Guide
east-2:account-id:lexicon/* as the Resource value specifies permissions on all owned
lexicons within the us-east-2 Region.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowPut-Get-ListActions",
"Effect": "Allow",
"Action": [
"polly:PutLexicon",
"polly:GetLexicon",
"polly:ListLexicons"],
"Resource": "arn:aws:polly:us-east-2:account-id:lexicon/*"
}
]
}
However, not all operations use ARNs. This is the case with the DescribeVoices, ListLexicons,
GetSpeechSynthesisTasks, and ListSpeechSynthesisTasks operations.
For more information about users, groups, roles, and permissions, see Identities (Users, Groups, and
Roles) in the IAM User Guide.
Identity-based policy examples for Amazon Polly
By default, users and roles don't have permission to create or modify Amazon Polly resources. They
also can't perform tasks by using the AWS Management Console, AWS Command Line Interface
(AWS CLI), or AWS API. To grant users permission to perform actions on the resources that they
need, an IAM administrator can create IAM policies. The administrator can then add the IAM
policies to roles, and users can assume the roles.
To learn how to create an IAM identity-based policy by using these example JSON policy
documents, see Creating IAM policies in the IAM User Guide.
For details about actions and resource types defined by Amazon Polly, including the format of the
ARNs for each of the resource types, see Actions, resources, and condition keys for Amazon Polly in
the Service Authorization Reference.
Topics
Policy best practices
Identity-based policy examples 340
Amazon Polly Developer Guide
Using the Amazon Polly console
Allow users to view their own permissions
AWS managed (predefined) policies for Amazon Polly
Customer-managed policy examples
Policy best practices
Identity-based policies determine whether someone can create, access, or delete Amazon Polly
resources in your account. These actions can incur costs for your AWS account. When you create or
edit identity-based policies, follow these guidelines and recommendations:
Get started with AWS managed policies and move toward least-privilege permissions – To
get started granting permissions to your users and workloads, use the AWS managed policies
that grant permissions for many common use cases. They are available in your AWS account. We
recommend that you reduce permissions further by defining AWS customer managed policies
that are specific to your use cases. For more information, see AWS managed policies or AWS
managed policies for job functions in the IAM User Guide.
Apply least-privilege permissions – When you set permissions with IAM policies, grant only the
permissions required to perform a task. You do this by defining the actions that can be taken on
specific resources under specific conditions, also known as least-privilege permissions. For more
information about using IAM to apply permissions, see Policies and permissions in IAM in the
IAM User Guide.
Use conditions in IAM policies to further restrict access – You can add a condition to your
policies to limit access to actions and resources. For example, you can write a policy condition to
specify that all requests must be sent using SSL. You can also use conditions to grant access to
service actions if they are used through a specific AWS service, such as AWS CloudFormation. For
more information, see IAM JSON policy elements: Condition in the IAM User Guide.
Use IAM Access Analyzer to validate your IAM policies to ensure secure and functional
permissions – IAM Access Analyzer validates new and existing policies so that the policies
adhere to the IAM policy language (JSON) and IAM best practices. IAM Access Analyzer provides
more than 100 policy checks and actionable recommendations to help you author secure and
functional policies. For more information, see IAM Access Analyzer policy validation in the IAM
User Guide.
Require multi-factor authentication (MFA) – If you have a scenario that requires IAM users
or a root user in your AWS account, turn on MFA for additional security. To require MFA when
Identity-based policy examples 341
Amazon Polly Developer Guide
API operations are called, add MFA conditions to your policies. For more information, see
Configuring MFA-protected API access in the IAM User Guide.
For more information about best practices in IAM, see Security best practices in IAM in the IAM User
Guide.
Using the Amazon Polly console
To access the Amazon Polly console, you must have a minimum set of permissions. These
permissions must allow you to list and view details about the Amazon Polly resources in your AWS
account. If you create an identity-based policy that is more restrictive than the minimum required
permissions, the console won't function as intended for entities (users or roles) with that policy.
You don't need to allow minimum console permissions for users that are making calls only to the
AWS CLI or the AWS API. Instead, allow access to only the actions that match the API operation
that they're trying to perform.
To ensure that users and roles can still use the Amazon Polly console, also attach the Amazon Polly
ConsoleAccess or ReadOnly AWS managed policy to the entities. For more information, see
Adding permissions to a user in the IAM User Guide.
To use the Amazon Polly console, grant permissions to all the Amazon Polly APIs. There are no
additional permissions needed. To get full console functionality you can use following policy:.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "Console-AllowAllPollyActions",
"Effect": "Allow",
"Action": [
"polly:*"],
"Resource": "*"
}
]
}
Allow users to view their own permissions
This example shows how you might create a policy that allows IAM users to view the inline and
managed policies that are attached to their user identity. This policy includes permissions to
complete this action on the console or programmatically using the AWS CLI or AWS API.
Identity-based policy examples 342
Amazon Polly Developer Guide
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ViewOwnUserInfo",
"Effect": "Allow",
"Action": [
"iam:GetUserPolicy",
"iam:ListGroupsForUser",
"iam:ListAttachedUserPolicies",
"iam:ListUserPolicies",
"iam:GetUser"
],
"Resource": ["arn:aws:iam::*:user/${aws:username}"]
},
{
"Sid": "NavigateInConsole",
"Effect": "Allow",
"Action": [
"iam:GetGroupPolicy",
"iam:GetPolicyVersion",
"iam:GetPolicy",
"iam:ListAttachedGroupPolicies",
"iam:ListGroupPolicies",
"iam:ListPolicyVersions",
"iam:ListPolicies",
"iam:ListUsers"
],
"Resource": "*"
}
]
}
AWS managed (predefined) policies for Amazon Polly
AWS addresses many common use cases by providing standalone IAM policies that are created
and administered by AWS. These AWS managed policies grant necessary permissions for common
use cases so that you can avoid having to investigate what permissions are needed. For more
information, see AWS Managed Policies in the IAM User Guide.
The following AWS managed policies, which you can attach to users in your account, are specific to
Amazon Polly:
Identity-based policy examples 343
Amazon Polly Developer Guide
AmazonPollyReadOnlyAccess – Grants read-only access to resources, allows listing lexicons,
fetching lexicons, listing available voices and synthesizing speech (including, applying lexicons to
the synthesized speech).
AmazonPollyFullAccess – Grants full access to resources and all the supported operations.
Note
You can review these permissions policies by signing in to the IAM console and searching
for specific policies there.
You can also create your own custom IAM policies to allow permissions for Amazon Polly actions
and resources. You can attach these custom policies to the IAM users or groups that require those
permissions.
Customer-managed policy examples
In this section, you can find example user policies that grant permissions for various Amazon Polly
actions. These policies work when you're using AWS SDKs or the AWS CLI. When you're using the
console, grant permissions to all the Amazon Polly APIs.
Note
All examples use the us-east-2 Region and contain fictitious account IDs.
Examples
Example 1: Allow All Amazon Polly Actions
Example 2: Allow all Amazon Polly actions except DeleteLexicon
Example 3: Allow DeleteLexicon
Example 4: Allow Delete Lexicon in a specified Region
Example 5: Allow DeleteLexicon for specified Lexicon
Example 1: Allow All Amazon Polly Actions
After you sign up (see Setting up Amazon Polly) create an administrator user to manage your
account, including creating users and managing their permissions.
Identity-based policy examples 344
Amazon Polly Developer Guide
You might create a user who has permissions for all Amazon Polly actions. Think of this user as
a service-specific administrator for working with Amazon Polly. You can attach the following
permissions policy to this user.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowAllPollyActions",
"Effect": "Allow",
"Action": [
"polly:*"],
"Resource": "*"
}
]
}
Example 2: Allow all Amazon Polly actions except DeleteLexicon
The following permissions policy grants the user permissions to perform all actions except
DeleteLexicon, with the permissions for delete explicitly denied in all Regions.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowAllActions-DenyDelete",
"Effect": "Allow",
"Action": [
"polly:DescribeVoices",
"polly:GetLexicon",
"polly:PutLexicon",
"polly:SynthesizeSpeech",
"polly:ListLexicons"],
"Resource": "*"
}
{
"Sid": "DenyDeleteLexicon",
"Effect": "Deny",
"Action": [
"polly:DeleteLexicon"],
"Resource": "*"
}
]
Identity-based policy examples 345
Amazon Polly Developer Guide
}
Example 3: Allow DeleteLexicon
The following permissions policy grants the user permissions to delete any lexicon that you own
regardless of the project or Region in which it is located.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowDeleteLexicon",
"Effect": "Allow",
"Action": [
"polly:DeleteLexicon"],
"Resource": "*"
}
]
}
Example 4: Allow Delete Lexicon in a specified Region
The following permissions policy grants the user permissions to delete any lexicon in any project
that you own that is located in a single Region (in this case, us-east-2).
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowDeleteSpecifiedRegion",
"Effect": "Allow",
"Action": [
"polly:DeleteLexicon"],
"Resource": "arn:aws:polly:us-east-2:123456789012:lexicon/*"
}
]
}
Example 5: Allow DeleteLexicon for specified Lexicon
The following permissions policy grants the user permissions to delete a specific lexicon that you
own (in this case, myLexicon) in a specific Region (in this case, us-east-2).
{
Identity-based policy examples 346
Amazon Polly Developer Guide
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowDeleteForSpecifiedLexicon",
"Effect": "Allow",
"Action": [
"polly:DeleteLexicon"],
"Resource": "arn:aws:polly:us-east-2:123456789012:lexicon/myLexicon"
}
]
}
Amazon Polly API Permissions: Actions, Permissions, and Resources
Reference
When you're setting up a permissions policy that you can attach to an IAM identity (identity-based
policies), you can use the following list as a reference. The list includes each Amazon Polly API
operation, the corresponding actions for which you can grant permissions to perform the action,
and the AWS resource for which you can grant the permissions. You specify the actions in the
policy's Action field, and you specify the resource value in the policy's Resource field.
You can use AWS-wide condition keys in your Amazon Polly policies to express conditions. For a
complete list of AWS-wide keys, see available keys in the IAM User Guide.
Note
To specify an action, use the polly prefix followed by the API operation name (for
example, polly:GetLexicon).
Amazon Polly supports Identity-based policies for actions at the resource-level. Therefore, the
Resource value is indicated by the ARN. For example: arn:aws:polly:us-east-2:account-
id:lexicon/* as the Resource value specifies permissions on all owned lexicons within the us-
east-2 Region.
Because Amazon Polly doesn't support permissions for actions at the resource-level, most
policies specify a wildcard character (*) as the Resource value. However, if it is necessary to limit
permissions to a specific Region this wildcard character is replaced with the appropriate ARN:
arn:aws:polly:region:account-id:lexicon/*.
Amazon Polly API Permissions Reference 347
Amazon Polly Developer Guide
Amazon Polly API and Required Permissions for Actions
API Operation: DeleteLexicon
Required Permissions (API Action): polly:DeleteLexicon
Resources: arn:aws:polly:region:account-id:lexicon/LexiconName
API Operation: DescribeVoices
Required Permissions (API Action): polly:DescribeVoices
Resources: arn:aws:polly:region:account-id:lexicon/voice-name
API Operation: GetLexicon
Required Permissions (API Action): polly:GetLexicon
Resources: arn:aws:polly:region:account-id:lexicon/voice-name
API Operation: ListLexicons
Required Permissions (API Action): polly:ListLexicons
Resources: arn:aws:polly:region:account-id:lexicon/*
API Operation: PutLexicon
Required Permissions (API Action): polly:ListLexicons
Resources: *
API Operation: SynthesizeSpeech
Required Permissions (API Action): polly:SynthesizeSpeech
Resources: *
Troubleshooting Amazon Polly identity and access
Use the following information to help you diagnose and fix common issues that you might
encounter when working with Amazon Polly and IAM.
Topics
Troubleshooting 348
Amazon Polly Developer Guide
I am not authorized to perform an action in Amazon Polly
I am not authorized to perform iam:PassRole
I want to allow people outside of my AWS account to access my Amazon Polly resources
I am not authorized to perform an action in Amazon Polly
If you receive an error that you're not authorized to perform an action, your policies must be
updated to allow you to perform the action.
The following example error occurs when the mateojackson IAM user tries to use the console
to view details about a fictional my-example-widget resource but doesn't have the fictional
polly:GetWidget permissions.
User: arn:aws:iam::123456789012:user/mateojackson is not authorized to perform:
polly:GetWidget on resource: my-example-widget
In this case, the policy for the mateojackson user must be updated to allow access to the my-
example-widget resource by using the polly:GetWidget action.
If you need help, contact your AWS administrator. Your administrator is the person who provided
you with your sign-in credentials.
I am not authorized to perform iam:PassRole
If you receive an error that you're not authorized to perform the iam:PassRole action, your
policies must be updated to allow you to pass a role to Amazon Polly.
Some AWS services allow you to pass an existing role to that service instead of creating a new
service role or service-linked role. To do this, you must have permissions to pass the role to the
service.
The following example error occurs when an IAM user named marymajor tries to use the console
to perform an action in Amazon Polly. However, the action requires the service to have permissions
that are granted by a service role. Mary does not have permissions to pass the role to the service.
User: arn:aws:iam::123456789012:user/marymajor is not authorized to perform:
iam:PassRole
In this case, Mary's policies must be updated to allow her to perform the iam:PassRole action.
Troubleshooting 349
Amazon Polly Developer Guide
If you need help, contact your AWS administrator. Your administrator is the person who provided
you with your sign-in credentials.
I want to allow people outside of my AWS account to access my Amazon Polly
resources
You can create a role that users in other accounts or people outside of your organization can use to
access your resources. You can specify who is trusted to assume the role. For services that support
resource-based policies or access control lists (ACLs), you can use those policies to grant people
access to your resources.
To learn more, consult the following:
To learn whether Amazon Polly supports these features, see How Amazon Polly works with IAM.
To learn how to provide access to your resources across AWS accounts that you own, see
Providing access to an IAM user in another AWS account that you own in the IAM User Guide.
To learn how to provide access to your resources to third-party AWS accounts, see Providing
access to AWS accounts owned by third parties in the IAM User Guide.
To learn how to provide access through identity federation, see Providing access to externally
authenticated users (identity federation) in the IAM User Guide.
To learn the difference between using roles and resource-based policies for cross-account access,
see Cross account resource access in IAM in the IAM User Guide.
Logging and Monitoring in Amazon Polly
Monitoring is an important part of maintaining the reliability, availability, and performance of your
Amazon Polly applications. To monitor Amazon Polly API calls, you can use AWS CloudTrail. To
monitor the status of your jobs, use Amazon CloudWatch Logs.
Amazon CloudWatch Alarms – Using CloudWatch alarms, you watch a single metric over a
time period that you specify. If the metric exceeds a given threshold, a notification is sent to
an Amazon Simple Notification Service topic or AWS Auto Scaling policy. CloudWatch alarms
don't invoke actions when a metric is in a particular state. Rather the state must have changed
and been maintained for a specified number of periods. For more information, see Integrating
CloudWatch with Amazon Polly.
CloudTrail logs – CloudTrail provides a record of actions taken by a user, role, or an AWS
service in Amazon Polly. Using the information collected by CloudTrail, you can determine the
Logging and Monitoring 350
Amazon Polly Developer Guide
request that was made to Amazon Polly. You can also determine the IP address from which the
request was made, who made the request, when it was made, and additional details. For more
information, see Logging Amazon Polly API calls with AWS CloudTrail.
Compliance Validation for Amazon Polly
Third-party auditors assess the security and compliance of Amazon Polly as part of multiple AWS
compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others.
For a list of AWS services in scope of specific compliance programs, see AWS Services in Scope by
Compliance Program. For general information, see AWS Compliance Programs.
You can download third-party audit reports using AWS Artifact. For more information, see
Downloading Reports in AWS Artifact.
Your compliance responsibility when using Amazon Polly is determined by the sensitivity of your
data, your company's compliance objectives, and applicable laws and regulations. AWS provides the
following resources to help with compliance:
Security and Compliance Quick Start Guides – These deployment guides discuss architectural
considerations and provide steps for deploying security- and compliance-focused baseline
environments on AWS.
Architecting for HIPAA Security and Compliance Whitepaper – This whitepaper describes how
companies can use AWS to create HIPAA-compliant applications.
AWS Compliance Resources – This collection of workbooks and guides might apply to your
industry and location.
Evaluating Resources with Rules in the AWS Config Developer Guide – The AWS Config service
assesses how well your resource configurations comply with internal practices, industry
guidelines, and regulations.
AWS Security Hub – This AWS service provides a comprehensive view of your security state within
AWS that helps you check your compliance with security industry standards and best practices.
Resilience in Amazon Polly
The AWS global infrastructure is built around AWS Regions and Availability Zones. AWS Regions
provide multiple physically separated and isolated Availability Zones, which are connected with
Compliance Validation 351
Amazon Polly Developer Guide
low-latency, high-throughput, and highly redundant networking. With Availability Zones, you
can design and operate applications and databases that automatically fail over between zones
without interruption. Availability Zones are more highly available, fault tolerant, and scalable than
traditional single or multiple data center infrastructures.
For more information about AWS Regions and Availability Zones, see AWS Global Infrastructure.
Infrastructure Security in Amazon Polly
As a managed service, Amazon Polly is protected by the AWS global network security procedures
that are described in the Amazon Web Services: Overview of Security Processes whitepaper.
You use AWS published API calls to access Amazon Polly through the network. Clients must
support Transport Layer Security (TLS) 1.0 or later. We recommend TLS 1.2 or later. Clients must
also support cipher suites with perfect forward secrecy (PFS) such as Ephemeral Diffie-Hellman
(DHE) or Elliptic Curve Ephemeral Diffie-Hellman (ECDHE). Most modern systems such as Java 7
and later support these modes.
Additionally, requests must be signed by using an access key ID and a secret access key that is
associated with an IAM principal. Or you can use the AWS Security Token Service (AWS STS) to
generate temporary security credentials to sign requests.
Security Best Practices for Amazon Polly
Your trust, privacy, and the security of your content are our highest priorities. We implement
responsible and sophisticated technical and physical controls designed to prevent unauthorized
access to, or disclosure of, your content and ensure that our use complies with our commitments to
you. For more information, see AWS Data Privacy FAQ.
Amazon Polly does not retain the the content of text submissions.
For a broad view of AWS security, including compliance, penetration testing, bulletins, and
resources, visit the AWS Cloud Security website.
Using Amazon Polly with interface VPC endpoints
If you use Amazon Virtual Private Cloud (Amazon VPC) to host your AWS resources, you can
establish a private connection between your VPC and Amazon Polly. You can use this connection to
synthesize speech with Amazon Polly without traversing the public internet.
Infrastructure Security 352
Amazon Polly Developer Guide
Amazon VPC is an AWS service that you can use to launch AWS resources in a virtual network that
you define. With a VPC, you have control over your network settings, such the IP address range,
subnets, route tables, and network gateways. To connect your VPC to Amazon Polly, you define an
interface VPC endpoint for Amazon Polly. This type of endpoint enables you to connect your VPC
to AWS services. The endpoint provides reliable, scalable connectivity to Amazon Polly without
requiring an internet gateway, network address translation (NAT) instance, or VPN connection. For
more information, see the What is Amazon VPC in the Amazon VPC User Guide.
Interface VPC endpoints are powered by AWS PrivateLink, an AWS technology that enables private
communication between AWS services using an elastic network interface with private IP addresses.
For more information, see New - AWS PrivateLink for AWS services.
The following steps are for users of Amazon VPC. For more information, see Getting Started in the
Amazon VPC User Guide.
Availability
VPC endpoints are supported in all the Regions where Amazon Polly is supported. For more
information about AWS Regions and Availability Zones, see AWS Global Infrastructure.
Creating a VPC endpoint for Amazon Polly
To start using Amazon Polly with your VPC, create an interface VPC endpoint for Amazon Polly.
The service to choose is com.amazonaws.Region.polly. You don't need to change any settings for
Amazon Polly. For more information, see Creating an Interface Endpoint in the Amazon VPC User
Guide.
Testing the connection between your VPC and Amazon Polly
After you create the endpoint, you can test the connection.
To test the connection between your VPC and your Amazon Polly endpoint
1. Connect to an Amazon EC2 instance that resides in your VPC. For information about connecting,
see Connect to your Linux instance or Connecting to your Windows instance in the Amazon EC2
documentation.
2.
From the instance, use aws polly describe-voices from the AWS CLI to list available
Amazon Polly voices.
Availability 353
Amazon Polly Developer Guide
If the response to the command includes the list of available Amazon Polly voices, the command
has succeeded, and your VPC endpoint is working.
Controlling access to your Amazon Polly endpoint
A VPC endpoint policy is an IAM resource policy that you attach to an endpoint when you create or
modify the endpoint. If you don't attach a policy when you create an endpoint, we attach a default
policy for you that allows full access to the service. An endpoint policy doesn't override or replace
IAM user policies or service-specific policies. It's a separate policy for controlling access from the
endpoint to the specified service.
Endpoint policies must be written in JSON format.
For more information, see Controlling Access to Services with VPC Endpoints in the Amazon VPC
User Guide.
The following is an example of an endpoint policy for Amazon Polly. This policy enables users
connecting to Amazon Polly through the VPC to describe voices and synthesize speech with
Amazon Polly, and prevents them from performing other Amazon Polly actions.
{
"Statement": [
{
"Sid": "SynthesisAndDescribeVoicesOnly",
"Principal": "*",
"Action": [
"polly:DescribeVoices",
"polly:SynthesizeSpeech"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
To modify the VPC endpoint policy for Amazon Polly
1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc.
2. In the navigation pane, choose Endpoints.
3. If you have not already created the endpoint for Amazon Polly, choose Create endpoint. Then
select com.amazonaws.Region.polly and choose Create endpoint.
Controlling access to your Amazon Polly endpoint 354
Amazon Polly Developer Guide
4.
Select the com.amazonaws.Region.polly endpoint, and choose the Policy tab in the lower half
of the screen.
5. Choose Edit Policy and make the changes to the policy.
Support for VPC context keys
Amazon Polly supports the aws:SourceVpc and aws:SourceVpce context keys that can limit
access to specific VPCs or specific VPC endpoints. These keys work only when the user is using VPC
endpoints. For more information, see Keys Available for Some Services in the IAM user Guide.
Support for VPC context keys 355
Amazon Polly Developer Guide
Logging Amazon Polly API calls with AWS CloudTrail
Amazon Polly is integrated with AWS CloudTrail, a service that provides a record of actions taken
by a user, role, or an AWS service in Amazon Polly. CloudTrail captures all API calls for Amazon
Polly as events. The calls captured include calls from the Amazon Polly console and code calls
to the Amazon Polly API operations. If you create a trail, you can enable continuous delivery
of CloudTrail events to an Amazon S3 bucket, including events for Amazon Polly. If you don't
configure a trail, you can still view the most recent events in the CloudTrail console in Event
history. Using the information collected by CloudTrail, you can determine the request that was
made to Amazon Polly, the IP address from which the request was made, who made the request,
when it was made, and additional details.
To learn more about CloudTrail, including how to configure and enable it, see the AWS CloudTrail
User Guide.
Amazon Polly information in CloudTrail
CloudTrail is enabled on your AWS account when you create the account. When supported event
activity occurs in Amazon Polly, that activity is recorded in a CloudTrail event along with other AWS
service events in Event history. You can view, search, and download recent events in your AWS
account. For more information, see Viewing Events with CloudTrail Event History.
For an ongoing record of events in your AWS account, including events for Amazon Polly, create
a trail. A trail enables CloudTrail to deliver log files to an Amazon S3 bucket. By default, when
you create a trail in the console, the trail applies to all AWS Regions. The trail logs events from all
Regions in the AWS partition and delivers the log files to the Amazon S3 bucket that you specify.
Additionally, you can configure other AWS services to further analyze and act upon the event data
collected in CloudTrail logs. For more information, see the following:
Overview for Creating a Trail
CloudTrail Supported Services and Integrations
Configuring Amazon SNS Notifications for CloudTrail
Receiving CloudTrail Log Files from Multiple Regions and Receiving CloudTrail Log Files from
Multiple Accounts
Amazon Polly supports logging the following actions as events in CloudTrail log files:
Amazon Polly information in CloudTrail 356
Amazon Polly Developer Guide
DeleteLexicon
DescribeVoices
GetLexicon
GetSpeechSynthesisTask
ListLexicons
ListSpeechSynthesisTasks
PutLexicon
StartSpeechSynthesisTask
SynthesizeSpeech
Every event or log entry contains information about who generated the request. The identity
information helps you determine the following:
Whether the request was made with root user or AWS Identity and Access Management (IAM)
user credentials.
Whether the request was made with temporary security credentials for a role or federated user.
Whether the request was made by another AWS service.
For more information, see the CloudTrail userIdentity Element.
Example: Amazon Polly Log File Entries
A trail is a configuration that enables delivery of events as log files to an Amazon S3 bucket that
you specify. CloudTrail log files contain one or more log entries. An event represents a single
request from any source and includes information about the requested action, the date and time of
the action, request parameters, and so on. CloudTrail log files aren't an ordered stack trace of the
public API calls, so they don't appear in any specific order.
The following example shows a CloudTrail log entry that demonstrates the SynthesizeSpeech.
{
"Records": [
{
"awsRegion": "us-east-2",
"eventID": "19bd70f7-5e60-4cdc-9825-936c552278ae",
Example: Amazon Polly Log File Entries 357
Amazon Polly Developer Guide
"eventName": "SynthesizeSpeech",
"eventSource": "polly.amazonaws.com",
"eventTime": "2016-11-02T03:49:39Z",
"eventType": "AwsApiCall",
"eventVersion": "1.05",
"recipientAccountId": "123456789012",
"requestID": "414288c2-a1af-11e6-b17f-d7cfc06cb461",
"requestParameters": {
"lexiconNames": [
"SampleLexicon"
],
"engine": "neural",
"outputFormat": "mp3",
"sampleRate": "22050",
"text": "**********",
"textType": "text",
"voiceId": "Kendra"
},
"responseElements": null,
"sourceIPAddress": "1.2.3.4",
"userAgent": "Amazon CLI/Polly 1.10 API 2016-06-10",
"userIdentity": {
"accessKeyId": "EXAMPLE_KEY_ID",
"accountId": "123456789012",
"arn": "arn:aws:iam::123456789012:user/Alice",
"principalId": "EX_PRINCIPAL_ID",
"type": "IAMUser",
"userName": "Alice"
}
}
]
}
Example: Amazon Polly Log File Entries 358
Amazon Polly Developer Guide
Integrating CloudWatch with Amazon Polly
When you interact with Amazon Polly, it sends the following metrics and dimensions to
CloudWatch every minute. You can use the following procedures to view the metrics for Amazon
Polly.
You can monitor Amazon Polly using CloudWatch, which collects and processes raw data from
Amazon Polly into readable, near real-time metrics. These statistics are recorded for a period of
two weeks, so that you can access historical information and gain a better perspective on
how your web application or service is performing. By default, Amazon Polly metric data is sent to
CloudWatch in 1 minute intervals. For more information, see What Is Amazon CloudWatch in the
Amazon CloudWatch User Guide.
Getting CloudWatch Metrics (Console)
1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
2. In the navigation pane, choose Metrics.
3. In the CloudWatch Metrics by Category pane, under the metrics category for Amazon Polly,
select a metrics category, and then in the upper pane, scroll down to view the full list of
metrics.
Getting CloudWatch metrics on the AWS CLI
The following code display available metrics for Amazon Polly.
aws cloudwatch list-metrics --namespace "AWS/Polly"
The preceding command returns a list of Amazon Polly metrics similar to the following. The
MetricName element identifies what the metric is.
{
"Metrics": [
{
"Namespace": "AWS/Polly",
"Dimensions": [
{
"Name": "Operation",
Getting CloudWatch Metrics (Console) 359
Amazon Polly Developer Guide
"Value": "SynthesizeSpeech"
}
],
"MetricName": "ResponseLatency"
},
{
"Namespace": "AWS/Polly",
"Dimensions": [
{
"Name": "Operation",
"Value": "SynthesizeSpeech"
}
],
"MetricName": "RequestCharacters"
}
For more information, see GetMetricStatistics in the Amazon CloudWatch API Reference.
Amazon Polly Metrics
Amazon Polly produces the following metrics for each request. These metrics are aggregated and
in one minute intervals sent to CloudWatch where they are available.
Metric Description
RequestCharacters
The number of characters in the request. This is
billable characters only and does not include SSML
tags.
Valid Dimension: Operation
Valid Statistics: Minimum, Maximum, Average,
SampleCount, Sum
Unit: Count
ResponseLatency
The latency between when the request was made
and the start of the streaming response.
Valid Dimensions: Operation
Amazon Polly Metrics 360
Amazon Polly Developer Guide
Metric Description
Valid Statistics: Minimum, Maximum, Average,
SampleCount
Unit: milliseconds
2XXCount
HTTP 200 level code returned upon a successful
response.
Valid Dimensions: Operation
Valid Statistics: Average, SampleCount, Sum
Unit: Count
4XXCount
HTTP 400 level error code returned upon an error.
For each successful response, a zero (0) is emitted.
Valid Dimensions: Operation
Valid Statistics: Average, SampleCount, Sum
Unit: Count
5XXCount
HTTP 500 level error code returned upon an error.
For each successful response, a zero (0) is emitted.
Valid Dimensions: Operation
Valid Statistics: Average, SampleCount, Sum
Unit: Count
Dimensions for Amazon Polly Metrics
Amazon Polly metrics use the AWS/Polly namespace and provide metrics for the following
dimension:
Dimensions for Amazon Polly Metrics 361
Amazon Polly Developer Guide
Dimension Description
Operation
Metrics are grouped by the API method they refer
to. Possible values are SynthesizeSpeech ,
PutLexicon , DescribeVoices , etc.
Dimensions for Amazon Polly Metrics 362
Amazon Polly Developer Guide
Amazon Polly API Reference
This section contains the Amazon Polly API reference.
Note
Authenticated API calls must be signed using the Signature Version 4 Signing Process.
For more information, see Signing AWS API Requests in the Amazon Web Services General
Reference.
Topics
Actions
Data Types
Actions
The following actions are supported:
DeleteLexicon
DescribeVoices
GetLexicon
GetSpeechSynthesisTask
ListLexicons
ListSpeechSynthesisTasks
PutLexicon
StartSpeechSynthesisTask
SynthesizeSpeech
Actions 363
Amazon Polly Developer Guide
DeleteLexicon
Deletes the specified pronunciation lexicon stored in an AWS Region. A lexicon which has been
deleted is not available for speech synthesis, nor is it possible to retrieve it using either the
GetLexicon or ListLexicon APIs.
For more information, see Managing Lexicons.
Request Syntax
DELETE /v1/lexicons/LexiconName HTTP/1.1
URI Request Parameters
The request uses the following URI parameters.
LexiconName
The name of the lexicon to delete. Must be an existing lexicon in the region.
Pattern: [0-9A-Za-z]{1,20}
Required: Yes
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
LexiconNotFoundException
Amazon Polly can't find the specified lexicon. This could be caused by a lexicon that is missing,
its name is misspelled or specifying a lexicon that is in a different region.
DeleteLexicon 364
Amazon Polly Developer Guide
Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is
spelled correctly. Then try again.
HTTP Status Code: 404
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
DeleteLexicon 365
Amazon Polly Developer Guide
DescribeVoices
Returns the list of voices that are available for use when requesting speech synthesis. Each voice
speaks a specified language, is either male or female, and is identified by an ID, which is the ASCII
version of the voice name.
When synthesizing speech ( SynthesizeSpeech ), you provide the voice ID for the voice you want
from the list of voices returned by DescribeVoices.
For example, you want your news reader application to read news in a specific language, but giving
a user the option to choose the voice. Using the DescribeVoices operation you can provide the
user with a list of available voices to select from.
You can optionally specify a language code to filter the available voices. For example, if you specify
en-US, the operation returns a list of all available US English voices.
This operation requires permissions to perform the polly:DescribeVoices action.
Request Syntax
GET /v1/voices?
Engine=Engine&IncludeAdditionalLanguageCodes=IncludeAdditionalLanguageCodes&LanguageCode=LanguageCode&NextToken=NextToken
HTTP/1.1
URI Request Parameters
The request uses the following URI parameters.
Engine
Specifies the engine (standard, neural, long-form or generative) used by Amazon Polly
when processing input text for speech synthesis.
Valid Values: standard | neural | long-form | generative
IncludeAdditionalLanguageCodes
Boolean value indicating whether to return any bilingual voices that use the specified language
as an additional language. For instance, if you request all languages that use US English (es-
US), and there is an Italian voice that speaks both Italian (it-IT) and US English, that voice will be
included if you specify yes but not if you specify no.
DescribeVoices 366
Amazon Polly Developer Guide
LanguageCode
The language identification tag (ISO 639 code for the language name-ISO 3166 country code)
for filtering the list of voices returned. If you don't specify this optional parameter, all available
voices are returned.
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
NextToken
An opaque pagination token returned from the previous DescribeVoices operation. If
present, this indicates where to continue the listing.
Length Constraints: Minimum length of 0. Maximum length of 4096.
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"NextToken": "string",
"Voices": [
{
"AdditionalLanguageCodes": [ "string" ],
"Gender": "string",
"Id": "string",
"LanguageCode": "string",
"LanguageName": "string",
"Name": "string",
"SupportedEngines": [ "string" ]
}
]
DescribeVoices 367
Amazon Polly Developer Guide
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
NextToken
The pagination token to use in the next request to continue the listing of voices. NextToken is
returned only if the response is truncated.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 4096.
Voices
A list of voices with their properties.
Type: Array of Voice objects
Errors
InvalidNextTokenException
The NextToken is invalid. Verify that it's spelled correctly, and then try again.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
DescribeVoices 368
Amazon Polly Developer Guide
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
DescribeVoices 369
Amazon Polly Developer Guide
GetLexicon
Returns the content of the specified pronunciation lexicon stored in an AWS Region. For more
information, see Managing Lexicons.
Request Syntax
GET /v1/lexicons/LexiconName HTTP/1.1
URI Request Parameters
The request uses the following URI parameters.
LexiconName
Name of the lexicon.
Pattern: [0-9A-Za-z]{1,20}
Required: Yes
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"Lexicon": {
"Content": "string",
"Name": "string"
},
"LexiconAttributes": {
"Alphabet": "string",
"LanguageCode": "string",
"LastModified": number,
"LexemesCount": number,
GetLexicon 370
Amazon Polly Developer Guide
"LexiconArn": "string",
"Size": number
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
Lexicon
Lexicon object that provides name and the string content of the lexicon.
Type: Lexicon object
LexiconAttributes
Metadata of the lexicon, including phonetic alphabetic used, language code, lexicon ARN,
number of lexemes defined in the lexicon, and size of lexicon in bytes.
Type: LexiconAttributes object
Errors
LexiconNotFoundException
Amazon Polly can't find the specified lexicon. This could be caused by a lexicon that is missing,
its name is misspelled or specifying a lexicon that is in a different region.
Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is
spelled correctly. Then try again.
HTTP Status Code: 404
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
GetLexicon 371
Amazon Polly Developer Guide
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
GetLexicon 372
Amazon Polly Developer Guide
GetSpeechSynthesisTask
Retrieves a specific SpeechSynthesisTask object based on its TaskID. This object contains
information about the given speech synthesis task, including the status of the task, and a link to
the S3 bucket containing the output of the task.
Request Syntax
GET /v1/synthesisTasks/TaskId HTTP/1.1
URI Request Parameters
The request uses the following URI parameters.
TaskId
The Amazon Polly generated identifier for a speech synthesis task.
Pattern: ^[a-zA-Z0-9_-]{1,100}$
Required: Yes
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"SynthesisTask": {
"CreationTime": number,
"Engine": "string",
"LanguageCode": "string",
"LexiconNames": [ "string" ],
"OutputFormat": "string",
"OutputUri": "string",
"RequestCharacters": number,
GetSpeechSynthesisTask 373
Amazon Polly Developer Guide
"SampleRate": "string",
"SnsTopicArn": "string",
"SpeechMarkTypes": [ "string" ],
"TaskId": "string",
"TaskStatus": "string",
"TaskStatusReason": "string",
"TextType": "string",
"VoiceId": "string"
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
SynthesisTask
SynthesisTask object that provides information from the requested task, including output
format, creation time, task status, and so on.
Type: SynthesisTask object
Errors
InvalidTaskIdException
The provided Task ID is not valid. Please provide a valid Task ID and try again.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
SynthesisTaskNotFoundException
The Speech Synthesis task with requested Task ID cannot be found.
HTTP Status Code: 400
GetSpeechSynthesisTask 374
Amazon Polly Developer Guide
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
GetSpeechSynthesisTask 375
Amazon Polly Developer Guide
ListLexicons
Returns a list of pronunciation lexicons stored in an AWS Region. For more information, see
Managing Lexicons.
Request Syntax
GET /v1/lexicons?NextToken=NextToken HTTP/1.1
URI Request Parameters
The request uses the following URI parameters.
NextToken
An opaque pagination token returned from previous ListLexicons operation. If present,
indicates where to continue the list of lexicons.
Length Constraints: Minimum length of 0. Maximum length of 4096.
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"Lexicons": [
{
"Attributes": {
"Alphabet": "string",
"LanguageCode": "string",
"LastModified": number,
"LexemesCount": number,
"LexiconArn": "string",
"Size": number
},
"Name": "string"
}
ListLexicons 376
Amazon Polly Developer Guide
],
"NextToken": "string"
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
Lexicons
A list of lexicon names and attributes.
Type: Array of LexiconDescription objects
NextToken
The pagination token to use in the next request to continue the listing of lexicons. NextToken
is returned only if the response is truncated.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 4096.
Errors
InvalidNextTokenException
The NextToken is invalid. Verify that it's spelled correctly, and then try again.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
ListLexicons 377
Amazon Polly Developer Guide
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
ListLexicons 378
Amazon Polly Developer Guide
ListSpeechSynthesisTasks
Returns a list of SpeechSynthesisTask objects ordered by their creation date. This operation can
filter the tasks by their status, for example, allowing users to list only tasks that are completed.
Request Syntax
GET /v1/synthesisTasks?MaxResults=MaxResults&NextToken=NextToken&Status=Status HTTP/1.1
URI Request Parameters
The request uses the following URI parameters.
MaxResults
Maximum number of speech synthesis tasks returned in a List operation.
Valid Range: Minimum value of 1. Maximum value of 100.
NextToken
The pagination token to use in the next request to continue the listing of speech synthesis
tasks.
Length Constraints: Minimum length of 0. Maximum length of 4096.
Status
Status of the speech synthesis tasks returned in a List operation
Valid Values: scheduled | inProgress | completed | failed
Request Body
The request does not have a request body.
Response Syntax
HTTP/1.1 200
Content-type: application/json
ListSpeechSynthesisTasks 379
Amazon Polly Developer Guide
{
"NextToken": "string",
"SynthesisTasks": [
{
"CreationTime": number,
"Engine": "string",
"LanguageCode": "string",
"LexiconNames": [ "string" ],
"OutputFormat": "string",
"OutputUri": "string",
"RequestCharacters": number,
"SampleRate": "string",
"SnsTopicArn": "string",
"SpeechMarkTypes": [ "string" ],
"TaskId": "string",
"TaskStatus": "string",
"TaskStatusReason": "string",
"TextType": "string",
"VoiceId": "string"
}
]
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
NextToken
An opaque pagination token returned from the previous List operation in this request. If
present, this indicates where to continue the listing.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 4096.
SynthesisTasks
List of SynthesisTask objects that provides information from the specified task in the list
request, including output format, creation time, task status, and so on.
Type: Array of SynthesisTask objects
ListSpeechSynthesisTasks 380
Amazon Polly Developer Guide
Errors
InvalidNextTokenException
The NextToken is invalid. Verify that it's spelled correctly, and then try again.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
ListSpeechSynthesisTasks 381
Amazon Polly Developer Guide
PutLexicon
Stores a pronunciation lexicon in an AWS Region. If a lexicon with the same name already exists
in the region, it is overwritten by the new lexicon. Lexicon operations have eventual consistency,
therefore, it might take some time before the lexicon is available to the SynthesizeSpeech
operation.
For more information, see Managing Lexicons.
Request Syntax
PUT /v1/lexicons/LexiconName HTTP/1.1
Content-type: application/json
{
"Content": "string"
}
URI Request Parameters
The request uses the following URI parameters.
LexiconName
Name of the lexicon. The name must follow the regular express format [0-9A-Za-z]{1,20}. That
is, the name is a case-sensitive alphanumeric string up to 20 characters long.
Pattern: [0-9A-Za-z]{1,20}
Required: Yes
Request Body
The request accepts the following data in JSON format.
Content
Content of the PLS lexicon as string data.
Type: String
Required: Yes
PutLexicon 382
Amazon Polly Developer Guide
Response Syntax
HTTP/1.1 200
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
InvalidLexiconException
Amazon Polly can't find the specified lexicon. Verify that the lexicon's name is spelled correctly,
and then try again.
HTTP Status Code: 400
LexiconSizeExceededException
The maximum size of the specified lexicon would be exceeded by this operation.
HTTP Status Code: 400
MaxLexemeLengthExceededException
The maximum size of the lexeme would be exceeded by this operation.
HTTP Status Code: 400
MaxLexiconsNumberExceededException
The maximum number of lexicons would be exceeded by this operation.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
UnsupportedPlsAlphabetException
The alphabet specified by the lexicon is not a supported alphabet. Valid values are x-sampa
and ipa.
PutLexicon 383
Amazon Polly Developer Guide
HTTP Status Code: 400
UnsupportedPlsLanguageException
The language specified in the lexicon is unsupported. For a list of supported languages, see
Lexicon Attributes.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
PutLexicon 384
Amazon Polly Developer Guide
StartSpeechSynthesisTask
Allows the creation of an asynchronous synthesis task, by starting a new SpeechSynthesisTask.
This operation requires all the standard information needed for speech synthesis, plus the name
of an Amazon S3 bucket for the service to store the output of the synthesis task and two optional
parameters (OutputS3KeyPrefix and SnsTopicArn). Once the synthesis task is created, this
operation will return a SpeechSynthesisTask object, which will include an identifier of this task
as well as the current status. The SpeechSynthesisTask object is available for 72 hours after
starting the asynchronous synthesis task.
Request Syntax
POST /v1/synthesisTasks HTTP/1.1
Content-type: application/json
{
"Engine": "string",
"LanguageCode": "string",
"LexiconNames": [ "string" ],
"OutputFormat": "string",
"OutputS3BucketName": "string",
"OutputS3KeyPrefix": "string",
"SampleRate": "string",
"SnsTopicArn": "string",
"SpeechMarkTypes": [ "string" ],
"Text": "string",
"TextType": "string",
"VoiceId": "string"
}
URI Request Parameters
The request does not use any URI parameters.
Request Body
The request accepts the following data in JSON format.
StartSpeechSynthesisTask 385
Amazon Polly Developer Guide
Engine
Specifies the engine (standard, neural, long-form or generative) for Amazon Polly to
use when processing input text for speech synthesis. Using a voice that is not supported for the
engine selected will result in an error.
Type: String
Valid Values: standard | neural | long-form | generative
Required: No
LanguageCode
Optional language code for the Speech Synthesis request. This is only necessary if using a
bilingual voice, such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN).
If a bilingual voice is used and no language code is specified, Amazon Polly uses the default
language of the bilingual voice. The default language for any voice is the one returned by the
DescribeVoices operation for the LanguageCode parameter. For example, if no language code
is specified, Aditi will use Indian English rather than Hindi.
Type: String
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
Required: No
LexiconNames
List of one or more pronunciation lexicon names you want the service to apply during synthesis.
Lexicons are applied only if the language of the lexicon is the same as the language of the voice.
Type: Array of strings
Array Members: Maximum number of 5 items.
Pattern: [0-9A-Za-z]{1,20}
StartSpeechSynthesisTask 386
Amazon Polly Developer Guide
Required: No
OutputFormat
The format in which the returned output will be encoded. For audio stream, this will be mp3,
ogg_vorbis, or pcm. For speech marks, this will be json.
Type: String
Valid Values: json | mp3 | ogg_vorbis | pcm
Required: Yes
OutputS3BucketName
Amazon S3 bucket name to which the output file will be saved.
Type: String
Pattern: ^[a-z0-9][\.\-a-z0-9]{1,61}[a-z0-9]$
Required: Yes
OutputS3KeyPrefix
The Amazon S3 key prefix for the output speech file.
Type: String
Pattern: ^[0-9a-zA-Z\/\!\-_\.\*\'\(\):;\$@=+\,\?&]{0,800}$
Required: No
SampleRate
The audio frequency specified in Hz.
The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The
default value for standard voices is "22050". The default value for neural voices is "24000". The
default value for long-form voices is "24000". The default value for generative voices is "24000".
Valid values for pcm are "8000" and "16000" The default value is "16000".
Type: String
StartSpeechSynthesisTask 387
Amazon Polly Developer Guide
Required: No
SnsTopicArn
ARN for the SNS topic optionally used for providing status notification for a speech synthesis
task.
Type: String
Pattern: ^arn:aws(-(cn|iso(-b)?|us-gov))?:sns:[a-z0-9_-]{1,50}:\d{12}:[a-
zA-Z0-9_-]{1,251}([a-zA-Z0-9_-]{0,5}|\.fifo)$
Required: No
SpeechMarkTypes
The type of speech marks returned for the input text.
Type: Array of strings
Array Members: Maximum number of 4 items.
Valid Values: sentence | ssml | viseme | word
Required: No
Text
The input text to synthesize. If you specify ssml as the TextType, follow the SSML format for the
input text.
Type: String
Required: Yes
TextType
Specifies whether the input text is plain text or SSML. The default value is plain text.
Type: String
Valid Values: ssml | text
Required: No
StartSpeechSynthesisTask 388
Amazon Polly Developer Guide
VoiceId
Voice ID to use for the synthesis.
Type: String
Valid Values: Aditi | Amy | Astrid | Bianca | Brian | Camila | Carla |
Carmen | Celine | Chantal | Conchita | Cristiano | Dora | Emma | Enrique
| Ewa | Filiz | Gabrielle | Geraint | Giorgio | Gwyneth | Hans | Ines
| Ivy | Jacek | Jan | Joanna | Joey | Justin | Karl | Kendra | Kevin
| Kimberly | Lea | Liv | Lotte | Lucia | Lupe | Mads | Maja | Marlene
| Mathieu | Matthew | Maxim | Mia | Miguel | Mizuki | Naja | Nicole
| Olivia | Penelope | Raveena | Ricardo | Ruben | Russell | Salli |
Seoyeon | Takumi | Tatyana | Vicki | Vitoria | Zeina | Zhiyu | Aria
| Ayanda | Arlet | Hannah | Arthur | Daniel | Liam | Pedro | Kajal |
Hiujin | Laura | Elin | Ida | Suvi | Ola | Hala | Andres | Sergio | Remi
| Adriano | Thiago | Ruth | Stephen | Kazuha | Tomoko | Niamh | Sofie |
Lisa | Isabelle | Zayd | Danielle | Gregory | Burcu
Required: Yes
Response Syntax
HTTP/1.1 200
Content-type: application/json
{
"SynthesisTask": {
"CreationTime": number,
"Engine": "string",
"LanguageCode": "string",
"LexiconNames": [ "string" ],
"OutputFormat": "string",
"OutputUri": "string",
"RequestCharacters": number,
"SampleRate": "string",
"SnsTopicArn": "string",
"SpeechMarkTypes": [ "string" ],
"TaskId": "string",
"TaskStatus": "string",
"TaskStatusReason": "string",
StartSpeechSynthesisTask 389
Amazon Polly Developer Guide
"TextType": "string",
"VoiceId": "string"
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
SynthesisTask
SynthesisTask object that provides information and attributes about a newly submitted speech
synthesis task.
Type: SynthesisTask object
Errors
EngineNotSupportedException
This engine is not compatible with the voice that you have designated. Choose a new voice that
is compatible with the engine or change the engine and restart the operation.
HTTP Status Code: 400
InvalidS3BucketException
The provided Amazon S3 bucket name is invalid. Please check your input with S3 bucket naming
requirements and try again.
HTTP Status Code: 400
InvalidS3KeyException
The provided Amazon S3 key prefix is invalid. Please provide a valid S3 object key name.
HTTP Status Code: 400
InvalidSampleRateException
The specified sample rate is not valid.
StartSpeechSynthesisTask 390
Amazon Polly Developer Guide
HTTP Status Code: 400
InvalidSnsTopicArnException
The provided SNS topic ARN is invalid. Please provide a valid SNS topic ARN and try again.
HTTP Status Code: 400
InvalidSsmlException
The SSML you provided is invalid. Verify the SSML syntax, spelling of tags and values, and then
try again.
HTTP Status Code: 400
LanguageNotSupportedException
The language specified is not currently supported by Amazon Polly in this capacity.
HTTP Status Code: 400
LexiconNotFoundException
Amazon Polly can't find the specified lexicon. This could be caused by a lexicon that is missing,
its name is misspelled or specifying a lexicon that is in a different region.
Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is
spelled correctly. Then try again.
HTTP Status Code: 404
MarksNotSupportedForFormatException
Speech marks are not supported for the OutputFormat selected. Speech marks are only
available for content in json format.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
SsmlMarksNotSupportedForTextTypeException
SSML speech marks are not supported for plain text-type input.
StartSpeechSynthesisTask 391
Amazon Polly Developer Guide
HTTP Status Code: 400
TextLengthExceededException
The value of the "Text" parameter is longer than the accepted limits. For the
SynthesizeSpeech API, the limit for input text is a maximum of 6000 characters total, of
which no more than 3000 can be billed characters. For the StartSpeechSynthesisTask API,
the maximum is 200,000 characters, of which no more than 100,000 can be billed characters.
SSML tags are not counted as billed characters.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
StartSpeechSynthesisTask 392
Amazon Polly Developer Guide
SynthesizeSpeech
Synthesizes UTF-8 input, plain text or SSML, to a stream of bytes. SSML input must be valid, well-
formed SSML. Some alphabets might not be available with all the voices (for example, Cyrillic
might not be read at all by English voices) unless phoneme mapping is used. For more information,
see How it Works.
Request Syntax
POST /v1/speech HTTP/1.1
Content-type: application/json
{
"Engine": "string",
"LanguageCode": "string",
"LexiconNames": [ "string" ],
"OutputFormat": "string",
"SampleRate": "string",
"SpeechMarkTypes": [ "string" ],
"Text": "string",
"TextType": "string",
"VoiceId": "string"
}
URI Request Parameters
The request does not use any URI parameters.
Request Body
The request accepts the following data in JSON format.
Engine
Specifies the engine (standard, neural, long-form, or generative) for Amazon Polly to
use when processing input text for speech synthesis. Provide an engine that is supported by the
voice you select. If you don't provide an engine, the standard engine is selected by default. If a
chosen voice isn't supported by the standard engine, this will result in an error. For information
on Amazon Polly voices and which voices are available for each engine, see Available Voices.
Type: String
SynthesizeSpeech 393
Amazon Polly Developer Guide
Valid Values: standard | neural | long-form | generative
Required: Yes
Type: String
Valid Values: standard | neural | long-form | generative
Required: No
LanguageCode
Optional language code for the Synthesize Speech request. This is only necessary if using a
bilingual voice, such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN).
If a bilingual voice is used and no language code is specified, Amazon Polly uses the default
language of the bilingual voice. The default language for any voice is the one returned by the
DescribeVoices operation for the LanguageCode parameter. For example, if no language code
is specified, Aditi will use Indian English rather than Hindi.
Type: String
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
Required: No
LexiconNames
List of one or more pronunciation lexicon names you want the service to apply during synthesis.
Lexicons are applied only if the language of the lexicon is the same as the language of the voice.
For information about storing lexicons, see PutLexicon.
Type: Array of strings
Array Members: Maximum number of 5 items.
Pattern: [0-9A-Za-z]{1,20}
Required: No
SynthesizeSpeech 394
Amazon Polly Developer Guide
OutputFormat
The format in which the returned output will be encoded. For audio stream, this will be mp3,
ogg_vorbis, or pcm. For speech marks, this will be json.
When pcm is used, the content returned is audio/pcm in a signed 16-bit, 1 channel (mono),
little-endian format.
Type: String
Valid Values: json | mp3 | ogg_vorbis | pcm
Required: Yes
SampleRate
The audio frequency specified in Hz.
The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The
default value for standard voices is "22050". The default value for neural voices is "24000". The
default value for long-form voices is "24000". The default value for generative voices is "24000".
Valid values for pcm are "8000" and "16000" The default value is "16000".
Type: String
Required: No
SpeechMarkTypes
The type of speech marks returned for the input text.
Type: Array of strings
Array Members: Maximum number of 4 items.
Valid Values: sentence | ssml | viseme | word
Required: No
Text
Input text to synthesize. If you specify ssml as the TextType, follow the SSML format for the
input text.
SynthesizeSpeech 395
Amazon Polly Developer Guide
Type: String
Required: Yes
TextType
Specifies whether the input text is plain text or SSML. The default value is plain text. For more
information, see Using SSML.
Type: String
Valid Values: ssml | text
Required: No
VoiceId
Voice ID to use for the synthesis. You can get a list of available voice IDs by calling the
DescribeVoices operation.
Type: String
Valid Values: Aditi | Amy | Astrid | Bianca | Brian | Camila | Carla |
Carmen | Celine | Chantal | Conchita | Cristiano | Dora | Emma | Enrique
| Ewa | Filiz | Gabrielle | Geraint | Giorgio | Gwyneth | Hans | Ines
| Ivy | Jacek | Jan | Joanna | Joey | Justin | Karl | Kendra | Kevin
| Kimberly | Lea | Liv | Lotte | Lucia | Lupe | Mads | Maja | Marlene
| Mathieu | Matthew | Maxim | Mia | Miguel | Mizuki | Naja | Nicole
| Olivia | Penelope | Raveena | Ricardo | Ruben | Russell | Salli |
Seoyeon | Takumi | Tatyana | Vicki | Vitoria | Zeina | Zhiyu | Aria
| Ayanda | Arlet | Hannah | Arthur | Daniel | Liam | Pedro | Kajal |
Hiujin | Laura | Elin | Ida | Suvi | Ola | Hala | Andres | Sergio | Remi
| Adriano | Thiago | Ruth | Stephen | Kazuha | Tomoko | Niamh | Sofie |
Lisa | Isabelle | Zayd | Danielle | Gregory | Burcu
Required: Yes
Response Syntax
HTTP/1.1 200
Content-Type: ContentType
SynthesizeSpeech 396
Amazon Polly Developer Guide
x-amzn-RequestCharacters: RequestCharacters
AudioStream
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The response returns the following HTTP headers.
ContentType
Specifies the type audio stream. This should reflect the OutputFormat parameter in your
request.
If you request mp3 as the OutputFormat, the ContentType returned is audio/mpeg.
If you request ogg_vorbis as the OutputFormat, the ContentType returned is audio/ogg.
If you request pcm as the OutputFormat, the ContentType returned is audio/pcm in a
signed 16-bit, 1 channel (mono), little-endian format.
If you request json as the OutputFormat, the ContentType returned is application/x-json-
stream.
RequestCharacters
Number of characters synthesized.
The response returns the following as the HTTP body.
AudioStream
Stream containing the synthesized speech.
Errors
EngineNotSupportedException
This engine is not compatible with the voice that you have designated. Choose a new voice that
is compatible with the engine or change the engine and restart the operation.
HTTP Status Code: 400
SynthesizeSpeech 397
Amazon Polly Developer Guide
InvalidSampleRateException
The specified sample rate is not valid.
HTTP Status Code: 400
InvalidSsmlException
The SSML you provided is invalid. Verify the SSML syntax, spelling of tags and values, and then
try again.
HTTP Status Code: 400
LanguageNotSupportedException
The language specified is not currently supported by Amazon Polly in this capacity.
HTTP Status Code: 400
LexiconNotFoundException
Amazon Polly can't find the specified lexicon. This could be caused by a lexicon that is missing,
its name is misspelled or specifying a lexicon that is in a different region.
Verify that the lexicon exists, is in the region (see ListLexicons) and that you spelled its name is
spelled correctly. Then try again.
HTTP Status Code: 404
MarksNotSupportedForFormatException
Speech marks are not supported for the OutputFormat selected. Speech marks are only
available for content in json format.
HTTP Status Code: 400
ServiceFailureException
An unknown condition has caused a service failure.
HTTP Status Code: 500
SsmlMarksNotSupportedForTextTypeException
SSML speech marks are not supported for plain text-type input.
HTTP Status Code: 400
SynthesizeSpeech 398
Amazon Polly Developer Guide
TextLengthExceededException
The value of the "Text" parameter is longer than the accepted limits. For the
SynthesizeSpeech API, the limit for input text is a maximum of 6000 characters total, of
which no more than 3000 can be billed characters. For the StartSpeechSynthesisTask API,
the maximum is 200,000 characters, of which no more than 100,000 can be billed characters.
SSML tags are not counted as billed characters.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS Command Line Interface
AWS SDK for .NET
AWS SDK for C++
AWS SDK for Go v2
AWS SDK for Java V2
AWS SDK for JavaScript V3
AWS SDK for PHP V3
AWS SDK for Python
AWS SDK for Ruby V3
Data Types
The following data types are supported:
Lexicon
LexiconAttributes
LexiconDescription
SynthesisTask
Voice
Data Types 399
Amazon Polly Developer Guide
Lexicon
Provides lexicon name and lexicon content in string format. For more information, see
Pronunciation Lexicon Specification (PLS) Version 1.0.
Contents
Content
Lexicon content in string format. The content of a lexicon must be in PLS format.
Type: String
Required: No
Name
Name of the lexicon.
Type: String
Pattern: [0-9A-Za-z]{1,20}
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS SDK for C++
AWS SDK for Java V2
AWS SDK for Ruby V3
Lexicon 400
Amazon Polly Developer Guide
LexiconAttributes
Contains metadata describing the lexicon such as the number of lexemes, language code, and so
on. For more information, see Managing Lexicons.
Contents
Alphabet
Phonetic alphabet used in the lexicon. Valid values are ipa and x-sampa.
Type: String
Required: No
LanguageCode
Language code that the lexicon applies to. A lexicon with a language code such as "en" would be
applied to all English languages (en-GB, en-US, en-AUS, en-WLS, and so on.
Type: String
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
Required: No
LastModified
Date lexicon was last modified (a timestamp value).
Type: Timestamp
Required: No
LexemesCount
Number of lexemes in the lexicon.
Type: Integer
LexiconAttributes 401
Amazon Polly Developer Guide
Required: No
LexiconArn
Amazon Resource Name (ARN) of the lexicon.
Type: String
Required: No
Size
Total size of the lexicon, in characters.
Type: Integer
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS SDK for C++
AWS SDK for Java V2
AWS SDK for Ruby V3
LexiconAttributes 402
Amazon Polly Developer Guide
LexiconDescription
Describes the content of the lexicon.
Contents
Attributes
Provides lexicon metadata.
Type: LexiconAttributes object
Required: No
Name
Name of the lexicon.
Type: String
Pattern: [0-9A-Za-z]{1,20}
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS SDK for C++
AWS SDK for Java V2
AWS SDK for Ruby V3
LexiconDescription 403
Amazon Polly Developer Guide
SynthesisTask
SynthesisTask object that provides information about a speech synthesis task.
Contents
CreationTime
Timestamp for the time the synthesis task was started.
Type: Timestamp
Required: No
Engine
Specifies the engine (standard, neural, long-form or generative) for Amazon Polly to
use when processing input text for speech synthesis. Using a voice that is not supported for the
engine selected will result in an error.
Type: String
Valid Values: standard | neural | long-form | generative
Required: No
LanguageCode
Optional language code for a synthesis task. This is only necessary if using a bilingual voice,
such as Aditi, which can be used for either Indian English (en-IN) or Hindi (hi-IN).
If a bilingual voice is used and no language code is specified, Amazon Polly uses the default
language of the bilingual voice. The default language for any voice is the one returned by the
DescribeVoices operation for the LanguageCode parameter. For example, if no language code
is specified, Aditi will use Indian English rather than Hindi.
Type: String
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
SynthesisTask 404
Amazon Polly Developer Guide
Required: No
LexiconNames
List of one or more pronunciation lexicon names you want the service to apply during synthesis.
Lexicons are applied only if the language of the lexicon is the same as the language of the voice.
Type: Array of strings
Array Members: Maximum number of 5 items.
Pattern: [0-9A-Za-z]{1,20}
Required: No
OutputFormat
The format in which the returned output will be encoded. For audio stream, this will be mp3,
ogg_vorbis, or pcm. For speech marks, this will be json.
Type: String
Valid Values: json | mp3 | ogg_vorbis | pcm
Required: No
OutputUri
Pathway for the output speech file.
Type: String
Required: No
RequestCharacters
Number of billable characters synthesized.
Type: Integer
Required: No
SampleRate
The audio frequency specified in Hz.
SynthesisTask 405
Amazon Polly Developer Guide
The valid values for mp3 and ogg_vorbis are "8000", "16000", "22050", and "24000". The
default value for standard voices is "22050". The default value for neural voices is "24000". The
default value for long-form voices is "24000". The default value for generative voices is "24000".
Valid values for pcm are "8000" and "16000" The default value is "16000".
Type: String
Required: No
SnsTopicArn
ARN for the SNS topic optionally used for providing status notification for a speech synthesis
task.
Type: String
Pattern: ^arn:aws(-(cn|iso(-b)?|us-gov))?:sns:[a-z0-9_-]{1,50}:\d{12}:[a-
zA-Z0-9_-]{1,251}([a-zA-Z0-9_-]{0,5}|\.fifo)$
Required: No
SpeechMarkTypes
The type of speech marks returned for the input text.
Type: Array of strings
Array Members: Maximum number of 4 items.
Valid Values: sentence | ssml | viseme | word
Required: No
TaskId
The Amazon Polly generated identifier for a speech synthesis task.
Type: String
Pattern: ^[a-zA-Z0-9_-]{1,100}$
Required: No
TaskStatus
Current status of the individual speech synthesis task.
SynthesisTask 406
Amazon Polly Developer Guide
Type: String
Valid Values: scheduled | inProgress | completed | failed
Required: No
TaskStatusReason
Reason for the current status of a specific speech synthesis task, including errors if the task has
failed.
Type: String
Required: No
TextType
Specifies whether the input text is plain text or SSML. The default value is plain text.
Type: String
Valid Values: ssml | text
Required: No
VoiceId
Voice ID to use for the synthesis.
Type: String
Valid Values: Aditi | Amy | Astrid | Bianca | Brian | Camila | Carla |
Carmen | Celine | Chantal | Conchita | Cristiano | Dora | Emma | Enrique
| Ewa | Filiz | Gabrielle | Geraint | Giorgio | Gwyneth | Hans | Ines
| Ivy | Jacek | Jan | Joanna | Joey | Justin | Karl | Kendra | Kevin
| Kimberly | Lea | Liv | Lotte | Lucia | Lupe | Mads | Maja | Marlene
| Mathieu | Matthew | Maxim | Mia | Miguel | Mizuki | Naja | Nicole
| Olivia | Penelope | Raveena | Ricardo | Ruben | Russell | Salli |
Seoyeon | Takumi | Tatyana | Vicki | Vitoria | Zeina | Zhiyu | Aria
| Ayanda | Arlet | Hannah | Arthur | Daniel | Liam | Pedro | Kajal |
Hiujin | Laura | Elin | Ida | Suvi | Ola | Hala | Andres | Sergio | Remi
| Adriano | Thiago | Ruth | Stephen | Kazuha | Tomoko | Niamh | Sofie |
Lisa | Isabelle | Zayd | Danielle | Gregory | Burcu
SynthesisTask 407
Amazon Polly Developer Guide
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS SDK for C++
AWS SDK for Java V2
AWS SDK for Ruby V3
SynthesisTask 408
Amazon Polly Developer Guide
Voice
Description of the voice.
Contents
AdditionalLanguageCodes
Additional codes for languages available for the specified voice in addition to its default
language.
For example, the default language for Aditi is Indian English (en-IN) because it was first used
for that language. Since Aditi is bilingual and fluent in both Indian English and Hindi, this
parameter would show the code hi-IN.
Type: Array of strings
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
Required: No
Gender
Gender of the voice.
Type: String
Valid Values: Female | Male
Required: No
Id
Amazon Polly assigned voice ID. This is the ID that you specify when calling the
SynthesizeSpeech operation.
Type: String
Valid Values: Aditi | Amy | Astrid | Bianca | Brian | Camila | Carla |
Carmen | Celine | Chantal | Conchita | Cristiano | Dora | Emma | Enrique
Voice 409
Amazon Polly Developer Guide
| Ewa | Filiz | Gabrielle | Geraint | Giorgio | Gwyneth | Hans | Ines
| Ivy | Jacek | Jan | Joanna | Joey | Justin | Karl | Kendra | Kevin
| Kimberly | Lea | Liv | Lotte | Lucia | Lupe | Mads | Maja | Marlene
| Mathieu | Matthew | Maxim | Mia | Miguel | Mizuki | Naja | Nicole
| Olivia | Penelope | Raveena | Ricardo | Ruben | Russell | Salli |
Seoyeon | Takumi | Tatyana | Vicki | Vitoria | Zeina | Zhiyu | Aria
| Ayanda | Arlet | Hannah | Arthur | Daniel | Liam | Pedro | Kajal |
Hiujin | Laura | Elin | Ida | Suvi | Ola | Hala | Andres | Sergio | Remi
| Adriano | Thiago | Ruth | Stephen | Kazuha | Tomoko | Niamh | Sofie |
Lisa | Isabelle | Zayd | Danielle | Gregory | Burcu
Required: No
LanguageCode
Language code of the voice.
Type: String
Valid Values: arb | cmn-CN | cy-GB | da-DK | de-DE | en-AU | en-GB | en-GB-
WLS | en-IN | en-US | es-ES | es-MX | es-US | fr-CA | fr-FR | is-IS |
it-IT | ja-JP | hi-IN | ko-KR | nb-NO | nl-NL | pl-PL | pt-BR | pt-PT |
ro-RO | ru-RU | sv-SE | tr-TR | en-NZ | en-ZA | ca-ES | de-AT | yue-CN |
ar-AE | fi-FI | en-IE | nl-BE | fr-BE
Required: No
LanguageName
Human readable name of the language in English.
Type: String
Required: No
Name
Name of the voice (for example, Salli, Kendra, etc.). This provides a human readable voice name
that you might display in your application.
Type: String
Required: No
Voice 410
Amazon Polly Developer Guide
SupportedEngines
Specifies which engines (standard, neural, long-form or generative) are supported by a
given voice.
Type: Array of strings
Valid Values: standard | neural | long-form | generative
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the
following:
AWS SDK for C++
AWS SDK for Java V2
AWS SDK for Ruby V3
Voice 411
Amazon Polly Developer Guide
Document History for Amazon Polly
The following table describes important changes in each release of the Amazon Polly Developer
Guide. For notification about updates to this documentation, you can subscribe to an RSS feed.
Latest documentation update: August 27, 2024
Change Description Date
New voices added for NTTS Amazon Polly now provides
two new NTTS voices: Jitka
and Sabrina. See Neural
voices for a list of NTTS
voices.
August 27, 2024
New generative voice engine
added
Amazon Polly now offers
a generative voice engine
designed for longer content,
with three English voices in
a generative variant: Amy,
Matthew, and Ruth. See
Generative voices for more
information.
March 28, 2024
New voice added for NTTS Amazon Polly now provides
the NTTS Turkish voice Burcu.
See Neural voices for a list of
NTTS voices.
February 14, 2024
New long-form voice engine
added
Amazon Polly now offers
a long-form voice engine
designed for longer content,
with three en-US voices:
Danielle, Gregory, and Ruth.
See Long-form voices for
more information.
November 16, 2023
412
Amazon Polly Developer Guide
New voices added for NTTS Amazon Polly now provides
two new NTTS US English
voices: Danielle and Gregory.
See Neural voices for a list of
NTTS voices.
October 5, 2023
Amazon Polly for Windows The Amazon Polly Windows
Speech Application
Programming Interface (SAPI)
plugin will no longer be
supported.
September 26, 2023
Updated quota guidance for
Amazon Polly
Updated Amazon Polly quotas
guide. Added examples and
clarification of terms. Refer to
Quotas in Amazon Polly for
the updates.
August 17, 2023
New voice added for NTTS Amazon Polly now provides
the Gulf Arabic NTTS voice
Zayd. See Neural voices for a
list of NTTS voices.
August 16, 2023
New voice added for NTTS Amazon Polly now provides
the Belgian French NTTS voice
Isabelle. See Neural voices for
a list of NTTS voices.
August 1, 2023
New voice added for NTTS Amazon Polly now provides
the Belgian Dutch (Flemish)
NTTS voice Lisa. See Neural
voices for a list of NTTS
voices.
June 7, 2023
413
Amazon Polly Developer Guide
New voices added for NTTS Amazon Polly now provides
two new NTTS voices: Irish
English (Niamh), and Danish
(Sofie). See Neural voices for a
list of NTTS voices.
May 30, 2023
Updated the IAM guidance for
Amazon Polly
Updated guide to align
with the IAM best practices
. For more information, see
Security best practices in IAM.
April 19, 2023
WordPress update The Amazon Polly WordPress
plugin will no longer be
supported.
April 6, 2023
New Region added Amazon Polly is now available
in the Asia Pacific (Osaka)
AWS Region. This Region
supports neural TTS (NTTS).
For more information, see
Feature and Region Compatibi
lity for a list of regions that
support NTTS.
April 5, 2023
New voices added for NTTS Amazon Polly now provides
two new Japanese NTTS
voices: Kazuha and Tomoko.
See Neural voices for a list of
NTTS voices.
February 7, 2023
New voices added for NTTS Amazon Polly now provides
two new US English NTTS
voices: Stephen and Ruth.
See Neural voices for a list of
NTTS voices.
January 31, 2023
414
Amazon Polly Developer Guide
New voices added for NTTS Amazon Polly now provides
new NTTS voices for: Brazilian
Portuguese (Thiago), Castilian
Spanish (Sergio), French
(Rémi), Italian (Adriano), and
Mexican Spanish (Andrés).
See Neural voices for a list of
NTTS voices.
January 24, 2023
New voices added for NTTS Amazon Polly now provides
NTTS voices for Arabic (Hala)
and Polish (Ola). See Neural
voices for a list of NTTS
voices.
November 17, 2022
Release AWS PrivateLink
support
Amazon Polly now provides
AWS PrivateLink support. See
Using Amazon Polly with VPC
endpoints to learn more.
November 9, 2022
New voices and languages
added for NTTS
Amazon Polly now provides
NTTS voices for Finnish (Suvi),
Norwegian (Ida), and Swedish
(Elin). See Neural voices for a
list of NTTS voices.
November 8, 2022
New voice added for NTTS Amazon Polly now provides
the Dutch NTTS voice Laura.
See Neural voices for a list of
NTTS voices.
November 2, 2022
415
Amazon Polly Developer Guide
New Region added Amazon Polly is now available
in the Europe (Paris) AWS
Region. This Region supports
neural TTS (NTTS). For more
information, see Feature and
Region Compatibility for a list
of regions that support NTTS.
September 22, 2022
New voice and language
added for NTTS
Amazon Polly now provides
the Cantonese NTTS voice
Hiujin. See Neural voices for a
list of NTTS voices.
September 20, 2022
New Region added Amazon Polly is now available
in the Asia Pacific (Mumbai)
AWS Region. This Region
supports neural TTS (NTTS).
For more information, see
Feature and Region Compatibi
lity for a list of regions that
support NTTS.
September 1, 2022
New voice added for NTTS Amazon Polly now provides
the Mandarin voice Zhiyu as
an NTTS voice. See Neural
voices for a list of NTTS
voices.
August 23, 2022
New voice added for NTTS Amazon Polly now provides
the Hindi NTTS voice Kajal.
See Neural voices for a list of
NTTS voices.
July 27, 2022
416
Amazon Polly Developer Guide
New voices added for NTTS Amazon Polly now provides
NTTS voices for US Spanish
(Pedro), German (Daniel),
Canadian French (Liam), and
UK English (Arthur). See
Neural voices for a list of
NTTS voices.
June 28, 2022
New voice added for NTTS Amazon Polly now provides
the Portuguese (Brazilian)
voice Vitória as an NTTS voice.
See Neural voices for a list of
NTTS voices.
April 27, 2022
New voice added for NTTS Amazon Polly now provides
the Portuguese (European)
voice Inês as an NTTS voice.
See Neural voices for a list of
NTTS voices.
April 26, 2022
New voice and language
added for NTTS
Amazon Polly now provides
the German (Austrian)
language and the NTTS voice
Hannah. See Neural voices for
a list of NTTS voices.
April 19, 2022
New voices and language
added for NTTS
Amazon Polly now provides
the Spanish (Mexican) voice
Mia as an NTTS voice. A new
language, Catalan, was added
along with the NTTS voice
Arlet. See Neural voices for a
list of NTTS voices.
March 22, 2022
417
Amazon Polly Developer Guide
New voice added for NTTS Amazon Polly now provides
the Japanese voice Takumi
as an NTTS voice. See Neural
voices for a list of NTTS
voices.
December 6, 2021
New voice added for NTTS Amazon Polly now provides
the French voice Léa as an
NTTS voice. See Neural voices
for a list of NTTS voices.
November 18, 2021
New voices added for NTTS Amazon Polly now provides
the Italian voice Bianca and
the European Spanish voice
Lucia as NTTS voices. See
Neural voices for a list of
NTTS voices.
November 8, 2021
New voice added for NTTS Amazon Polly now provides
a new South African English
voice, Ayanda. The voice is
available as an NTTS voice
only. See Neural voices for a
list of NTTS voices.
September 1, 2021
New Region added Amazon Polly is now available
in the Africa (Cape Town) AWS
Region. This Region supports
neural TTS (NTTS). For more
information, see Feature and
Region Compatibility for a list
of regions that support NTTS.
September 1, 2021
418
Amazon Polly Developer Guide
New language and voice
added
Amazon Polly now supports
New Zealand English (en-
NZ). A new NTTS voice, Aria,
speaks New Zealand English
and a selection of Maori
words.
August 24, 2021
New feature Amazon Polly makes the
conversational speaking
style the default version
for the neural Matthew and
Joanna voices. We removed
references to the conversat
ional speaking style.
June 28, 2021
New voice added for NTTS Amazon Polly now provides
the German voice Vicki as an
NTTS voice.
June 15, 2021
New voice added A new female voice, Gabrielle,
has been added to the French
(Canadian) (fr-CA) locale. The
voice is high quality and only
available as an NTTS voice.
Like all neural voices, it is only
available in certain regions.
For a list of regions, see
Feature and region compatibi
lity.
June 1, 2021
New voice added for NTTS Amazon Polly now provides
the Korean voice Seoyeon as
an NTTS voice.
May 11, 2021
419
Amazon Polly Developer Guide
New Region added for NTTS Amazon Polly now supports
neural TTS (NTTS) in the
Canada (Central) AWS Region.
For more information, see
Feature and Region Compatibi
lity for NTTS.
March 17, 2021
New voice available for
newscaster style
In addition to the Matthew,
Joanna, and Lupe voices for
the Newscaster speaking
style, Amazon Polly now
provides an additional option
for this speaking style. Using
the neural engine, you can
use the Amy voice in British
English for the Newscaster
style. For more information,
see NTTS Speaking Styles.
November 10, 2020
New Regions added for NTTS In addition to the existing
Regions for NTTS (us-east-1,
us-west-2, eu-west-1, and ap-
southeast-2), neural voices
are now supported in four
additional Regions: (ap-north
east-1 (Tokyo), ap-southe
ast-1 (Singapore), eu-centra
l-1 (Frankfurt), and eu-west-2
(London). For more informati
on, see Feature and Region
Compatibility for NTTS.
September 3, 2020
420
Amazon Polly Developer Guide
New voice added In addition to child voices
Ivy and Justin, a new male
child voice, Kevin, has been
added to American English
(en-US). This new voice is
very high quality and is only
available as an NTTS voice.
Like all neural voices, it is only
supported in four Regions: us-
east-1 (N. Virginia), us-west-2
(Oregon), eu-west-1 (Ireland),
and ap-southeast-2 (Sydney).
For more information, see
NTTS Voices.
June 16, 2020
New voice available for
newscaster style
In addition to the Matthew
and Joanna voices for the
Newscaster speaking style,
Amazon Polly now provides
an additional option for this
speaking style. Using the
neural engine, you can use
the Lupe voice in Spanish
(American) for the Newscaste
r style. For more information,
see NTTS Speaking Styles.
April 16, 2020
421
Amazon Polly Developer Guide
New feature In addition to the Newscaste
r speaking style, Amazon
Polly now provides a second
NTTS speaking style to help
you synthesize even better
text to speech passages.
The Conversational style
uses the neural system to
generate speech in a more
friendly and expressive
conversational style that can
be used in many use cases.
For more information, see
NTTS Speaking Styles.
November 25, 2019
New voices added Two new voices added: Camila
(female, Portuguese-Brazil)
and Lupe (female, Spanish-U
S).
October 23, 2019
New feature added Addition of Amazon Polly for
Windows plugin to incorpora
te the full range of Amazon
Polly voices into Windows
SAPI-compliant applications.
September 26, 2019
422
Amazon Polly Developer Guide
Major new feature In addition to the standard
text-to-speech (TTS) voices
supported by Amazon Polly
since its launch, Amazon Polly
now provides an improved
Neural TTS (NTTS) system
that can provide even higher
quality voices, thereby
providing you with the most
natural and human-like text-
to-speech voices possible. For
more information, see Neural
Text-to-Speech.
July 30, 2019
New voices added New voices added: Lucia
(female, Spanish), and Bianca
(female, Italian).
August 2, 2018
New language added New language added:
Mexican Spanish (es-MX). This
language uses the female
voice of Mia.
August 2, 2018
New language added New language added: Hindi
(hi-IN). This voice uses the
female voice of Aditi, which is
also used for Indian English,
making Aditi Amazon Polly's
first bilingual voice.
August 2, 2018
New feature added Addition of Speech synthesis
of long text passages (up to
100,000 billed characters).
July 17, 2018
New SSML feature added Addition of Maximum
Duration for Synthesized
Speech.
July 17, 2018
423
Amazon Polly Developer Guide
New voice added New voice added: Léa (female,
French).
June 5, 2018
Region expansion Expansion of Amazon Polly
service to all commercial
regions.
June 4, 2018
New language added New language added: Korean
(ko-KR).
June 4, 2018
Expanded feature The Amazon Polly WordPress
Plugin feature, including
addition of Amazon Translate
capabilities.
June 4, 2018
New voices added Two new voices added: Aditi
(female, Indian English) and
Seoyeon (female, Korean).
November 15, 2017
New feature Addition of new Speech
Marks feature, as well as an
expansion of SSML capabilit
ies..
April 19, 2017
New guide This is the first release of
the Amazon Polly Developer
Guide.
November 30, 2016
424
Amazon Polly Developer Guide
AWS Glossary
For the latest AWS terminology, see the AWS glossary in the AWS Glossary Reference.
425