Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis

Slava Shechtman1, Raul Fernandez2 and David Haws2
1IBM Haifa Research Lab, Haifa Israel
2IBM TJ Watson Research Lab, Yorktown Heights, NY USA

Accepted to SLT 2021 (Full paper)

Audio Samples


Transplant condition : NO labeled word-emphasis training data available for the target synthesis voice
Base (NoEmph) Base (Sup) PC-Unsup PC-Hybrid Emphasized word
WALKED
COMPLEX
ONLY
AGAIN
MILDER
LITERALLY
FIVE
THOUGHT
ONLY
PROMPTLY
EXIT
BEGINNER'S
PATTERN
ANYONE
MUCH


Matched condition : WITH labeled word-emphasis training data available for target synthesis voice
Base (NoEmph) Base (Sup) PC-Unsup PC-Hybrid Emphasized word
WALKED
COMPLEX
ONLY
AGAIN
MILDER
LITERALLY
FIVE
THOUGHT
ONLY
PROMPTLY
EXIT
BEGINNER'S
PATTERN
ANYONE
MUCH

The sample screen of the conducted MOS Test experiment

LT screen