This paper describes a system for the automatic extraction of diphone units form given speech utterances. The method is based on an automatic phonetic segmentation and on a subsequent rule-driven diphone boundary detection. The phonetic segmenter, developed at Irst, was trained and tested both in speaker independent and speaker dependent mode. A rule formalism, involving acoustic parameters, arithmetical and logical operators, was defined to express the acoustic/phonetic knowledge acquired during previous experiences on manual diphone segmentation. A specialized tol for rule parsing was designed that processes a given sequence of automatically derived phone boundaries using a corresponding sequence of predefined acoustic parameters. Several sets of rules were developed that include both general principles and specific details concerning the content of the diphone database of 'Eloquens', the CSELT test-to-speech syntesis system for the Italian language. The accuracy was evaluated by comparing the manual and the automatic segmentations of the speech utterances of a female speaker, resulting in nearly 95% of correct boundary position, given a tolerance of 20 ms

Automatic Diphone Extraction for an Italian Text-to-Speech Synthesis System

Angelini, Bianca;Falavigna, Giuseppe Daniele;Omologo, Maurizio;
1997-01-01

Abstract

This paper describes a system for the automatic extraction of diphone units form given speech utterances. The method is based on an automatic phonetic segmentation and on a subsequent rule-driven diphone boundary detection. The phonetic segmenter, developed at Irst, was trained and tested both in speaker independent and speaker dependent mode. A rule formalism, involving acoustic parameters, arithmetical and logical operators, was defined to express the acoustic/phonetic knowledge acquired during previous experiences on manual diphone segmentation. A specialized tol for rule parsing was designed that processes a given sequence of automatically derived phone boundaries using a corresponding sequence of predefined acoustic parameters. Several sets of rules were developed that include both general principles and specific details concerning the content of the diphone database of 'Eloquens', the CSELT test-to-speech syntesis system for the Italian language. The accuracy was evaluated by comparing the manual and the automatic segmentations of the speech utterances of a female speaker, resulting in nearly 95% of correct boundary position, given a tolerance of 20 ms
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/1398
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact