Research has continued on the study of new acoustic parameters. The possibility of using broad-band and temporal acoustic cues in Hidden Markov Models (HMM) has been investigated. Simple parameters like ratios between low and high frequency energies and the distance of the actual frame from the beginning of an energy peak or dip resulted in an increase of phoneme recognition accuracy in a classical experiment involving the TIMIT corpus. The use of simulated annealing for determining better HMM topologies did not result in a significant phoneme recognition improvement, while the use of the same algorithm resulted to be effective for determining context clusters to introduce new type of Context-Dependent (CD) phoneme models. Interesting results were also obtained by merging the gaussian mixtures of the CD models for the allophones of the same phoneme to obtain a single Context-Independent (CI) model for each phoneme. Close to 70% correct and 65% phoneme accuracy were obtained with these CI models on the TIMIT corpus without bigram probabilities and second derivatives. The details of this research are described in a paper in press on Computer Speech and Language. Interesting results with the same corpus have been obtained with Parallel Synchronous HMMs (PHMM). These models have master HMMs (the ones used in the previously described research) and slave HMMs whose transition probabilities and mixture coefficients are modulated by the probabilities of being in a state of the corresponding master. An improvement in phoneme recognition accuracy was observed by modeling phoneme durations in the slave HMMs. Methods for speaker adaptation using statistical parameter re-estimation based on samples dynamically stored into cache memories and for acoustic channel normalization were also developed.
R. AbuHosan, P. Boucher, F. Brugnara (IRST), R. DeMori, M. Galler, M. Snow