A Cluster-Based Speaker Diarization System Combined with Dimensionality Reduction Techniques
Keywords:
Speaker diarization, statistical model, segmentation clustering KL divergence.Abstract
In this article, we introduced an unsupervised speaker diarization system for speaker detection in noisy environments, we introduced a statistical mixture-based model to model the input segment and cluster features obtained using MFCC for effective speaker identification of this segment. The concepts of KL- divergence is considered to effectively identify a speaker based on the a maximum Likelihood estimate of the speaker.
Downloads
References
S. Jothilakshmi, V. Ramalingam, and S. Palanivel, “Speaker diarization using autoassociative neural networks,” Eng. Applicat. Artif. Intell., vol. 22, no. 4-5, pp. 667–675, 2009.
X. Anguera, C. Wooters, and J. Hernando, “Robust speaker diarization for meetings: ICSI RT06s evaluation system,” in Proc. ICSLP, Pittsburgh, PA, Sep. 2006.
C. Wooters and M. Huijbregts, “The ICSI RT07s speaker diarization system,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8–11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag, 2008, pp. 509–519.
J. Rougui, M. Rziza, D. Aboutajdine, M. Gelgon, and J. Martinez, “Fast incremental clustering of Gaussian mixture speaker models for scaling up retrieval in on-line broadcast,” in Proc. ICASSP, May 2006, vol. 5, pp. 521–524.
M. Kotti, E. Benetos, and C. Kotropoulos, “Computationally efficient and robust bic-based speaker segmentation,” IEEE TASLP, vol. 16(5), 2008.
X. Zhu, C. Barras, L. Lamel, and J.-L. Gauvain, “Multi-stage Speaker Diarization for Conference and Lecture Meetings,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 533–542.
S. Jothilakshmi, V. Ramalingam, and S. Palanivel, “Speaker diarization using autoassociative neural networks,” Engineering Applications of Artificial Intelligence, vol. 22(4-5), 2009.
X. Anguera, C. Wooters, and J. Hernando, “Robust speaker diarization for meetings: ICSI RT06s evaluation system,” in Proc. ICSLP, Pittsburgh, USA, September 2006.
C. Wooters and M. Huijbregts, “The ICSI RT07s Speaker Diarization System,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 509–519.
J. Rougui, M. Rziza, D. Aboutajdine, M. Gelgon, and J. Martinez, “Fast incremental clustering of gaussian mixture speaker models for scaling up retrieval in on-line broadcast,” in Proc. ICASSP, vol. 5, May 2006.
W. Tsai, S. Cheng, and H. Wang, “Speaker clustering of speech utterances using a voice characteristic reference space,” in Proc. ICSLP, 2004.
T. H. Nguyen, E. S. Chng, and H. Li, “T-test distance and clustering criterion for speaker diarization,” in Proc. Interspeech, Brisbane, Australia, 2008.
T. Nguyen et al., “The IIR-NTU Speaker Diarization Systems for RT 2009,” in RT’09, NIST Rich Transcription Workshop, May 28-29, 2009, Melbourne, Florida, USA, 2009.
S. Meignier, J.-F. Bonastre, and S. Igounet, “E-HMM approach for learning and adapting sound models for speaker indexing,” in Proc. Odyssey Speaker and Language Recognition Workshop, Chania, Creete, June 2001, pp. 175–180.
C. Fredouille and N. Evans, “The LIA RT’07 speaker diarization system,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 520–532.
C. Fredouille, S. Bozonnet, and N. W. D. Evans, “The LIA-EURECOM RT‘09 Speaker Diarization System,” in RT’09, NIST Rich Transcription Workshop, May 28-29, 2009, Melbourne, Florida, USA, 2009.
S. Bozonnet, N. W. D. Evans, and C. Fredouille, “The LIA-EURECOM RT‘09 Speaker Diarization System: enhancements in speaker modelling and cluster purification,” in Proc. ICASSP, Dallas, Texas, USA, March 14-19 2010.
D. Vijayasenan, F. Valente, and H. Bourlard, “Agglomerative information bottleneck for speaker diarization of meetings data,” in Proc. ASRU, Dec. 2007, pp. 250–255.
D. Vijayasenan, F. Valente and H. Bourlard, “An information theoretic approach to speaker diarization of meeting data,” IEEE TASLP, vol. 17, pp. 1382–1393, September 2009.
S. McEachern, “Estimating normal means with a conjugate style dirichlet process prior,” in Communications in Statistics: Simulation and Computation, vol. 23, 1994, pp. 727–741.
G. E. Hinton and D. van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the sixth annual conference on Computational learning theory, ser. COLT ’93. New York, NY, USA: ACM, 1993, pp. 5–13. [Online]. Available: http://doi.acm.org/10.1145/168304.168306
M. J. Wainwright and M. I. Jordan, “Variational inference in graphical models: The view from the marginal polytope,” in Forty-first Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, 2003.
F. Valente, “Variational Bayesian methods for audio indexing,” Ph.D. dissertation, Thesis, 09 2005.
D. Reynolds, P. Kenny, and F. Castaldo, “A study of new approaches to speaker diarization,” in Proc. Interspeech. ISCA, 2009.
P. Kenny, “Bayesian analysis of speaker diarization with eigenvoice priors,” CRIM, Montreal, Technical Report, 2008.
X. Anguera and J.-F. Bonastre, “A novel speaker binary key derived from anchor models,” in Proc. Interspeech, 2010.
X. Anguera and J. -F. Bonastre, "Fast speaker diarization based on binary keys," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 4428-4431.
Y. Huang, O. Vinyals, G. Friedland, C. Muller, N. Mirghafori, and C. Wooters, “A fast-match approach for robust, faster than real-time speaker diarization,” in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, Japan, December 2007, pp. 693–698.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.