A Cluster-Based Speaker Diarization System Combined with Dimensionality Reduction Techniques

Authors

  • D. Indu Research Scholar, Department of Computer Science and Engineering GITAM School of Technology GITAM (Deemed to be University) Visakhapatnam, Andhra Pradesh, India
  • Y. Srinivas Professor, Department of Computer Science and Engineering GITAM School of Technology GITAM (Deemed to be University) Visakhapatnam, Andhra Pradesh, India

Keywords:

Speaker diarization, statistical model, segmentation clustering KL divergence.

Abstract

In this article, we introduced an unsupervised speaker diarization system for speaker detection in noisy environments, we introduced a statistical mixture-based model to model the input segment and cluster features obtained using MFCC for effective speaker identification of this segment. The concepts of KL- divergence is considered to effectively identify a speaker based on the  a maximum Likelihood estimate of the speaker.

Downloads

Download data is not yet available.

References

S. Jothilakshmi, V. Ramalingam, and S. Palanivel, “Speaker diarization using autoassociative neural networks,” Eng. Applicat. Artif. Intell., vol. 22, no. 4-5, pp. 667–675, 2009.

X. Anguera, C. Wooters, and J. Hernando, “Robust speaker diarization for meetings: ICSI RT06s evaluation system,” in Proc. ICSLP, Pittsburgh, PA, Sep. 2006.

C. Wooters and M. Huijbregts, “The ICSI RT07s speaker diarization system,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8–11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag, 2008, pp. 509–519.

J. Rougui, M. Rziza, D. Aboutajdine, M. Gelgon, and J. Martinez, “Fast incremental clustering of Gaussian mixture speaker models for scaling up retrieval in on-line broadcast,” in Proc. ICASSP, May 2006, vol. 5, pp. 521–524.

M. Kotti, E. Benetos, and C. Kotropoulos, “Computationally efficient and robust bic-based speaker segmentation,” IEEE TASLP, vol. 16(5), 2008.

X. Zhu, C. Barras, L. Lamel, and J.-L. Gauvain, “Multi-stage Speaker Diarization for Conference and Lecture Meetings,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 533–542.

S. Jothilakshmi, V. Ramalingam, and S. Palanivel, “Speaker diarization using autoassociative neural networks,” Engineering Applications of Artificial Intelligence, vol. 22(4-5), 2009.

X. Anguera, C. Wooters, and J. Hernando, “Robust speaker diarization for meetings: ICSI RT06s evaluation system,” in Proc. ICSLP, Pittsburgh, USA, September 2006.

C. Wooters and M. Huijbregts, “The ICSI RT07s Speaker Diarization System,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 509–519.

J. Rougui, M. Rziza, D. Aboutajdine, M. Gelgon, and J. Martinez, “Fast incremental clustering of gaussian mixture speaker models for scaling up retrieval in on-line broadcast,” in Proc. ICASSP, vol. 5, May 2006.

W. Tsai, S. Cheng, and H. Wang, “Speaker clustering of speech utterances using a voice characteristic reference space,” in Proc. ICSLP, 2004.

T. H. Nguyen, E. S. Chng, and H. Li, “T-test distance and clustering criterion for speaker diarization,” in Proc. Interspeech, Brisbane, Australia, 2008.

T. Nguyen et al., “The IIR-NTU Speaker Diarization Systems for RT 2009,” in RT’09, NIST Rich Transcription Workshop, May 28-29, 2009, Melbourne, Florida, USA, 2009.

S. Meignier, J.-F. Bonastre, and S. Igounet, “E-HMM approach for learning and adapting sound models for speaker indexing,” in Proc. Odyssey Speaker and Language Recognition Workshop, Chania, Creete, June 2001, pp. 175–180.

C. Fredouille and N. Evans, “The LIA RT’07 speaker diarization system,” in Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 520–532.

C. Fredouille, S. Bozonnet, and N. W. D. Evans, “The LIA-EURECOM RT‘09 Speaker Diarization System,” in RT’09, NIST Rich Transcription Workshop, May 28-29, 2009, Melbourne, Florida, USA, 2009.

S. Bozonnet, N. W. D. Evans, and C. Fredouille, “The LIA-EURECOM RT‘09 Speaker Diarization System: enhancements in speaker modelling and cluster purification,” in Proc. ICASSP, Dallas, Texas, USA, March 14-19 2010.

D. Vijayasenan, F. Valente, and H. Bourlard, “Agglomerative information bottleneck for speaker diarization of meetings data,” in Proc. ASRU, Dec. 2007, pp. 250–255.

D. Vijayasenan, F. Valente and H. Bourlard, “An information theoretic approach to speaker diarization of meeting data,” IEEE TASLP, vol. 17, pp. 1382–1393, September 2009.

S. McEachern, “Estimating normal means with a conjugate style dirichlet process prior,” in Communications in Statistics: Simulation and Computation, vol. 23, 1994, pp. 727–741.

G. E. Hinton and D. van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” in Proceedings of the sixth annual conference on Computational learning theory, ser. COLT ’93. New York, NY, USA: ACM, 1993, pp. 5–13. [Online]. Available: http://doi.acm.org/10.1145/168304.168306

M. J. Wainwright and M. I. Jordan, “Variational inference in graphical models: The view from the marginal polytope,” in Forty-first Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, 2003.

F. Valente, “Variational Bayesian methods for audio indexing,” Ph.D. dissertation, Thesis, 09 2005.

D. Reynolds, P. Kenny, and F. Castaldo, “A study of new approaches to speaker diarization,” in Proc. Interspeech. ISCA, 2009.

P. Kenny, “Bayesian analysis of speaker diarization with eigenvoice priors,” CRIM, Montreal, Technical Report, 2008.

X. Anguera and J.-F. Bonastre, “A novel speaker binary key derived from anchor models,” in Proc. Interspeech, 2010.

X. Anguera and J. -F. Bonastre, "Fast speaker diarization based on binary keys," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 4428-4431.

Y. Huang, O. Vinyals, G. Friedland, C. Muller, N. Mirghafori, and C. Wooters, “A fast-match approach for robust, faster than real-time speaker diarization,” in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, Japan, December 2007, pp. 693–698.

Downloads

Published

02.02.2024

How to Cite

Indu, D. ., & Srinivas, Y. . (2024). A Cluster-Based Speaker Diarization System Combined with Dimensionality Reduction Techniques . International Journal of Intelligent Systems and Applications in Engineering, 12(14s), 125–132. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4644

Issue

Section

Research Article