da/sec scientific talk on Biometrics

Topic: Deep Learning in Speaker Recognition

by Hong Hao
FBI D14/0.13, July 02, 2015 (Thursday), 12.00 noon

Keywords — speaker recognition, deep learning, RBM, PLDA

Abstract

An novelty supervised dimension reduction technique called PLDA-RBM was initiated in [1]. The model in [1] is comparable with PLDA, and outperform LDA. Hereby, instead using PLDA-RBM as a reduction technique, PLDA-RBM is used as a sparse coding method connecting with a cosine classifier. This achieves better result than PLDA baseline system about 0.5% on both female and male speakers on MOBIO database. In addition, the HTER on female evaluation set of MOBIO database has achieved the best result among the systems participating in a speaker recognition competition in mobile environment in 2013 [2]. Moreover, a designated multiple-layer PLDA-RBM extracts additional information from channel hidden units improves the performance of single layer PLDA-RBM.

[1] Stafylakis, Themos, Patrick Kenny, Mohammed Senoussaoui, and Pierre Dumouchel. „PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification.“ In INTERSPEECH. 2012.
access at: http://www.crim.ca/perso/patrick.kenny/Stafyl_Interspeech12.pdf

[2] Khoury, et. al. „The 2013 speaker recognition evaluation in mobile environment“, Biometrics (ICB), International Conference. 2013.
access at: http://www.inesc-id.pt/pt/indicadores/Ficheiros/9351.pdf