NaijaFaceVoice: A Large-Scale Deep Learning Model and Database of Nigerian Faces and Voices
dc.creator | Akinrinmade, A., Adetiba, E., Badejo, J. A., Oshin, Oluwadamilola | |
dc.date | 2023-06-05 | |
dc.date.accessioned | 2025-04-15T12:20:06Z | |
dc.description | The fusion of two or more traits in multimodal biometrics generally improves recognition accuracy. The question is, by how much? Large-scale databases are better suited for training deep learning models for better generalization and accuracy. Therefore, a large-scale multimodal database is beneficial. However, publicly available large-scale multimodal databases are scarce, especially for faces and voices. Again, because a face image is 2-D while a voice is 1-D, there is the challenge of the best way to fuse both. Therefore, improvements owing to fusion have hitherto yielded marginal improvements. This study proposes a semi-automated curation algorithm for the extraction of the faces and voices of target individuals in videos to create a large-scale face-voice database. The curation technique involves observing the positions at the time of the occurrence of the target subject’s faces and voices in videos. These positions are supplied to a MATLAB2017b script that detects the faces in the observed regions, crops, resizes, auto-labels, and writes them to the disk. A second MATLAB2017b script, extracts the audio content within the observed regions, auto-labels, and writes the voice segments to the disk. The created database named NaijaFaceVoice consists of 2,656 subjects with over 2 million faces and 195 hours of utterances. The database was employed to develop a large-scale recognition system that leveraged Convolutional Neural Networks. Robust fusion methods incorporating the proposed Spectrogram-Voting concept significantly improved performance achieving a record equal error rate of 0.0003519%, an improvement by a factor of over 450. | |
dc.format | application/pdf | |
dc.identifier | http://eprints.covenantuniversity.edu.ng/18410/ | |
dc.identifier.uri | https://repository.covenantuniversity.edu.ng/handle/123456789/49024 | |
dc.language | en | |
dc.publisher | IEEE | |
dc.subject | TK Electrical engineering. Electronics Nuclear engineering | |
dc.title | NaijaFaceVoice: A Large-Scale Deep Learning Model and Database of Nigerian Faces and Voices | |
dc.type | Article |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- NaijaFaceVoice.pdf
- Size:
- 227.11 KB
- Format:
- Adobe Portable Document Format