Publications
A collection of my research work.
Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning
Yuchen Mao, Wen Huang, Yanmin Qian†
ICASSP 2026
A segment-aware framework introducing positional labeling and cross-segment mixing to mitigate boundary over-reliance and enhance the localization of manipulated content in partial deepfake audio.
A Data-Centric Approach to Generalizable Speech Deepfake Detection
Wen Huang, Yuchen Mao, Yanmin Qian†
Preprint
A data-centric investigation establishing scaling laws and proposing a Diversity-Optimized Sampling Strategy (DOSS), achieving state-of-the-art generalization with superior data efficiency.
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian†
ACL 2025
A 3,000+ hour multilingual dataset with 40 diverse synthesis methods, designed to enable robust speech deepfake detection.
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation
Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian†
ICASSP 2025
A latent space algorithm integrating refinement and augmentation to enhance the generalization of speech deepfake detection.
Unified audio event detection
Yidi Jiang, Ruijie Tao†, Wen Huang, Qian Chen, Wen Wang
ICASSP 2025
A novel framework unifying sound event detection and speaker diarization for the comprehensive analysis of both speech and non-speech events.
Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning.
Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan, Cheng Lu, Zhiqiang Lv, Jia Liu, Wei-Qiang Zhang, Yanmin Qian
ICASSP 2025
A data-efficient, low-complexity acoustic scene classification system utilizing the Rep-Mobile architecture and progressive pruning, achieving 1st place in the DCASE2024 Challenge.
Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters
Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian†
ICASSP 2024
A multi-level domain adapter module for speaker verification that effectively mitigates domain shift, improving cross-domain performance with minimal parameter overhead.
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian†
ISCSLP 2024
A dual-level contrastive learning method for unsupervised domain adaptation in speaker verification, achieving state-of-the-art performance across diverse cross-domain settings.
Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection
Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu, Wei-Qiang Zhang, Pingyi Fan, Jia Liu, Yanmin Qian†
ICASSP 2024
A robust anomaly sound detection framework leveraging large-scale pre-trained speech models and condition-based self-supervision, achieving 2nd place in DCASE 2023 Task 2.
Semi-Supervised Acoustic Scene Classification with Test-Time Adaptation
Wen Huang, Anbai Jiang, Bing Han, Xinhu Zheng, Yihong Qiu, Wenxi Chen1, Yuzhe Liang, Pingyi Fan, Wei-Qiang Zhang, Cheng Lu, Xie Chen, Jia Liu, Yanmin Qian1†
ICMEW (Competition Track) 2024
An acoustic scene classification system addressing label scarcity and domain mismatch through semi-supervised learning and test-time adaptation, achieving 3rd place in the ICME 2024 Grand Challenge.
Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training
Bing Han, Wen Huang, Zhengyang Chen, Yanmin Qian†
ICASSPW (Self-supervision in Audio, Speech and Beyond) 2023
A cluster-aware training strategy for DINO-based self-supervised speaker verification, achieving state-of-the-art performance and outperforming fully supervised systems with only 10% labeled data.