Publications

A collection of my research work.

Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning

Yuchen Mao, Wen Huang, Yanmin Qian

ICASSP 2026

A segment-aware framework introducing positional labeling and cross-segment mixing to mitigate boundary over-reliance and enhance the localization of manipulated content in partial deepfake audio.

A Data-Centric Approach to Generalizable Speech Deepfake Detection

Wen Huang, Yuchen Mao, Yanmin Qian

Preprint

A data-centric investigation establishing scaling laws and proposing a Diversity-Optimized Sampling Strategy (DOSS), achieving state-of-the-art generalization with superior data efficiency.

Paper

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods

Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian

ACL 2025

A 3,000+ hour multilingual dataset with 40 diverse synthesis methods, designed to enable robust speech deepfake detection.

From Sharpness to Better Generalization for Speech Deepfake Detection

Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi, Yanmin Qian

Interspeech 2025

An empirical investigation validating sharpness as a theoretical indicator for generalization in speech deepfake detection.

Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation

Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian

ICASSP 2025

A latent space algorithm integrating refinement and augmentation to enhance the generalization of speech deepfake detection.

Paper

Unified audio event detection

Yidi Jiang, Ruijie Tao, Wen Huang, Qian Chen, Wen Wang

ICASSP 2025

A novel framework unifying sound event detection and speaker diarization for the comprehensive analysis of both speech and non-speech events.

Paper

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning.

Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan, Cheng Lu, Zhiqiang Lv, Jia Liu, Wei-Qiang Zhang, Yanmin Qian

ICASSP 2025

A data-efficient, low-complexity acoustic scene classification system utilizing the Rep-Mobile architecture and progressive pruning, achieving 1st place in the DCASE2024 Challenge.

Paper

Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters

Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian

ICASSP 2024

A multi-level domain adapter module for speaker verification that effectively mitigates domain shift, improving cross-domain performance with minimal parameter overhead.

Paper

Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification

Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian

ISCSLP 2024

A dual-level contrastive learning method for unsupervised domain adaptation in speaker verification, achieving state-of-the-art performance across diverse cross-domain settings.

Paper

Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection

Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu, Wei-Qiang Zhang, Pingyi Fan, Jia Liu, Yanmin Qian

ICASSP 2024

A robust anomaly sound detection framework leveraging large-scale pre-trained speech models and condition-based self-supervision, achieving 2nd place in DCASE 2023 Task 2.

Paper

Semi-Supervised Acoustic Scene Classification with Test-Time Adaptation

Wen Huang, Anbai Jiang, Bing Han, Xinhu Zheng, Yihong Qiu, Wenxi Chen1, Yuzhe Liang, Pingyi Fan, Wei-Qiang Zhang, Cheng Lu, Xie Chen, Jia Liu, Yanmin Qian1

ICMEW (Competition Track) 2024

An acoustic scene classification system addressing label scarcity and domain mismatch through semi-supervised learning and test-time adaptation, achieving 3rd place in the ICME 2024 Grand Challenge.

Paper

Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training

Bing Han, Wen Huang, Zhengyang Chen, Yanmin Qian

ICASSPW (Self-supervision in Audio, Speech and Beyond) 2023

A cluster-aware training strategy for DINO-based self-supervised speaker verification, achieving state-of-the-art performance and outperforming fully supervised systems with only 10% labeled data.

Paper