Wen Huang

M.E. Student

Shanghai Jiao Tong University

Research Interests

Generalization & Robustness

Speech & Audio Processing

Multimodal & Generative AI

About

I am a graduating Master's student at the Auditory Cognition and Computational Acoustics Laboratory (AudioCC Lab) @ Shanghai Jiao Tong University, advised by Prof. Yanmin Qian.

My current research focuses on bridging the generalization gap in speech and audio AI. I am particularly interested in exploring paradigms where generalization is defined not only by single-task robustness but also by versatility across complex, open-ended tasks.

Research Interests:

Generalization & Robustness: Domain adaptation and generalization, self-supervised learning, data-centric AI, and foundation models.
Speech & Audio Processing: Universal audio understanding and processing across a broad spectrum of tasks including synthesis, recognition, and analysis.
Multimodal & Generative AI: Cross-modal reasoning and generative modeling involving audio, vision, and text.

Selected Publications

View All →

A Data-Centric Approach to Generalizable Speech Deepfake Detection

Wen Huang, Yuchen Mao, Yanmin Qian^†

Preprint

A data-centric investigation establishing scaling laws and proposing a Diversity-Optimized Sampling Strategy (DOSS), achieving state-of-the-art generalization with superior data efficiency.

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods

Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian^†

ACL 2025

A 3,000+ hour multilingual dataset with 40 diverse synthesis methods, designed to enable robust speech deepfake detection.

From Sharpness to Better Generalization for Speech Deepfake Detection

Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi^†, Yanmin Qian^†

Interspeech 2025

An empirical investigation validating sharpness as a theoretical indicator for generalization in speech deepfake detection.

Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation

Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian^†

ICASSP 2025

A latent space algorithm integrating refinement and augmentation to enhance the generalization of speech deepfake detection.

Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters

Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian^†

ICASSP 2024

A multi-level domain adapter module for speaker verification that effectively mitigates domain shift, improving cross-domain performance with minimal parameter overhead.

News

2026.01

One paper accepted to ICASSP 2026.

2025.05

One paper accepted to Interspeech 2025.

2025.05

One paper accepted to ACL 2025.

2025.04

Presented virtually at ICASSP 2025.

2024.12

Three papers accepted to ICASSP 2025.

2024.08

One paper accepted to ISCSLP 2024.

2024.06

Achieved 1st and 4th place in Task 1 and Task 4 of the DCASE 2024 Challenge.

2024.04

Presented one poster at ICASSP 2024 in Seoul, Korea.

2024.03

Achieved 2nd and 3rd place in the ICME 2024 ASC Grand Challenge.

2023.12

Two papers accepted to ICASSP 2024.