Skip to content

Neuro-AI-Lab/IDCL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICASSP ORAL 2026] INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS

Dong-Hyuk Lee, Dae Hyeon Kim, Young-Seok Choi*

Department of Electronics and Communications Engineering, Kwangwoon University, Seoul, South Korea

Conference License


📢 News

  • [Apr. 2026] 🚀 The official code released!
  • [Mar, 2026] 🎉 Our paper "INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS" has been accepted for an oral presentation at ICASSP 2026! See you in Barcelona, Spain!
  • [Jan, 2026] 🎉 Our paper "INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS" has been accepted to ICASSP 2026!

📦 Package Usage

You can install the IDCL loss function directly from GitHub:

pip install git+https://github.com/hyuki0003/IDCL.git@official

Argument

Argument Type Description
K int Number of top-K neighbors used as positives
temperature float Softmax temperature τ

Quick Start

from idcl import IDCL

# Initialize
loss_fn = IDCL(K=15, temperature=0.05)

# Forward pass
# anchor, modality: [B, L, D]  (batch, sequence length, feature dim)
loss = loss_fn(audio_feat, text_feat)

Multimodal Alignment

loss_ta = loss_fn(audio_feat, text_feat)   # audio → text
loss_at = loss_fn(text_feat, audio_feat)   # text → audio
loss = (loss_ta + loss_at) / 2.0

📝 Abstract

Multimodal Emotion Recognition in Conversations (MERC) is challenging due to the complex interplay between modalities and the critical role of contextual information. While previous studies have primarily focused on context within a single conversation (intra-dialog), this work explores a new dimension: the contextual information shared across different conversations.

We introduce Inter-Dialog Contrastive Learning (IDCL), a novel framework that leverages inter-dialog similarities to enhance multimodal representation learning. IDCL operates on the hypothesis that conversations with similar emotional trajectories share underlying contextual patterns. By maximizing the similarity between these emotionally congruent dialogs and minimizing it for incongruent ones, IDCL learns more robust and generalizable representations.

Experiments on the IEMOCAP dataset demonstrate that our approach establishing the importance of inter-dialog context for advancing emotion recognition.

Architecture
Figure 1: Overall Architecture
IDCL Framework
Figure 2: IDCL Framework

📊 Experimental Results

To validate the explicit contribution of our proposed Inter-Dialog Contrastive Learning (IDCL) framework, we conducted ablation studies on the IEMOCAP dataset. We compared three settings to isolate the effect of the IDCL objective from transfer learning.

Model Setting IEMOCAP (4-way)
Acc (%) / WF1 (%)
IEMOCAP (6-way)
Acc (%) / WF1 (%)
(A) Baseline
(Cross-Entropy only)
80.8 / 80.8 65.4 / 65.7
(B) IDCL
(w/o Pre-training)
82.5 / 82.5 65.8 / 66.2
(C) Proposed Full Model
(with Pre-training)
85.9 / 85.8 66.4 / 66.6

Key Findings:

  1. Intrinsic Robustness: Comparing (B) with (A), IDCL alone improved accuracy by 1.7% (4-way), proving that the IDCL objective effectively learns robust representations by leveraging inter-dialog context, even without external data.
  2. Synergy with Pre-training: The proposed full model (C) achieved the highest performance (+5.1% over baseline), confirming a strong synergy between IDCL and transfer learning strategies.

About

[ICASSP 2026] Official Implementation of Inter-Dialog Contrasive Learning for MERC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%