[ICASSP ORAL 2026] INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS

Dong-Hyuk Lee, Dae Hyeon Kim, Young-Seok Choi^*

Department of Electronics and Communications Engineering, Kwangwoon University, Seoul, South Korea

📢 News

[Apr. 2026] 🚀 The official code released!
[Mar, 2026] 🎉 Our paper "INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS" has been accepted for an oral presentation at ICASSP 2026! See you in Barcelona, Spain!
[Jan, 2026] 🎉 Our paper "INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS" has been accepted to ICASSP 2026!

📦 Package Usage

You can install the IDCL loss function directly from GitHub:

pip install git+https://github.com/hyuki0003/IDCL.git@official

Argument

Argument	Type	Description
`K`	int	Number of top-K neighbors used as positives
`temperature`	float	Softmax temperature τ

Quick Start

from idcl import IDCL

# Initialize
loss_fn = IDCL(K=15, temperature=0.05)

# Forward pass
# anchor, modality: [B, L, D]  (batch, sequence length, feature dim)
loss = loss_fn(audio_feat, text_feat)

Multimodal Alignment

loss_ta = loss_fn(audio_feat, text_feat)   # audio → text
loss_at = loss_fn(text_feat, audio_feat)   # text → audio
loss = (loss_ta + loss_at) / 2.0

📝 Abstract

Multimodal Emotion Recognition in Conversations (MERC) is challenging due to the complex interplay between modalities and the critical role of contextual information. While previous studies have primarily focused on context within a single conversation (intra-dialog), this work explores a new dimension: the contextual information shared across different conversations.

We introduce Inter-Dialog Contrastive Learning (IDCL), a novel framework that leverages inter-dialog similarities to enhance multimodal representation learning. IDCL operates on the hypothesis that conversations with similar emotional trajectories share underlying contextual patterns. By maximizing the similarity between these emotionally congruent dialogs and minimizing it for incongruent ones, IDCL learns more robust and generalizable representations.

Experiments on the IEMOCAP dataset demonstrate that our approach establishing the importance of inter-dialog context for advancing emotion recognition.

Figure 1: Overall Architecture

Figure 2: IDCL Framework

📊 Experimental Results

To validate the explicit contribution of our proposed Inter-Dialog Contrastive Learning (IDCL) framework, we conducted ablation studies on the IEMOCAP dataset. We compared three settings to isolate the effect of the IDCL objective from transfer learning.

Model Setting	IEMOCAP (4-way) Acc (%) / WF1 (%)	IEMOCAP (6-way) Acc (%) / WF1 (%)
(A) Baseline (Cross-Entropy only)	80.8 / 80.8	65.4 / 65.7
(B) IDCL (w/o Pre-training)	82.5 / 82.5	65.8 / 66.2
(C) Proposed Full Model (with Pre-training)	85.9 / 85.8	66.4 / 66.6

Key Findings:

Intrinsic Robustness: Comparing (B) with (A), IDCL alone improved accuracy by 1.7% (4-way), proving that the IDCL objective effectively learns robust representations by leveraging inter-dialog context, even without external data.

Synergy with Pre-training: The proposed full model (C) achieved the highest performance (+5.1% over baseline), confirming a strong synergy between IDCL and transfer learning strategies.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
dataloader		dataloader
figures		figures
idcl		idcl
losses		losses
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICASSP ORAL 2026] INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS

📢 News

📦 Package Usage

Argument

Quick Start

Multimodal Alignment

📝 Abstract

📊 Experimental Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICASSP ORAL 2026] INTER-DIALOG CONTRASTIVE LEARNING FOR MULTIMODAL EMOTION RECOGNITION IN CONVERSATIONS

📢 News

📦 Package Usage

Argument

Quick Start

Multimodal Alignment

📝 Abstract

📊 Experimental Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages