Skip to content

AMALIA-LLM/AMALIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 

Repository files navigation

AMALIA

AMALIA is a fully open Large Language Model for European Portuguese.

This repository serves as a central entry point for resources related to the paper:

โ€œAMALIA: A Fully Open Large Language Model for European Portugueseโ€
Accepted at PROPOR 2026
๐Ÿ“„ https://aclanthology.org/2026.propor-1.38/


๐Ÿ“š Overview

Despite recent advances in open Large Language Models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and evaluation benchmarks. Existing evaluations often rely on machine-translated datasets, which fail to capture important linguistic and cultural nuances of the language.

AMALIA addresses this gap by:

  • Prioritizing high-quality pt-PT data during all training stages
  • Providing a fully open LLM tailored specifically for European Portuguese
  • Introducing new evaluation benchmarks for pt-PT

Experimental results show that AMALIA remains competitive with strong baselines, while achieving substantial improvements on pt-PT-specific evaluations, highlighting the importance of targeted training and native benchmarking for underrepresented language variants.

For implementation details refer to the official organization repositories:

๐Ÿ”— https://github.com/orgs/AMALIA-LLM/repositories


๐Ÿ“– Citation

If you use AMALIA in your work, please cite:

@inproceedings{simplicio-etal-2026-amalia,
    title = "{AMALIA}: A Fully Open Large Language Model for {E}uropean {P}ortuguese",
    author = "Simpl{\'i}cio, Afonso and
              Vinagre, Gon{\c{c}}alo and
              Ramos, Miguel Moura and
              Tavares, Diogo and
              Ferreira, Rafael and
              Attanasio, Giuseppe and
              Alves, Duarte M. and
              Calvo, In{\^e}s and
              Vieira, In{\^e}s and
              Guerra, Rui and
              Furtado, James and
              Canaverde, Beatriz and
              Paulo, Iago and
              Ramos, Vasco and
              Gl{\'o}ria-Silva, Diogo and
              Faria, Miguel and
              Treviso, Marcos and
              Gomes, Daniel and
              Gomes, Pedro and
              Semedo, David and
              Martins, Andr{\'e} and
              Magalh{\~a}es, Jo{\~a}o",
    booktitle = "Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1",
    month = apr,
    year = "2026",
    address = "Salvador, Brazil",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.propor-1.38/",
    pages = "380--391",
    ISBN = "979-8-89176-387-6"
}

About

Entry point for AMALIA related resources.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages