Skip to content

Fix PostgreSQL connection exhaustion (too many clients) with gevent workers#1408

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-django-cursor-connection
Draft

Fix PostgreSQL connection exhaustion (too many clients) with gevent workers#1408
Copilot wants to merge 2 commits intomainfrom
copilot/fix-django-cursor-connection

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 17, 2026

O que esse PR faz?

Corrige o erro FATAL: sorry, too many clients already que derruba o scielo.org em produção.

Causa raiz: Com gevent (1000 greenlets/worker × 3 workers = até 3000 greenlets), CONN_MAX_AGE=60 mantém uma conexão persistente por greenlet, esgotando o max_connections do PostgreSQL (tipicamente 100).

Mudanças:

  • CONN_MAX_AGE default 60→0: Fecha conexões após cada request — correto para gevent onde cada greenlet é um "thread" com sua própria conexão
  • Fix bug no env var: env.int("CONN_MAX_AGE", default=0) or env.int("DJANGO_CONN_MAX_AGE", default=60) — o or do Python trata 0 como falsy, impossibilitando setar CONN_MAX_AGE=0 explicitamente via env
  • Remove POOL_OPTIONS: POOL_SIZE, MAX_OVERFLOW, RECYCLE são opções SQLAlchemy que não fazem nada com django_prometheus.db.backends.postgresql — nenhuma lib de pooling está instalada
  • Gunicorn configurável via env vars: GUNICORN_WORKERS, GUNICORN_WORKER_CONNECTIONS, GUNICORN_WORKER_CLASS, GUNICORN_TIMEOUT

Onde a revisão poderia começar?

config/settings/production.py — a mudança no CONN_MAX_AGE é o fix principal.

Como este poderia ser testado manualmente?

  1. Deploy em staging com as configurações padrão (sem env vars de override)
  2. Verificar que CONN_MAX_AGE efetivo é 0 via django.conf.settings.DATABASES["default"]["CONN_MAX_AGE"]
  3. Sob carga, monitorar conexões PostgreSQL: SELECT count(*) FROM pg_stat_activity; — deve se manter dentro do max_connections
  4. Testar override: CONN_MAX_AGE=60 deve ser respeitado (antes, CONN_MAX_AGE=0 era ignorado pelo bug do or)
  5. Testar gunicorn env vars: GUNICORN_WORKERS=4 GUNICORN_WORKER_CONNECTIONS=500 deve alterar o comportamento do gunicorn

Algum cenário de contexto que queira dar?

O POOL_OPTIONS configurado dava falsa sensação de que existia connection pooling. Se pooling real for necessário, considerar PgBouncer ou instalar django-db-connection-pool e reconfigurar o engine.

Screenshots

N/A — mudanças de configuração de infraestrutura.

Quais são tickets relevantes?

Referências

Original prompt

This section details on the original issue you should resolve

<issue_title>scielo.org com dificuldade de manter up</issue_title>
<issue_description>### Descrição do problema

O log do django está com algumas mensagens em produção como:

cursor = self.connection.cursor() 2026-03-17T13:10:18.731109419Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731135526Z File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner 2026-03-17T13:10:18.731153429Z return func(*args, **kwargs) 2026-03-17T13:10:18.731169591Z ^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731186525Z File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 320, in cursor 2026-03-17T13:10:18.731203245Z return self._cursor() 2026-03-17T13:10:18.731218124Z ^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731233119Z File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 296, in _cursor 2026-03-17T13:10:18.731250537Z self.ensure_connection()
File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner 2026-03-17T13:10:18.731280895Z return func(*args, **kwargs) 2026-03-17T13:10:18.731297069Z ^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731312102Z File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 278, in ensure_connection 2026-03-17T13:10:18.731327108Z with self.wrap_database_errors: 2026-03-17T13:10:18.731342051Z File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__ 2026-03-17T13:10:18.731356615Z raise dj_exc_value.with_traceback(traceback) from exc_value 2026-03-17T13:10:18.731370811Z File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 279, in ensure_connection 2026-03-17T13:10:18.731385757Z self.connect() 2026-03-17T13:10:18.731402735Z File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner 2026-03-17T13:10:18.731418725Z return func(*args, **kwargs) 2026-03-17T13:10:18.731433473Z ^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731449340Z File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 256, in connect
self.connection = self.get_new_connection(conn_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/django_prometheus/db/backends/postgresql/base.py", line 9, in get_new_connection
conn = super().get_new_connection(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731583998Z File "/usr/local/lib/python3.11/site-packages/django_prometheus/db/common.py", line 45, in get_new_connection
return super().get_new_connection(*args, **kwargs) 2026-03-17T13:10:18.731617109Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner 2026-03-17T13:10:18.731663564Z return func(*args, **kwargs) 2026-03-17T13:10:18.731684798Z ^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731705148Z File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 332, in get_new_connection 2026-03-17T13:10:18.731723543Z connection = self.Database.connect(**conn_params) 2026-03-17T13:10:18.731740876Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731760124Z File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect 2026-03-17T13:10:18.731776242Z conn = _connect(dsn, connection_factory=connection_factory, **kwasync) 2026-03-17T13:10:18.731826098Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-03-17T13:10:18.731842735Z django.db.utils.OperationalError: connection to server at "192.168.2.190", port 5432 failed: FATAL: sorry, too many clients already 2026-03-17T13:10:18.731862247Z 2026-03-17T13:10:19.137316691Z WARNING 2026-03-17 13:10:19,136 log 16 139884299102432 Not Found: /wp-content/uploads/2018/08/bvs.gif 2026-03-17T13:10:21.318225371Z WARNING 2026-03-17 13:10:21,314 log 16 139884299102432 Not Found: /wp-content/uploads/2018/08/Logo_Fap-Unifesp.png
WARNING 2026-03-17 13:10:21,324 log 15 139884423662304 Not Found: /wp-content/uploads/2018/08/fapesp_patrocinadores.png
WARNING 2026-03-17 13:10:26,379 log 16 139884299102432 Not Found: /pt-br/apps/servicesplatform/client/controller/authentication/origin/aHR0cDovL3NjaWVsby5zbGQuY3Uvc2NpZWxvLnBocD9sbmc9ZXMmbnJtPWlzbyZwaWQ9UzIyMTgtMzYyMDIwMjMwMDA0MDA3NzUmc2NyaXB0PXNjaV9hcnR0ZXh0/
WARNING 2026-03-17 13:10:26,944 log 16 139884299102432 Not Found: /apple-touch-icon-precomposed.png
WARNING 2026-03-17 13:10:27,373 log 15 139884423662304 Not Found: /wp-content/uploads/2018/08/fapesp_patrocinadores.png
WARNING 2026-03-17 13:10:29,769 log 17 139884360979840 Not Found: /wp-content/uploads/2018/08/capes_patrocinadores.png 2026-03-17T13:10:29.775141592Z WARNING 2026-03-17 13:10:29,774 log 15 139884423662304 Not Found: /wp-content/uploads/2018/08/ops-oms_es_pt.gif 2026-03-17T13:10:29.781661078Z WARNING 2026-03-17 13:10:29,780 log 16 139884299102432 Not Found...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes scieloorg/core#1407

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 Send tasks to Copilot coding agent from [Slack](https://gh.io/cca-slack-docs) and [Teams](https://gh.io/cca-teams-docs) to turn conversations into code. Copilot posts an update in your thread when it's finished.

- Set CONN_MAX_AGE default to 0 (close DB connections after each request)
  to prevent connection exhaustion when using gevent workers where each
  greenlet holds its own persistent connection
- Fix buggy CONN_MAX_AGE env var logic that used Python `or` operator,
  which prevented explicitly setting CONN_MAX_AGE=0
- Remove non-functional POOL_OPTIONS (no pooling library installed)
- Make gunicorn settings configurable via environment variables
- Update .envs/.production/.django with new configuration docs

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix django cursor connection issues in production Fix PostgreSQL connection exhaustion (too many clients) with gevent workers Mar 17, 2026
Copilot AI requested a review from robertatakenaka March 17, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants