Skip to content

ARM64环境下,使用gaussdb 506.0SPC0100版本的libpq,在连接失败的时候python进程会coredump #30

@Dark-Athena

Description

@Dark-Athena
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# cat test1.py 
#!/usr/bin/env python3
import argparse
import os
import sys

def main() -> int:

    try:
        import gaussdb
    except Exception as exc:
        print(f"[ERROR] Failed to import gaussdb: {exc}")
        return 3

    print("[INFO] gaussdb module:", getattr(gaussdb, "__file__", "<unknown>"))
    print("[INFO] gaussdb version:", getattr(gaussdb, "__version__", "<unknown>"))

    conn = None
    try:
        conn = gaussdb.connect('host=127.0.0.1 port=3333 dbname=database_name user=username password=password', connect_timeout=10)
        with conn.cursor() as cur:
            cur.execute("select 1")
            row = cur.fetchone()
        print("[OK] Connection succeeded, query result:", row)
        return 0
    except Exception as exc:
        print(f"[ERROR] Connection/query failed: {exc}")
        return 1
    finally:
        if conn is not None:
            try:
                conn.close()
            except Exception:
                pass


if __name__ == "__main__":
    sys.exit(main())
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# export LD_LIBRARY_PATH=/workspace/test_gaussdb/gaussdb-505.2-libpq/lib #使用505.2版本libpq不coredump
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# python test1.py 
[INFO] gaussdb module: /workspace/test_gaussdb/gaussdb-python/gaussdb/gaussdb/__init__.py
[INFO] gaussdb version: 1.0.4
[ERROR] Connection/query failed: connection failed: could not connect to server: Operation now in progress
        Is the server running on host "127.0.0.1" and accepting
        TCP/IP connections on port 3333?
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# export LD_LIBRARY_PATH=/workspace/test_gaussdb/gaussdb-506.0-libpq/lib  #使用506.0版本libpq会coredump
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# python test1.py 
[INFO] gaussdb module: /workspace/test_gaussdb/gaussdb-python/gaussdb/gaussdb/__init__.py
[INFO] gaussdb version: 1.0.4
Segmentation fault (core dumped)
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# arch
aarch64
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# python --version
Python 3.11.14

写C程序直接调用506.0的libpq不会coredump

(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# cat test_pq.c
#include <stdio.h>
#include <stdlib.h>
#include <libpq-fe.h>

int main() {
    // 连接字符串,请根据实际情况修改数据库名、用户名、密码和主机
    PGconn *conn = PQconnectdb("dbname=postgres user=postgres password=secret host=localhost");

    if (PQstatus(conn) != CONNECTION_OK) {
        fprintf(stderr, "连接失败: %s", PQerrorMessage(conn));
        PQfinish(conn);
        exit(1);
    }

    printf("成功连接到 PostgreSQL\n");
    PQfinish(conn);
    return 0;
}(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# gcc -o test_pq test_pq.c -I/workspace/test_gaussdb/gaussdb-506.0-libpq/include -L/workspace/test_gaussdb/gaussdb-506.0-libpq/lib -lpq
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# ./test_pq 
连接失败: could not connect to server: Operation now in progress
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5432?
could not connect to server: Operation now in progress
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# 

下面是GPT-5.3-CODEX的分析

细化分析

  • 崩溃链路是:连接失败后在 generators.py 调用 finish_pgconn(),随后在 errors.py 读取 pgconn 属性快照。
  • 这些属性(如 hostpassword)会走到 pq_ctypes.py,内部通过 _call_bytes() 调 C API(PQhost/PQpass),见 pq_ctypes.py。
  • _call_bytes()c_char_p 返回值会触发 ctypes 解引用;如果 C 层返回的是“非空但无效地址”,Python 来不及抛异常就会直接 SIGSEGV。你看到的 core 正是这一类崩溃。
  • 我做了独立最小复现:在 connect_poll 失败后直接 print(conn.host),无需上层框架也会立刻段错误,所以问题不在异常对象构造本身,而在“失败态 PGconn + PQhost/PQpass 访问”这个组合。
  • 你 C 程序不崩是因为只调用了 PQstatus/PQerrorMessage/PQfinish,没有触发 PQhost/PQpass 路径。

为什么说是 libpq 组合问题

  • 该驱动在 misc.py 用 find_library("pq") 动态找库,实际加载结果受 LD_LIBRARY_PATH、系统库优先级影响。
  • 你当前这套库报告 PQlibVersion = 90204 (9.2.4),但又来自 gaussdb-506.0-libpq 打包目录,说明是“Gauss 发行库 + 当前系统运行时”的特定组合;在这个组合里,失败态访问某些字符串 getter 不安全。

结论边界

  • 我能 100% 确认“崩溃触发点”是 PQhost/PQpass 在失败态 PGconn 上被调用。
  • 我不能在无 libpq 源码/符号的前提下 100% 断言是 libpq 内部 bug 还是 ABI 兼容性问题,但工程上两者都应按“不安全调用路径”处理

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions