Skip to content

humanlab/MAQuA-IRT-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAQuA Multi-Outcome Adaptive Questionnaire

This repository contains the runnable pipeline used for the MAQuA experiments. The code is organized as a staged workflow where each stage writes outputs used by the next stage.

Data is private and not distributed in this repository.


1) Environment Setup

Create environment:

conda env create -f environment.yml
conda activate maqua

If the env already exists, update it:

conda env update -n maqua -f environment.yml --prune

Verify CUDA-enabled PyTorch:

/chronos_data/conda_envs/maqua/bin/python -c "import torch; print('torch', torch.__version__); print('cuda_compiled', torch.version.cuda); print('cuda_available', torch.cuda.is_available())"

If CUDA is still unavailable, force CUDA build:

conda install -n maqua -c pytorch -c nvidia pytorch pytorch-cuda=12.4

Optional dependencies (only if needed):

pip install vllm
pip install --prefer-binary llama-cpp-python

2) Required Input Data

At minimum, place these files under formatted_data/:

  • question_embeddings.csv
  • input_dataset_words.csv
  • input_dataset_essay.csv
  • output_questionnaire_outcomes.csv
  • questions_symptom_outcomes_type.csv

Recommended:

  • diagnoses.csv (needed for diagnosis correlations)

Optional private input:

  • private_data/transcripts_with_code.csv (used by GPT/Qwen scoring scripts)

3) Fastest Way To Run (Recommended)

From repo root:

bash optional/sh/run_minimal_pipeline.sh

Preview commands only:

bash optional/sh/run_minimal_pipeline.sh --dry-run

Include optional plots:

bash optional/sh/run_minimal_pipeline.sh --with-plots

Include user-level reports (point-biserial + Table-A-aligned Pearson/MSE):

bash optional/sh/run_minimal_pipeline.sh --with-user-level-report

Parallel singletask shards (faster ALBA stage):

bash optional/sh/run_minimal_pipeline.sh --singletask-jobs 3

Parallel singletask shards across multiple GPUs (round-robin by shard):

MAQUA_PYTHON=/chronos_data/conda_envs/maqua/bin/python \
  bash optional/sh/run_minimal_pipeline.sh --singletask-jobs 3 --singletask-gpus 0,1,2

Notes:

  • --singletask-jobs N controls concurrent singletask shards.
  • --singletask-gpus 0,1,2 binds shards to those GPUs using CUDA_VISIBLE_DEVICES.
  • If jobs exceed listed GPUs, shard-to-GPU assignment is round-robin.
  • --with-user-level-report runs user_level_correlations.py and writes user-level report CSVs to analysis_outputs/.

Use a specific Python interpreter:

MAQUA_PYTHON=/chronos_data/conda_envs/maqua/bin/python \
  bash optional/sh/run_minimal_pipeline.sh --dry-run

Run on a specific GPU (example: third GPU, index 2):

CUDA_VISIBLE_DEVICES=2 \
MAQUA_PYTHON=/chronos_data/conda_envs/maqua/bin/python \
  bash optional/sh/run_minimal_pipeline.sh

Check GPU binding quickly:

CUDA_VISIBLE_DEVICES=2 /chronos_data/conda_envs/maqua/bin/python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count(), torch.cuda.get_device_name(0))"

4) Manual Stage-By-Stage Run

Stage 1: Model training and QA outputs

python train_user_level.py
python run_qa_level_save_outputs.py --strategy all_questions

Notes:

  • train_user_level.py covers user-level multitask and singletask (depending on args/defaults).
  • run_qa_level_save_outputs.py --strategy all_questions is QA-level multitask output generation.

Optional single-task QA:

python optional/python/run_qa_level_singletask.py --outcome PHQ --strategy all_questions

The command above is QA-level singletask and is separate from run_qa_level_save_outputs.py.

Stage 2: Discretize

python discretize_question_scores.py

Stage 3: Adaptive testing

LATEST_DISCRETIZED_DIR=$(ls -td polytomized_data/*_discretized | head -n 1)
python adaptive_testing.py --input-dir "$LATEST_DISCRETIZED_DIR" --run-both

Stage 4: Analysis tables

python compute_tables.py
python diagnosis_correlations.py

Stage 5: User-level report (optional but recommended)

python user_level_correlations.py --model both --metrics both

This generates:

  • analysis_outputs/user_level_multitask_pointbiserial.csv
  • analysis_outputs/user_level_singletask_pointbiserial.csv
  • analysis_outputs/user_level_pointbiserial_all.csv
  • analysis_outputs/user_level_multitask_regression.csv
  • analysis_outputs/user_level_singletask_regression.csv
  • analysis_outputs/user_level_regression_all.csv

Metric notes:

  • pointbiserial: diagnosis label (0/1) vs predicted score correlation.
  • regression: user-level Pearson and MSE between true questionnaire scores and predicted scores (same metric family as Table A/ALBA).

6) Optional Plots / Figure 2

Adaptive plots:

python plot_adaptive_results.py \
  --adaptive-dir "adaptive_outputs/$(basename "$LATEST_DISCRETIZED_DIR")" \
  --input-dir "$LATEST_DISCRETIZED_DIR"

Figure 2 correlation plots:

LATEST_QA_DIR=$(python - <<'PY'
import glob, json
from pathlib import Path

candidates = []

def is_complete(d: Path, n_folds: int) -> bool:
  if not (d / 'qa_level_test_outputs.csv').exists():
    return False
  for i in range(n_folds):
    if not (d / f'fold_{i}' / 'qa_level_test_predictions.csv').exists():
      return False
  return True

for pattern in ('model_outputs/qa_level_outputs_*_*', 'model_outputs/all_questions_qa_level_outputs_*'):
  for path in glob.glob(pattern):
    d = Path(path)
    if not d.is_dir():
      continue
    summary = d / 'cross_validation_summary.json'
    if not summary.exists():
      summary = d / 'overall_summary.json'
    if not summary.exists():
      summary = d / 'config.json'
    if not summary.exists():
      continue
    try:
      with summary.open() as f:
        info = json.load(f)
    except Exception:
      continue
    if info.get('question_strategy') == 'all_questions':
      try:
        n_folds = int(info.get('n_folds', 9))
      except Exception:
        continue
      if n_folds <= 0:
        continue
      if not is_complete(d, n_folds):
        continue
      candidates.append((d.stat().st_mtime, str(d)))

if not candidates:
  raise SystemExit('No complete QA output directory found for strategy=all_questions')

candidates.sort(key=lambda x: x[0])
print(candidates[-1][1])
PY)
python plot_adaptive_correlations.py \
  --adaptive-dir "adaptive_outputs/$(basename "$LATEST_DISCRETIZED_DIR")" \
  --input-dir "$LATEST_DISCRETIZED_DIR" \
  --qa-dir "$LATEST_QA_DIR" \
  --rolling-window 5 --std-threshold 0.01

7) Main Output Locations

  • model_outputs/ (stage 1)
  • polytomized_data/ (stage 2)
  • adaptive_outputs/ (stage 3)
  • analysis_outputs/ (stage 4 and plots)

8) Optional LLM Baselines

GPT baseline:

python gpt4_adaptive_baseline.py --model gpt-4o --out-csv gpt4_predictions.csv

Self-hosted OpenAI-compatible endpoint example:

python gpt4_adaptive_baseline.py --model gpt-oss-20b --base-url http://localhost:8000/v1 --api-key dummy --api-mode chat

Qwen scoring (optional):

python optional/python/run_qwen_scores.py

9) Reproduction Mapping

For table/figure-to-script mapping, see:

  • docs/repro_checklist.md
  • docs/run_with_formatted_and_private.md

9) Notes

  • Paths are centralized in maqua_paths.py.
  • Set MAQUA_ROOT if running from outside repo root.
  • Legacy and experimental scripts are in legacy/.

About

Code for the framework MAQuA described in the paper "MAQUA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response Theory" (EACL 2026)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors