Merge TPR simulation function into Development by swaraj-neu · Pull Request #6 · Vitek-Lab/MSstatsResponse

swaraj-neu · 2026-03-20T20:50:39Z

Transfer the TPR simulation function from MSstatsShiny and export it for external use

Motivation and Context

This PR moves the True Positive Rate (TPR) simulation and interactive plotting functionality from MSstatsShiny into MSstatsResponse and exports it for external use. The goal is to provide standalone functions for sweeping experimental designs (varying concentrations and replicates) to estimate detection power (TPR) and to visualize results interactively with plotly.

Short solution summary: two new exported functions were added — run_tpr_simulation(rep_range, n_proteins = 1000) to run grid simulations across concentration counts (2–9) and replicate counts (bounded to max 5), and plot_tpr_power_curve(simulation_results) to produce a two-panel interactive plotly visualization (Strong / Weak interactions). Package metadata and NAMESPACE were updated and man pages for the new functions were added.

Detailed Changes

New R source
- Added R/TPR_Power_Curve.R defining:
  - run_tpr_simulation(rep_range, n_proteins = 1000)
    - Validates rep_range (numeric length-2, min ≤ max), enforces max replicates = 5.
    - Uses hardcoded CONC_MAP for concentration ladders (k = 2..9).
    - Builds grid over replicate levels and k values; calls futureExperimentSimulation() per combination with deterministic seeding.
    - Extracts and returns Hit_Rates_Data filtered to "TPR (Strong)" and "TPR (Weak)" with columns Interaction, TPR, N_rep, NumConcs.
    - Wraps individual runs in tryCatch and warns on failures; errors if all runs fail.
  - plot_tpr_power_curve(simulation_results)
    - Builds ggplot2 line+point panels (NumConcs × TPR) with linetype mapped to replicate count (max 5).
    - Requires plotly (stop() with message if missing), converts ggplots to plotly and composes a 1×2 subplot with shared Y axis and annotations.
    - Uses custom linetype mapping and theme_bw styling.
NAMESPACE
- Exports: run_tpr_simulation, plot_tpr_power_curve
- Added imports: dplyr::if_else, ggplot2::scale_linetype_manual, ggplot2::theme_bw (plus other ggplot2/dplyr functions referenced via roxygen imports in file).
DESCRIPTION
- Added plotly to Suggests.
Documentation (man/)
- Added man/run_tpr_simulation.Rd
- Added man/plot_tpr_power_curve.Rd
Documentation removals
- Removed man pages: ConvertGroupToNumericDose.Rd, DoseResponseFit.Rd, FutureExperimentSimulation.Rd, PredictIC50Parallel.Rd, VisualizeResponseProtein.Rd (these .Rd files are missing in the repo tree).

Unit Tests

No new unit tests were added for run_tpr_simulation() or plot_tpr_power_curve().
Existing test suite references TPR/hit-rate behavior (e.g., test-ExperimentalDesignSimulation.R) but there are no dedicated tests validating:
- run_tpr_simulation argument validation, grid coverage, or error handling.
- Correct filtering and column structure of run_tpr_simulation output.
- plot_tpr_power_curve behavior when plotly is missing or when replicate levels exceed 5.
  Recommendation: add unit tests for input validation, a small deterministic simulation mock (or stub futureExperimentSimulation) to validate output structure/contents, and a test for plot_tpr_power_curve requiring plotly (or using requireNamespace mocking).

Coding Guidelines / Policy Violations

Documentation mismatch: multiple previously present .Rd files were removed (ConvertGroupToNumericDose, DoseResponseFit, FutureExperimentSimulation, PredictIC50Parallel, VisualizeResponseProtein) while their corresponding R implementations remain in source and exports appear unchanged. Exported functions should have corresponding documentation; removing .Rd files for still-exported functions creates a documentation/export inconsistency and violates the guideline that all exported/public functions must be documented. Either restore/regenerate these man pages or remove the associated exports/implementations as appropriate.
Hardcoded plotting limit: run_tpr_simulation enforces max replicates = 5 due to plotting linetype availability. This coupling of simulation input constraints to plotting aesthetics may be surprising; consider decoupling or documenting the rationale clearly.
External dependency handling: plot_tpr_power_curve requires plotly at runtime (requireNamespace check). Because plotly is only in Suggests, callers in non-interactive contexts may encounter runtime errors; acceptable if documented, but tests and examples should account for optional dependency.

…for external use

coderabbitai · 2026-03-20T20:51:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f6b30687-ca8d-4042-a433-5603fb8d2d75

📥 Commits

Reviewing files that changed from the base of the PR and between 5a26bdc and 8b7eaf9.

📒 Files selected for processing (1)

R/TPR_Power_Curve.R

🚧 Files skipped from review as they are similar to previous changes (1)

R/TPR_Power_Curve.R

📝 Walkthrough

Walkthrough

Adds TPR power-curve simulation and interactive plotting: new R/TPR_Power_Curve.R with run_tpr_simulation() and plot_tpr_power_curve(), updates NAMESPACE and DESCRIPTION (adds plotly to Suggests), removes five existing man pages and adds two new man pages for the TPR functions.

Changes

Cohort / File(s)	Summary
Package Configuration `DESCRIPTION`	Added `plotly` to `Suggests`.
Namespace & Imports `NAMESPACE`	Exported `run_tpr_simulation`, `plot_tpr_power_curve`; imported `if_else` (dplyr), `scale_linetype_manual`, `theme_bw` (ggplot2).
TPR Power Curve Implementation `R/TPR_Power_Curve.R`	New file adding `run_tpr_simulation(rep_range, n_proteins=1000)` (sweeps replicate counts and concentration ladder sizes, calls `futureExperimentSimulation`, collects TPR results) and `plot_tpr_power_curve(simulation_results)` (builds ggplot panels, requires `plotly::ggplotly`, combines into interactive 1×2 subplot).
Removed Documentation `man/ConvertGroupToNumericDose.Rd`, `man/DoseResponseFit.Rd`, `man/FutureExperimentSimulation.Rd`, `man/PredictIC50Parallel.Rd`, `man/VisualizeResponseProtein.Rd`	Five man pages deleted (documentation-only removals).
New Documentation `man/run_tpr_simulation.Rd`, `man/plot_tpr_power_curve.Rd`	Added man pages documenting `run_tpr_simulation()` and `plot_tpr_power_curve()` (parameters, usage, return values).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Runner as run_tpr_simulation
    participant Simulator as futureExperimentSimulation
    participant Plotter as plot_tpr_power_curve
    participant GG as ggplot2
    participant Plotly as plotly

    User->>Runner: rep_range, n_proteins
    loop for each (N_rep, NumConcs)
        Runner->>Simulator: invoke simulation (params, template)
        Simulator-->>Runner: Hit_Rates_Data
        Runner->>Runner: filter TPR Strong/Weak, add N_rep & NumConcs
    end
    Runner-->>User: consolidated data.frame

    User->>Plotter: simulation_results
    Plotter->>GG: create Strong panel
    GG-->>Plotter: ggplot (Strong)
    Plotter->>GG: create Weak panel
    GG-->>Plotter: ggplot (Weak)
    Plotter->>Plotly: ggplotly + subplot(1x2)
    Plotly-->>Plotter: interactive subplot
    Plotter-->>User: plotly object (1×2)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇✨ I hopped through ladders, counts, and code,
measuring TPR down each winding road.
Two panels now sparkle with interactive cheer —
Strong and Weak sing what testers want to hear.
— a rabbit celebrating plots and nodes 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Merge TPR simulation function into Development' is vague and generic, using 'Merge' and 'into Development' which are standard version-control operations that don't convey what actually changed.	Use a more descriptive title like 'Add TPR power curve simulation and visualization functions' to clearly indicate the main additions to the codebase.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/tpr-function

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

R/TPR_Power_Curve.R (1)

34-46: Consider simplifying by removing the intermediate sim_args list.

The sim_args list is created and then immediately unpacked in the function call. This adds verbosity without benefit.

♻️ Simplified version

   run_one <- function(n_rep, k_conc, seed = 123) {
     set.seed(seed + n_rep * 100 + k_conc)
     concs_k <- CONC_MAP[[as.character(k_conc)]]
 
-    sim_args <- list(
-      N_proteins = n_proteins,
-      N_rep = n_rep,
-      Concentrations = concs_k,
-      IC50_Prediction = FALSE
-    )
-
-    temp_res <- futureExperimentSimulation(
-      N_proteins = sim_args$N_proteins,
-      N_rep = sim_args$N_rep,
-      Concentrations = sim_args$Concentrations,
-      IC50_Prediction = sim_args$IC50_Prediction
-    )
+    temp_res <- futureExperimentSimulation(
+      N_proteins = n_proteins,
+      N_rep = n_rep,
+      Concentrations = concs_k,
+      IC50_Prediction = FALSE
+    )
     temp_res$Hit_Rates_Data |>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 34 - 46, The sim_args list is unnecessary
boilerplate: remove the sim_args variable and call futureExperimentSimulation
directly with the original variables (n_proteins, n_rep, concs_k, and FALSE) by
passing them to the corresponding parameters N_proteins, N_rep, Concentrations,
and IC50_Prediction in the futureExperimentSimulation call so you only use the
function name futureExperimentSimulation and the original symbols n_proteins,
n_rep, concs_k, and FALSE.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 91-92: The linetype mapping currently uses a fixed vector ltypes
and assigns ltype_values <- setNames(ltypes[seq_along(rep_levels)],
as.character(rep_levels)), which will produce NAs when length(rep_levels) >
length(ltypes); change the assignment to repeat or recycle ltypes to cover all
replicate levels (e.g., generate a vector of length length(rep_levels) by
repeating ltypes with length.out = length(rep_levels) or using rep(ltypes,
length.out = length(rep_levels))) and then call setNames(...) so ltype_values
maps every element of rep_levels to a valid linetype; refer to the variables
ltypes, ltype_values and rep_levels when making the change.
- Around line 26-28: The function run_tpr_simulation currently assumes rep_range
is a two-element integer vector; add input validation at the top of
run_tpr_simulation to check that rep_range is a length-2 numeric (or integer)
vector with no NA/NaN/Inf values, both elements are whole numbers (or can be
safely coerced to integers), and rep_range[1] <= rep_range[2]; if any check
fails, stop with a clear error message mentioning rep_range and expected form
(e.g., "rep_range must be a length-2 integer vector with min <= max"). After
validation, coerce to integers (if needed) before creating rep_grid to avoid
downstream surprises when using rep_grid.

---

Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 34-46: The sim_args list is unnecessary boilerplate: remove the
sim_args variable and call futureExperimentSimulation directly with the original
variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the
corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction
in the futureExperimentSimulation call so you only use the function name
futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k,
and FALSE.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b0cfe25-885d-428f-890c-886741a75f85

📥 Commits

Reviewing files that changed from the base of the PR and between 8739ce4 and 5a26bdc.

📒 Files selected for processing (10)

DESCRIPTION
NAMESPACE
R/TPR_Power_Curve.R
man/ConvertGroupToNumericDose.Rd
man/DoseResponseFit.Rd
man/FutureExperimentSimulation.Rd
man/PredictIC50Parallel.Rd
man/VisualizeResponseProtein.Rd
man/plot_tpr_power_curve.Rd
man/run_tpr_simulation.Rd

💤 Files with no reviewable changes (5)

man/ConvertGroupToNumericDose.Rd
man/VisualizeResponseProtein.Rd
man/PredictIC50Parallel.Rd
man/DoseResponseFit.Rd
man/FutureExperimentSimulation.Rd

R/TPR_Power_Curve.R

tonywu1999 · 2026-03-22T22:09:24Z

R/TPR_Power_Curve.R

+      IC50_Prediction = FALSE
+    )
+
+    temp_res <- futureExperimentSimulation(


I'm a little confused. Users should also pass in the actual dataset and the Protein ID that is considered a strong interaction. But I don't see where this is being passed to the function.

tonywu1999 · 2026-03-22T22:10:50Z

R/TPR_Power_Curve.R

+  "8" = c(0, 1, 10, 30, 100, 300, 1000, 3000),
+  "9" = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000)
+)
+


As discussed with Sarah, this should not be hard coded but rather each subsequent value should be picked based on farthest distance from the log(median) among a user's set of doses, not this hard-coded list.

A user will have any arbitrary set of doses as well (i.e. it's not just 0, 1, 3, 10. You might have 0, 2, 4, 20, ...)

This also means that another input parameter should be the concentrations themselves (e.g. 0,1,3,10,30,100,300,1000,3000) and the dose_range (e.g. c(2,9) being doses 2 through 9).

tonywu1999 · 2026-03-22T22:16:52Z

R/TPR_Power_Curve.R

+    stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.")
+  }
+  if (rep_range[2] > 5) {
+    stop("Maximum replicates is 5 (limited by available line styles for plotting).")


(limited by available line styles for plotting) - if a user sees this, it's not going to make sense. You can remove this part.

I don't think there needs to be a maximum replicate range in this function. It should go inside the plotting function since that's where the plotting limitation

tonywu1999 · 2026-03-22T22:17:00Z

R/TPR_Power_Curve.R

+#' @param n_proteins Integer. Number of proteins to simulate. Default: 1000.
+#'
+#' @return A data.frame with columns: Interaction, TPR, N_rep, NumConcs.
+#'


Can you provide examples here?

tonywu1999 · 2026-03-22T22:19:11Z

R/TPR_Power_Curve.R

+#' Run TPR simulation across a grid of concentration counts and replicate counts
+#'
+#' Sweeps over combinations of dose counts (2-9) and replicate counts,
+#' calling \code{futureExperimentSimulation()} for each combination.


People don't know what TPR simulation means or what it means to sweep over dose counts / replicate counts calling this futureExperimentSimulation function.

Can you be more thorough in a way so that a biologist with minimal coding experience would understand what this function does?

tonywu1999 · 2026-03-22T22:26:02Z

R/TPR_Power_Curve.R

+#'
+#' @importFrom dplyr filter mutate select if_else
+#' @export
+run_tpr_simulation <- function(rep_range, n_proteins = 1000) {


There should be unit tests for this function

tonywu1999 · 2026-03-22T22:28:48Z

R/TPR_Power_Curve.R

+  if (!is.numeric(rep_range) || length(rep_range) != 2 || rep_range[1] > rep_range[2]) {
+    stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.")
+  }
+  if (rep_range[2] > 5) {


Shouldn't this validation be for the difference between max and min? Not the max replicate value.

tonywu1999 · 2026-03-22T22:33:40Z

R/TPR_Power_Curve.R

+  if (!requireNamespace("plotly", quietly = TRUE)) {
+    stop("Package 'plotly' is required for interactive plots. Please install it.")


We don't need this because plotly should already be a dependency of this package, meaning users will have it automatically installed when they install MSstatsResponse

tonywu1999 · 2026-03-22T22:34:22Z

R/TPR_Power_Curve.R

+  }
+  ltype_values <- setNames(ltypes[seq_along(rep_levels)], as.character(rep_levels))
+
+  make_panel <- function(data, color, show_legend = FALSE) {


Put this outside of the plotting function instead of inside it and rename it so that it's clear what it's doing, aka plotting the TPR curves using ggplot2/plotly.

tonywu1999 · 2026-03-22T22:36:32Z

man/VisualizeResponseProtein.Rd

@@ -1,93 +0,0 @@
-% Generated by roxygen2: do not edit by hand


All of the documentation files in the man folder should NOT be deleted. Can you fix it so that devtools::document() does not remove these files?

tonywu1999 · 2026-03-23T12:52:47Z

R/TPR_Power_Curve.R

+  k_grid <- sort(unique(simulation_results$NumConcs))
+  rep_levels <- sort(unique(simulation_results$N_rep))
+
+  ltypes <- c("dotted", "dotdash", "dashed", "longdash", "solid")


On second thought - I thought we discussed doing a color gradient instead? In that case, 5 replicate validation is not needed either.

Transfer the TPR simulation function from MSstatsShiny and export it …

5a26bdc

…for external use

swaraj-neu requested a review from tonywu1999 March 20, 2026 20:50

swaraj-neu self-assigned this Mar 20, 2026

swaraj-neu added the enhancement New feature or request label Mar 20, 2026

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

R/TPR_Power_Curve.R Show resolved Hide resolved

R/TPR_Power_Curve.R Show resolved Hide resolved

Resolve TPR function nitpicks

8b7eaf9

tonywu1999 reviewed Mar 22, 2026

View reviewed changes

tonywu1999 reviewed Mar 23, 2026

View reviewed changes

Rollback the documentation files from dev branch

5d87188

		if (!requireNamespace("plotly", quietly = TRUE)) {
		stop("Package 'plotly' is required for interactive plots. Please install it.")

		@@ -1,93 +0,0 @@
		% Generated by roxygen2: do not edit by hand

Conversation

swaraj-neu commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Detailed Changes

Unit Tests

Coding Guidelines / Policy Violations

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

swaraj-neu commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading