Merge TPR simulation function into Development#6
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds TPR power-curve simulation and interactive plotting: new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Runner as run_tpr_simulation
participant Simulator as futureExperimentSimulation
participant Plotter as plot_tpr_power_curve
participant GG as ggplot2
participant Plotly as plotly
User->>Runner: rep_range, n_proteins
loop for each (N_rep, NumConcs)
Runner->>Simulator: invoke simulation (params, template)
Simulator-->>Runner: Hit_Rates_Data
Runner->>Runner: filter TPR Strong/Weak, add N_rep & NumConcs
end
Runner-->>User: consolidated data.frame
User->>Plotter: simulation_results
Plotter->>GG: create Strong panel
GG-->>Plotter: ggplot (Strong)
Plotter->>GG: create Weak panel
GG-->>Plotter: ggplot (Weak)
Plotter->>Plotly: ggplotly + subplot(1x2)
Plotly-->>Plotter: interactive subplot
Plotter-->>User: plotly object (1×2)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
R/TPR_Power_Curve.R (1)
34-46: Consider simplifying by removing the intermediatesim_argslist.The
sim_argslist is created and then immediately unpacked in the function call. This adds verbosity without benefit.♻️ Simplified version
run_one <- function(n_rep, k_conc, seed = 123) { set.seed(seed + n_rep * 100 + k_conc) concs_k <- CONC_MAP[[as.character(k_conc)]] - sim_args <- list( - N_proteins = n_proteins, - N_rep = n_rep, - Concentrations = concs_k, - IC50_Prediction = FALSE - ) - - temp_res <- futureExperimentSimulation( - N_proteins = sim_args$N_proteins, - N_rep = sim_args$N_rep, - Concentrations = sim_args$Concentrations, - IC50_Prediction = sim_args$IC50_Prediction - ) + temp_res <- futureExperimentSimulation( + N_proteins = n_proteins, + N_rep = n_rep, + Concentrations = concs_k, + IC50_Prediction = FALSE + ) temp_res$Hit_Rates_Data |>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/TPR_Power_Curve.R` around lines 34 - 46, The sim_args list is unnecessary boilerplate: remove the sim_args variable and call futureExperimentSimulation directly with the original variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction in the futureExperimentSimulation call so you only use the function name futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k, and FALSE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 91-92: The linetype mapping currently uses a fixed vector ltypes
and assigns ltype_values <- setNames(ltypes[seq_along(rep_levels)],
as.character(rep_levels)), which will produce NAs when length(rep_levels) >
length(ltypes); change the assignment to repeat or recycle ltypes to cover all
replicate levels (e.g., generate a vector of length length(rep_levels) by
repeating ltypes with length.out = length(rep_levels) or using rep(ltypes,
length.out = length(rep_levels))) and then call setNames(...) so ltype_values
maps every element of rep_levels to a valid linetype; refer to the variables
ltypes, ltype_values and rep_levels when making the change.
- Around line 26-28: The function run_tpr_simulation currently assumes rep_range
is a two-element integer vector; add input validation at the top of
run_tpr_simulation to check that rep_range is a length-2 numeric (or integer)
vector with no NA/NaN/Inf values, both elements are whole numbers (or can be
safely coerced to integers), and rep_range[1] <= rep_range[2]; if any check
fails, stop with a clear error message mentioning rep_range and expected form
(e.g., "rep_range must be a length-2 integer vector with min <= max"). After
validation, coerce to integers (if needed) before creating rep_grid to avoid
downstream surprises when using rep_grid.
---
Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 34-46: The sim_args list is unnecessary boilerplate: remove the
sim_args variable and call futureExperimentSimulation directly with the original
variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the
corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction
in the futureExperimentSimulation call so you only use the function name
futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k,
and FALSE.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0b0cfe25-885d-428f-890c-886741a75f85
📒 Files selected for processing (10)
DESCRIPTIONNAMESPACER/TPR_Power_Curve.Rman/ConvertGroupToNumericDose.Rdman/DoseResponseFit.Rdman/FutureExperimentSimulation.Rdman/PredictIC50Parallel.Rdman/VisualizeResponseProtein.Rdman/plot_tpr_power_curve.Rdman/run_tpr_simulation.Rd
💤 Files with no reviewable changes (5)
- man/ConvertGroupToNumericDose.Rd
- man/VisualizeResponseProtein.Rd
- man/PredictIC50Parallel.Rd
- man/DoseResponseFit.Rd
- man/FutureExperimentSimulation.Rd
| IC50_Prediction = FALSE | ||
| ) | ||
|
|
||
| temp_res <- futureExperimentSimulation( |
There was a problem hiding this comment.
I'm a little confused. Users should also pass in the actual dataset and the Protein ID that is considered a strong interaction. But I don't see where this is being passed to the function.
| "8" = c(0, 1, 10, 30, 100, 300, 1000, 3000), | ||
| "9" = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000) | ||
| ) | ||
|
|
There was a problem hiding this comment.
As discussed with Sarah, this should not be hard coded but rather each subsequent value should be picked based on farthest distance from the log(median) among a user's set of doses, not this hard-coded list.
A user will have any arbitrary set of doses as well (i.e. it's not just 0, 1, 3, 10. You might have 0, 2, 4, 20, ...)
This also means that another input parameter should be the concentrations themselves (e.g. 0,1,3,10,30,100,300,1000,3000) and the dose_range (e.g. c(2,9) being doses 2 through 9).
| stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.") | ||
| } | ||
| if (rep_range[2] > 5) { | ||
| stop("Maximum replicates is 5 (limited by available line styles for plotting).") |
There was a problem hiding this comment.
(limited by available line styles for plotting) - if a user sees this, it's not going to make sense. You can remove this part.
I don't think there needs to be a maximum replicate range in this function. It should go inside the plotting function since that's where the plotting limitation
| #' @param n_proteins Integer. Number of proteins to simulate. Default: 1000. | ||
| #' | ||
| #' @return A data.frame with columns: Interaction, TPR, N_rep, NumConcs. | ||
| #' |
There was a problem hiding this comment.
Can you provide examples here?
| #' Run TPR simulation across a grid of concentration counts and replicate counts | ||
| #' | ||
| #' Sweeps over combinations of dose counts (2-9) and replicate counts, | ||
| #' calling \code{futureExperimentSimulation()} for each combination. |
There was a problem hiding this comment.
People don't know what TPR simulation means or what it means to sweep over dose counts / replicate counts calling this futureExperimentSimulation function.
Can you be more thorough in a way so that a biologist with minimal coding experience would understand what this function does?
| #' | ||
| #' @importFrom dplyr filter mutate select if_else | ||
| #' @export | ||
| run_tpr_simulation <- function(rep_range, n_proteins = 1000) { |
There was a problem hiding this comment.
There should be unit tests for this function
| if (!is.numeric(rep_range) || length(rep_range) != 2 || rep_range[1] > rep_range[2]) { | ||
| stop("rep_range must be a numeric vector of length 2 with c(min, max) where min <= max.") | ||
| } | ||
| if (rep_range[2] > 5) { |
There was a problem hiding this comment.
Shouldn't this validation be for the difference between max and min? Not the max replicate value.
| if (!requireNamespace("plotly", quietly = TRUE)) { | ||
| stop("Package 'plotly' is required for interactive plots. Please install it.") |
There was a problem hiding this comment.
We don't need this because plotly should already be a dependency of this package, meaning users will have it automatically installed when they install MSstatsResponse
| } | ||
| ltype_values <- setNames(ltypes[seq_along(rep_levels)], as.character(rep_levels)) | ||
|
|
||
| make_panel <- function(data, color, show_legend = FALSE) { |
There was a problem hiding this comment.
Put this outside of the plotting function instead of inside it and rename it so that it's clear what it's doing, aka plotting the TPR curves using ggplot2/plotly.
| @@ -1,93 +0,0 @@ | |||
| % Generated by roxygen2: do not edit by hand | |||
There was a problem hiding this comment.
All of the documentation files in the man folder should NOT be deleted. Can you fix it so that devtools::document() does not remove these files?
| k_grid <- sort(unique(simulation_results$NumConcs)) | ||
| rep_levels <- sort(unique(simulation_results$N_rep)) | ||
|
|
||
| ltypes <- c("dotted", "dotdash", "dashed", "longdash", "solid") |
There was a problem hiding this comment.
On second thought - I thought we discussed doing a color gradient instead? In that case, 5 replicate validation is not needed either.
Transfer the TPR simulation function from MSstatsShiny and export it for external use
Motivation and Context
This PR moves the True Positive Rate (TPR) simulation and interactive plotting functionality from MSstatsShiny into MSstatsResponse and exports it for external use. The goal is to provide standalone functions for sweeping experimental designs (varying concentrations and replicates) to estimate detection power (TPR) and to visualize results interactively with plotly.
Short solution summary: two new exported functions were added — run_tpr_simulation(rep_range, n_proteins = 1000) to run grid simulations across concentration counts (2–9) and replicate counts (bounded to max 5), and plot_tpr_power_curve(simulation_results) to produce a two-panel interactive plotly visualization (Strong / Weak interactions). Package metadata and NAMESPACE were updated and man pages for the new functions were added.
Detailed Changes
New R source
NAMESPACE
DESCRIPTION
Documentation (man/)
Documentation removals
Unit Tests
Recommendation: add unit tests for input validation, a small deterministic simulation mock (or stub futureExperimentSimulation) to validate output structure/contents, and a test for plot_tpr_power_curve requiring plotly (or using requireNamespace mocking).
Coding Guidelines / Policy Violations