Balanced partition of samples into groups using one numeric column (mean + SD)
Source:R/partition.R
balanced_partition.RdUser-facing wrapper around balance_partition_core.
Computes the best grouping using score_col, and optionally generates
a visualization and/or writes outputs to files.
Usage
balanced_partition(
data,
score_col,
K = NULL,
group_sizes = NULL,
id_col = NULL,
lambda = 1,
B = 50000,
method = c("blocked_permute", "random_assign"),
allow_unequal = TRUE,
seed = NULL,
output_dir = NULL,
file_prefix = "balanced_partition",
output_csv = TRUE,
output_plot = FALSE
)Arguments
- data
A
data.framewith samples in rows.- score_col
Name of the numeric column to balance.
- K
Integer. Number of groups. Ignored if
group_sizesis provided.- group_sizes
Optional integer vector of target group sizes; must sum to
nrow(data).- id_col
Optional sample ID column name. If
NULL, uses row index.- lambda
Numeric >= 0. Weight on SD-balance term. Default
1.- B
Integer >= 1. Number of random tries. Default
50000.- method
"blocked_permute"(default) or"random_assign".- allow_unequal
Logical. If TRUE and
nis not divisible byK, group sizes differ by at most 1. Ignored ifgroup_sizesis provided.- seed
Optional integer seed for reproducibility.
- output_dir
Optional directory path. If provided, outputs are written here.
- file_prefix
Character. Prefix for output filenames.
- output_csv
Logical. If TRUE and
output_diris provided, write CSV of original data plus appended.__groupand.__score.- output_plot
Logical. If TRUE, generate a ggplot object and (if
output_diris provided) write a PNG.
Value
A list with:
assignment: data.frame withsample_id,group,score.group_stats: data.frame with per-groupn,mean,sd.loss: best achieved loss.group_sizes: integer vector of target group sizes used.plot: ggplot object ifoutput_plot = TRUE, else NULL.files: named list of written file paths if any were written, else NULL.params: list of parameters used.