Rtoolset provides a collection of utility functions for common tasks in R programming and data analysis. These functions help streamline workflows, improve code readability, and handle common programming challenges.
Overview
This guide covers utility functions organized by category:
- String Matching: Fuzzy string matching for handling typos and variations
- Formatting & Printing: Professional formatting for reports and publications
- File & Directory Management: Safe file operations and directory creation
- Function Utilities: Helper functions for debugging and data structures
- Package Management: Unified interface for installing packages from multiple sources
String Matching
closestMatch()
Find the closest match for a string using Levenshtein distance. Useful for matching gene names, IDs, or any strings with potential typos.
library(Rtoolset)
# Find closest match
result <- closestMatch("DFNB31", c("DNMT1", "DTNBP1", "IFNB1"))
result
#> [[1]]
#> [1] "DTNBP1" "IFNB1"
#>
#> [[2]]
#> [1] 2 3The function returns a list with: - Matched strings (may be multiple if tied) - Indices of matches in the vector
Use Cases: - Matching gene names with typos - Finding similar identifiers - Fuzzy string matching
Formatting & Printing
print_pval()
Format p-values for readability in reports and publications.
# Format p-values
print_pval(c(0.0101, 0.00003, 0.00000001))
#> [1] "0.0101" "3.00e-05" "1.00e-08"Options: - threshold: P-values below
this are shown in scientific notation (default: 1e-4) -
rm_tailing0: Remove trailing zeros (default: TRUE)
# Custom threshold
print_pval(c(0.0101, 0.00003), threshold = 1e-3)
#> [1] "0.0101" "3.00e-05"
# Keep trailing zeros
print_pval(c(0.0101, 0.00003), rm_tailing0 = FALSE)
#> [1] "0.0101" "3.00e-05"vec2print()
Print vectors in copy-paste friendly format for different languages.
# R format (default)
vec2print(c("Apple", "Orange", "Banana"))
#> c("Apple", "Orange", "Banana")
# Python format
vec2print(c("Apple", "Orange", "Banana"), lang = "python")
#> ["Apple", "Orange", "Banana"]
# Shell format
vec2print(c("Apple", "Orange", "Banana"), lang = "shell")
#> ("Apple" "Orange" "Banana")Use Cases: - Quick code generation - Copying vectors between languages - Creating configuration files
File & Directory Management
save2pdf()
Save plots to PDF with automatic file management and append capability.
# Save a plot
save2pdf(
file = "my_plot.pdf",
width = 6,
height = 6,
overwrite = TRUE,
plot_code = quote({
plot(1:10, 1:10, main = "My Plot")
})
)
# Append to existing file
save2pdf(
file = "my_plot.pdf",
width = 6,
height = 6,
overwrite = FALSE, # Don't overwrite
append = TRUE, # Append instead
plot_code = quote({
plot(10:1, 1:10, main = "Second Plot")
})
)Features: - Auto-creates directories - Prevents
accidental overwrites (creates numbered files) - Can append multiple
plots to one PDF - Supports all pdf() arguments (width,
height, etc.)
Function Utilities
extract_params()
Extract default parameters from functions for debugging.
# Define a function
my_function <- function(x, n = 100, seed = 2024, verbose = FALSE) {
# function body
}
# Extract defaults
extract_params(my_function, verbose = TRUE)Use Cases: - Quick debugging - Understanding function defaults - Documentation
createNamedList()
Create a named list from a character vector.
# Create named list
my_list <- createNamedList(c("x", "y", "z"))
my_list
#> $x
#> NULL
#>
#> $y
#> NULL
#>
#> $z
#> NULL
# Populate it
my_list$x <- 1
my_list$y <- 2
my_list$z <- 3
my_list
#> $x
#> [1] 1
#>
#> $y
#> [1] 2
#>
#> $z
#> [1] 3Use Cases: - Initializing lists with known names - Creating structured data containers
Package Management
install_packages()
Install packages from CRAN, Bioconductor, or GitHub with a unified interface.
# Install from CRAN
install_packages(c("dplyr", "ggplot2"))
# Install from Bioconductor
install_packages("SummarizedExperiment", repo = "bioc")
# Install from GitHub
install_packages("rstudio/gt", repo = "github")
# With options
install_packages(
c("dplyr", "ggplot2"),
dependencies = TRUE,
force = FALSE
)Features: - Skips already installed packages (unless
force = TRUE) - Handles dependencies automatically - Works
with CRAN, Bioconductor, and GitHub
check_packages()
Check if packages are installed.
# Check packages
check_packages(c("dplyr", "ggplot2", "nonexistent"))Returns a logical vector indicating which packages are installed.
Use Cases
1. Gene Name Matching
When working with gene names that may have typos or variations:
# Match gene names with potential typos
gene_names <- c("TP53", "BRCA1", "EGFR", "MYC")
query <- "TP5" # Typo
match_result <- closestMatch(query, gene_names)
match_result2. Publication-Ready Tables
Format p-values for manuscripts and reports:
# Format p-values for a results table
p_values <- c(0.045, 0.0001, 0.0000005, 0.23)
formatted <- print_pval(p_values)
data.frame(
Gene = c("Gene1", "Gene2", "Gene3", "Gene4"),
P_value = formatted
)Tips and Best Practices
-
String Matching: Use
closestMatch()when exact matches might fail due to typos -
P-values: Always use
print_pval()for publication-ready formatting -
File Management: Use
mkdir()andsave2pdf()to avoid file system errors -
Package Installation: Use
install_packages()for consistent installation across sources -
Error Handling: Functions like
mkdir()are safe to call multiple times without errors - Performance: Caching is enabled for faster vignette rebuilds
See Also
- Function reference:
?closestMatch,?print_pval,?vec2print,?mkdir,?save2pdf - Full documentation website