Utility Functions for DNA Methylation Analysis
Source:vignettes/dnam-workflow.Rmd
dnam-workflow.RmdThis vignette demonstrates DNA methylation data preprocessing using Rtoolset functions.
Overview
DNA methylation data is typically represented as beta values (ranging from 0 to 1), but many statistical analyses require M values (logit-transformed beta values) for better statistical properties.
The beta2M() function in Rtoolset provides a simple and
efficient way to convert beta values to M values, handling edge cases (0
and 1) and preserving data structure.
Beta to M Value Conversion
Basic Usage
The beta2M() function converts beta values to M
values:
Why Convert to M Values?
M values have several advantages for statistical analysis:
- Better distribution: M values are approximately normally distributed
- Improved variance: Variance is more stable across the range
- Better for linear models: More suitable for regression and differential analysis
Complete Workflow Example
library(Rtoolset)
# 1. Load beta values (e.g., from Illumina array)
beta_data <- read.csv("methylation_beta_values.csv", row.names = 1)
# 2. Convert to M values
M_data <- beta2M(beta_data)
# 3. Continue with downstream analysis
# - Differential methylation analysis
# - Clustering
# - Visualization
# - etc.Use Cases
Best Practices
- Check for extreme values: Beta values of exactly 0 or 1 may indicate technical issues
- Alpha parameter: Use default (1e-6) unless you have specific requirements
- Data structure: Function preserves matrix/data.frame structure and names
- Memory: For very large datasets, consider processing in chunks
- Reproducibility: Always document the alpha parameter used in your analysis
- Performance: Caching is enabled for faster vignette rebuilds
Technical Details
Conversion Formula
The conversion from beta to M values uses the logit transformation:
where: - is the beta value (0 to 1) - is a small constant (default: 1e-6) to prevent log(0) - is the resulting M value
Why M Values?
- Normal Distribution: M values are approximately normally distributed, making them suitable for parametric tests
- Stable Variance: Variance is more stable across the methylation range
- Linear Models: Better suited for linear regression and differential analysis
- Symmetric Scale: M values range from to , centered around 0
Integration with Other Packages
M values work well with:
- limma: For differential methylation analysis
- minfi: For Illumina array preprocessing
- ChAMP: For comprehensive methylation analysis
- DSS: For differential analysis of bisulfite sequencing data
See Also
- Function reference:
?beta2M - Full documentation website