| Title: | Carbon Footprint Estimation for R Computations |
|---|---|
| Description: | Computes the carbon footprint and ecological impact of computational tasks in R. Based on the Green Algorithms framework (Lannelongue et al. 2021, <https://calculator.green-algorithms.org/>), this package provides functions to estimate energy consumption and CO2 emissions from R computations. It includes specialized support for targets pipelines and provides visualization tools for carbon footprint analysis. The package helps researchers and data scientists to measure the environmental impact of their computational work. |
| Authors: | Adrien Taudière [aut, cre, cph] |
| Maintainer: | Adrien Taudière <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.2 |
| Built: | 2026-05-10 08:07:51 UTC |
| Source: | https://github.com/adrientaudiere/greenAlgoR |
greenAlgoR packageCarbon Footprint Estimation for R Computations
The greenAlgoR package provides tools to estimate the carbon footprint
and energy consumption of computational tasks in R. Based on the Green Algorithms
framework developed by Lannelongue et al. (2021), this package helps researchers
and data scientists understand and minimize the environmental impact of their work.
The package includes some internal datasets to work outline. You can access them directly after loading the package (TDP_cpu_internal, carbon_intensity_internal and ref_value_internal). You can also replace default data by overwritting them. By default, data are loaded using functions:
TDP_cpu_internal <- csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CPUs.csv")
carbon_intensity_internal <- csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CI_aggregated.csv")
ref_value_internal <- csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/referenceValues.csv")
ga_footprint: Calculate carbon footprint for individual computations
ga_targets: Calculate carbon footprint for targets pipelines
session_runtime: Compute session runtime and memory usage
Estimate CO2 emissions based on runtime, CPU usage, and memory consumption
Support for different geographical locations with varying carbon intensities
Integration with the targets package for pipeline analysis
Visualization tools for carbon footprint comparisons
Configurable hardware specifications (CPU models, memory, storage)
To get started with greenAlgoR, try:
# Basic usage - estimate footprint of a 12-hour computation result <- ga_footprint(runtime_h = 12, location_code = "WORLD") # For your current R session session_footprint <- ga_footprint(runtime_h = "session") # For targets pipelines (in a targets project) targets_footprint <- ga_targets()
Adrien Taudière [email protected]
Lannelongue, L., Grealey, J., Inouye, M. (2021). Green Algorithms: Quantifying the Carbon Footprint of Computation. Advanced Science, 8(12), 2100707. doi:10.1002/advs.202100707
Helper function to download and parse CSV data from the Green Algorithms project repositories. This function handles the specific format used by Green Algorithms data files, which often have headers in specific rows.
csv_from_url_ga(url, remove_first_line = TRUE)csv_from_url_ga(url, remove_first_line = TRUE)
url |
Character string with URL to a raw CSV file from Green Algorithms repository |
remove_first_line |
Logical (default TRUE). Whether to remove the first line from the CSV file (often contains metadata rather than column headers). |
A data.frame with properly formatted column names and data
Adrien Taudière
## Not run: # Download carbon intensity data carbon_intensity <- csv_from_url_ga( paste0( "https://raw.githubusercontent.com/GreenAlgorithms/GA-data", "/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CI_aggregated.csv" ) ) head(carbon_intensity) ## End(Not run)## Not run: # Download carbon intensity data carbon_intensity <- csv_from_url_ga( paste0( "https://raw.githubusercontent.com/GreenAlgorithms/GA-data", "/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CI_aggregated.csv" ) ) head(carbon_intensity) ## End(Not run)
Please cite Lannelongue, L., Grealey, J., Inouye, M., Green Algorithms: Quantifying the Carbon Footprint of Computation. Adv. Sci. 2021, 2100707. https://doi.org/10.1002/advs.202100707
Default value are from https://github.com/GreenAlgorithms/green-algorithms-tool:
PUE: https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/default_PUE.csv
TDP_per_core: https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CPUs.csv
power_draw_per_gb: https://onlinelibrary.wiley.com/doi/10.1002/advs.202100707
Description of the algorithm from the green-algorithms website:
"""
The carbon footprint is calculated by estimating the energy draw of the algorithm and the carbon intensity of producing this energy at a given location:
Where the energy needed is:
The power draw for the computing cores depends on the model and number of cores, while the memory power draw only depends on the size of memory available. The usage factor corrects for the real core usage (default is 1, i.e. full usage). The PUE (Power Usage Effectiveness) measures how much extra energy is needed to operate the data centre (cooling, lighting etc.).
The PSF (Pragmatic Scaling Factor) is used to take into account multiple identical runs (e.g. for testing or optimisation).
The Carbon Intensity depends on the location and the technologies used to produce electricity. But note that the "energy needed" [...] is independent of the location.
"""
ga_footprint( runtime_h = NULL, location_code = "WORLD", PUE = 1.67, TDP_per_core = 12, n_cores = 1, cpu_model = "Any", memory_ram = NULL, power_draw_per_gb = 0.3725, PSF = 1, usage_core = 1, add_ref_values = TRUE, add_storage_estimation = FALSE, mass_storage = NULL, carbon_intensity = NULL, TDP_cpu = NULL, ref_value = NULL )ga_footprint( runtime_h = NULL, location_code = "WORLD", PUE = 1.67, TDP_per_core = 12, n_cores = 1, cpu_model = "Any", memory_ram = NULL, power_draw_per_gb = 0.3725, PSF = 1, usage_core = 1, add_ref_values = TRUE, add_storage_estimation = FALSE, mass_storage = NULL, carbon_intensity = NULL, TDP_cpu = NULL, ref_value = NULL )
runtime_h |
Runtime in hours (numeric). Use a positive number for
explicit runtime, or "session" to automatically calculate based on
current R session time using |
location_code |
Character string specifying geographical location for carbon intensity. Available options include country codes (e.g., "FR", "US", "CN") or "WORLD" for global average. See the Green Algorithms database for complete list of supported locations. |
PUE |
Power Usage Effectiveness (numeric, default 1.67). Measures data center efficiency - how much extra energy is needed for cooling, lighting, etc. Use 1.05 for personal computers, 1.2-1.7 for data centers. See https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/default_PUE.csv |
TDP_per_core |
Thermal Design Power per core in Watts (numeric, default 12).
CPU power consumption per core. Find values at https://www.techpowerup.com/cpu-specs/
or http://calculator.green-algorithms.org/. Overridden by |
n_cores |
Number of CPU cores (integer, default 1).
Overridden by |
cpu_model |
Character string specifying exact CPU model. Must match entries
in the Green Algorithms database. When specified, automatically sets
|
memory_ram |
RAM memory in GB (numeric). If NULL, attempts to detect
automatically using |
power_draw_per_gb |
Power consumption per GB of RAM in Watts (numeric, default 0.3725). |
PSF |
Pragmatic Scaling Factor (numeric, default 1). Accounts for multiple runs of the same computation. As noted by Lannelongue et al. (2021): "computations are rarely performed only once" - use values > 1 to account for repeated runs, parameter sweeps, or iterative development. GHG emissions are multiplied." |
usage_core |
(int, default 1). The usage factor corrects for the real core usage (default is 1, i.e. full usage). |
add_ref_values |
(logical, default TRUE) Do we compute and return reference values to compare to your footprint ? |
add_storage_estimation |
(logical, default FALSE) Do we compute the footprint of mass storage ? By default FALSE because it is far less important than cpu and memory usage. Note that green-algorithms original tool do not compute mass storage usage. |
mass_storage |
(int. in GB, default NULL) The size of the mass_storage.
Only used if add_storage_estimation is set to TRUE. If set to NULL, use
the |
carbon_intensity |
(default NULL). Advanced users only.
A dataframe with |
TDP_cpu |
(default NULL). Advanced users only.
A dataframe with |
ref_value |
(default NULL). Advanced users only.
A dataframe with |
A list of values
runtime_h: the input run time in hours
location_code: the input location code
TDP_per_core: the input TDP_per_core (if cpu_model is set, correspond to
the TDP_per_core for this cpu)
n_cores: the input n_cores (if cpu_model is set, correspond to
the n_cores for this cpu)
cpu_model: the input cpu model. If set to "Any", TDP_per_core and ncore are used
memory_ram: the input memory ram in GB
power_draw_per_gb: the input power draw per GB
usage_core: the input usage core
carbon_intensity: the input carbon intensity (depend on location code)
PUE: the input PUE
PSF: the input PUE
power_draw_for_cores_kWh: the output power draw for cores in kWh
power_draw_for_memory_kWh: the output power draw for RAM memory in kWh
energy_needed_kWh: the output energy needed in kWh
carbon_footprint_cores: the output carbon footprint in grams of CO2 for
cores usage
carbon_footprint_memory: the output carbon footprint in grams of CO2 for
memory usage
carbon_footprint_total_gCO2: the total output carbon footprint in grams of CO2
ref_value: (optionnal, return if add_ref_values is TRUE) : a dataframe
power_draw_storage_kWh: (optionnal, return if add_storage_estimation is TRUE)
the output power draw for mass storage in kWh
Adrien Taudière
# Basic usage with explicit parameters result <- ga_footprint( runtime_h = 2, n_cores = 4, TDP_per_core = 15, memory_ram = 16, location_code = "WORLD" ) result$carbon_footprint_total_gCO2 # Using specific CPU model (automatically sets cores and TDP) ga_footprint( runtime_h = 1, cpu_model = "Core i5-9600KF", location_code = "FR" ) # Calculate footprint for current R session ga_footprint(runtime_h = "session") # Compare different locations locations <- c("WORLD", "FR", "US", "NO") sapply(locations, function(loc) { ga_footprint(runtime_h = 1, location_code = loc)$carbon_footprint_total_gCO2 }) # Advanced usage with storage estimation and reference values res_ga <- ga_footprint( runtime_h = 4, n_cores = 8, memory_ram = 32, add_storage_estimation = TRUE, add_ref_values = TRUE ) ggplot(res_ga$ref_value, aes(y = variable, x = as.numeric(value), fill = log10(prop_footprint))) + geom_col() + geom_col(data = data.frame( variable = "Total", value = res_ga$carbon_footprint_total_gCO2 ), fill = "grey30") + geom_col(data = data.frame( variable = "Cores", value = res_ga$carbon_footprint_cores ), fill = "darkred") + geom_col(data = data.frame( variable = "Memory", value = res_ga$carbon_footprint_memory ), fill = "orange") + geom_col(data = data.frame( variable = "Mass storage", value = res_ga$carbon_footprint_storage ), fill = "violet") + scale_x_continuous( trans = "log1p", breaks = c(0, 10^c(1:max(log1p(as.numeric(res_ga$ref_value$value))))) ) + geom_vline( xintercept = res_ga$carbon_footprint_total_gCO2, col = "grey30", lwd = 1.2 ) + geom_label(aes(label = round_conditionaly(prop_footprint)), fill = "grey90", position = position_stack(vjust = 1.1) ) + labs( title = "Carbon footprint of the analysis", subtitle = paste0( "(", res_ga$carbon_footprint_total_gCO2, " g CO2", ")" ), caption = "Please cite Lannelongue et al. 2021 (10.1002/advs.202100707)" ) + xlab("Carbon footprint (g CO2) in log10") + ylab("Modality") + theme(legend.position = "none")# Basic usage with explicit parameters result <- ga_footprint( runtime_h = 2, n_cores = 4, TDP_per_core = 15, memory_ram = 16, location_code = "WORLD" ) result$carbon_footprint_total_gCO2 # Using specific CPU model (automatically sets cores and TDP) ga_footprint( runtime_h = 1, cpu_model = "Core i5-9600KF", location_code = "FR" ) # Calculate footprint for current R session ga_footprint(runtime_h = "session") # Compare different locations locations <- c("WORLD", "FR", "US", "NO") sapply(locations, function(loc) { ga_footprint(runtime_h = 1, location_code = loc)$carbon_footprint_total_gCO2 }) # Advanced usage with storage estimation and reference values res_ga <- ga_footprint( runtime_h = 4, n_cores = 8, memory_ram = 32, add_storage_estimation = TRUE, add_ref_values = TRUE ) ggplot(res_ga$ref_value, aes(y = variable, x = as.numeric(value), fill = log10(prop_footprint))) + geom_col() + geom_col(data = data.frame( variable = "Total", value = res_ga$carbon_footprint_total_gCO2 ), fill = "grey30") + geom_col(data = data.frame( variable = "Cores", value = res_ga$carbon_footprint_cores ), fill = "darkred") + geom_col(data = data.frame( variable = "Memory", value = res_ga$carbon_footprint_memory ), fill = "orange") + geom_col(data = data.frame( variable = "Mass storage", value = res_ga$carbon_footprint_storage ), fill = "violet") + scale_x_continuous( trans = "log1p", breaks = c(0, 10^c(1:max(log1p(as.numeric(res_ga$ref_value$value))))) ) + geom_vline( xintercept = res_ga$carbon_footprint_total_gCO2, col = "grey30", lwd = 1.2 ) + geom_label(aes(label = round_conditionaly(prop_footprint)), fill = "grey90", position = position_stack(vjust = 1.1) ) + labs( title = "Carbon footprint of the analysis", subtitle = paste0( "(", res_ga$carbon_footprint_total_gCO2, " g CO2", ")" ), caption = "Please cite Lannelongue et al. 2021 (10.1002/advs.202100707)" ) + xlab("Carbon footprint (g CO2) in log10") + ylab("Modality") + theme(legend.position = "none")
Calculates the total carbon footprint of a targets pipeline by analyzing
the metadata from completed targets. This function is a wrapper around
ga_footprint() that automatically extracts runtime and storage information
from the targets metadata and computes the cumulative environmental impact.
The function aggregates:
Total runtime across all targets
Memory usage patterns (when storage estimation is enabled)
Hardware specifications you provide
ga_targets( names_targets = NULL, targets_only = TRUE, complete_only = FALSE, store = targets::tar_config_get("store"), tar_meta_raw = NULL, ... )ga_targets( names_targets = NULL, targets_only = TRUE, complete_only = FALSE, store = targets::tar_config_get("store"), tar_meta_raw = NULL, ... )
names_targets |
Character vector of target names to include in analysis.
If NULL (default), analyzes all available targets. See |
targets_only |
Logical (default TRUE). Whether to analyze only actual targets or also include metadata on functions and other global objects. |
complete_only |
Logical (default FALSE). Whether to return only targets with complete metadata (no NA values in critical fields). |
store |
Character string, path to the targets data store.
See |
tar_meta_raw |
Optional data.frame. If provided, uses this metadata directly
instead of calling |
... |
Additional arguments passed to
|
A list with the same structure as ga_footprint().
See ?ga_footprint for complete details on return values.
Adrien Taudière
## Not run: # Basic usage in a targets project directory pipeline_footprint <- ga_targets() # With specific hardware configuration pipeline_footprint <- ga_targets( location_code = "FR", n_cores = 4, memory_ram = 16, PUE = 1.2 ) # Analyze specific targets only pipeline_footprint <- ga_targets( names_targets = c("data_prep", "model_fit", "results"), add_storage_estimation = TRUE ) ## End(Not run) # The next exemple emulate a mini-targets before to ask for tar_meta tar_dir({ # tar_dir() runs code from a temp dir for CRAN. tar_script( { list( tar_target( name = waiting, command = Sys.sleep(2), description = "Sleep 2 seconds" ), tar_target(x, writeLines( targets::tar_option_get("error"), "error.txt" )) ) }, ask = FALSE ) tar_make() tm <- tar_meta() res_gat <- ga_targets( tar_meta_raw = tm, n_cores = 6, TDP_per_core = 15.8, location_code = "FR", PUE = 2, add_storage_estimation = TRUE ) ggplot(res_gat$ref_value, aes( y = reorder(variable, as.numeric(value)), x = as.numeric(value), fill = log10(prop_footprint) )) + geom_col() + geom_col(data = data.frame( variable = "Total ", value = res_gat$carbon_footprint_total_gCO2 ), fill = "grey30") + geom_col( data = data.frame( variable = "Cores", value = res_gat$carbon_intensity * res_gat$power_draw_for_cores_kWh ), fill = "darkred" ) + geom_col( data = data.frame( variable = "Memory", value = res_gat$carbon_intensity * res_gat$power_draw_for_memory_kWh ), fill = "orange" ) + geom_col( data = data.frame( variable = "Storage", value = res_gat$carbon_intensity * res_gat$power_draw_per_gb ), fill = "violet" ) + scale_x_continuous(trans = "log1p") + geom_vline( xintercept = res_gat$carbon_footprint_total_gCO2, col = "grey30", lwd = 1.2 ) + geom_label(aes(label = round(prop_footprint, 1)), fill = "grey90") + xlab("g CO^2") + ylab("Modality") })## Not run: # Basic usage in a targets project directory pipeline_footprint <- ga_targets() # With specific hardware configuration pipeline_footprint <- ga_targets( location_code = "FR", n_cores = 4, memory_ram = 16, PUE = 1.2 ) # Analyze specific targets only pipeline_footprint <- ga_targets( names_targets = c("data_prep", "model_fit", "results"), add_storage_estimation = TRUE ) ## End(Not run) # The next exemple emulate a mini-targets before to ask for tar_meta tar_dir({ # tar_dir() runs code from a temp dir for CRAN. tar_script( { list( tar_target( name = waiting, command = Sys.sleep(2), description = "Sleep 2 seconds" ), tar_target(x, writeLines( targets::tar_option_get("error"), "error.txt" )) ) }, ask = FALSE ) tar_make() tm <- tar_meta() res_gat <- ga_targets( tar_meta_raw = tm, n_cores = 6, TDP_per_core = 15.8, location_code = "FR", PUE = 2, add_storage_estimation = TRUE ) ggplot(res_gat$ref_value, aes( y = reorder(variable, as.numeric(value)), x = as.numeric(value), fill = log10(prop_footprint) )) + geom_col() + geom_col(data = data.frame( variable = "Total ", value = res_gat$carbon_footprint_total_gCO2 ), fill = "grey30") + geom_col( data = data.frame( variable = "Cores", value = res_gat$carbon_intensity * res_gat$power_draw_for_cores_kWh ), fill = "darkred" ) + geom_col( data = data.frame( variable = "Memory", value = res_gat$carbon_intensity * res_gat$power_draw_for_memory_kWh ), fill = "orange" ) + geom_col( data = data.frame( variable = "Storage", value = res_gat$carbon_intensity * res_gat$power_draw_per_gb ), fill = "violet" ) + scale_x_continuous(trans = "log1p") + geom_vline( xintercept = res_gat$carbon_footprint_total_gCO2, col = "grey30", lwd = 1.2 ) + geom_label(aes(label = round(prop_footprint, 1)), fill = "grey90") + xlab("g CO^2") + ylab("Modality") })
Applies different rounding rules based on the magnitude of values. Larger values are rounded to fewer decimal places, while smaller values retain more precision. This is useful for presenting results with appropriate precision across different scales.
round_conditionaly( vec, cond = cbind(c(1e-05, 5), c(0.001, 3), c(0.01, 3), c(1, 2), c(10, 1), c(100, 0)) )round_conditionaly( vec, cond = cbind(c(1e-05, 5), c(0.001, 3), c(0.01, 3), c(1, 2), c(10, 1), c(100, 0)) )
vec |
A numeric vector to be rounded |
cond |
A matrix with 2 rows and n columns where:
The function automatically sorts conditions in decreasing order of thresholds. Default provides reasonable rounding for most carbon footprint values. |
A numeric vector of the same length as vec with values rounded
according to the conditional rules
Adrien Taudière
# Default rounding behavior values <- c(1000.27890, 10.87988, 1.769869, 0.99796, 0.000179) round_conditionaly(values) # Custom rounding rules custom_rules <- cbind(c(10e-5, 5), c(10, 2)) # 5 decimals for tiny values, 2 for others round_conditionaly(c(1000.27890, 0.000179, 10e-11), cond = custom_rules) # Useful for carbon footprint reporting footprint_values <- c(0.001234, 1.23456, 123.456, 12345.6) round_conditionaly(footprint_values)# Default rounding behavior values <- c(1000.27890, 10.87988, 1.769869, 0.99796, 0.000179) round_conditionaly(values) # Custom rounding rules custom_rules <- cbind(c(10e-5, 5), c(10, 2)) # 5 decimals for tiny values, 2 for others round_conditionaly(c(1000.27890, 0.000179, 10e-11), cond = custom_rules) # Useful for carbon footprint reporting footprint_values <- c(0.001234, 1.23456, 123.456, 12345.6) round_conditionaly(footprint_values)
Analyzes the current R session to extract timing and memory usage information.
This function is particularly useful for understanding resource consumption
patterns and can be used with ga_footprint(runtime_h = "session").
The function uses base::proc.time() to get CPU timing information
and base::gc() to estimate memory usage when requested.
session_runtime(compute_mass_storage = TRUE)session_runtime(compute_mass_storage = TRUE)
compute_mass_storage |
Logical (default TRUE). Whether to compute
memory usage statistics using the |
A list containing:
cpu_times_users: User CPU time in seconds
cpu_times_system: System CPU time in seconds
time_elapsed: Total elapsed time in seconds
cpu_times: Combined user and system CPU time
mass_storage_used: Memory currently used (if requested)
mass_storage_max: Maximum memory used (if requested)
Adrien Taudière
# Get complete session information session_info <- session_runtime() print(session_info) # Get only timing information (faster) timing_only <- session_runtime(compute_mass_storage = FALSE) cat("Session has been running for", timing_only$time_elapsed, "seconds\n")# Get complete session information session_info <- session_runtime() print(session_info) # Get only timing information (faster) timing_only <- session_runtime(compute_mass_storage = FALSE) cat("Session has been running for", timing_only$time_elapsed, "seconds\n")