Package 'greenAlgoR'

Title: Carbon Footprint Estimation for R Computations
Description: Computes the carbon footprint and ecological impact of computational tasks in R. Based on the Green Algorithms framework (Lannelongue et al. 2021, <https://calculator.green-algorithms.org/>), this package provides functions to estimate energy consumption and CO2 emissions from R computations. It includes specialized support for targets pipelines and provides visualization tools for carbon footprint analysis. The package helps researchers and data scientists to measure the environmental impact of their computational work.
Authors: Adrien Taudière [aut, cre, cph]
Maintainer: Adrien Taudière <[email protected]>
License: GPL (>= 3)
Version: 0.1.2
Built: 2026-05-10 08:07:51 UTC
Source: https://github.com/adrientaudiere/greenAlgoR

Help Index


greenAlgoR package

Description

Carbon Footprint Estimation for R Computations

The greenAlgoR package provides tools to estimate the carbon footprint and energy consumption of computational tasks in R. Based on the Green Algorithms framework developed by Lannelongue et al. (2021), this package helps researchers and data scientists understand and minimize the environmental impact of their work.

Internal Data

The package includes some internal datasets to work outline. You can access them directly after loading the package (TDP_cpu_internal, carbon_intensity_internal and ref_value_internal). You can also replace default data by overwritting them. By default, data are loaded using functions:

TDP_cpu_internal <- csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CPUs.csv")

carbon_intensity_internal <- csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CI_aggregated.csv")

ref_value_internal <- csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/referenceValues.csv")

Main Functions

  • ga_footprint: Calculate carbon footprint for individual computations

  • ga_targets: Calculate carbon footprint for targets pipelines

  • session_runtime: Compute session runtime and memory usage

Key Features

  • Estimate CO2 emissions based on runtime, CPU usage, and memory consumption

  • Support for different geographical locations with varying carbon intensities

  • Integration with the targets package for pipeline analysis

  • Visualization tools for carbon footprint comparisons

  • Configurable hardware specifications (CPU models, memory, storage)

Getting Started

To get started with greenAlgoR, try:

# Basic usage - estimate footprint of a 12-hour computation
result <- ga_footprint(runtime_h = 12, location_code = "WORLD")

# For your current R session
session_footprint <- ga_footprint(runtime_h = "session")

# For targets pipelines (in a targets project)
targets_footprint <- ga_targets()

Author(s)

Adrien Taudière [email protected]

References

Lannelongue, L., Grealey, J., Inouye, M. (2021). Green Algorithms: Quantifying the Carbon Footprint of Computation. Advanced Science, 8(12), 2100707. doi:10.1002/advs.202100707


Load CSV files from Green Algorithms GitHub repositories

Description

lifecycle-experimental

Helper function to download and parse CSV data from the Green Algorithms project repositories. This function handles the specific format used by Green Algorithms data files, which often have headers in specific rows.

Usage

csv_from_url_ga(url, remove_first_line = TRUE)

Arguments

url

Character string with URL to a raw CSV file from Green Algorithms repository

remove_first_line

Logical (default TRUE). Whether to remove the first line from the CSV file (often contains metadata rather than column headers).

Value

A data.frame with properly formatted column names and data

Author(s)

Adrien Taudière

Examples

## Not run: 
# Download carbon intensity data
carbon_intensity <- csv_from_url_ga(
  paste0(
    "https://raw.githubusercontent.com/GreenAlgorithms/GA-data",
    "/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CI_aggregated.csv"
  )
)
head(carbon_intensity)

## End(Not run)

Compute footprint in grams of CO2 using Lannelongue et al. 2021 algorithm

Description

lifecycle-experimental

Please cite Lannelongue, L., Grealey, J., Inouye, M., Green Algorithms: Quantifying the Carbon Footprint of Computation. Adv. Sci. 2021, 2100707. https://doi.org/10.1002/advs.202100707

Default value are from https://github.com/GreenAlgorithms/green-algorithms-tool:

  • PUE: https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/default_PUE.csv

  • TDP_per_core: https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CPUs.csv

  • power_draw_per_gb: https://onlinelibrary.wiley.com/doi/10.1002/advs.202100707

Description of the algorithm from the green-algorithms website:

"""

The carbon footprint is calculated by estimating the energy draw of the algorithm and the carbon intensity of producing this energy at a given location:

carbonfootprint=energyneededcarbonintensitycarbon footprint = energy needed * carbon intensity

Where the energy needed is:

runtime(powerdrawforcoresusage+powerdrawformemory)PUEPSFruntime * (power draw for cores * usage + power draw for memory) * PUE * PSF

The power draw for the computing cores depends on the model and number of cores, while the memory power draw only depends on the size of memory available. The usage factor corrects for the real core usage (default is 1, i.e. full usage). The PUE (Power Usage Effectiveness) measures how much extra energy is needed to operate the data centre (cooling, lighting etc.).

The PSF (Pragmatic Scaling Factor) is used to take into account multiple identical runs (e.g. for testing or optimisation).

The Carbon Intensity depends on the location and the technologies used to produce electricity. But note that the "energy needed" [...] is independent of the location.

"""

Usage

ga_footprint(
  runtime_h = NULL,
  location_code = "WORLD",
  PUE = 1.67,
  TDP_per_core = 12,
  n_cores = 1,
  cpu_model = "Any",
  memory_ram = NULL,
  power_draw_per_gb = 0.3725,
  PSF = 1,
  usage_core = 1,
  add_ref_values = TRUE,
  add_storage_estimation = FALSE,
  mass_storage = NULL,
  carbon_intensity = NULL,
  TDP_cpu = NULL,
  ref_value = NULL
)

Arguments

runtime_h

Runtime in hours (numeric). Use a positive number for explicit runtime, or "session" to automatically calculate based on current R session time using proc.time().

location_code

Character string specifying geographical location for carbon intensity. Available options include country codes (e.g., "FR", "US", "CN") or "WORLD" for global average. See the Green Algorithms database for complete list of supported locations.

PUE

Power Usage Effectiveness (numeric, default 1.67). Measures data center efficiency - how much extra energy is needed for cooling, lighting, etc. Use 1.05 for personal computers, 1.2-1.7 for data centers. See https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/default_PUE.csv

TDP_per_core

Thermal Design Power per core in Watts (numeric, default 12). CPU power consumption per core. Find values at https://www.techpowerup.com/cpu-specs/ or http://calculator.green-algorithms.org/. Overridden by cpu_model parameter.

n_cores

Number of CPU cores (integer, default 1). Overridden by cpu_model parameter.

cpu_model

Character string specifying exact CPU model. Must match entries in the Green Algorithms database. When specified, automatically sets TDP_per_core and n_cores. Use "Any" for generic calculation.

memory_ram

RAM memory in GB (numeric). If NULL, attempts to detect automatically using benchmarkme::get_ram().

power_draw_per_gb

Power consumption per GB of RAM in Watts (numeric, default 0.3725).

PSF

Pragmatic Scaling Factor (numeric, default 1). Accounts for multiple runs of the same computation. As noted by Lannelongue et al. (2021): "computations are rarely performed only once" - use values > 1 to account for repeated runs, parameter sweeps, or iterative development. GHG emissions are multiplied."

usage_core

(int, default 1). The usage factor corrects for the real core usage (default is 1, i.e. full usage).

add_ref_values

(logical, default TRUE) Do we compute and return reference values to compare to your footprint ?

add_storage_estimation

(logical, default FALSE) Do we compute the footprint of mass storage ? By default FALSE because it is far less important than cpu and memory usage. Note that green-algorithms original tool do not compute mass storage usage.

mass_storage

(int. in GB, default NULL) The size of the mass_storage. Only used if add_storage_estimation is set to TRUE. If set to NULL, use the base::gc() function to estimate storage used.

carbon_intensity

(default NULL). Advanced users only. A dataframe with location and carbonIntensity columns. Set to carbon_intensity_internal if NULL. carbon_intensity_internal is set using command line csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CI_aggregated.csv")

TDP_cpu

(default NULL). Advanced users only. A dataframe with model, n_cores and TDP_per_core columns. Set to TDP_cpu_internal if NULL. TDP_cpu_internal is set using command line csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/CPUs.csv")

ref_value

(default NULL). Advanced users only. A dataframe with variable and value columns. Set to ref_value_internal if NULL. ref_value_internal is set using command line csv_from_url_ga("https://raw.githubusercontent.com/GreenAlgorithms/GA-data/5266caba6601dae0ffc93af8971e758f55292e08/v3.0/referenceValues.csv")

Value

A list of values

  • runtime_h: the input run time in hours

  • location_code: the input location code

  • TDP_per_core: the input TDP_per_core (if cpu_model is set, correspond to the TDP_per_core for this cpu)

  • n_cores: the input n_cores (if cpu_model is set, correspond to the n_cores for this cpu)

  • cpu_model: the input cpu model. If set to "Any", TDP_per_core and ncore are used

  • memory_ram: the input memory ram in GB

  • power_draw_per_gb: the input power draw per GB

  • usage_core: the input usage core

  • carbon_intensity: the input carbon intensity (depend on location code)

  • PUE: the input PUE

  • PSF: the input PUE

  • power_draw_for_cores_kWh: the output power draw for cores in kWh

  • power_draw_for_memory_kWh: the output power draw for RAM memory in kWh

  • energy_needed_kWh: the output energy needed in kWh

  • carbon_footprint_cores: the output carbon footprint in grams of CO2 for cores usage

  • carbon_footprint_memory: the output carbon footprint in grams of CO2 for memory usage

  • carbon_footprint_total_gCO2: the total output carbon footprint in grams of CO2

  • ref_value: (optionnal, return if add_ref_values is TRUE) : a dataframe

  • power_draw_storage_kWh: (optionnal, return if add_storage_estimation is TRUE) the output power draw for mass storage in kWh

Author(s)

Adrien Taudière

Examples

# Basic usage with explicit parameters
result <- ga_footprint(
  runtime_h = 2,
  n_cores = 4,
  TDP_per_core = 15,
  memory_ram = 16,
  location_code = "WORLD"
)
result$carbon_footprint_total_gCO2

# Using specific CPU model (automatically sets cores and TDP)
ga_footprint(
  runtime_h = 1,
  cpu_model = "Core i5-9600KF",
  location_code = "FR"
)

# Calculate footprint for current R session
ga_footprint(runtime_h = "session")

# Compare different locations
locations <- c("WORLD", "FR", "US", "NO")
sapply(locations, function(loc) {
  ga_footprint(runtime_h = 1, location_code = loc)$carbon_footprint_total_gCO2
})

# Advanced usage with storage estimation and reference values
res_ga <- ga_footprint(
  runtime_h = 4,
  n_cores = 8,
  memory_ram = 32,
  add_storage_estimation = TRUE,
  add_ref_values = TRUE
)

ggplot(res_ga$ref_value, aes(y = variable, x = as.numeric(value), fill = log10(prop_footprint))) +
  geom_col() +
  geom_col(data = data.frame(
    variable = "Total",
    value = res_ga$carbon_footprint_total_gCO2
  ), fill = "grey30") +
  geom_col(data = data.frame(
    variable = "Cores",
    value = res_ga$carbon_footprint_cores
  ), fill = "darkred") +
  geom_col(data = data.frame(
    variable = "Memory",
    value = res_ga$carbon_footprint_memory
  ), fill = "orange") +
  geom_col(data = data.frame(
    variable = "Mass storage",
    value = res_ga$carbon_footprint_storage
  ), fill = "violet") +
  scale_x_continuous(
    trans = "log1p",
    breaks = c(0, 10^c(1:max(log1p(as.numeric(res_ga$ref_value$value)))))
  ) +
  geom_vline(
    xintercept = res_ga$carbon_footprint_total_gCO2,
    col = "grey30", lwd = 1.2
  ) +
  geom_label(aes(label = round_conditionaly(prop_footprint)),
    fill = "grey90", position = position_stack(vjust = 1.1)
  ) +
  labs(
    title = "Carbon footprint of the analysis",
    subtitle = paste0(
      "(", res_ga$carbon_footprint_total_gCO2,
      " g CO2", ")"
    ),
    caption = "Please cite Lannelongue et al. 2021 (10.1002/advs.202100707)"
  ) +
  xlab("Carbon footprint (g CO2) in log10") +
  ylab("Modality") +
  theme(legend.position = "none")

Calculate carbon footprint for targets pipelines

Description

lifecycle-experimental

Calculates the total carbon footprint of a targets pipeline by analyzing the metadata from completed targets. This function is a wrapper around ga_footprint() that automatically extracts runtime and storage information from the targets metadata and computes the cumulative environmental impact.

The function aggregates:

  • Total runtime across all targets

  • Memory usage patterns (when storage estimation is enabled)

  • Hardware specifications you provide

Usage

ga_targets(
  names_targets = NULL,
  targets_only = TRUE,
  complete_only = FALSE,
  store = targets::tar_config_get("store"),
  tar_meta_raw = NULL,
  ...
)

Arguments

names_targets

Character vector of target names to include in analysis. If NULL (default), analyzes all available targets. See ?targets::tar_meta()

targets_only

Logical (default TRUE). Whether to analyze only actual targets or also include metadata on functions and other global objects.

complete_only

Logical (default FALSE). Whether to return only targets with complete metadata (no NA values in critical fields).

store

Character string, path to the targets data store. See ?targets::tar_meta() for details.

tar_meta_raw

Optional data.frame. If provided, uses this metadata directly instead of calling targets::tar_meta(). Useful for custom analyses or when working with pre-loaded metadata.

...

Additional arguments passed to ga_footprint(), such as:

  • location_code: geographical location for carbon intensity

  • n_cores: number of CPU cores used

  • TDP_per_core: thermal design power per core

  • memory_ram: RAM memory in GB

  • PUE: power usage effectiveness

Value

A list with the same structure as ga_footprint(). See ?ga_footprint for complete details on return values.

Author(s)

Adrien Taudière

Examples

## Not run: 
# Basic usage in a targets project directory
pipeline_footprint <- ga_targets()

# With specific hardware configuration
pipeline_footprint <- ga_targets(
  location_code = "FR",
  n_cores = 4,
  memory_ram = 16,
  PUE = 1.2
)

# Analyze specific targets only
pipeline_footprint <- ga_targets(
  names_targets = c("data_prep", "model_fit", "results"),
  add_storage_estimation = TRUE
)

## End(Not run)

# The next exemple emulate a mini-targets before to ask for tar_meta
tar_dir({ # tar_dir() runs code from a temp dir for CRAN.
  tar_script(
    {
      list(
        tar_target(
          name = waiting,
          command = Sys.sleep(2),
          description = "Sleep 2 seconds"
        ),
        tar_target(x, writeLines(
          targets::tar_option_get("error"),
          "error.txt"
        ))
      )
    },
    ask = FALSE
  )

  tar_make()
  tm <- tar_meta()

  res_gat <-
    ga_targets(
      tar_meta_raw = tm,
      n_cores = 6,
      TDP_per_core = 15.8,
      location_code = "FR",
      PUE = 2,
      add_storage_estimation = TRUE
    )

  ggplot(res_gat$ref_value, aes(
    y = reorder(variable, as.numeric(value)),
    x = as.numeric(value), fill = log10(prop_footprint)
  )) +
    geom_col() +
    geom_col(data = data.frame(
      variable = "Total ",
      value = res_gat$carbon_footprint_total_gCO2
    ), fill = "grey30") +
    geom_col(
      data = data.frame(
        variable = "Cores",
        value = res_gat$carbon_intensity * res_gat$power_draw_for_cores_kWh
      ),
      fill = "darkred"
    ) +
    geom_col(
      data = data.frame(
        variable = "Memory",
        value = res_gat$carbon_intensity * res_gat$power_draw_for_memory_kWh
      ),
      fill = "orange"
    ) +
    geom_col(
      data = data.frame(
        variable = "Storage",
        value = res_gat$carbon_intensity * res_gat$power_draw_per_gb
      ),
      fill = "violet"
    ) +
    scale_x_continuous(trans = "log1p") +
    geom_vline(
      xintercept = res_gat$carbon_footprint_total_gCO2,
      col = "grey30", lwd = 1.2
    ) +
    geom_label(aes(label = round(prop_footprint, 1)), fill = "grey90") +
    xlab("g CO^2") +
    ylab("Modality")
})

Conditionally round numeric values based on magnitude

Description

Applies different rounding rules based on the magnitude of values. Larger values are rounded to fewer decimal places, while smaller values retain more precision. This is useful for presenting results with appropriate precision across different scales.

Usage

round_conditionaly(
  vec,
  cond = cbind(c(1e-05, 5), c(0.001, 3), c(0.01, 3), c(1, 2), c(10, 1), c(100, 0))
)

Arguments

vec

A numeric vector to be rounded

cond

A matrix with 2 rows and n columns where:

  • First row: threshold values for applying rounding rules

  • Second row: number of decimal places to round to

The function automatically sorts conditions in decreasing order of thresholds. Default provides reasonable rounding for most carbon footprint values.

Value

A numeric vector of the same length as vec with values rounded according to the conditional rules

Author(s)

Adrien Taudière

Examples

# Default rounding behavior
values <- c(1000.27890, 10.87988, 1.769869, 0.99796, 0.000179)
round_conditionaly(values)

# Custom rounding rules
custom_rules <- cbind(c(10e-5, 5), c(10, 2)) # 5 decimals for tiny values, 2 for others
round_conditionaly(c(1000.27890, 0.000179, 10e-11), cond = custom_rules)

# Useful for carbon footprint reporting
footprint_values <- c(0.001234, 1.23456, 123.456, 12345.6)
round_conditionaly(footprint_values)

Compute session runtime and memory usage statistics

Description

lifecycle-experimental

Analyzes the current R session to extract timing and memory usage information. This function is particularly useful for understanding resource consumption patterns and can be used with ga_footprint(runtime_h = "session").

The function uses base::proc.time() to get CPU timing information and base::gc() to estimate memory usage when requested.

Usage

session_runtime(compute_mass_storage = TRUE)

Arguments

compute_mass_storage

Logical (default TRUE). Whether to compute memory usage statistics using the base::gc() function. Set to FALSE if you only need timing information.

Value

A list containing:

  • cpu_times_users: User CPU time in seconds

  • cpu_times_system: System CPU time in seconds

  • time_elapsed: Total elapsed time in seconds

  • cpu_times: Combined user and system CPU time

  • mass_storage_used: Memory currently used (if requested)

  • mass_storage_max: Maximum memory used (if requested)

Author(s)

Adrien Taudière

Examples

# Get complete session information
session_info <- session_runtime()
print(session_info)

# Get only timing information (faster)
timing_only <- session_runtime(compute_mass_storage = FALSE)
cat("Session has been running for", timing_only$time_elapsed, "seconds\n")