ggbetween_pq() to facilitate comparison of hill number using the power of ggstatsplot::ggbetweenstats()plot_SCBD_pq() to plot species contributions to beta diversity (SCBD) of samplesLCBD_pq() and plot_LCBD_pq() to compute, test and plot local contributions to beta diversity (LCBD) of samplestbl_sum_samdata() to summarize information from sample data in a tablemumu_pq() to use mumu, a fast and robust C++ implementation of lulu.install_pkg_needed() to install pkg (mostly for package list in Suggest in DESCRIPTION) if needed by a function.add_funguild_info() and plot_guild_pq() to add and plot fungal guild information from taxonomy using FUNGuild packagebuild_phytree_pq() to build 3 phylogenetic trees (NJ, UPGMA and ML using phangorn R package) from the refseq slot of a phyloseq object, possibly with bootstrap values. See the vignettes Tree visualization for an introduction to tree visualization using ggtree R package.one_plot (default FALSE, same behavior than before) to hill_pq function in order to return an unique ggplot2 object with the four plots inside.correction_for_sample_size (default TRUE, same behavior than before) to hill_pq and hill_tuckey_pq function to allow removing any correction for uneven sampling depth.multitax_bar_pq() to plot 3 levels of taxonomy in function of samples attributesridges_pq() to plot ridges of one taxonomic level in function of samples attributestreemap_pq to plot treemap of two taxonomic levelsiNEXT_pq() to calculate hill diversity using the iNEXT package.pairs to multi_biplot_pq() in order to indicate all pairs of samples we want to print.compare_pairs_pq() with information about the number of shared sequences among pairs.upset_pq() to plot upset of phyloseq object using the ComplexUpset package.upset_test_pq to test for differences between intersections (wrapper of ComplexUpset::upset_test() for phyloseq-object).add_info) in subtitle of the hill_pq() function.remove_space to simplify_taxo() function.simplify_taxo to clean_pq() function.rarefy_nb_seq by rarefy_before_merging and add arguments rarefy_after_merging and add_nb_seq to ggvenn_pq() function.rarefy_after_merging to biplot_pq() and upset_pq() functions.taxa_fill to upset_pq() function in order to fill the bar with taxonomic rank.subsample_fastq() to make subset of fastq files in order to test your pipeline with all samples but with a low number of reads.accu_samp_threshold() to compute the number of sequence to obtain a given proportion of ASV in accumulation curves (`accu_plot).tax_bar_pq() in order to plot taxonomic distribution across samples.multi_biplot_pq() to visualize a collection of couples of samples for comparison through a list of biplot_pq().add_info, na_remove, and clean_pq to plot_tax_pq() function.vsearch_cluster_method and vsearch_args to otu2asv() for more detailed control of the vsearch software.MM_idtaxa().write_pq() called save_pq() to save a phyloseq object in the three possible formats () at the same time
add_blast_info() to add information from blast_pq() to the tax_table slot of a phyloseq object.keep_temporary_files in asv2otu() function.asv2otu() and fix a little bug in the name of the conserved ASV after asv2otu().search_exact_seq_pq() to search for exact matching of sequences using complement, reverse and reverse-complement against a phyloseq object.add_new_taxonomy_pq() to add new taxonomic rank to a phyloseq object. For example to add taxonomic assignment from a new database.test_that package and improve code compatibility with cran recommendations.asv2otu() with method="vsearch" change two default values (to repeat the precedent behavior, use asv2otu(..., vsearch_cluster_method = "--cluster_fast", tax_adjust = 1)):
add_nb_samples to ggvenn_pq() which add the number of samples to level name in the plot. Useful to see disequilibrium in the number of samples among the factor's levels.args_makedb and args_blastn to functions blast_pq(), blast_to_phyloseq(), blast_to_derep() and filter_asv_blast().rarefy_nb_seqs to ggven_pq() in order to rarefy samples before plotting.SRS_curve_pq() to plot scaling with ranked subsampling (SRS) curves using the SRS::SRS_curve() function (see citation("SRS") for reference).nb_samples_info to biplot_pq() in order to add the number of samples merged by level of factors.biplot_pq() and ggvenn_pq().na_remove, dist_method (including Aitchinson and robust-Aitchinson distance), correction_for_sample_size and rarefy_nb_seqs options to adonis_pq() function.na_remove to graph_test_pq() function.plot_tax_pq() to plot taxonomic distribution (nb of sequences or nb of ASV) across factor.add_points and make better axis of hill_pq() functionblast_to_derep() in order to facilitate searching some fasta sequences in dereplicated sequences (obtained by dada2::derepFastq)| | Database (makeblastdb) | Sequences to blast (blastn) |
|-----------------------|------------------------------------------------|-----------------------------------|
| blast_to_phyloseq() | Built from ref_seq slot(physeq-class) | Custom fasta file |
| blast_to_derep() | Built from dereplicate sequences (derep-class) | Custom fasta file |
| blast_pq() | Custom database or custom fasta file | ref_seq slot of a physeq object |
tsne_pq() and plot_tsne_pq() to quickly visualize results of the t-SNE multidimensional analysis based on the Rtsne::Rtsne() function.count_seq()track_wkflow_samples() and select_one_sample()sam_data_first in function write_pq()reorder_asv and rename_asv to in function write_pq() and clean_pqrotl_pq() to build a phylogenetic tree using the ASV binomial names of a physeq object and the Open Tree of Life tree.split_by to make multiple plot given a variable in sam_data slot (function ggvenn_pq())seq_names in asv2otu() function allow to clusterize sequences from a character vector of DNA.blast_pq() function to blast the sequences of the @ref_seq slot against a custom databasefilter_asv_blast() function to filter ASV in phyloseq dataset using blast against a custom databasesubset_taxa_pq() function to filter ASV based on a named conditional vector. Used in filter_asv_blast().force_taxa_as_columns (default FALSE) and force_taxa_as_rows (default FALSE) to clean_pq().count_fastq_seq() to count sequences from fastq.gz files directly from R.track_wkflow() function (parameter taxonomy_rank)Adapt the function asv2otu() to IdClusters change in the DECIPHER package (commit 254100922f2093cc789d018c18a26752a3cda1e3). Then change the IdClusters function that was removed from DECIPHER to Clusterize function.
Better functioning of blast_to_phyloseq() when none query sequences are founded.
Add tax_adjust argument to asv2otu()function
Add some functions useful for the targets package
Add a biplot_physeq() function to visualize of two samples for comparison of physeq object
Add an argument modality in the tax_datatable() function to split OTU abundancy by level of the sample modality
Add a function multiple_share_bisamples() to help compare samples by pairs
Add a new function (ggVenn_phyloseq()) for better venn diagram but without area calculation (use venn_phyloseq() in this case).
Add two functions helpful for beta-diversity analysis (adonis_phyloseq() and physeq_graph_test())
accu_plot(), dist_pos_control(), hill_curves_pq() and tsne_pq() no longer error under recent R-devel: the otu_table is now coerced to a base matrix before being passed to vegan functions (renyi(), renyiaccum(), specaccum(), vegdist()), which previously triggered an assignment of an object of class "numeric" is not valid for @'.Data' error because plain as.matrix() does not strip the S4 otu_table class.phyloseq::rarefy_even_depth(), whose replace = FALSE code path errors with invalid 'length.out' value under recent R-devel (phyloseq issue #1753). The reimplementation is bit-identical to phyloseq::rarefy_even_depth() for the same seed, depth and replace value (and is more correct in the degenerate case where a retained sample has a single read). This affects rarefy_pq(), adonis_pq(), adonis_rarperm_pq(), hill_test_rarperm_pq(), hill_pq(), biplot_pq(), ggvenn_pq(), upset_pq(), ggaluv_pq() and ggscatt_pq().rarefy_pq() gains a replace argument (default FALSE, sampling without replacement) and accepts seed = FALSE to leave the random number generator untouched, mirroring phyloseq::rarefy_even_depth().\dontrun{} (kept for documentation,
not run during checks).skip_on_cran() to some heavy test files.shQuote()verify_tax_table() is now ~10× faster on full-size taxonomy tables.divent_hill_matrix_pq() no longer recomputes the per-sample
positive-subset (x <- x[x > 0]) once per Hill order. The loop is
now sample-outer / q-inner, so each row is sliced once. Numeric
output is bitwise-identical. Speeds up every Hill-diversity
computation in the package: hill_pq(), hill_bar_pq(),
hill_tuckey_pq(), profile_hill_pq(), psmelt_samples_pq(),
plot_refseq_extremity_pq(), and the *_rarperm_pq family.circle_pq() replaces a nested
pbapply(., 2, pbtapply(., group, sum)) over the OTU table with
two rowsum() calls. On data_fungi (1420 taxa × 185 samples)
the example dropped from ~18 s to ~1.8 s (≈ 10× faster). Output
unchanged.format2dada2(fasta_db = …), hill_acc_pq(type = "sample"), adonis_rarperm_pq() are also faster.vignettes/articles/timing.Rmd documents
wall-clock cost of the main functions on data_fungi and
data_fungi_mini, with a CSV refreshed by inst/benchmark/function_timings.R.R CMD check time to keep CRAN's 10-minute budget. Examples for
verify_tax_table(), adonis_pq(), plot_SCBD_pq(), multipatt_pq(),
hill_pq(), plot_tsne_pq(), upset_test_pq(), summary_plot_pq(),
ggvenn_pq(), plot_refseq_pq(), plot_seq_ratio_pq(),
plot_refseq_extremity_pq(), glmutli_pq(), adonis_rarperm_pq(),
lefser_pq(), var_par_pq(), var_par_rarperm_pq(),
taxa_only_in_one_level(), distri_1_taxa(), accu_plot_balanced_modality(),
multi_biplot_pq(), tax_bar_pq(), plot_var_part_pq(), track_wkflow(),
reorder_taxa_pq() and transform_pq() now use data_fungi_mini
(137 × 45) instead of the full data_fungi (185 × 1420), keeping behaviour
identical but much faster.hill_acc_pq(), iNEXT_pq(), format2dada2() and hill_test_rarperm_pq()
examples moved from \donttest{} to \dontrun{}. These functions are
inherently CPU-bound (sample-based accumulation, fasta reformatting,
permutation × rarefaction × q-loop) and were the largest individual
contributors to the 10-min CRAN budget. Their behaviour is documented in
the corresponding vignettes.verify_pq() example switched from data_fungi to data_fungi_mini
(82 s → < 5 s).plot_LCBD_pq() / LCBD_pq() smoke tests in
tests/testthat/test_figures_beta_div.R lowered nperm from 100 to 9
(they only assert return class, not numeric stability).nperm, n_permutations) are unchanged.funguild_assign() and rotl_pq() examples now use \dontrun{} instead of
\donttest{}. Both examples call external APIs (www.stbates.org and the
Open Tree of Life respectively) that are not always reachable during CRAN's
--run-donttest check, causing spurious ERRORs.verify_tax_table()'s introductory example was moved inside the existing
\donttest{} block. The call against the full data_fungi dataset took
~70 s, which triggered the CRAN "examples > 5 s" NOTE on every check.XVector removed from DESCRIPTION Imports. It was declared but never
imported in NAMESPACE or used directly; Biostrings already loads it
transitively. CRAN flagged this as "Namespace in Imports field not imported
from".paper/bibliography.bib, paper/paper.bib, and the two
vignettes/*.bib files (was the journal ISSN landing
10.1002/(issn)2637-4943, now the paper DOI
10.1111/j.1365-294X.2012.05542.x). README.md and the pkgdown site
regenerate accordingly.verify_tax_table() now recognises non-breaking space (U+00A0) and other
Unicode separators (em space, ideographic space, ...) as border / internal
whitespace. Previously the detection regex ^\s|\s$ (TRE) and the stripping
call trimws() only handled ASCII [ \t\r\n], so taxonomic values padded
with NBSP — common in spreadsheet- or copy-paste-derived metadata — were
silently kept as e.g. "Archaeospora ", causing duplicate genera and
broken grouping downstream. Detection now uses
grepl("^[\\s\\p{Z}]|[\\s\\p{Z}]$", val, perl = TRUE) and stripping uses
gsub("^[\\s\\p{Z}]+|[\\s\\p{Z}]+$", "", val, perl = TRUE). Both
clean_pq(..., tax_remove_border_spaces = TRUE) and
clean_pq(..., tax_remove_all_space = TRUE) benefit from the fix.verify_tax_table() gains a new check for invisible / unusual characters
in taxonomic values: anything in Unicode category \p{C} (control / format /
surrogate / private use / unassigned) or any \p{Z} separator other than
plain ASCII space or tab. Typical offenders are NBSP (U+00A0), zero-width
space (U+200B), zero-width joiner (U+200D) and C0 control characters.
Three new parameters drive the check: detect_invisible_chars (default
TRUE, warns when verbose = TRUE), replace_invisible_chars (default
FALSE, requires modify_phyloseq = TRUE to strip), and
invisible_chars_replacement (default ""). Warnings/messages report each
offending value with the hexadecimal code points of the offending characters
so the user can see what is hiding inside the string.clean_pq() gains tax_replace_invisible_chars (default FALSE) which
forwards to verify_tax_table() and strips invisible characters from the
cleaned tax_table.write_pq() no longer passes a DNAStringSet refseq slot directly to
utils::write.table() — sequences are now coerced via as.character()
first. This avoids dispatching to as.data.frame,XStringSet-method from
R-devel's data.frame(), which now forwards an internal validRN = FALSE
argument that the XStringSet method's .local does not accept.Biostrings is now an Imports (moved from Suggests), so that the
XVector classes stored in data/data_fungi*.rda are covered by
MiscMetabar's recursive strong dependency graph.ggstatsplot link in NEWS.md
(www.indrapatil.com) with the CRAN page.clean_pq() gains four FALSE-by-default toggles to apply verify_tax_table() modifications on the cleaned tax_table: remove_border_spaces (trim leading/trailing whitespace), remove_all_space (replace internal whitespace via replace_space_with, default "_"), replace_to_NA (set values matching unwanted_tax_patterns to NA; accepts a custom pattern vector), and redundant_suffix (drop redundant "_sp" tips where the genus is already filled; accepts a custom suffix string such as "_var"). Toggles can be enabled independently or combined in a single call; each modification emits a message and nothing fires when all toggles are FALSE.cutadapt_remove_primers() gains a cutadapt_args parameter (default "") to pass additional arguments directly to cutadapt, such as "-e 0.01" to lower the maximum error rate from the cutadapt default of 10% to 1%.hill_test_rarperm_pq(): fixed default type from "non-parametrique" to "nonparametric" to match the documented valid values and avoid confusion.hill_test_rarperm_pq(): fixed example that incorrectly passed p.val = 0.9 (not a valid parameter); it now uses p_val_signif = 0.9 as intended.var.equal, nboot, and effsize.type from ggbetweenstats(); if you were passing these through ... to ggbetween_pq() or hill_test_rarperm_pq(), they will now be silently ignored. The palette argument now requires "package::palette" format (e.g. palette = "ggthemes::gdoc"), and the separate package argument has been removed from ggstatsplot.hill_bar_pq() gains five parameters: error_fun (a function returning c(lower, upper) bounds, enabling asymmetric intervals such as quantile ranges; default mean ± SE), error_fun_lab (caption label; default "mean ± SE"), error_bar_alpha (transparency of the secondary top-half error bar drawn over jittered points; default 0.35), point_alpha (transparency of jittered data points; default 0.7), and letters_below_bar (when TRUE, compact letters are placed below the x-axis at a fixed position, giving a clean layout independent of data spread; default FALSE). Groups with NA values in the grouping variable now receive "n.d." letters when Tukey HSD is run, instead of being silently dropped.umap_pq() no longer emits a tibble .name_repair deprecation warning when using pkg = "umap" (fixes #134).hill_bar_pq() new function plotting Hill diversity bar charts (mean ±SE, jittered points, Kruskal-Wallis subtitle, optional Tukey HSD compact letter display) for one or multiple Hill orders via a patchwork layout.tax_bar_pq() fixes a bug where nb_seq = FALSE with a grouping fact would sum binary per-sample presence values across samples sharing the same modality, inflating bar heights beyond the true OTU count. Each OTU is now counted at most once per group (present in ≥1 sample of that group), so bar segments correctly show the number of distinct OTUs in each taxonomic rank per modality.tax_bar_pq() gains a n_sample_text_size parameter (default 2) controlling the font size of the per-group sample count label. The (n=X) annotation is now displayed below each bar rather than appended to the group x-axis label.R/normalize_pq.R, documented in a new article (articles/normalization.html).css_pq() new function wrapping metagenomeSeq::cumNorm() for Cumulative Sum Scaling normalization.gmpr_pq() new function implementing the Geometric Mean of Pairwise Ratios normalization (Chen et al. 2018) in pure R.mcknight_residuals_pq() new function computing depth-robust alpha diversity as residuals of log-richness on log-depth (McKnight 2018; Mikryukov 2023).rarefy_pq() new function wrapping phyloseq::rarefy_even_depth() with optional averaging over n rarefaction repetitions.srs_pq() new function wrapping SRS::SRS() for Scaling with Ranked Subsampling normalization.tmm_pq() new function wrapping edgeR::calcNormFactors(method = "TMM") for Trimmed Mean of M-values normalization.transform_pq() new function providing a unified interface to common count transformations (tss, hellinger, clr, rclr, log1p, z, pa, rank) via vegan::decostand().vst_pq() new function wrapping DESeq2::varianceStabilizingTransformation().biplot_pq() gains a color_rank parameter (default NULL): when set to a taxonomic rank (e.g. "Class"), bars are colored by that rank instead of by sample modality, giving a taxonomic-composition view of the biplot. The fill legend is automatically titled with the rank name.biplot_pq() gains a taxa_names_rank parameter (default NULL): when set to a taxonomic rank (e.g. "Genus"), the taxon axis labels display that rank instead of taxa_names(). Each OTU remains a separate bar regardless of shared rank values.biplot_pq() no longer displays "Samples" on the taxon axis; the position used for the modality name annotations is now unlabeled.unwanted_tax_patterns is a new exported named character vector of regex patterns for common problematic taxonomy values (NA-like strings, "unclassified", "unknown", "Incertae_sedis", empty QIIME-style ranks, etc.). verify_tax_table() now uses it as the default for replace_to_NA, and other pqverse packages (e.g. dbpq::count_unwanted_tax()) can reuse it to keep patterns in sync.compare_pairs_pq(), ggbetween_pq(), hill_pq(), hill_tuckey_pq(), plot_refseq_extremity_pq(), and psmelt_samples_pq() now use divent::div_hill() instead of vegan::renyi() for Hill number computation, and compare_pairs_pq() uses divent::ent_shannon() / divent::ent_simpson() instead of vegan::diversity() for Shannon and Simpson indices. The default estimator is now "UnveilJ" (bias-corrected) rather than the naive plug-in estimator — diversity values will differ from previous versions. Pass estimator = "naive" via ... to restore old numeric behavior.divent_hill_matrix_pq() new exported utility to compute Hill numbers for all samples in an OTU table using divent::div_hill(). Accepts ... to forward any argument to divent::div_hill().ggbetween_pq() gains a q parameter (default c(0, 1, 2)) to control which Hill diversity orders are computed. One plot is produced per value.hill_acc_pq() gains a type parameter ("individual" or "sample"). type = "sample" computes sample-based accumulation curves by pooling samples incrementally across random permutations using divent::div_hill(), with a confidence ribbon. When merge_sample_by is set, one curve per group is drawn on the same plot. type = "individual" preserves the previous individual-based behaviour.profile_hill_pq() new function wrapping divent::profile_hill() |> autoplot() to visualize Hill diversity profiles across all orders for all samples in a phyloseq object.hill_scales parameter in hill_pq(), hill_tuckey_pq(), and psmelt_samples_pq() is deprecated in favour of q. Use q = c(0, 1, 2) going forward.Add find_vsearch() and install_vsearch() to make vsearch-based functions work on all platforms including Windows. install_vsearch() downloads the vsearch binary from GitHub, and find_vsearch() automatically locates it. All vsearch-calling functions now default to find_vsearch() instead of a hard-coded "vsearch" path. Users can also set options(MiscMetabar.vsearchpath = "/path/to/vsearch") for custom installations.
Add ridges_sam_pq(), the sample-centric counterpart of ridges_pq(): each ridge represents a taxon (at a given taxonomic level) and the x-axis shows the abundance distribution across samples, colored by a sample factor.
Add params output_data_frame to function track_wkflow_samples()
cutadapt_remove_primers() gains a verbose parameter (default TRUE). Set verbose = FALSE to fully silence cutadapt stdout/stderr and the completion message — unlike suppressMessages() or capture.output(), which cannot intercept system command output.
Fix a bug in chimera_removal_vs() where matrix dimensions were dropped when the input had only one sample (one row), causing downstream [, ...] indexing to fail with "incorrect number of dimensions". All three subsetting branches now use drop = FALSE.
Many functions accepting a fact parameter now handle single-level factors gracefully: functions that require multiple groups (hill_pq(), hill_test_rarperm_pq(), graph_test_pq(), multipatt_pq(), ancombc_pq(), ggbetween_pq(), venn_pq(), ggvenn_pq(), upset_pq(), accu_plot(), accu_plot_balanced_modality(), plot_tsne_pq()) now emit an informative error message, while functions that can produce meaningful output with a single level (circle_pq(), sankey_pq(), are_modality_even_depth()) no longer crash.
Fix a bug in format2sintax() where the pattern_tax parameter was referenced by the wrong internal name (pattern_k), causing an error when using the taxnames argument.
Add reorder_distinct_colors() to reassign fill and color scales in ggplot objects so that adjacent segments have maximally different colors, with optional colorblind optimization and lightness alternation.
tax_bar_pq() gains show_values and minimum_value_to_show parameters to display abundance values (or percentages when percent_bar = TRUE) inside bar segments.
treemap_pq() now uses log10(x + 1) instead of log10(x) so that taxa with a count of 1 are still visible. New parameters show_na (default TRUE) to display NA taxa as a grey area, na_label to customize the NA label, and min_text_size (default 0) to control the minimum font size for tile labels.
biplot_pq() gains split_by_sample, sample_border_col, and sample_border_width parameters. When split_by_sample = TRUE, bars are stacked by sample with visible borders, showing the distribution of sequences across individual samples instead of a merged total.
Add two parameters to tax_bar_pq(), bar_internal_color to color each cells of the colored bars and linewidth_bar_internal to set the linewidth.
tax_bar_pq() with label_taxa = TRUE now also draws left-side labels for taxa that appear in the first bar but are absent from the last bar, making all taxa visible when using add_ribbon = TRUE across a time factor. A warning is emitted when taxa only appear in intermediate levels and cannot be labelled on either side.
Bug fix in normalize_prop_pq when taxa_are_rows(physeq) were FALSE.
Improve the verify_pq() function for cases where taxa_names or sample_names are not consistent and to test for duplicate sequences in @refseq slot.
Add a function verify_tax_table() to verify some classic issues in tax_table.
Fix a bug in aldex_pq() and plot_ordination_pq(). Also fix a bug in plot_ordination_pq() when using phyloseq object where taxa are rows.
Add parameters show_count, facet_by, growing_text and text_size to treemap_pq(): show_count appends raw abundance counts to labels, facet_by splits the treemap into facets by a sample metadata column, and growing_text=FALSE forces all tile labels to the same font size (determined by text_size).
Extend track_wkflow_samples() to accept all input types supported by track_wkflow(): matrix, dada-class, derep-class, lists of dada/derep, and character vectors of fastq file paths (previously only phyloseq objects were accepted).
Fix a bug for case with only one column in slot @sam_data
Fix a bug in the name of plot in the result of hill_pq()
Fix a bug in mumu_pq() not deleting temporary file log.txt when keep_temporary_files=FALSE
Fix a bug in adonis_pq() when using na_remove = TRUE and multiple terms in formula.
Add parameter by to adonis_pq() to choose how to compute p-values (overall model, sequential terms, marginal effects, one-degree-of-freedom contrasts). The default is now by = "terms" that will assess significance for each term.
Add function lefser_pq() to run LEfSe analysis (differential analysis) from a phyloseq object using the package lefser.
Add function aldex_pq() to run ALDEX2 analysis (differential analysis) from a phyloseq object using the package ALDEx2 and the default parameters gamma=0.5.
Add the parameter rngseed in all functions which used phyloseq::rarefy_even_depth
to set the seed for random number generator in order to increase reproducibility.
Better messages (and not error) in filter_asv_blast when the resulting table of OTU is empty
Improve ancombc_pq() function by allowing custom names in the tax_levels parameter.
Fix a bug in filt_taxa_pq when using both min_nb_seq and min_nb_occurence parameters.
Add function plot_seq_ratio_pq() to explore the number of sequences per samples using difference ratio of the number of sequences per samples ordered by the number of sequences.
Add params discard_genus_alone, pattern_to_remove_tip and pattern_to_remove_node to rotl_pq() to enhance the default naming of nodes and tips
Improve documentation consistency following the style guide
Allow DNAStringSet object as input of swarm_clustering() and physeq_or_string_to_dna()
Add param rank_propagation in merge_taxa_vec() to dissable the rank propagation of NA when merging taxa. It is useful when merging taxa with informations in the tax_table slot that do not follow a strict taxonomic hierarchical structure (e.g. functional guilds).
Add param lulu_exact in mumu_pq() to force the use of the unmodified lulu algorithm (with possibles errors) thanks to the option --legacy in mumu software. Add param extra_mumu_args to mumu_pq() to pass extra arguments to mumu software (--minimum_match, --minimum_ratio_type, --minimum_ratio, --minimum_relative_cooccurence, --threads).
Add function plot_ordination_pq to plot ordination from vegan::vegdist object (useful when using aitchison and robust aitchison distances)
Fix a bug in subset_taxa_pq() when the condition was TRUE only for one taxon
Fix warnings in graph_test_pq() with ggplot2 v.4.0.0
Fix a bug in upseq_pq() when using min_nb_seq parameter.
Fix a bug in blast function by allowing value to be equal (not strictly greater) to the threshold values id_cut, bit_score_cut, min_cover_cut and e_value_cut.
Fix a bug in swarm associated functions (swarm_clustering(), add_swarms_to_pq()) to take into account the d parameter. Also add a parameter fastidious that is automatically set to FALSE is d is different from 1.
species_colnames by taxonomic_ranks in rotl_pq()plot_mt() and krona()
plot_mt(): alpha → pval (aligns with existing pval pattern in other functions)krona(): file → file_path (aligns with existing file_path pattern)subset_taxa_tax_control()text_size and text_size_info to expand or minimize text annotation in summary_plot_pq().filt_taxa_wo_NA() to filter out taxa with NA values at given taxonomic rank(s)format2dada2() by adding semicolons to fill all the taxonomic levels if from_sintax is TRUEadonis_pq() for method aitchison and robust.aitchison.resolve_vector_ranks() in the assign_sintax() function.assign_sintax(), in particular vote_algorithm to choose the algo resolving conflict.pattern_to_remove in format2dada2()add_new_taxonomy_pq(). Only parameters used by the assign_* function corresponding to method are used.format2sintax(), format2dada2() and format2dada2_species to format fasta database in sintax, dada2 (dada2::assignTaxonomy()) and dada2 Species (dada2::assignSpecies()) formatassign_dada2() to assign Taxonomy (with missing ranks if needed) and to assign species using dada2::assignSpecies() with only one database input. Add method dada2_2steps in function add_new_taxonomy_pq() which use assign_dada2() function.assign_blastn() and add a method blast in the function add_new_taxonomy_pq().resolve_vector_ranks() to resolve conflict in a vector of taxonomy valuesmin_bootstrap in add_new_taxonomy_pq()assign_idtaxa()pattern_to_remove and remove_NA to simplify_taxo()assign_idtaxa() and learn_idtaxa() to facilitate the taxonomic assignation using the idtaxa algorithm from the DECIPHER R package.idtaxa to method in add_new_taxonomy_pq()tbl_sum_taxtable() to summarize tax_table from a phyloseq objectassign_sintax(), add params too_few (default value "align_start") and too_many (default "merge") to authorize db with variable numbers of rank and parenthesis in taxonomic name,suffix to add_blast_info() allowing multiple use of the function on the same phyloseq object (e.g. in order to used different database)return_DNAStringSet to write_temp_fasta() function to return a DNAStringSet object in place of a temporary file.count_seq()filt_taxa_pq() to filter taxa based on the number of sequences/occurencesno_legend() and hill_curves_pq() to plot hill diversity accumulation curves for phylosequmap_pq() to compute Dimensionality Reduction with UMAPplot_complexity_pq() to plot kmer complexity of references sequences of a phyloseq objecttype to ridge_pq() to plot a cumulative version (type="ecdf") version of ridgeassign_vsearch_lca(), assign_sintax() and internal function write_temp_fasta()method to add_new_taxonomy_pq() to allow the use of dada2::assign_taxonomy() (default, precedent only method available), assign_sintax() or assign_vsearch_lca()plot_refseq_pq() and plot_refseq_extremity_pq() to plot the proportion of each nucleotide and the diversity of nucleotides from @refseq of a phyloseq object.type, na_remove and verbose to ggvenn_pq(). The type = "nb_seq" allow to plot Venn diagram with the number of shared sequences instead of shared ASV.cutadapt_remove_primers().verbose to track_wkflow() and improve examples for track_wkflow() and list_fastq_filesreturn_file_path to cutadapt_remove_primers() in order to facilitate targets pipelinesam_data_matching_names() to match and verify congruence between fastq files names and sample metadata (sam_data)CRAN 2024-09-10
heat_tree_pq() because {metacoder} package is archived from CRAN.build_tree_pq to resubmit to CRAN
Add a param return_a_vector in function filter_trim() to make possible to return a vector of path as it is useful when used with targets::tar_targets(..., format="file"))list() by vector(list, ...)CRAN 2024-09-09
filter_taxa_blast() for filter_asv_blast()postcluster_pq() for asv2otu()return_data_for_venn in function ggvenn_pq in order to make more customizable plot following ggVennDiagram tutorialrename_asv by rename_taxons in clean_pq()reorder_asv by reorder_taxons in clean_pq()default_fun in function merge_samples2() in order to replace the default function that change the sample data in case of merging. A useful parameter is default_fun=diff_fct_diff_class.kruskal_test to hill_pq() function to prevent user to mis-interpret Tuckey HSD result (and letters) if the global effect of the tested factor on Hill diversity is non significant.vioplot to hill_pq() function to allow violin plot instead of boxplot.rarefy_sample_count_by_modality to debug the case of modality with level of length one.CRAN 2024-04-28
taxa_as_rows() and taxa_as_columns() to replace verbose called to clean_pq()ggscatt_pq() to plot and test for effect of a numerical columns in sam_data on Hill number. Its the equivalent for numerical variables of ggbetween_pq() which focus on the effect of a factor.var_par_pq() , var_par_rarperm_pq() and plot_var_part_pq() to compute the partition of the variation of community and plot it. It introduce the notion of rarperm part in the function name. It refers to the fact that this function compute permutation of samples depth rarefaction to measure the variation due to the random process in rarefaction.hill_test_rarperm_pq() to test the effect of a factor on hill diversity accounting for the variation due to random nature of the rarefaction by sample depth.rarefy_sample_count_by_modality() to equalize the number of samples for each levels of a modality (factor)accu_plot_balanced_modality() to plot accumulation curves with balanced modality (same number of samples per level) and depth rarefaction (same number of sequences per sample)adonis_rarperm_pq() to compute multiple Permanova analyses on different sample depth rarefaction.ggaluv_pq() to plot taxonomic distribution in alluvial fashion with ggplot2 (using the ggalluvial package)glmutli_pq() to use automated model selection and multimodel inference with (G)LMs for phyloseq objecttaxa_ranks in function psmelt_samples_pq() to group results by samples AND taxonomic ranks.q in functions hill_tuckey_pq() and hill_p() to choose the level of the hill number.na_remove in function hill_pq() to remove samples with NA in the factor fact.plot_with_tuckey to hill_pq().,formattable_pq() to make beautiful table of the distribution of taxa across a modality using visualization inside in the table.fac2col() and transp() to facilitate manipulation of colors, especially in function formattable_pq()signif_ancombc() and plot_ancombc_pq() to plot significant results from ancombc_pq() functiondistri_1_taxa() to summarize the distribution of one given taxa across level of a modalitynormalize_prop_pq() to implement the method proposed by McKnight et al. 2018psmelt_samples_pq() to build data frame of samples information including the number of sequences (Abundance) and Hill diversity metrics. Useful to use with the ggstatsplot packages (see examples).variable by fact in function ggbetween_pq() and hill_pq() (keeping the variable option in hill_pq() for backward compatibility)chimera_removal_vs(). Now it return a matrix to be able to be parsed on to dada2::getUniques()CRAN 2024-03-08
Add functions chimera_detection_vs() and chimera_removal_vs() to process chimera detection and removal using vsearch software
Add functions filter_trim(), sample_data_with_new_names() and rename_samples() to facilitate the use of targets for bioinformatic pipeline.
Add function add_info_to_sam_data() to expand sam_data slot using a data.frame and using nb_asv and nb_seq
Add functions swarm_clustering() and vsearch_clustering() and add swarm method in the function asv2otu()
Add function physeq_or_string_to_dna() mostly for internal use
Add function cutadapt_remove_primers() to remove primers using cutadapt
Add internal functions is_swarm_installed(), is_cutadapt_installed(), is_vsearch_installed() and is_falco_installed() to test for the availability of external software in order to run examples and test from testthat.
Submit to CRAN and change code to comply with their rules (patch 0.7.1 to 0.7.9)
Numerous examples and tests are skipped on CRAN because it spends to much time to run. Rules vignettes is updated to details the strategy for this.
Harmonization of parameters names:
add_nb_sequences -> add_nb_seq in ggvenn_pq()db -> db_url in get_funguild_db()db -> db_funguild in get_funguild_db()file -> file_path in get_file_extension()n_seq -> nb_seq in subsample_fastq()otutable -> otu_table in lulu()alpha -> pval in plot_edgeR_pq() and plot_deseq2_pq() and change default value from 0.01 to more classical 0.05sequences -> seq2search in function search_exact_seq_pq()seq_names -> dna_seq in function asv2otuRemoving the function install_pkg_needed() which do not comply with CRAN policies
ancombc_pq() to simplify the call to ANCOMBC::ancombc2() : ANalysis of COmpositions of Microbiomes with Bias Correction 2taxa_names_from_physeq (default FALSE) to subset_taxa_pq()rarefy_by_sample (default FALSE) to function ggbetween_pq()are_modality_even_depth() to test if samples depth significantly vary among the modalities of a factormerge_taxa_vec() and merge_samples2() from the speedyseq package into MiscMetabar to decrease package dependencies (Thanks to Mike R. Mclaren)reorder_taxa_pq() in order to replace the unique call to package MicroViz to decrease package dependencies.get_funguild_db() and funguild_assign() from the FUNGuildR package into MiscMetabar to decrease package dependenciesgoodpractice::gp() and devtools::check() functionverify_pq() with args verbose=TRUEmultitax_bar_pq() when using nb_seq = FALSEggvenn_pq() thanks to issue #31log_10 in function biplot_pq() into log10translog10transform in function circle_pq() into log10transphyseq by pk.
\tabular{rl}{
graph_test_pq() \tab now a synonym for physeq_graph_test\cr
adonis_pq() \tab now a synonym for adonis_phyloseq\cr
clean_pq() \tab now a synonym for clean_physeq\cr
lulu_pq() \tab now a synonym for lulu_phyloseq\cr
circle_pq() \tab now a synonym for otu_circle\cr
biplot_pq() \tab now a synonym for biplot_physeq\cr
read_pq() \tab now a synonym for read_phyloseq\cr
write_pq() \tab now a synonym for write_phyloseq\cr
sankey_pq() \tab now a synonym for sankey_phyloseq\cr
summary_plot_pq() \tab now a synonym for summary_plot_phyloseq\cr
plot_edgeR_pq() \tab now a synonym for plot_edgeR_phyloseq\cr
plot_deseq2_pq() \tab now a synonym for plot_deseq2_phyloseq\cr
venn_pq() \tab now a synonym for venn_phyloseq\cr
ggvenn_pq() \tab now a synonym for ggVenn_phyloseq\cr
hill_tuckey_pq() \tab now a synonym for hill_tuckey_phyloseq\cr
hill_pq() \tab now a synonym for hill_phyloseq\cr
heat_tree_pq() \tab now a synonym for physeq_heat_tree\cr
compare_pairs_pq() \tab now a synonym for multiple_share_bisamples\cr
}sam_names() to read_pq()data_fungi and data_fungi_sp_known metadata