Protistan grazing impacts microbial communities and carbon cycling at deep-sea hydrothermal vents
Hu, S.K., Herrera, E., Smith, A., Pachiadaki, M.G., Edgcomb, V.P., Sylva, S.P., Chan, E.W., Seewald, J.S., German, C.R., & Huber, J.A. (In press) Protistan grazing impacts microbial communities and carbon cycling at deep-sea hydrothermal vents. PNAS
Code for all data analysis and figure generation, including grazing experiment analysis and sequence data processing.
Abstract: Microbial eukaryotes (or protists) in marine ecosystems are a link between microbial primary producers and all higher trophic levels. The rate at which heterotrophic protistan grazers consume microbial prey and recycle organic matter is an important component of marine microbial food webs and carbon cycling. At deep-sea hydrothermal vents, chemosynthetic bacteria and archaea form the basis of a food web in the absence of sunlight, but the role of protistan grazers in these highly productive ecosystems is largely unexplored. We investigated protistan activity, in the context of hydrothermal vent food web dynamics, in low-temperature venting fluids from Gorda Ridge in the North East (NE) Pacific Ocean. In this study we:
The following analysis was performed in R version 3.6.1. All input files are available on the Gorda Ridge GitHub repo. Data used in analysis is available in /data-input/
. ASV tables generated using QIIME2 (with DADA2); raw sequences are publicly available under BioProject PRJNA637089.
library(tidyverse)
library(reshape2)
library(cowplot)
library(phyloseq)
library(decontam)
Import metadata for each site sampled, cell count data, and details for each grazing experiment. Count data includes concentration of bacteria and archaea (cells per ml) and FLP per time point (cells per ml) for replicates in the grazing experiments.
# Metadata for each grazing experiment Including dive ID, vent/site name,
# incubation parameters
<- read.table("data-input/Table1_grazingexp_list.txt", header = T, fill = T,
exp_list sep = "\t")
# Import all cell count information from FLP disappearance experiments
<- read.csv("data-input/GordaRidge-cell-count-results.csv")
counts <- counts %>% separate(Site, c("SampleOrigin", "SampleNumber", "Stain"),
counts_df "-", remove = FALSE) %>% separate(ID, c("TimePoint", "Bottle", "Replicate"),
"-", remove = FALSE) %>% add_column(excess = "NA108") %>% unite(Sample.ID, excess,
sep = "-") %>% data.frame
SampleNumber,
# Import prok counts
<- read.table("data-input/prok_counts.txt", header = T, fill = T, sep = "\t") prok
Correction for sample name
Sample Plume003, is more appropriately considered near vent bottom water. Modify sample name and entry below.
<- exp_list %>% type.convert(as.is = TRUE) %>% mutate(Vent.name = case_when(Sample.ID ==
exp_IDs_mod "NA108-001" ~ "Near vent BW", TRUE ~ Vent.name), Sample.Location = case_when(Sample.ID ==
"NA108-001" ~ "BW", TRUE ~ Sample.Location)) %>% mutate(Vent.name = case_when(Sample.Location ==
"Plume" ~ "Plume", TRUE ~ Vent.name))
<- counts_df %>% mutate(SampleOrigin = case_when(Sample.ID == "NA108-001" ~
counts_df_mod "BW", TRUE ~ SampleOrigin)) %>% select(-Site)
<- prok %>% type.convert(as.is = TRUE) %>% mutate(Vent.name = case_when(Specific_Site ==
prok_mod "Plume001" ~ "Near vent BW", TRUE ~ Vent.name), Sample.location = case_when(Specific_Site ==
"Plume001" ~ "BW", TRUE ~ Sample.location)) %>% mutate(Specific_Site = case_when(Sample.location ==
"Plume" ~ "Plume", TRUE ~ Specific_Site), Vent.name = case_when(Sample.location ==
"Plume" ~ "Plume", TRUE ~ Vent.name))
# Join count data with experiment IDs:
<- counts_df_mod %>% left_join(exp_IDs_mod, by = c(SampleOrigin = "Sample.Location",
counts_df_ids Sample.ID = "Sample.ID")) %>% unite(Sample, TimePoint, Bottle, sep = "_", remove = FALSE) %>%
data.frame
Counts from Vent sample 110 T0 control were repeated three times (3 separate slides). Results are used below as technical replicates to estimate the percentage error rate. Calculating error rate is necessary for determining when FLP cells per ml were significantly different from one another.
By determining error rate from microscopy counting we can be more confident in evaluating true differences in values.
# This is the % max and min that we will consider to be a margin of error
<- counts_df_ids %>% filter(Sample.ID %in% "NA108-110" & TimePoint %in%
tech_check "T0" & Bottle %in% "Ctrl" & !(Replicate %in% "R2")) %>% group_by(SampleOrigin,
%>% summarise(MEAN = mean(Cellsperml), STDEV = sd(Cellsperml), ERR_PER = (100 *
Sample.ID) /MEAN))) %>% data.frame
(STDEV
<- tech_check[["ERR_PER"]]
PERCENT_ERR # Change in FLP time point to time point must exceed 16% PERCENT_ERR
## [1] 16.14934
Get average FLP concentration across experiment replicates and average cells/ml from prokaryotic counts.
<- counts_df_ids %>%
calc_FLP_avg group_by(SampleOrigin, Sample.ID, T, Bottle, Vent.name, Sample, Stain, T1, T2) %>%
summarise(Avg_cellmL = mean(Cellsperml), # Average cells per ml across replicates
sem=sd(Cellsperml)/sqrt(length(Cellsperml)), # Standard mean error
SD=sd(Cellsperml), #standard deviation
var=sqrt(SD), # variance
Num = n()) %>% #Total number of
data.frame
Separate T0 from other time points to calculate % differences in DTAF counts from T0 to T1 and T0 to T2
<- filter(calc_FLP_avg, (T == "T0" & Stain == "DTAF")) %>% select(-T1, -T2, -Stain,
t0 -Num, -T, -Sample, -SD, -var, Avg_cellmL_T0 = Avg_cellmL, sem_T0 = sem) %>% data.frame
# Isolate non-T0 time points
<- filter(calc_FLP_avg, (!(T == "T0") & Stain == "DTAF")) %>% select(-Stain,
t_ex -Num, -Sample, -SD, -var) %>% pivot_wider(names_from = T, values_from = c(Avg_cellmL,
%>% data.frame
sem))
<- calc_FLP_avg %>% filter(Stain %in% "DAPI") %>% select(-Bottle, -Stain,
bac_exp -T1, -T2, -SD, -var, -Num, bac_cellmL = Avg_cellmL, bac_sem = sem) %>% unite(SAMPLE,
sep = "-", remove = FALSE) %>% data.frame
SampleOrigin, Vent.name,
<- as.character(unique(bac_exp$SAMPLE))
dapi
<- prok_mod %>% group_by(Sample.location, Vent.name) %>% summarise(prok_avg = mean(Prok_count)) %>%
prok_avg unite(SAMPLE, Sample.location, Vent.name, sep = "-", remove = FALSE) %>% data.frame
colnames(t0)
## [1] "SampleOrigin" "Sample.ID" "Bottle" "Vent.name"
## [5] "Avg_cellmL_T0" "sem_T0"
Calculate percent difference between T0 and T1, and T1 and T2. These will be compared to the percent error rate.
<- t0 %>% left_join(t_ex) %>% unite(SAMPLE, SampleOrigin, Vent.name,
flp_exp_summary sep = "-", remove = FALSE) %>% left_join(prok_avg) %>% mutate(T0_T1_PercDiff = 100 *
abs(Avg_cellmL_T1 - Avg_cellmL_T0)/Avg_cellmL_T0), T0_T2_PercDiff = 100 * (abs(Avg_cellmL_T2 -
(/Avg_cellmL_T0)) %>% data.frame Avg_cellmL_T0)
Above data frame created lists the T0 FLP concentration and the T1 and T2 separately. The difference between T0 and T1 or T0 and T2 must exceed the percent error rate to be considered a reliable difference.
# Compare to those that exceed error rate
PERCENT_ERR
## [1] 16.14934
Generate data frame with timepoints to consider for downstream calculations.
<- flp_exp_summary %>% select(SAMPLE, Bottle, Vent.name, Avg_cellmL_T0,
cells_long %>% pivot_longer(cols = starts_with("Avg_cellmL"),
Avg_cellmL_T1, Avg_cellmL_T2, T1, T2) names_to = "CountID", values_to = "cellmL") %>% separate(CountID, c("avg", "excess",
"Tx"), sep = "_", remove = FALSE) %>% select(-avg, -excess) %>% data.frame
<- flp_exp_summary %>% select(SAMPLE, Bottle, Vent.name, sem_T0, sem_T1,
sem_long %>% pivot_longer(cols = starts_with("sem"), names_to = "semID", values_to = "sem") %>%
sem_T2) separate(semID, c("excess", "Tx"), sep = "_", remove = FALSE) %>% select(-excess) %>%
data.frame
# Combine and fix Timepoint
<- cells_long %>% left_join(sem_long) %>% select(-semID) %>% add_column(Hrs = 0) %>%
flp_long_toplot mutate(Hrs = case_when(Tx == "T1" ~ T1, Tx == "T2" ~ T2, TRUE ~ (as.integer(.$Hrs)))) %>%
select(-T1, -T2) %>% data.frame
# head(flp_long_toplot)
Supplementary plot showing loss in FLP over all time points. Shaded area reports percent error rate from T0.
# Factor for plotting
<- c("Near vent BW", "Mt Edwards", "Venti latte", "Candelabra", "SirVentsalot")
sample_order <- c("Near vent BW", "Mt. Edwards", "Venti latte", "Candelabra", "Sir Ventsalot")
sample_label <- c("#6f88af", "#61ac86", "#711518", "#dfa837", "#ce536b")
sample_color $SAMPLE_ORDER <- factor(flp_long_toplot$Vent.name, levels = (sample_order),
flp_long_toplotlabels = sample_label)
names(sample_color) <- sample_label
<- c("Ctrl", "Exp")
bottle_order $BOTTLE <- factor(flp_long_toplot$Bottle, levels = bottle_order, labels = c("Control",
flp_long_toplot"Experimental"))
Generate plot with all time points.
# svg('figs/Supplementary-FLP-CTRL-PercError-plot.svg', w = 7, h = 6)
ggplot(flp_long_toplot, aes(x = Hrs, y = cellmL, fill = SAMPLE_ORDER)) + geom_rect(data = (subset(flp_long_toplot,
%in% "T0")), aes(xmin = 0, xmax = 40, ymin = (cellmL - ((PERCENT_ERR/100) *
Tx ymax = (cellmL + ((PERCENT_ERR/100) * cellmL))), color = NA, alpha = 0.3) +
cellmL)), geom_line(stat = "identity", linetype = 1, aes(group = SAMPLE)) + geom_errorbar(aes(ymin = (cellmL -
ymax = (cellmL + sem)), width = 0.1) + geom_point(stat = "identity", size = 3,
sem), color = "black", aes(fill = SAMPLE_ORDER, shape = SAMPLE_ORDER)) + scale_y_log10() +
scale_fill_manual(values = sample_color) + scale_shape_manual(values = c(23,
21, 21, 21, 21)) + labs(y = bquote("FLP cells " ~ mL^-1), x = "Incubation hours") +
facet_grid(SAMPLE_ORDER ~ BOTTLE, scales = "free") + theme_bw() + theme(panel.grid.minor = element_blank(),
legend.title = element_blank(), strip.text.x = element_text(face = "bold", color = "black",
hjust = 0, size = 10), strip.text.y = element_text(size = 10), strip.background = element_blank(),
panel.background = element_blank(), panel.border = element_blank(), axis.line = element_line(colour = "black"),
axis.text = element_text(color = "black", size = 9))
# dev.off()
Subset FLP results to select time points with significant loss in FLP/
<- flp_exp_summary %>%
flp_sig filter(Bottle %in% "Exp") %>%
select(-T1, -T2) %>%
mutate(T1_sig = case_when(
> PERCENT_ERR ~ "exceeds"),
T0_T1_PercDiff T2_sig = case_when(T0_T2_PercDiff > PERCENT_ERR ~ "exceeds")
%>%
)
data.frame
# Select experiments that T1 exceeds percent difference
<- flp_sig %>%
T1_tmp filter(T1_sig == "exceeds") %>%
select(SAMPLE) %>%
data.frame$Tx = "T1"
T1_tmp$Keep = "yes"
T1_tmp
# Select experiments that T1 was NA, but T2 was significant
<- flp_sig %>%
T2_tmp filter(is.na(T1_sig) & T2_sig == "exceeds") %>%
select(SAMPLE) %>%
data.frame$Tx = "T2"
T2_tmp$Keep = "yes"
T2_tmp
<- rbind(T1_tmp, T2_tmp); #keep_status
keep_status # # KEPT:
# # near vent point T2, Candelabra T2
# # Mt Edwards time point T1, Sirventsalot T1, & venti latte T1
These values are used for all downstream grazing rate calculations, as the loss in FLP was found to exceed the microscopy count error percentage.
There was one exception to the above, where the Candelabra experiment T2 exceeded the error rate, and the control FLP also exceeded the error rate. Due to the change between T0 T1 and T2 in the control Candelabra experiment, this was likely an imprecise collection of T0 (poor mixing of control treatment bottle). Below, the average T0 FLP concentration in the control treatments (excluding Candelabra) was determined, so the T0 value for Candelabra could be corrected.
<- flp_trend_sig_ctrls %>% mutate(TimePoint = case_when(Tx == "T0" ~
flp_ctrl_trend "T0", Tx == "T2" ~ "TF", Tx == "T1" ~ "TF")) %>% select(-Tx, -Bottle, -SampleOrigin,
-CountID, -BOTTLE, -SAMPLE, -SAMPLE_ORDER) %>% pivot_wider(names_from = TimePoint,
names_sep = "_", values_from = c(cellmL, sem, Hrs)) %>% mutate(Perc_loss = 100 *
- cellmL_TF)/cellmL_T0))
((cellmL_T0
<- flp_ctrl_trend %>% filter(Vent.name != "Near vent BW") %>%
flp_ctrl_trend_vent filter(Vent.name != "Candelabra")
# Extract average set of controls
<- mean((flp_ctrl_trend_vent$cellmL_T0))
flp_controls_t0_avg # flp_controls_t0_avg
Create Figure S3 with corrected T0 FLP concentration.
# head(flp_long_toplot %>% filter(Vent.name == 'Candelabra'))
<- flp_long_toplot %>% filter(Vent.name == "Candelabra" & Bottle == "Ctrl") %>%
tmp mutate(cellmL = case_when((Vent.name == "Candelabra" & Bottle == "Ctrl" & Tx ==
"T0") ~ flp_controls_t0_avg, TRUE ~ cellmL)) %>% add_column(corr = "corrected")
<- flp_long_toplot %>% add_column(corr = "uncorrected") %>%
flp_long_to_plot_corr rbind(flp_long_toplot %>% filter(Vent.name == "Candelabra" & Bottle == "Ctrl") %>%
mutate(cellmL = case_when((Vent.name == "Candelabra" & Bottle == "Ctrl" &
== "T0") ~ flp_controls_t0_avg, TRUE ~ cellmL)) %>% add_column(corr = "corrected"))
Tx
# svg('corrected_flp_FigS3.svg', w=7, h = 6)
ggplot(flp_long_to_plot_corr, aes(x = Hrs, y = cellmL, fill = SAMPLE_ORDER)) + geom_rect(data = (subset(flp_long_to_plot_corr,
%in% "T0")), aes(xmin = 0, xmax = 40, ymin = (cellmL - ((PERCENT_ERR/100) *
Tx ymax = (cellmL + ((PERCENT_ERR/100) * cellmL))), color = NA, alpha = 0.3) +
cellmL)), geom_line(stat = "identity", aes(group = SAMPLE, linetype = corr)) + geom_errorbar(aes(ymin = (cellmL -
ymax = (cellmL + sem)), width = 0.1) + geom_point(stat = "identity", size = 3,
sem), color = "black", aes(fill = SAMPLE_ORDER, shape = SAMPLE_ORDER)) + scale_y_log10() +
scale_fill_manual(values = sample_color) + scale_shape_manual(values = c(23,
21, 21, 21, 21)) + scale_linetype_manual(values = c(1, 1)) + labs(y = bquote("FLP cells " ~
^-1), x = "Incubation hours") + facet_grid(SAMPLE_ORDER ~ BOTTLE, scales = "free") +
mLtheme_bw() + theme(panel.grid.minor = element_blank(), legend.title = element_blank(),
strip.text.x = element_text(face = "bold", color = "black", hjust = 0, size = 10),
strip.text.y = element_text(size = 10), strip.background = element_blank(), panel.background = element_blank(),
panel.border = element_blank(), axis.line = element_line(colour = "black"), axis.text = element_text(color = "black",
size = 9))
# dev.off()
Generate plot that reports the FLP loss in the significant experimental treatments.
# Factor for plotting use characterise lists from above
$SAMPLE_ORDER <- factor(flp_trend_sig$Vent.name, levels = (sample_order),
flp_trend_siglabels = sample_label)
<- ggplot(flp_trend_sig, aes(x = Hrs, y = cellmL, fill = SAMPLE_ORDER,
plot_graze_trends shape = SampleOrigin)) + geom_line(stat = "identity", aes(group = SAMPLE_ORDER,
linetype = SampleOrigin)) + geom_errorbar(aes(ymin = (cellmL - sem), ymax = (cellmL +
size = 0.5, width = 0.1) + geom_point(stat = "identity", size = 3, color = "black") +
sem)), scale_linetype_manual(values = c(1, 1)) + scale_fill_manual(values = sample_color) +
scale_shape_manual(values = c(23, 21)) + scale_y_log10(limits = c(5000, 1e+05)) +
labs(y = bquote("FLP cells " ~ mL^-1), x = "Incubation hours") + theme_minimal() +
theme(panel.grid.major = element_line(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
axis.text = element_text(color = "black"), legend.title = element_blank()) +
guides(fill = guide_legend(override.aes = list(shape = c(23, 21, 21, 21, 21))),
shape = guide_legend(override.aes = list(fill = "black"))) + annotation_logticks(sides = "l")
# plot_graze_trends
See reference: Salat, J. and Marrasé, C. (1994) Exponential and linear estimations of grazing on bacteria: effects of changes in the proportion of marked cells. Mar Ecol Prog Ser 104: 205–209.
<- flp_trend_sig %>% type.convert(as.is = TRUE) %>% mutate(TimePoint = case_when(Tx ==
processed_data "T0" ~ "T0", Tx != "T0" ~ "Tf")) %>% select(-Tx, -CountID) %>% pivot_wider(names_from = TimePoint,
values_from = c(cellmL, sem, Hrs)) %>% select(-Hrs_T0) %>% left_join(prok_avg) %>%
data.frame
Perform all calculations
# cellmL = prokaryote average cells per ml
<- processed_data %>%
graze_rate # type.convert(as.is = TRUE) %>%
group_by(SAMPLE, SampleOrigin, Vent.name, Hrs_Tf, SAMPLE_ORDER) %>%
mutate(
# Calculate mortality factor (m)
MORTALITY = (log(cellmL_Tf/cellmL_T0))*(-1/(Hrs_Tf/24)),
MORTALITY_min = (log((cellmL_Tf-sem_Tf)/(cellmL_T0-sem_T0)))*(-1/(Hrs_Tf/24)),
MORTALITY_max = (log((cellmL_Tf+sem_Tf)/(cellmL_T0+sem_T0)))*(-1/(Hrs_Tf/24)),
# Calculate model I G - Rate over given amount of time
G = ((cellmL_T0 - cellmL_Tf) * (prok_avg / cellmL_T0)),
G_min = (((cellmL_T0-sem_T0) - (cellmL_Tf-sem_Tf)) * (prok_avg / (cellmL_T0-sem_T0))),
G_max = (((cellmL_T0+sem_T0) - (cellmL_Tf+sem_Tf)) * (prok_avg / (cellmL_T0+sem_T0))),
# Calculate Grazing per hour
GrazingRate_hr = (G/Hrs_Tf),
GrazingRate_hr_min = (G_min/Hrs_Tf),
GrazingRate_hr_max = (G_max/Hrs_Tf),
# Estimate prokaryote turnover % per day
Prok_turnover = (100*(G / prok_avg)), #Convert to per day (*24)
Prok_turnover_min = (100*(G_min / prok_avg)),
Prok_turnover_max = (100*(G_max / prok_avg)),
# Prok_turnover = (100*((rate * cellmL)/cellmL)), #ARCHIVE
# Prok_turnover_min = (100*((rate_min * cellmL)/cellmL)), #ARCHIVE
# Prok_turnover_max = (100*((rate_max * cellmL)/cellmL)) #ARCHIVE
# Model II
N_avg = ((prok_avg + prok_avg)/2),
F_avg = ((cellmL_T0 + cellmL_Tf)/2),
q = ((cellmL_T0 - cellmL_Tf)/F_avg),
# G_II a and b should be equivalent
G_II_a = q * (N_avg),
G_II_b = ((cellmL_T0 - cellmL_Tf) * ((prok_avg+prok_avg)/(cellmL_T0+cellmL_Tf))),
GrazingRate_hr_II = (G_II_a/Hrs_Tf)
%>%
)
data.frame# graze_rate
Plot grazing rate for each site.
# Factor for plotting
<- c("Near vent BW", "Mt Edwards", "Venti latte", "Candelabra", "SirVentsalot")
sample_order <- c("Near vent BW", "Mt. Edwards", "Venti latte", "Candelabra", "Sir Ventsalot")
sample_label <- c("#6f88af", "#61ac86", "#711518", "#dfa837", "#ce536b")
sample_color
$SAMPLE_ORDER <- factor(graze_rate$Vent.name, levels = (sample_order),
graze_ratelabels = (sample_label))
<- ggplot(graze_rate, aes(y = SAMPLE_ORDER, x = GrazingRate_hr, fill = SAMPLE_ORDER,
mortality shape = SampleOrigin)) + geom_errorbar(aes(xmin = GrazingRate_hr_min, xmax = GrazingRate_hr_max),
size = 0.5, width = 0.1) + geom_point(stat = "identity", size = 3, color = "black",
aes(shape = SampleOrigin)) + scale_fill_manual(values = (sample_color)) + scale_shape_manual(values = c(23,
21)) + coord_flip() + labs(y = "", x = bquote("Cells " ~ mL^-1 ~ hr^-1)) + theme_minimal() +
theme(panel.grid.major = element_line(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
axis.text.x = element_text(color = "black", angle = 45, hjust = 1, vjust = 1),
axis.ticks = element_line(), axis.text.y = element_text(color = "black"),
legend.position = "none", strip.text = element_blank())
# mortality
Plot daily prokaryote turnover percentage.
<- ggplot(graze_rate, aes(x = SAMPLE_ORDER, y = Prok_turnover)) + geom_bar(stat = "identity",
bar_plot position = "stack", width = 0.6, aes(fill = SAMPLE_ORDER)) + geom_errorbar(aes(ymin = Prok_turnover_min,
ymax = Prok_turnover_max), size = 0.5, width = 0.1) + scale_fill_manual(values = (sample_color)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 100)) + labs(x = "", y = bquote("Prokaryote turnover %" ~
^-1)) + # coord_flip() +
dtheme_minimal() + theme(panel.grid.major = element_line(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
axis.text.y = element_text(color = "black"), axis.ticks = element_line(), axis.text.x = element_text(color = "black",
angle = 45, hjust = 1, vjust = 1), legend.position = "none", strip.text = element_blank())
bar_plot
Generate Figure 1.
# svg("figs/Grazing-results-panel-VERT-27-06-2021.svg", h = 9, w = 3)
plot_grid(plot_graze_trends + theme(legend.position = "none"),
# mortality,
# bar_plot,
+ theme(axis.text.x = element_blank()),
mortality
bar_plot,axis = c("lrtb"), align = c("hv"), labels = c("A", "B", "B"), nrow = 3, ncol = 1)
# dev.off()
Continue calculations to place into context with McNichol et al. work. Consider Morono et al. 2011 value for fg C per prokaryote cell.
# G = number of cells grazed during experiment duration
<- graze_rate %>%
graze_rate_wCarbon add_column(fgC_cell_morono = 86) %>% # Add in Morono et al. 2011 value
add_column(fgC_cell_mcnic = 173) %>%
mutate(
cells_consumed_perday = (G / (Hrs_Tf /24)), # Rate of cells consumed * in situ prok, per day (day = hours of incubation reported in days)
fgC_ml_perday_morono = (cells_consumed_perday * fgC_cell_morono),
fgC_ml_perday_mcnic = (cells_consumed_perday * fgC_cell_mcnic),# Convert cell amount to fg C
ugC_L_perday_morono = (fgC_ml_perday_morono * (1e-09) * 1000), # Convert to ug C per L
ugC_L_perday_mcnic = (fgC_ml_perday_mcnic * (1e-09) * 1000),
lower_mcnichol_morono = 100*(ugC_L_perday_morono / 17.3),
upper_mcnichol_morono = 100*(ugC_L_perday_morono / 321.4),
lower_mcnichol_mcnic = 100*(ugC_L_perday_mcnic / 17.3),
upper_mcnichol_mcnic = 100*(ugC_L_perday_mcnic / 321.4)
%>%
)
data.frame# write_delim(graze_rate_wCarbon, path = "Grazing-calc-wCarbon-results.txt", delim = "\t")
Set up working R environment and import 18S ASV table. Modify input tables and import as phyloseq objects in order to perform quality control removal of contaminant ASVs (decontam).
load("data-input/GR-ASVtables-updatedTax.RData", verbose = TRUE)
## Loading objects:
## GR_tagseq_longformat
## GR_tagseq_wideformat
Import ASV table as phyloseq object, note control samples. Control samples derived from lab and shipboard milliQ water samples.
<- GR_tagseq_wideformat %>% select(Feature.ID, Taxon_updated) %>% separate(Taxon_updated,
taxmat c("Kingdom", "Supergroup", "Division", "Class", "Order", "Family", "Genus", "Species"),
sep = ";", remove = FALSE) %>% column_to_rownames(var = "Feature.ID") %>% as.matrix
# class(taxmat) head(taxmat)
Note that Axial ID originates from a laboratory blank sample that was extracted at the same time.
<- GR_tagseq_wideformat %>% select(Feature.ID, starts_with(c("Gorda", "Axial"))) %>%
asvmat column_to_rownames(var = "Feature.ID") %>% as.matrix
Import metadata below and combine with phyloseq object.
## SAMPLE LOCATION LOCATION_SPECIFIC SAMPLEID
## 1 Axial_ExtractControl_CTRL_2019 ExtractControl ExtractControl CTRL
## 2 GordaRidge_Plume001_T0_2019_REP12 GordaRidge Plume001 T0
## 3 GordaRidge_Plume001_T24_2019_REP12 GordaRidge Plume001 T24
## 4 GordaRidge_Plume001_T36_2019_REP12 GordaRidge Plume001 T36
## 5 GordaRidge_Vent013_T0_2019_REP13 GordaRidge Vent013 T0
## 6 GordaRidge_Vent013_T36_2019_REP12 GordaRidge Vent013 T36
## Sampletype LocationName Sample_or_Control Sample_or_BSW
## 1 Control Lab blank Control Sample Control
## 2 Grazing Near vent BW True Sample True Sample
## 3 Grazing Near vent BW True Sample True Sample
## 4 Grazing Near vent BW True Sample True Sample
## 5 Grazing Mt Edwards Vent True Sample True Sample
## 6 Grazing Mt Edwards Vent True Sample True Sample
Decontam will identify putative contaminate ASVs based on the difference in prevalence between control blank and environmental samples. First review the library size or number of sequences within each sample to see how varied the control samples are to the experimental samples.
# Decontam:
physeq_names
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 9175 taxa and 34 samples ]
## sample_data() Sample Data: [ 34 samples by 9 sample variables ]
## tax_table() Taxonomy Table: [ 9175 taxa by 9 taxonomic ranks ]
# Check out library size of my data
<- as.data.frame(sample_data(physeq_names))
df $LibrarySize <- sample_sums(physeq_names)
df<- df[order(df$LibrarySize), ]
df $Index <- seq(nrow(df))
df#
ggplot(data = df, aes(x = Index, y = LibrarySize, fill = Sample_or_Control, shape = LOCATION)) +
geom_point(color = "black", size = 3, aes(shape = LOCATION)) + scale_shape_manual(values = c(21,
22, 23)) + theme_bw()
Shows that out of the 3 ship blanks I have, one of the sames has a pretty large library size, otherwise, control samples have very small library sizes.
# Assign negative control designation
sample_data(physeq_names)$is.neg <- sample_data(physeq_names)$Sample_or_Control ==
"Control Sample"
# ID contaminants using Prevalence information
<- isContaminant(physeq_names, method = "prevalence", neg = "is.neg",
contamdf.prev threshold = 0.5, normalize = TRUE)
table(contamdf.prev$contaminant) # Report number of ASVs IDed as contamintants
##
## FALSE TRUE
## 9141 34
0.5 - this threshold will ID contaminants in all samples that are more prevalent in negative controls than in positive samples. In this study, control samples included 1 lab-based blank and 3 ship-board blanks taken at the time of field study. Results showed 34 ASVs to be considered “contaminants”
# Make phyloseq object of presence-absence in negative controls and true samples
# change to presence absence
<- transform_sample_counts(physeq_names, function(abund) 1 * (abund > 0))
gr.pa
# isolate PA of positive and negative samples
<- prune_samples(sample_data(gr.pa)$Sample_or_Control == "Control Sample",
gr.pa.neg
gr.pa)<- prune_samples(sample_data(gr.pa)$Sample_or_Control == "True Sample",
gr.pa.pos gr.pa)
# Subset TRUE contaminants
<- subset(contamdf.prev, contaminant == "TRUE")
contams $Feature.ID <- row.names(contams)
contams# head(contams);dim(contams)
<- as.character(contams$Feature.ID)
list_of_contams # Explore taxa IDed as contaminants
<- as.data.frame(taxmat)
taxa_list $Feature.ID <- row.names(taxa_list)
taxa_list
<- left_join(contams, taxa_list)
taxa_contams # write_delim(taxa_contams, path = 'List-of-contaminant-ASVs.txt', delim = '\t')
# Plot total sequences and which are contaminants Remove contaminant and count
# sequence sums per sample to see which samples had the highest number of
# contamiant sequences removed. After remove contaminants, what % of sequences
# is removed? head(GR_tagseq_counts[1:2,])
$CONTAM <- "Pass"
GR_tagseq_longformat# head(contams[1:2,]) str(list_of_contams)
$CONTAM[GR_tagseq_longformat$Feature.ID %in% list_of_contams] = "Fail"
GR_tagseq_longformat# head(GR_tagseq_counts[1:2,])
# Make character list of all feature.ids to KEEP:
<- subset(GR_tagseq_longformat, CONTAM %in% "Pass")
keep1 # length(unique(keep1$Feature.ID))
<- as.character(unique(keep1$Feature.ID)) #see below
keep_asvs #
<- GR_tagseq_longformat %>% group_by(SAMPLE, CONTAM) %>% summarise(SUM_CONTAM = sum(COUNT)) %>%
passfail data.frame
Report sequence stats
<- left_join(passfail, ventnames, by = "SAMPLE") passfail_wID
# Original number of reads
sum(GR_tagseq_longformat$COUNT)
## [1] 1569829
# Original number of ASVs
length(unique(GR_tagseq_longformat$Feature.ID))
## [1] 9175
# unique(GR_tagseq_counts$SAMPLEID)
<- subset(GR_tagseq_longformat, !(SAMPLEID %in% "CTRL"))
GR_tagseq_counts_noCTRL # New total number of sequences
sum(GR_tagseq_counts_noCTRL$COUNT)
## [1] 1479273
<- GR_tagseq_longformat %>% filter(!(Feature.ID %in% list_of_contams))
counts_decont_tmp # Check
length(unique(counts_decont_tmp$Feature.ID)) - length(unique(GR_tagseq_longformat$Feature.ID)) # Confirm 34 lines removed
## [1] -34
# % of sequences was removed following decontam; this is counting the ship blank
# samples themselves
100 * (1 - (sum(counts_decont_tmp$COUNT)/sum(GR_tagseq_counts_noCTRL$COUNT)))
## [1] 1.23581
# Breakdown by samples:
<- dcast(passfail, SAMPLE ~ CONTAM) passfail_wide
## Using SUM_CONTAM as value column: use value.var to override.
$PercLossSeq <- paste(100 * (passfail_wide$Fail/(passfail_wide$Fail +
passfail_wide$Pass))) passfail_wide
Also remove sample with too few sequences and the control samples for an R data file.
# Remove controls
<- counts_decont_tmp %>% filter(!(SAMPLE == "GordaRidge_BSW020_sterivex_2019_REPa")) %>%
counts_decont filter(!(SAMPLEID %in% "CTRL")) %>% data.frame
# Save as R Data
save(counts_decont, file = "data-input/GR-ASV-table-clean.RData")
Import cleaned ASV data, curate taxonomic assignments specific to protists, create bar plot to demonstrate protistan diversity at Gorda Ridge.
load("data-input/GR-ASV-table-clean.RData", verbose = TRUE) # after decontam clenaing
## Loading objects:
## counts_decont
<- counts_decont %>% filter(COUNT > 0) %>% separate(Taxon_updated, c("Kingdom",
gr_counts "Supergroup", "Division", "Class", "Order", "Family", "Genus", "Species"), sep = ";",
remove = FALSE) %>% data.frame
<- gr_counts %>% select(Taxon_updated, Kingdom, Supergroup, Division,
tax_only_tmp %>% distinct() %>% data.frame Class, Order, Family, Genus, Species)
Import metadata for all vent sites.
<- read.delim("data-input/ventnames-gordaridge.txt")
ventnames colnames(ventnames)[1] <- "SAMPLE"
# Join with dataframe
<- gr_counts %>% left_join(select(ventnames, SAMPLE, LOCATION_SPECIFIC,
gr_counts_name %>% data.frame
Sampletype, LocationName))
$LocationName[gr_counts_name$LOCATION == "Shipblank"] = "Shipblank" gr_counts_name
Function below pr2_curate() is the custom manual curation of the taxonomic assignments from the PR2 database. The function creates new columns with taxonomic information that summarizes the core groups in the dataset.
<- function(df) {
pr2_curate # Add a column
$Taxa <- "Unassigned-Eukaryote"
df$Taxa[df$Supergroup == "Alveolata"] = "Alveolata-Other"
df$Taxa[df$Division == "Ciliophora"] = "Alveolata-Ciliates"
df$Taxa[df$Division == "Dinoflagellata"] = "Alveolata-Dinoflagellates"
df$Taxa[df$Class == "Syndiniales"] = "Alveolata-Syndiniales"
df$Taxa[df$Class == "Apicomplexa"] = "Alveolata-Apicomplexa"
df$Taxa[df$Supergroup == "Hacrobia"] = "Hacrobia-Other"
df$Taxa[df$Division == "Cryptophyta"] = "Hacrobia-Cryptophyta"
df$Taxa[df$Division == "Haptophyta"] = "Hacrobia-Haptophyta"
df$Taxa[df$Supergroup == "Opisthokonta"] = "Opisthokonta-Other"
df$Taxa[df$Division == "Fungi"] = "Opisthokonta-Fungi"
df$Taxa[df$Division == "Metazoa"] = "Opisthokonta-Metazoa"
df$Taxa[df$Supergroup == "Stramenopiles"] = "Stramenopiles-Other"
df$Taxa[df$Class == "Bicoecea"] = "Stramenopiles-Bicoecea"
df$Taxa[df$Division == "Ochrophyta"] = "Stramenopiles-Ochrophyta"
df<- unique(filter(df, grepl("MAST", Class)) %>% select(Class))
mast <- as.character(mast$Class)
mast_list $Taxa[df$Class %in% mast_list] = "Stramenopiles-MAST"
df$Taxa[df$Supergroup == "Archaeplastida"] = "Archaeplastida-Other"
df$Taxa[df$Division == "Chlorophyta"] = "Archaeplastida-Chlorophyta"
df$Taxa[df$Supergroup == "Excavata"] = "Excavata"
df$Taxa[df$Supergroup == "Apusozoa"] = "Apusozoa"
df$Taxa[df$Supergroup == "Amoebozoa"] = "Amoebozoa"
df$Taxa[df$Supergroup == "Rhizaria"] = "Rhizaria-Other"
df$Taxa[df$Division == "Cercozoa"] = "Rhizaria-Cercozoa"
df$Taxa[df$Division == "Radiolaria"] = "Rhizaria-Radiolaria"
dfreturn(df)
}
Apply PR2 curation to 18S data.
<- pr2_curate(gr_counts_name) gr_counts_wtax
Output is the full ASV table with added columns for curated taxonomy. Above also provides a list of the unique taxonomic names assigned.
<- subset(gr_counts_wtax, !(Sampletype == "control"))
gr_counts_wtax_samplesonly
## To average across replicates, modify SUPR sample names
<- gr_counts_wtax_samplesonly
gr_counts_filter $SAMPLEID <- sub("SUPRS9", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID <- sub("SUPRS11", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID <- sub("SUPRS10", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID <- sub("SUPRS2", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID <- sub("SUPRS3", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID <- sub("SUPRS1", "SUPR", gr_counts_filter$SAMPLEID) gr_counts_filter
# Sum of all sequences
<- sum(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% select(COUNT))
a a
## [1] 1434477
# Total ASVs
dim(unique(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% select(Feature.ID)))[1]
## [1] 9027
Percentage of all sequences Unassigned Eukaryote
<- sum(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Taxon_updated ==
x "Eukaryota") %>% select(COUNT))
100 * (x/a)
## [1] 2.823886
Total ASVs left “Unassigned-Eukaryote”
dim(unique(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Taxon_updated ==
"Eukaryota") %>% select(Feature.ID)))[1]
## [1] 1058
Percentage of all sequences assigned Opisthokonts
<- sum(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Supergroup ==
x "Opisthokonta") %>% select(COUNT))
100 * (x/a)
## [1] 12.9261
dim(unique(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Supergroup ==
"Opisthokonta") %>% select(Feature.ID)))[1]
## [1] 615
Average ASV sequence counts across replicate samples, COUNT_AVG
column will now equal the ASV sequence count value across replicates
<- gr_counts_filter %>% mutate(LocationName = case_when(LOCATION_SPECIFIC ==
gr_counts_avg_wtax "Plume036" ~ "Candelabra Plume", LOCATION_SPECIFIC == "Plume096" ~ "Mt Edwards Plume",
TRUE ~ as.character(LocationName))) %>% group_by(Feature.ID, SAMPLEID, Sampletype,
LOCATION_SPECIFIC, LocationName, Taxon_updated, Kingdom, Supergroup, Division, %>% summarise(COUNT_AVG = mean(COUNT)) %>%
Class, Order, Family, Genus, Species, Taxa)
as.data.frame# dim(gr_counts_filter);dim(gr_counts_avg_wtax) tmp <- gr_counts_avg_wtax %>%
# select(Taxa, Taxon_updated, Kingdom, Supergroup, Division, Class, Order,
# Family, Genus, Species) %>% distinct() %>% data.frame write_delim(tmp, path =
# 'tax-tmp-2.txt', delim = '\t') unique(gr_counts_avg_wtax$Taxa)
# unique(gr_counts_avg_wtax$LocationName)
Save output file
# save(gr_counts_filter,gr_counts_wtax, gr_counts_avg_wtax,
# file='data-input/GordaRidge-ASVtable-avg.RData')
Sum ASV sequence counts to taxonomic level
# See above load(file='data-input/GordaRidge-ASVtable-avg.RData', verbose = T)
Now sum ASV counts by curated taxonomic level. Below generates both summed sequences from samples averages across replicates and for samples with replicates.
# Sum averaged counts at curated taxa level
<- gr_counts_avg_wtax %>% # Remove control samples & bsw with too few sequences
gr_counts_avg_TAXA filter(!(Sampletype == "Control")) %>% filter(!(LOCATION_SPECIFIC == "BSW020")) %>%
# sum by like taxa
group_by(SAMPLEID, Sampletype, LocationName, Taxa) %>% summarise(SUM = sum(COUNT_AVG)) %>%
unite(SAMPLE, LocationName, Sampletype, SAMPLEID, sep = " ", remove = FALSE) %>%
data.frame
# Sum each replicate separately to curated taxa level
<- gr_counts_filter %>% mutate(LocationName = case_when(LOCATION_SPECIFIC ==
gr_counts_wreps_TAXA "Plume036" ~ "Candelabra Plume", LOCATION_SPECIFIC == "Plume096" ~ "Mt Edwards Plume",
TRUE ~ as.character(LocationName))) %>% # Remove control samples & bsw with too few sequences
filter(!(Sampletype == "Control")) %>% filter(!(LOCATION_SPECIFIC == "BSW020")) %>%
# sum by like taxa
group_by(SAMPLEID, Sampletype, LocationName, LOCATION_SPECIFIC, Taxa) %>% summarise(SUM = sum(COUNT)) %>%
mutate(locationspecific_mod = case_when(LOCATION_SPECIFIC == "Plume001" ~ "NearVent001",
TRUE ~ as.character(LOCATION_SPECIFIC))) %>% unite(SAMPLE, LocationName,
sep = " ", remove = FALSE) %>% unite(SAMPLE_REPS, LocationName,
Sampletype, SAMPLEID, sep = " ", remove = FALSE) %>% data.frame Sampletype, SAMPLEID, locationspecific_mod,
Make supplementary tables to summarize protist results.
<- c("Shallow seawater in situ sterivex", "Deep seawater in situ sterivex",
sample_order_all "Near vent BW in situ sterivex", "Near vent BW Grazing T0", "Near vent BW Grazing T24",
"Near vent BW Grazing T36", "Mt Edwards Plume in situ sterivex", "Mt Edwards Vent in situ SUPR",
"Mt Edwards Vent Grazing T0", "Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ SUPR",
"Venti Latte Vent Grazing T0", "Venti Latte Vent Grazing T36", "Candelabra Plume in situ sterivex",
"Candelabra Vent in situ SUPR", "Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ SUPR",
"SirVentsAlot Vent Grazing T24")
<- gr_counts_avg_TAXA %>% select(SAMPLE, Taxa, SUM) %>% pivot_wider(names_from = SAMPLE,
supp_table_seq values_from = SUM, values_fill = 0) %>% arrange(Taxa) %>% select(Taxa, sample_order_all)
# write_delim(supp_table_seq, path = 'Suppl-18s-seq-total.txt', delim = '\t')
# head(gr_counts_avg_wtax)
<- gr_counts_avg_wtax %>% # Remove control samples
supp_table_ASV filter(!(Sampletype == "Control")) %>% # total ASVs by like taxa
group_by(SAMPLEID, Sampletype, LocationName, Taxa) %>% summarise(ASV_total = n_distinct(Feature.ID)) %>%
unite(SAMPLE, LocationName, Sampletype, SAMPLEID, sep = " ", remove = TRUE) %>%
pivot_wider(names_from = SAMPLE, values_from = ASV_total, values_fill = 0) %>%
arrange(Taxa) %>% select(Taxa, sample_order_all)
# write_delim(supp_table_ASV, path = 'Suppl-18s-asv-total.txt', delim = '\t')
Plot factoring and parameters, ahead of bar plot (Figure 2)
<- c("Alveolata-Ciliates", "Alveolata-Dinoflagellates", "Alveolata-Syndiniales",
level2ORDER "Alveolata-Other", "Rhizaria-Cercozoa", "Rhizaria-Radiolaria", "Rhizaria-Other",
"Stramenopiles-MAST", "Stramenopiles-Ochrophyta", "Stramenopiles-Bicoecea", "Stramenopiles-Other",
"Hacrobia-Cryptophyta", "Hacrobia-Haptophyta", "Hacrobia-Other", "Amoebozoa",
"Excavata", "Apusozoa", "Archaeplastida-Chlorophyta", "Archaeplastida-Other",
"Opisthokonta-Fungi", "Opisthokonta-Metazoa", "Opisthokonta-Other", "Unassigned-Eukaryote")
<- c("Alveolates-Ciliates", "Alveolates-Dinoflagellates", "Alveolates-Syndiniales",
level2ORDER_LABEL "Alveolates-Other", "Rhizaria-Cercozoa", "Rhizaria-Radiolaria", "Rhizaria-Other",
"Stramenopiles-MAST", "Stramenopiles-Ochrophytes", "Stramenopiles-Bicoecea",
"Stramenopiles-Other", "Hacrobia-Cryptophytes", "Hacrobia-Haptophytes", "Hacrobia-Other",
"Amoebozoa", "Excavates", "Apusozoa", "Archaeplastid-Chlorophytes", "Archaeplastid-Other",
"Opisthokont-Fungi", "Opisthokont-Metazoa", "Opisthokont-Other", "Unassigned-Eukaryote")
<- c("#f1eef6", "#d7b5d8", "#df65b0", "#ce1256", "#fc9272", "#ef3b2c",
level2color "#800026", "#fff7bc", "#fec44f", "#d95f0e", "#a63603", "#74c476", "#238b45",
"#00441b", "#7fcdbb", "#084081", "#c6dbef", "#2b8cbe", "#016c59", "#bcbddc",
"#807dba", "#54278f", "#bdbdbd", "black")
$LEVEL2ORDER <- factor(gr_counts_avg_TAXA$Taxa, levels = level2ORDER,
gr_counts_avg_TAXAlabels = level2ORDER_LABEL)
names(level2color) <- level2ORDER
<- c("Shallow seawater in situ sterivex", "Deep seawater in situ sterivex",
sample_order_all "Near vent BW in situ sterivex", "Near vent BW Grazing T0", "Near vent BW Grazing T24",
"Near vent BW Grazing T36", "Mt Edwards Plume in situ sterivex", "Mt Edwards Vent in situ SUPR",
"Mt Edwards Vent Grazing T0", "Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ SUPR",
"Venti Latte Vent Grazing T0", "Venti Latte Vent Grazing T36", "Candelabra Plume in situ sterivex",
"Candelabra Vent in situ SUPR", "Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ SUPR",
"SirVentsAlot Vent Grazing T24")
<- c("Shallow seawater in situ", "Deep seawater in situ", "Near vent BW in situ",
sample_name_all "Near vent BW Grazing T0", "Near vent BW Grazing T24", "Near vent BW Grazing T36",
"Mt Edwards Plume in situ", "Mt Edwards Vent in situ", "Mt Edwards Vent Grazing T0",
"Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ", "Venti Latte Vent Grazing T0",
"Venti Latte Vent Grazing T36", "Candelabra Plume in situ", "Candelabra Vent in situ",
"Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ", "SirVentsAlot Vent Grazing T24")
$SAMPLE_ORDER <- factor(gr_counts_avg_TAXA$SAMPLE, levels = sample_order_all,
gr_counts_avg_TAXAlabels = sample_name_all)
<- c("sterivex", "SUPR", "T0", "T24", "T36")
exporder $SAMPLEID_ORDER <- factor(gr_counts_avg_TAXA$SAMPLEID, levels = exporder)
gr_counts_avg_TAXA$LOCATION_ORDER <- factor(gr_counts_avg_TAXA$LocationName, levels = c("Shallow seawater",
gr_counts_avg_TAXA"Deep seawater", "Near vent BW", "Mt Edwards Plume", "Mt Edwards Vent", "Venti Latte Vent",
"Candelabra Plume", "Candelabra Vent", "SirVentsAlot Vent"))
# Factor for dataframe with replicates
$LEVEL2ORDER <- factor(gr_counts_wreps_TAXA$Taxa, levels = level2ORDER,
gr_counts_wreps_TAXAlabels = level2ORDER_LABEL)
$SAMPLE_ORDER <- factor(gr_counts_wreps_TAXA$SAMPLE, levels = sample_order_all,
gr_counts_wreps_TAXAlabels = sample_name_all) # Factor by sample, but will plot x as sample with reps
$SAMPLEID_ORDER <- factor(gr_counts_wreps_TAXA$SAMPLEID, levels = exporder)
gr_counts_wreps_TAXA$LOCATION_ORDER <- factor(gr_counts_wreps_TAXA$LocationName,
gr_counts_wreps_TAXAlevels = c("Shallow seawater", "Deep seawater", "Near vent BW", "Mt Edwards Plume",
"Mt Edwards Vent", "Venti Latte Vent", "Candelabra Plume", "Candelabra Vent",
"SirVentsAlot Vent"))
Created a function to plot protist bar plots. This way we can plot samples averaged across replicates (Figure 2) and by individual samples.
<- function(df) {
barplot_lev2 ggplot(df, aes(x = SAMPLE_ORDER, y = SUM, fill = LEVEL2ORDER)) + geom_bar(stat = "identity",
position = "fill", color = "black") + scale_fill_manual(values = level2color) +
scale_y_continuous(expand = c(0, 0)) + theme(legend.position = "right", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",
size = 12), axis.text.y = element_text(color = "black", size = 12), axis.title = element_text(color = "black",
size = 14), legend.text = element_text(color = "black", size = 14), strip.text = element_blank(),
legend.title = element_blank()) + labs(x = "", y = "Relative abundance") +
facet_grid(. ~ LOCATION_ORDER, space = "free", scales = "free") + guides(fill = guide_legend(ncol = 1))
}<- function(df) {
barplot_lev2_wreps ggplot(df, aes(x = SAMPLE_REPS, y = SUM, fill = LEVEL2ORDER)) + geom_bar(stat = "identity",
position = "fill", color = "black") + scale_fill_manual(values = level2color) +
scale_y_continuous(expand = c(0, 0)) + theme(legend.position = "right", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",
size = 12), axis.text.y = element_text(color = "black", size = 12), axis.title = element_text(color = "black",
size = 14), legend.text = element_text(color = "black", size = 14), strip.text = element_blank(),
legend.title = element_blank()) + labs(x = "", y = "Relative abundance") +
facet_grid(. ~ SAMPLE_ORDER, space = "free", scales = "free") + guides(fill = guide_legend(ncol = 1))
}
<- c("sterivex", "SUPR")
insitu <- c("Unassigned", "Opisthokonta-Other", "Opisthokonta-Fungi", "Opisthokonta-Metazoa") rm
Protist bar plots with all replicates/samples.
# svg('figs/barplot-taxalevel-wReps.svg', h = 9, w = 13)
barplot_lev2_wreps(gr_counts_wreps_TAXA)
# dev.off()
Bar plot for Figure 2.
<- c("Unassigned", "Opisthokonta-Other", "Opisthokonta-Fungi", "Opisthokonta-Metazoa")
rm <- barplot_lev2(filter(gr_counts_avg_TAXA, !(Taxa %in% rm)))
nometaz_all nometaz_all
Generate protist bar plot with only in situ samples for presentation purposes and an additional plot with metazoa. Not report in manuscript.
# svg('figs/SUPPLEMENTARY-GR-tax-barplot-wmetaz.svg', w = 10, h = 8)
# barplot_lev2(filter(gr_counts_avg_TAXA)) dev.off()
# rm <- c('Unassigned', 'Opisthokonta-Other', 'Opisthokonta-Fungi',
# 'Opisthokonta-Metazoa') insitu <- c('sterivex', 'SUPR')
# head(gr_counts_avg_TAXA) svg('figs/GR-tax-barplot-insitu-only.svg', w = 8, h =
# 8) barplot_lev2(filter(gr_counts_avg_TAXA, !(Taxa %in% rm) & SAMPLEID %in%
# insitu)) dev.off()
Import 18S-derived data and metdata as needed.
load("data-input/GordaRidge-ASVtable-avg.RData")
<- read.delim("data-input/ventnames-gordaridge.txt")
ventnames library(vegan)
library(compositions)
library(purrr)
library(cluster)
library(ape)
library(reshape2)
library(ggdendro)
library(dendextend)
Remove controls samples and reformat data frame.
<- counts_decont %>% filter(!(SAMPLEID == "CTRL")) %>% filter(!(grepl("GordaRidge_BSW020",
gr_nums_wide %>% select(Feature.ID, SAMPLE, COUNT, REP) %>% left_join(ventnames,
SAMPLE))) by = c(SAMPLE = "SAMPLENAME")) %>% mutate(LocationName = case_when(LOCATION_SPECIFIC ==
"Plume036" ~ "Candelabra Plume", LOCATION_SPECIFIC == "Plume096" ~ "Mt Edwards Plume",
TRUE ~ as.character(LocationName))) %>% unite(SAMPLE_MOD, LocationName, Sampletype,
sep = "-") %>% select(Feature.ID, COUNT, SAMPLE_MOD) %>% pivot_wider(names_from = SAMPLE_MOD,
SAMPLEID, REP, values_from = COUNT, values_fill = 0) %>% column_to_rownames(var = "Feature.ID")
# Fix column names
<- gr_nums_wide
gr_for_dendro colnames(gr_for_dendro) <- gsub(x = names(gr_for_dendro), pattern = "-", replacement = " ")
# Relative abundance
<- decostand(gr_for_dendro, MARGIN = 2, method = "total")
relabun # colSums(relabun) # Should all equal to 1
# Cluster dendrogram (average hierarchical clustering)
<- hclust(dist(t(relabun)), method = "average") cluster_gr
<- as.dendrogram(cluster_gr)
dendro <- dendro_data(dendro, type = "rectangle") gr_dendro
Plot base dendrogram.
<- ggplot(segment(gr_dendro)) + geom_segment(aes(x = x, y = y, xend = xend,
gr_dendro_plot yend = yend)) + coord_flip() + scale_y_reverse(expand = c(0.2, 0.5), breaks = c(0,
0.2, 0.4, 0.6, 0.8)) + geom_text(aes(x = x, y = y, label = label, angle = 0,
hjust = 0), data = label(gr_dendro)) + theme_dendro() + labs(y = "Dissimilarity") +
theme(axis.text.x = element_text(color = "black", size = 14), axis.line.x = element_line(color = "#252525"),
axis.ticks.x = element_line(), axis.title.x = element_text(color = "black",
size = 14))
# svg('figs/SUPPLEMENTARY-dendrogram-wreps.svg', w = 10, h = 8)
gr_dendro_plot
# dev.off()
Transform ASV count data using center log-ratio. Look at eigenvalues after PCA analysis to determine how many axes are appropriate to show diversity among samples.
Reference: Coenen, A.R., Hu, S.K., Luo, E., Muratore, D., and Weitz, J.S. (2020) A Primer for Microbiome Time-Series Analysis. Front Genet 11: 310.
# CLR
<- data.frame(compositions::clr(t(gr_nums_wide)))
log_rats
# look at eigenvalues
<- prcomp(log_rats)
pca_lr <- (pca_lr$sdev^2)/sum(pca_lr$sdev^2)
variance_lr # head(variance_lr)
barplot(variance_lr, main = "Log-Ratio PCA Screeplot", xlab = "PC Axis", ylab = "% Variance",
cex.names = 1.5, cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
Based on this screeplot - 2 axis are OK, as they show 0.079 and 0.077, respectively, of the variance.
Extract PCA data points from analysis.
# Extract PCA points
<- data.frame(pca_lr$x, SAMPLE = rownames(pca_lr$x))
pca_lr_frame
<- pca_lr_frame %>% rownames_to_column(var = "SAMPLENAME") %>%
pca_lr_frame_wNames separate(SAMPLENAME, c("LocationName", "Sampletype", "SampleID", "REP"), "-",
remove = FALSE) %>% unite(shape_sample, LocationName, Sampletype, sep = " ",
remove = FALSE)
Factor for plotting.
<- c("Shallow seawater", "Deep seawater", "Near vent BW", "Mt Edwards Plume",
sample_order_all "Mt Edwards Vent", "Venti Latte Vent", "Candelabra Plume", "Candelabra Vent",
"SirVentsAlot Vent")
<- c("Shallow BSW", "Deep BSW", "Near vent BW", "Mt. Edwards Plume",
sample_label_all "Mt. Edwards", "Venti latte", "Candelabra Plume", "Candelabra", "Sir Ventsalot")
<- c("#bfbbb0", "#413f44", "#6f88af", "#61ac86", "#61ac86", "#711518",
sample_color_all "#dfa837", "#dfa837", "#ce536b")
names(sample_color_all) <- sample_label_all
<- c("Candelabra Plume in situ", "Candelabra Vent Grazing", "Candelabra Vent in situ",
shape_order "Deep seawater in situ", "Mt Edwards Plume in situ", "Mt Edwards Vent Grazing",
"Mt Edwards Vent in situ", "Near vent BW Grazing", "Near vent BW in situ", "Shallow seawater in situ",
"SirVentsAlot Vent Grazing", "SirVentsAlot Vent in situ", "Venti Latte Vent Grazing",
"Venti Latte Vent in situ")
<- c(24, 21, 21, 22, 24, 21, 21, 23, 23, 22, 21, 21, 21, 21)
shapes <- c("#dfa837", "white", "#dfa837", "#413f44", "#61ac86", "white", "#61ac86",
fill_color "white", "#6f88af", "#bfbbb0", "white", "#ce536b", "white", "#711518")
<- c("#dfa837", "#dfa837", "#dfa837", "#413f44", "#61ac86", "#61ac86",
color_color "#61ac86", "#6f88af", "#6f88af", "#bfbbb0", "#ce536b", "#ce536b", "#711518",
"#711518")
$SAMPLE_ORDER <- factor(pca_lr_frame_wNames$LocationName, levels = rev(sample_order_all),
pca_lr_frame_wNameslabels = rev(sample_label_all))
$SHAPE_ORDER <- factor(pca_lr_frame_wNames$shape_sample, levels = shape_order) pca_lr_frame_wNames
<- ggplot(pca_lr_frame_wNames,
pca_18s aes(x = PC1, y = PC2,
fill = SHAPE_ORDER,
color = SHAPE_ORDER,
shape = SHAPE_ORDER)) + #Replace label=SAMPLEID.y
# geom_text_repel(size = 3,
# box.padding = unit(0.5, "lines"))+
geom_hline(yintercept = 0) + geom_vline(xintercept = 0, color = "#525252") +
geom_point(size=4, stroke = 1.5, aes(fill=SHAPE_ORDER, color = SHAPE_ORDER, shape = SHAPE_ORDER)) +
ylab(paste0('PC2 ',round(variance_lr[2]*100,2),'%')) +
xlab(paste0('PC1 ',round(variance_lr[1]*100,2),'%')) +
scale_shape_manual(values = shapes) +
scale_fill_manual(values = fill_color) +
scale_color_manual(values = color_color) +
theme_bw() +
theme(axis.text = element_text(color="black", size=12),
legend.title = element_blank(),
axis.title = element_text(color="black", size=14),
legend.text = element_text(color = "black", size = 14))
# pca_18s
# svg('figs/panel-barplot-pca-18S-nolabel.svg', w = 20, h = 8)
plot_grid(nometaz_all, pca_18s, nrow = 1, labels = c("a", "b"), rel_widths = c(1,
0.85), align = c("hv"), axis = c("tblr"))
# dev.off()
To test the hypothesis that protistan species may be enriched at vent sites compared to surrounding seawater, 18S-derived ASVs were characterized by distribution.
Import data and classify ASVs
# Import averaged across replicates data
load("data-input/GordaRidge-ASVtable-avg.RData", verbose = T)
## Loading objects:
## gr_counts_filter
## gr_counts_wtax
## gr_counts_avg_wtax
# unique(gr_counts_avg_wtax[, c('Sampletype', 'LocationName')]) #categories to
# consider unique(gr_counts_avg_wtax$LocationName)
Classify resident versus cosmopolitan
<- gr_counts_avg_wtax %>% type.convert(as.is = TRUE) %>% filter(!(Sampletype ==
gr_wide "Control")) %>% filter(COUNT_AVG > 0) %>% unite(sample_type, LocationName, Sampletype,
sep = "_") %>% select(Feature.ID, sample_type, COUNT_AVG) %>% pivot_wider(names_from = sample_type,
values_from = COUNT_AVG, values_fill = 0, values_fn = sum) %>% rowwise() %>%
mutate_at(vars(Feature.ID), factor) %>% mutate(total = sum(c_across(where(is.numeric)))) %>%
data.frame
Import classification definition.
<- read.delim("data-input/vent-asv-classification.txt")
classifcation_schema
# From the purr function
<- function(gr_wide) reduce(gr_wide, `|`)
any_cols
<- gr_wide %>% mutate(VENT_x = ifelse(any_cols(across(contains("Vent_in.situ"),
gr_classified ~. > 0)), "vent", ""), VENTGRAZE_x = ifelse(any_cols(across(contains("Vent_Grazing"),
~. > 0)), "ventgraze", ""), NEAR_x = ifelse(any_cols(across(contains("Near.vent.BW_in.situ"),
~. > 0)), "near", ""), NEARGRAZE_x = ifelse(any_cols(across(contains("Near.vent.BW_Grazing"),
~. > 0)), "neargraze", ""), PLUME_x = ifelse(any_cols(across(contains("Plume_in.situ"),
~. > 0)), "plume", ""), BACK_x = ifelse(any_cols(across(contains("seawater"),
~. > 0)), "bsw", "")) %>% unite(COMPILED, ends_with("_x"), sep = "", remove = FALSE) %>%
left_join(classifcation_schema) %>% mutate(CLASS_COMPLEX = case_when(total ==
1 ~ "Unique", TRUE ~ as.character(CLASS_COMPLEX)), CLASS_SIMPLE_I = case_when(total ==
1 ~ "Unique", TRUE ~ as.character(CLASS_SIMPLE_I)), CLASS_SIMPLE_II = case_when(total ==
1 ~ "Unique", TRUE ~ as.character(CLASS_SIMPLE_II))) %>% mutate(sirvents_graze = case_when((SirVentsAlot.Vent_Grazing >
0 & SirVentsAlot.Vent_in.situ > 0) ~ "sirvents"), candelabra_graze = case_when((Candelabra.Vent_Grazing >
0 & Candelabra.Vent_in.situ > 0) ~ "candelabra"), edwards_graze = case_when((Mt.Edwards.Vent_Grazing >
0 & Mt.Edwards.Vent_in.situ > 0) ~ "edwards"), latte_graze = case_when((Venti.Latte.Vent_Grazing >
0 & Venti.Latte.Vent_in.situ > 0) ~ "latte"), near_graze = case_when((Near.vent.BW_Grazing >
0 & Near.vent.BW_in.situ > 0) ~ "near")) %>% unite(COMPILED_graze, ends_with("_graze"),
sep = "", remove = FALSE) %>% select(Feature.ID, starts_with("CLASS_"), ends_with("_graze")) %>%
distinct() %>% data.frame
Print report on total ASV counts that fall into each category.
<- left_join(gr_counts_avg_wtax, gr_classified) %>% filter(!(Sampletype ==
gr_sorted "Control"))
# head(gr_sorted)
# Stats
<- sum(gr_sorted$COUNT_AVG)
total #1.26 million sequences total
## [1] 1260873
<- gr_sorted %>% group_by(CLASS_SIMPLE_I) %>% summarise(totalasv = n_distinct(Feature.ID),
gr_sorted_summary_simpleI totalseq = sum(COUNT_AVG)) %>% mutate(Perc_seq = 100 * (totalseq/total)) %>%
data.frame
<- gr_sorted %>% group_by(CLASS_SIMPLE_II) %>% summarise(totalasv = n_distinct(Feature.ID),
gr_sorted_summary_simpleII totalseq = sum(COUNT_AVG)) %>% mutate(Perc_seq = 100 * (totalseq/total)) %>%
data.frame
<- gr_sorted %>% group_by(CLASS_COMPLEX) %>% summarise(totalasv = n_distinct(Feature.ID),
gr_sorted_summary_complex totalseq = sum(COUNT_AVG)) %>% mutate(Perc_seq = 100 * (totalseq/total)) %>%
data.frame# View(gr_sorted_summary_simpleI) View(gr_sorted_summary_simpleII)
# View(gr_sorted_summary_complex)
# distribution_simple vs detailed
<- gr_sorted %>% select(Feature.ID, CLASS_SIMPLE_I, CLASS_SIMPLE_II) %>%
gr_dist distinct() %>% mutate(DIST_simple = case_when(CLASS_SIMPLE_I == "Background" ~
"Other", CLASS_SIMPLE_I == "Unique" ~ "Other", TRUE ~ CLASS_SIMPLE_I)) %>% select(Feature.ID,
DIST_detail = CLASS_SIMPLE_II) %>% data.frame
DIST_simple,
# Select grazing enriched samples
<- gr_sorted %>% select(Feature.ID, ends_with("_graze")) %>% distinct() %>%
gr_dist_grazing filter(!(COMPILED_graze == "NANANANANA")) %>% add_column(Graze_enriched = "Enriched") %>%
data.frame# dim(gr_dist_grazing) table(gr_dist_grazing$COMPILED_graze)
Include distribution with taxonomic annotations
<- left_join(gr_counts_avg_wtax, gr_dist) %>% filter(!(Sampletype ==
gr_stats_wtax "Control")) %>% data.frame
<- gr_stats_wtax %>% group_by(Taxa, DIST_simple) %>% summarise(totalasv = n(),
gr_wtax_dist_simple totalseq = sum(COUNT_AVG)) %>% ungroup() %>% group_by(Taxa, DIST_simple) %>%
summarise(totalasvs = sum(totalasv), sumseqs = sum(totalseq)) %>% mutate(percentseq = sumseqs/sum(sumseqs) *
100) %>% pivot_wider(names_from = DIST_simple, names_glue = "{DIST_simple}_{.value}",
values_from = c(totalasvs, sumseqs, percentseq)) %>% data.frame
# View(gr_wtax_dist_simple) write_delim(gr_wtax_dist_simple, path =
# 'Distribution-ASVs-bytax.txt', delim = '\t')
<- gr_stats_wtax %>% group_by(Taxa, DIST_detail) %>% summarise(totalasv = n(),
gr_wtax_dist_detailed totalseq = sum(COUNT_AVG)) %>% ungroup() %>% group_by(Taxa, DIST_detail) %>%
summarise(totalasvs = sum(totalasv), sumseqs = sum(totalseq)) %>% mutate(percentseq = sumseqs/sum(sumseqs) *
100) %>% pivot_wider(names_from = DIST_detail, names_glue = "{DIST_detail}_{.value}",
values_from = c(totalasvs, sumseqs, percentseq)) %>% data.frame
# View(gr_wtax_dist_detailed) write_delim(gr_wtax_dist_detailed, path =
# 'Distribution-ASVs-bytax-detailed.txt', delim = '\t')
Generate bar plot that summarized sequence and ASV abundance by distribution of ASV. Simple distribution determined above mentioned in text of manuscript, more detailed outline of ASV classifications for the supplementary section.
<- gr_stats_wtax %>% unite(sample, LocationName, Sampletype,
gr_stats_wtax_toplot sep = " ", remove = FALSE) %>% group_by(sample, LocationName, Sampletype,
SAMPLEID, %>% summarise(totalasvs = n_distinct(Feature.ID), sumseqs = sum(COUNT_AVG)) %>%
SAMPLEID, DIST_detail) data.frame
<- c("Shallow seawater in situ sterivex", "Deep seawater in situ sterivex",
sample_order_all "Near vent BW in situ sterivex", "Near vent BW Grazing T0", "Near vent BW Grazing T24",
"Near vent BW Grazing T36", "Mt Edwards Plume in situ sterivex", "Mt Edwards Vent in situ SUPR",
"Mt Edwards Vent Grazing T0", "Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ SUPR",
"Venti Latte Vent Grazing T0", "Venti Latte Vent Grazing T36", "Candelabra Plume in situ sterivex",
"Candelabra Vent in situ SUPR", "Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ SUPR",
"SirVentsAlot Vent Grazing T24")
<- c("Shallow seawater in situ", "Deep seawater in situ", "Near vent BW in situ",
sample_name_all "Near vent BW Grazing T0", "Near vent BW Grazing T24", "Near vent BW Grazing T36",
"Mt Edwards Plume in situ", "Mt Edwards Vent in situ", "Mt Edwards Vent Grazing T0",
"Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ", "Venti Latte Vent Grazing T0",
"Venti Latte Vent Grazing T36", "Candelabra Plume in situ", "Candelabra Vent in situ",
"Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ", "SirVentsAlot Vent Grazing T24")
<- c("Shallow seawater", "Deep seawater", "Near vent BW", "Mt Edwards Plume",
location_order "Mt Edwards Vent", "Venti Latte Vent", "Candelabra Plume", "Candelabra Vent",
"SirVentsAlot Vent")
<- c("Shallow BSW", "Deep BSW", "Near vent BW", "Mt. Edwards Plume",
location_order_name "Mt. Edwards", "Venti latte", "Candelabra Plume", "Candelabra", "Sir Ventsalot")
$SAMPLE_ORDER <- factor(gr_stats_wtax_toplot$sample, levels = sample_order_all,
gr_stats_wtax_toplotlabels = sample_name_all)
<- c("sterivex", "SUPR", "T0", "T24", "T36")
exporder $SAMPLEID_ORDER <- factor(gr_stats_wtax_toplot$SAMPLEID, levels = exporder)
gr_stats_wtax_toplot
<- c("Shallow seawater", "Deep seawater", "Near vent BW", "Mt Edwards Plume",
location_order "Mt Edwards Vent", "Venti Latte Vent", "Candelabra Plume", "Candelabra Vent",
"SirVentsAlot Vent")
<- c("Shallow BSW", "Deep BSW", "Near vent BW", "Mt. Edwards Plume",
location_order_name "Mt. Edwards", "Venti latte", "Candelabra Plume", "Candelabra", "Sir Ventsalot")
$LOCATION_ORDER <- factor(gr_stats_wtax_toplot$LocationName,
gr_stats_wtax_toplotlevels = location_order, labels = location_order_name)
<- c("Vent local", "Vent local (no background, no vent)", "Vent resident",
category_order "Vent resident and background", "Background and vent local (w vent)", "Background and vent local (no vent)",
"Background", "Other", "Unique")
<- c("#e31a1c", "#fc4e2a", "#feb24c", "#ffeda0", "#c7e9b4", "#7fcdbb",
category_color "#1d91c0", "#225ea8", "#0c2c84")
$CATEGORY_ORDER <- factor(gr_stats_wtax_toplot$DIST_detail, levels = category_order)
gr_stats_wtax_toplotnames(category_color) <- category_order
Generate plots.
<- ggplot(gr_stats_wtax_toplot, aes(x = SAMPLE_ORDER, y = sumseqs, fill = CATEGORY_ORDER)) +
totalseq geom_bar(stat = "identity", color = "black", position = "fill") + # scale_fill_brewer(palette = 'Accent') +
scale_fill_manual(values = category_color) + scale_y_continuous(expand = c(0, 0)) +
facet_grid(. ~ LOCATION_ORDER, space = "free", scales = "free") + theme(legend.position = "right",
panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(),
panel.background = element_blank(), axis.text.x = element_text(angle = 90, hjust = 1,
vjust = 0.5, color = "black", face = "bold"), axis.text.y = element_text(color = "black",
face = "bold"), strip.text = element_blank(), strip.background = element_blank(),
legend.title = element_blank()) + labs(x = "", y = "")
# totalseq
<- ggplot(gr_stats_wtax_toplot, aes(x = SAMPLE_ORDER, y = totalasvs, fill = CATEGORY_ORDER)) +
totalasv geom_bar(stat = "identity", color = "black", position = "fill") + scale_fill_manual(values = category_color) +
scale_y_continuous(expand = c(0, 0)) + facet_grid(. ~ LOCATION_ORDER, space = "free",
scales = "free") + theme(legend.position = "right", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",
face = "bold"), axis.text.y = element_text(color = "black", face = "bold"),
strip.text = element_blank(), strip.background = element_blank(), legend.title = element_blank()) +
labs(x = "", y = "")
# totalasv
Generate supplementary figure that summarizes resident versus cosmopolitan ASVs by ASV and sequence count.
# Supplementary figure svg('figs/SUPPLEMENTARY-seq-asv-distribution.svg', h = 8,
# w = 16)
plot_grid(totalseq + labs(y = "Relative sequence abundance") + theme(legend.position = "none"),
+ labs(y = "Relative ASV abundance"), totalseq + geom_bar(stat = "identity",
totalasv color = "black", position = "stack") + labs(y = "Total sequences") + theme(legend.position = "none"),
+ geom_bar(stat = "identity", color = "black", position = "stack") +
totalasv labs(y = "Total ASVs"), labels = c("a", "b", "c", "d"), axis = c("lrtb"),
align = c("vh"))
# dev.off()
# save(gr_stats_wtax_toplot, gr_stats_wtax, gr_dist_grazing, gr_dist, file =
# 'data-input/GR-countinfo-withASVdistribution.RData')
# gr_dist = all ASV classification gr_dist_grazing = ASVs that are found in situ
# and in grazing gr_stats_wtax = complete full table with ASV classifications
# gr_stats_wtax_tplot = summarized for making figures
load("data-input/GordaRidge-ASVtable-avg.RData", verbose = T)
## Loading objects:
## gr_counts_filter
## gr_counts_wtax
## gr_counts_avg_wtax
load("data-input/GR-countinfo-withASVdistribution.RData", verbose = T)
## Loading objects:
## gr_stats_wtax_toplot
## gr_stats_wtax
## gr_dist_grazing
## gr_dist
# head(gr_stats_wtax)
unique(gr_stats_wtax$Taxa)
## [1] "Rhizaria-Radiolaria" "Stramenopiles-MAST"
## [3] "Opisthokonta-Metazoa" "Alveolata-Ciliates"
## [5] "Alveolata-Syndiniales" "Stramenopiles-Ochrophyta"
## [7] "Stramenopiles-Other" "Alveolata-Dinoflagellates"
## [9] "Unassigned-Eukaryote" "Rhizaria-Cercozoa"
## [11] "Opisthokonta-Other" "Hacrobia-Haptophyta"
## [13] "Alveolata-Other" "Stramenopiles-Bicoecea"
## [15] "Archaeplastida-Chlorophyta" "Hacrobia-Other"
## [17] "Archaeplastida-Other" "Hacrobia-Cryptophyta"
## [19] "Excavata" "Opisthokonta-Fungi"
## [21] "Amoebozoa" "Apusozoa"
## [23] "Rhizaria-Other"
# View(gr_stats_wtax %>% filter(Taxa == 'Stramenopiles-Bicoecea'))
This is a another function for further refinement of the taxonomic assignments.
# Add Taxa2 level
<- function(df){
expand_taxa2 <- sum(df$COUNT_AVG)
sumseq # Sum asv totals
<- df %>% group_by(Feature.ID) %>%
df_tmp summarise(SUM_ASV = sum(COUNT_AVG)) %>%
mutate(RelAbun = 100*(SUM_ASV/sumseq)) %>%
filter(RelAbun > 0.01) %>%
data.frame<- as.character(unique(df_tmp$Feature.ID)) #Select ASVs > 0.01% of data
topASVs18s $Taxa <- as.character(df$Taxa); df$Order <- as.character(df$Order)
df$Class <- as.character(df$Class); df$Division <- as.character(df$Division)
df<- c("Alveolata-Syndiniales", "Alveolata-Dinoflagellates", "Alveolata-Other")
non_ciliate <- c("Alveolata-Syndiniales", "Alveolata-Dinoflagellates")
order <- c("Alveolata-Ciliates", "Opisthokonta-Metazoa", "Opisthokonta-Fungi", "Opisthokonta-Other", "Rhizaria-Cercozoa", "Rhizaria-Radiolaria")
class <- df %>%
df2 type.convert(as.is = TRUE) %>%
mutate(Taxa2 = Taxa) %>% # Duplicate Taxa column
mutate(Taxa2 = ifelse(Taxa %in% order, Order, Taxa2),
Taxa2 = ifelse(Taxa %in% class, Class, Taxa2),
Taxa2 = ifelse(Taxa %in% "Amoebozoa", Division, Taxa2),#
Taxa2 = ifelse(Taxa %in% "Apusozoa", Division, Taxa2),#
# Curate Ciliates
Taxa2 = ifelse(Class == "Spirotrichea", paste(Class, Order, sep = "-"), Taxa2),
Taxa2 = ifelse(grepl("Spirotrichea_", Taxa2), "Spirotrichea-Other", Taxa2),
Taxa2 = ifelse(grepl("Spirotrichea-NA", Taxa2), "Spirotrichea-Other", Taxa2),
Taxa2 = ifelse(grepl("Spirotrichea-Strombidiida_", Taxa2), "Spirotrichea-Strombidiida", Taxa2),
Taxa2 = ifelse(grepl("Prostomatea_1", Taxa2), "Prostomatea", Taxa2),
Taxa2 = ifelse(grepl("CONTH_", Taxa2), "Alveolata-Ciliates", Taxa2),
Taxa2 = ifelse(grepl("CONThreeP", Taxa2), "Alveolata-Ciliates", Taxa2),
# Curate dinoflagellates and Syndiniales
Taxa2 = ifelse(grepl("Dino-Group-", Taxa2), "Syndiniales Dino-Groups (I-V)", Taxa2),
Taxa2 = ifelse(Taxa2 %in% non_ciliate, "Alveolata-Other", Taxa2),
Taxa2 = ifelse(Division == "Apicomplexa", "Apicomplexa", Taxa2),
# Curate Radiolaria
Taxa2 = ifelse(Class == "Acantharea", "Rhizaria-Acantharea", Taxa2),
# Taxa2 = ifelse(grepl("Acantharea-Group-", Taxa2), "Acantharea-Groups (I,II,VI)", Taxa2),
Taxa2 = ifelse(Class == "Polycystinea", paste(Class, Order, sep = "-"), Taxa2),
Taxa2 = ifelse(Taxa2 == "Rhizaria-Radiolaria-Other", "Rhizaria-Radiolaria", Taxa2),
Taxa2 = ifelse(Taxa2 == "Rhizaria-Cercozoa-Other", "Rhizaria-Cercozoa", Taxa2),
Taxa2 = ifelse(Taxa2 == "Endomyxa-Ascetosporea", "Endomyxa", Taxa2),
Taxa2 = ifelse(Taxa2 == "Novel-clade-10-12", "Rhizaria-Cercozoa", Taxa2),
Taxa2 = ifelse(Taxa2 == "Chlorarachniophyceae", "Rhizaria-Cercozoa", Taxa2),
Taxa2 = ifelse(Taxa2 == "Rhizaria-Other", "Rhizaria-Other", Taxa2),
# Add hacrobia resolution
Taxa2 = ifelse(Taxa2 == "Hacrobia-Other", Division, Taxa2),
# Add Excavata resolution
Taxa2 = ifelse(Taxa2 == "Excavata", Division, Taxa2),
# Curate Stramenopiles
Taxa2 = ifelse(Taxa2 == "Stramenopiles-Ochrophyta", Class, Taxa2),
Taxa2 = ifelse(Taxa2 == "Stramenopiles-MAST", "MAST", Taxa2),
Taxa2 = ifelse(grepl("MOCH-", Taxa2), "MOCH", Taxa2),
Taxa2 = ifelse(Taxa2 == "Stramenopiles-Bicoecea", Family, Taxa2),
# Archaeplastidia
Taxa2 = ifelse(Division == "Streptophyta", "Archaeplastida-Streptophyta", Taxa2),
# Curate other unknown - Move low abundance ASVs to "Other"
Taxa2 = ifelse(grepl("_X", Taxa2), Taxa, Taxa2),
Taxa2 = ifelse(is.na(Taxa2), Taxa, Taxa2),
Taxa2 = ifelse(Taxa2 == "Stramenopiles-Ochrophyta", "Stramenopiles-Other", Taxa2),
Taxa2 = ifelse(Taxa2 == "Unassigned-Eukaryote-Other", "Unassigned-Eukaryote", Taxa2),
# Fixing the designation of "Other"
Taxa2 = ifelse(Taxa2 == "Alveolata-Syndiniales", "Alveolata-Other", Taxa2),
Taxa2 = ifelse(Taxa2 == "Alveolata-Dinoflagellates", "Alveolata-Other", Taxa2),
Taxa2 = ifelse(Taxa2 == "Alveolata-Dinoflagellates", "Alveolata-Other", Taxa2),
Taxa2 = ifelse(Taxa2 == "Alveolata-Ciliates", "Ciliates-Other", Taxa2),
Taxa2 = ifelse(Taxa2 == "Alveolata-Ciliates", "Ciliates-Other", Taxa2)
%>%
) mutate(Broad_Taxa = Taxa) %>%
mutate(Broad_Taxa = ifelse(Broad_Taxa %in% non_ciliate, "Alveolata", Broad_Taxa),
Broad_Taxa = ifelse(grepl("Rhizaria", Broad_Taxa), "Rhizaria", Broad_Taxa),
Broad_Taxa = ifelse(grepl("Stramenopiles", Broad_Taxa), "Stramenopiles", Broad_Taxa),
Broad_Taxa = ifelse(grepl("Archaeplastida", Broad_Taxa), "Archaeplastida", Broad_Taxa),
Broad_Taxa = ifelse(grepl("Hacrobia", Broad_Taxa), "Hacrobia", Broad_Taxa),
Broad_Taxa = ifelse(grepl("Opisthokonta", Broad_Taxa), "Opisthokonta", Broad_Taxa)) %>%
data.framereturn(df2)
}
# Apply to ASV table
<- expand_taxa2(gr_stats_wtax)
gr_counts_avg_wtax2 # View(gr_counts_avg_wtax2)
# Add categories & set up for plotting function
<- gr_counts_avg_wtax2 %>% left_join(gr_dist) %>% unite(sample, LocationName,
gr_tax_res sep = " ", remove = FALSE) %>% data.frame
Sampletype, SAMPLEID, # save(gr_tax_res, file = 'GR-alltax-dist-tax2.RData')
Option to export a series of tables for supplementary information.
# Make table summarizing taxa stats for each sample.
<- gr_tax_res %>% group_by(LocationName, Sampletype, Taxa) %>% summarize(ASV_total = n_distinct(Feature.ID),
asv_seq_taxa SEQ_sum = sum(COUNT_AVG)) %>% unite(sample, LocationName, Sampletype, sep = "-") %>%
pivot_wider(names_from = sample, values_from = c(ASV_total, SEQ_sum), values_fill = 0) %>%
data.frame# head(asv_seq_taxa) dim(asv_seq_taxa)
<- gr_tax_res %>% group_by(LocationName, Sampletype, Taxa, Taxa2) %>%
asv_seq_taxa2 summarize(ASV_total = n_distinct(Feature.ID), SEQ_sum = sum(COUNT_AVG)) %>% unite(sample,
sep = "-") %>% pivot_wider(names_from = sample, values_from = c(ASV_total,
LocationName, Sampletype, values_fill = 0) %>% data.frame
SEQ_sum),
# write_delim(asv_seq_taxa, path = 'taxa-asv-seq-stats.txt', delim = '\t')
# write_delim(asv_seq_taxa2, path = 'taxa2-asv-seq-stats.txt', delim = '\t')
ID most abundant ASVs
# What are the most abundant ASVs in each of the taxa2 categories?
<- gr_tax_res %>% select(Feature.ID, RES_COSMO = DIST_simple, Taxa,
gr_topASV_taxa2 %>% group_by(Feature.ID, RES_COSMO, Taxa, Taxa2,
Taxa2, Taxon_updated, COUNT_AVG) %>% summarise(Total = sum(COUNT_AVG)) %>% ungroup() %>% group_by(Taxa,
Taxon_updated) %>% arrange(Taxa2, desc(Total)) %>% top_n(10, Total) %>% data.frame
Taxa2) # gr_topASV_taxa2 write_delim(gr_topASV_taxa2, path =
# 'supptable-topASVs-taxa2.txt', delim = '\t') save(gr_tax_res, file =
# 'GR-18S-ASV-list.RData')
Formatting to generate tile plot.
<- c("Shallow seawater in situ sterivex", "Deep seawater in situ sterivex",
sample_order_all "Near vent BW in situ sterivex", "Near vent BW Grazing T0", "Near vent BW Grazing T24",
"Near vent BW Grazing T36", "Mt Edwards Plume in situ sterivex", "Mt Edwards Vent in situ SUPR",
"Mt Edwards Vent Grazing T0", "Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ SUPR",
"Venti Latte Vent Grazing T0", "Venti Latte Vent Grazing T36", "Candelabra Plume in situ sterivex",
"Candelabra Vent in situ SUPR", "Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ SUPR",
"SirVentsAlot Vent Grazing T24")
<- c("Shallow seawater in situ", "Deep seawater in situ", "Near vent BW in situ",
sample_name_all "Near vent BW Grazing T0", "Near vent BW Grazing T24", "Near vent BW Grazing T36",
"Mt Edwards Plume in situ", "Mt Edwards Vent in situ", "Mt Edwards Vent Grazing T0",
"Mt Edwards Vent Grazing T36", "Venti Latte Vent in situ", "Venti Latte Vent Grazing T0",
"Venti Latte Vent Grazing T36", "Candelabra Plume in situ", "Candelabra Vent in situ",
"Candelabra Vent Grazing T24", "SirVentsAlot Vent in situ", "SirVentsAlot Vent Grazing T24")
$SAMPLE_ORDER <- factor(gr_tax_res$sample, levels = sample_order_all, labels = sample_name_all) gr_tax_res
Perform CLR transformation.
# Format input data frame to plot heatmap across resident and cosmopolitan populations.
<- gr_tax_res %>%
tax_key select(Broad_Taxa, Taxa2) %>%
distinct() %>%
data.frame
<- gr_tax_res %>%
expand_tmp expand(SAMPLE = sample, RES_COSMO = DIST_simple, Taxa2) %>%
filter(!(RES_COSMO == "Other"))
# Subsetting to taxa of interest for the geom_tile:
# RM
<- c("Unassigned-Eukaryote","Ciliates-Other", "Spirotrichea-Other", "Alveolata-Other", "Rhizaria-Other", "Stramenopiles-Other", "Stramenopiles-Bicoecea", "Amoebozoa", "Apusozoa", "Hacrobia-Other", "Archaeplastida-Other", "Archaeplastida-Streptophyta", "Archaeplastida-Chlorophyta")
others
# Perform first formatting of data
<- gr_tax_res %>%
gr_relAbun_toheat_ONE # Remove unwanted samples - controls and ASVs that are not classified as cosmopolitan or resident
filter(!(Sampletype == "Control")) %>%
filter(!(DIST_simple == "Other")) %>%
filter(Broad_Taxa != "Opisthokonta") %>%
filter(!(Taxa2 %in% others)) %>%
# Average across replicates - ASV sequence count averaged when ASV appears in replicate samples
group_by(Feature.ID, RES_COSMO = DIST_simple, SAMPLE = sample, Broad_Taxa, Taxa2) %>%
summarise(COUNT_AVG2 = mean(COUNT_AVG)) %>%
ungroup() %>%
group_by(RES_COSMO, SAMPLE, Broad_Taxa, Taxa2) %>%
summarise(SUM_TAXA2 = sum(COUNT_AVG2),
RICHNESS_TAXA2 = n_distinct(Feature.ID)) %>%
data.frame
## `summarise()` regrouping output by 'Feature.ID', 'RES_COSMO', 'SAMPLE', 'Broad_Taxa' (override with `.groups` argument)
## `summarise()` regrouping output by 'RES_COSMO', 'SAMPLE', 'Broad_Taxa' (override with `.groups` argument)
# Get CLR transformed data
# head(gr_relAbun_toheat_ONE)
<- gr_relAbun_toheat_ONE %>%
gr_taxa2_clr select(RES_COSMO, SAMPLE, Broad_Taxa, Taxa2, SUM_TAXA2) %>%
pivot_wider(names_from = c("RES_COSMO", "SAMPLE"), values_from = SUM_TAXA2) %>%
unite(tax, Broad_Taxa, Taxa2, sep = "_") %>%
column_to_rownames(var = "tax") %>%
data.frame# Perform CLR
<- as.data.frame(clr(gr_taxa2_clr))
gr_taxa2_clr_df # Reformat CLR data
<- gr_taxa2_clr_df %>%
gr_taxa2_clr_df_mod rownames_to_column(var = "tax") %>%
separate(tax, c("Broad_Taxa", "Taxa2"), sep = "_") %>%
pivot_longer(cols = starts_with(c("Cosmopolitan", "Resident")), values_to = "CLR") %>%
separate(name, c("RES_COSMO", "SAMPLE_TMP"), sep = "_") %>%
mutate(SAMPLE = gsub("\\.", " ", SAMPLE_TMP)) %>%
select(-SAMPLE_TMP) %>%
data.frame
# Perform second series of data re-formatting
<- gr_relAbun_toheat_ONE %>%
gr_relAbun_toheat group_by(Broad_Taxa) %>%
mutate(SUM_BROAD_TAXA = sum(SUM_TAXA2),
RelAbun_Broad_Taxa = (100*(SUM_TAXA2/SUM_BROAD_TAXA))) %>%
ungroup() %>%
group_by(Taxa2) %>%
mutate(SUM_ALL_TAXA2 = sum(SUM_TAXA2),
RelAbun_Taxa2 = (100*(SUM_TAXA2/SUM_ALL_TAXA2))) %>%
select(-Broad_Taxa) %>% #Temporarily remove
# Artifically match with expanded columns to include complete dataset
right_join(expand_tmp) %>%
# Re-join with taxonomic information and replace NAs with zeroes
left_join(tax_key) %>%
left_join(gr_taxa2_clr_df_mod) %>%
replace(is.na(.), 0) %>%
# Need to replace TRUE zero with NA in CLR column
mutate(CLR = ifelse(
== 0, NA, CLR
SUM_TAXA2 %>%
)) separate(SAMPLE, c("locale", "site", "type"), sep = " ", remove = FALSE) %>%
select(-site, -type) %>%
data.frame
## Joining, by = c("RES_COSMO", "SAMPLE", "Taxa2")
## Joining, by = "Taxa2"
## Joining, by = c("RES_COSMO", "SAMPLE", "Taxa2", "Broad_Taxa")
## Warning: Expected 3 pieces. Additional pieces discarded in 3528 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# View(gr_relAbun_toheat)
# Re-factor
<- c("Colpodea", "Heterotrichea", "Karyorelictea", "Litostomatea",
tax2_order_all "Nassophorea", "Oligohymenophorea", "Phyllopharyngea", "Plagiopylea", "Prostomatea",
"Spirotrichea-Choreotrichida", "Spirotrichea-Euplotia", "Spirotrichea-Hypotrichia",
"Spirotrichea-Other", "Spirotrichea-Strombidiida", "Spirotrichea-Tintinnida",
"Ciliates-Other", "Gonyaulacales", "Gymnodiniales", "Noctilucales", "Peridiniales",
"Prorocentrales", "Suessiales", "Torodiniales", "Apicomplexa", "Syndiniales Dino-Groups (I-V)",
"Alveolata-Other", "Amoebozoa", "Breviatea", "Lobosa", "Apusomonadidae", "Apusozoa",
"Hilomonadea", "Mantamonadidea", "Discoba", "Metamonada", "Hacrobia-Cryptophyta",
"Hacrobia-Haptophyta", "Centroheliozoa", "Katablepharidophyta", "Picozoa", "Telonemia",
"Hacrobia-Other", "Archaeplastida-Chlorophyta", "Archaeplastida-Streptophyta",
"Archaeplastida-Other", "Ascomycota", "Basidiomycota", "Chytridiomycota", "Microsporidiomycota",
"Opisthokonta-Fungi", "Annelida", "Arthropoda", "Cnidaria", "Craniata", "Ctenophora",
"Echinodermata", "Gastrotricha", "Mollusca", "Nematoda", "Nemertea", "Opisthokonta-Metazoa",
"Platyhelminthes", "Porifera", "Rotifera", "Urochordata", "Choanoflagellatea",
"Filasterea", "Ichthyosporea", "Opisthokonta-Other", "Endomyxa", "Filosa", "Filosa-Granofilosea",
"Filosa-Imbricatea", "Filosa-Sarcomonadea", "Filosa-Thecofilosea", "Rhizaria-Cercozoa",
"Polycystinea-Collodaria", "Polycystinea-Nassellaria", "Polycystinea-Spumellarida",
"RAD-A", "RAD-B", "RAD-C", "Rhizaria-Acantharea", "Rhizaria-Radiolaria", "Rhizaria-Other",
"MAST", "Bacillariophyta", "Bolidophyceae", "Chrysophyceae", "Dictyochophyceae",
"MOCH", "Pelagophyceae", "Synurophyceae", "Caecitellaceae", "Cafeteriaceae",
"Stramenopiles-Other", "Unassigned-Eukaryote")
<- c("Colpodea", "Heterotrichea", "Karyorelictea", "Litostomatea",
tax2_LABEL_all "Nassophorea", "Oligohymenophorea", "Phyllopharyngea", "Plagiopylea", "Prostomatea",
"Spirotrichea-Choreotrichida", "Spirotrichea-Euplotia", "Spirotrichea-Hypotrichia",
"Spirotrichea-Other", "Spirotrichea-Strombidiida", "Spirotrichea-Tintinnida",
"Ciliates-Other", "Gonyaulacales", "Gymnodiniales", "Noctilucales", "Peridiniales",
"Prorocentrales", "Suessiales", "Torodiniales", "Apicomplexa", "Syndiniales Dino-Groups (I-V)",
"Alveolates-Other", "Amoebozoa", "Breviatea", "Lobosa", "Apusomonadidae", "Apusozoa",
"Hilomonadea", "Mantamonadidea", "Discoba", "Metamonada", "Hacrobia-Cryptophytes",
"Hacrobia-Haptophytes", "Centroheliozoa", "Katablepharidophyta", "Picozoa", "Telonemia",
"Hacrobia-Other", "Archaeplastid-Chlorophytes", "Archaeplastid-Streptophytes",
"Archaeplastid-Other", "Ascomycota", "Basidiomycota", "Chytridiomycota", "Microsporidiomycota",
"Opisthokonta-Fungi", "Annelida", "Arthropoda", "Cnidaria", "Craniata", "Ctenophora",
"Echinodermata", "Gastrotricha", "Mollusca", "Nematoda", "Nemertea", "Opisthokonta-Metazoa",
"Platyhelminthes", "Porifera", "Rotifera", "Urochordata", "Choanoflagellatea",
"Filasterea", "Ichthyosporea", "Opisthokonta-Other", "Endomyxa", "Filosa", "Filosa-Granofilosea",
"Filosa-Imbricatea", "Filosa-Sarcomonadea", "Filosa-Thecofilosea", "Rhizaria-Cercozoa",
"Polycystine-Collodaria", "Polycystine-Nassellaria", "Polycystine-Spumellarida",
"RAD-A", "RAD-B", "RAD-C", "Rhizaria-Acantharea", "Rhizaria-Radiolaria", "Rhizaria-Other",
"MAST", "Bacillariophyta", "Bolidophyceae", "Chrysophyceae", "Dictyochophyceae",
"MOCH", "Pelagophyceae", "Synurophyceae", "Caecitellaceae", "Cafeteriaceae",
"Stramenopiles-Other", "Unassigned-Eukaryote")
$TAXA2_ORDER <- factor(gr_relAbun_toheat$Taxa2, levels = rev(tax2_order_all),
gr_relAbun_toheatlabels = rev(tax2_LABEL_all))
<- c("Alveolata-Ciliates", "Alveolata", "Rhizaria", "Stramenopiles",
broad_taxa_order "Amoebozoa", "Apusozoa", "Excavata", "Hacrobia", "Archaeplastida", "Opisthokonta",
"Unassigned-Eukaryote")
<- c("Alveolates-Ciliates", "Alveolates", "Rhizaria", "Stramenopiles",
broad_taxa_order_label "Amoebozoa", "Apusozoa", "Excavates", "Hacrobia", "Archaeplastid", "Opisthokonts",
"Unassigned-Eukaryote")
$Broad_ORDER <- factor(gr_relAbun_toheat$Broad_Taxa, levels = broad_taxa_order,
gr_relAbun_toheatlabels = broad_taxa_order_label)
# Factor sample order
$SAMPLENAME <- factor(gr_relAbun_toheat$SAMPLE, levels = sample_order_all,
gr_relAbun_toheatlabels = sample_name_all)
# head(gr_relAbun_toheat)
Generate tile plot function with CLR heat map and bubble plot representing ASV richness.
# Make geom tile plot relative abundance
<- ggplot(gr_relAbun_toheat, aes(x = SAMPLENAME, fill = RelAbun_Taxa2, y = TAXA2_ORDER)) +
tile_tax geom_tile(color = "white") + theme(legend.position = "right", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",
size = 8), axis.text.y = element_text(color = "black", size = 8), strip.background = element_blank(),
strip.text.y = element_text(hjust = 1, vjust = 0.5, angle = 0), strip.text.x = element_blank(),
legend.title = element_blank()) + labs(x = "", y = "") + facet_grid(Broad_ORDER ~
space = "free", scales = "free")
RES_COSMO, # ?geom_tile
# Richness plot
<- gr_relAbun_toheat %>% filter(RelAbun_Taxa2 > 0) %>% ggplot(aes(x = RES_COSMO,
richness_plot y = TAXA2_ORDER)) + geom_point(aes(size = RICHNESS_TAXA2)) + scale_size_continuous(range = c(0.1,
5)) + facet_grid(Broad_ORDER ~ RES_COSMO, space = "free", scales = "free") +
theme(legend.position = "right", panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank(), axis.text.x = element_text(angle = 90,
size = 8), axis.text.y = element_text(size = 8), strip.text = element_blank(),
strip.background = element_blank()) + labs(x = "", y = "") + guides(size = guide_legend(title = "Total ASVs")) +
coord_cartesian(clip = "off")
# richness_plot
<- ggplot(gr_relAbun_toheat, aes(x = SAMPLENAME, fill = CLR, y = TAXA2_ORDER)) +
tile_tax_CLR geom_tile(color = "#bdbdbd") + theme(legend.position = "right", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",
size = 8), axis.text.y = element_text(color = "black", size = 8), strip.background = element_blank(),
strip.text.y = element_text(hjust = 0, vjust = 0.5, angle = 0), strip.text.x = element_text(hjust = 0,
vjust = 0.5, angle = 0)) + labs(x = "", y = "") + facet_grid(Broad_ORDER ~
space = "free", scales = "free") RES_COSMO,
Figure 3 plot
<- c("#8e0152", "#c51b7d", "#de77ae", "#f1b6da", "#f7f7f7", "#b8e186",
green_pink "#7fbc41", "#4d9221", "#276419")
# svg('figs/Tileplot-CLR-21-04-2021.svg', h = 8, w = 16)
plot_grid(tile_tax_CLR %+% subset(gr_relAbun_toheat, RelAbun_Taxa2 > 0) + scale_fill_gradientn(colors = rev(green_pink),
na.value = NA), richness_plot, ncol = 2, axis = c("tb"), align = c("hv"))
# dev.off()
Extra tile plot for presentations with the insitu samples only.
# head(gr_relAbun_toheat) svg('insitu-tileplot.svg', h=8, w = 13) plot_grid(
# tile_tax_CLR %+% subset(gr_relAbun_toheat, (RelAbun_Taxa2 > 0 & grepl('in
# situ', SAMPLENAME))) + scale_fill_gradientn(colors = rev(green_pink), na.value
# = NA), richness_plot, ncol = 2, axis = c('tb'), align = c('hv')) dev.off()
Import 16S tag-sequence results (ASV table) and metadata. Modify sample names and conduct taxonomic curation.
# Import metadata for 16S
<- read.delim("data-input/ventnames-gordaridge-16S.txt")
ventnames_16 # View(ventnames_16)
<- ventnames_16 %>% mutate(location = case_when(grepl("NA108001",
ventnames_16_mod ~ "NearVent", grepl("NA108036", SAMPLEID_16S) ~ "Plume", grepl("NA108096",
SAMPLEID_16S) ~ "Plume", grepl("BSW", SAMPLE_AMY) ~ "BSW", grepl("Vent", LocationName) ~
SAMPLEID_16S) "Vent"), NA_NUM = SAMPLEID_16S) %>% mutate(NA_NUM = str_replace(NA_NUM, "NA108",
"")) %>% mutate(NA_NUM = str_replace(NA_NUM, "NA080", "")) %>% mutate(NA_NUM = str_replace(NA_NUM,
"aSTEP", "")) %>% mutate(NA_NUM = str_replace(NA_NUM, "bSTEP", "")) %>% mutate(NA_NUM = str_replace(NA_NUM,
"STEP20200226", "")) %>% mutate(NA_NUM = str_replace(NA_NUM, "STEP", "")) %>%
unite(NEW_SAMPLEID, location, NA_NUM, sep = "") %>% mutate(LocationName = case_when(grepl("NearVent",
~ "Near vent BW", NEW_SAMPLEID == "Plume036" ~ "Candelabra Plume",
NEW_SAMPLEID) == "Plume096" ~ "Mt Edwards Plume", grepl("SirVentsAlot", LocationName) ~
NEW_SAMPLEID "Sir Ventsalot", TRUE ~ as.character(LocationName))) %>% data.frame
# View(ventnames_16_mod)
<- read.delim("data-input/CountTable-wtax-16s-plus3-2020-06-23.txt")
countbac
# Remove samples that were repeated
<- c("NA108003STEP", "NA108039STEP", "NA108087STEP") rm
Report stats on 16S tag-sequence data.
<- countbac %>% select(-all_of(rm)) %>% pivot_longer(starts_with("NA"), names_to = "SAMPLEID_16S") %>%
tmp left_join(ventnames_16_mod) %>% data.frame
## Joining, by = "SAMPLEID_16S"
sum(tmp$value) # total sequences
## [1] 1190997
length(unique(tmp$Feature.ID)) # Total ASVs
## [1] 6532
<- countbac %>%
bac_df_plot separate(Taxon, sep = ";D_[[:digit:]]__", into = c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species"), remove = TRUE, extra = "merge") %>% # Warnings are OK with NAs
mutate_if(is.character, str_replace_all, pattern = "D_0__", replacement = "") %>%
filter(Domain %in% "Archaea" | Domain %in% "Bacteria") %>% #Select only archaea and bacteria, removing unassigned
select(-all_of(rm)) %>% # Remove samples we are replacing
pivot_longer(starts_with("NA"), names_to = "SAMPLEID_16S") %>%
left_join(ventnames_16_mod) %>%
data.frame# head(bac_df_plot)
Sequence and ASV stats for 16S amplicon dataset
# head(bac_df_plot)
sum(bac_df_plot$value) # total sequences
## [1] 1190058
length(unique(bac_df_plot$Feature.ID)) # total ASVs
## [1] 6497
Subset 16S ASVs by abundance.
<- bac_df_plot %>% ungroup() %>% mutate(TOTALSEQ = sum(value)) %>%
bac_df_filtered group_by(Feature.ID) %>% summarise(SUM = sum(value), RELABUN = SUM/TOTALSEQ) %>%
filter(RELABUN >= 0.001) %>% add_column(KEEP = "YES") %>% right_join(bac_df_plot) %>%
filter(KEEP == "YES") %>% data.frame
## `summarise()` regrouping output by 'Feature.ID' (override with `.groups` argument)
## Joining, by = "Feature.ID"
Here we are curating the 16S taxonomy assignments so we can get an informative look at the in situ bacteria population diversity and biogeography. Places ASVs below a user designated THRESHOLD into the “Other” category - ASVs that make up < X% of the total data set. For this work, a threshold of 0.1% was used. The other curation of taxonomic assignment was to highlight those groups known to inhabit the region or other chemosynthetic habitats.
Function to modify 16S taxonomy
# Add a column for updated taxonomy name
<- function(df, THRESHOLD){
curate_16s_tax # List the class and genus level designations that should be named at class level
<- c("Alphaproteobacteria", "Deltaproteobacteria", "Gammaproteobacteria", "Nitrososphaeria", "Thermoplasmata")
class <- c("Arcobacter","Campylobacter","Hydrogenimonas","Nitratiruptor","Nitratifractor","Sulfurovum","Sulfurimonas","Caminibacter", "Psychrilyobacter", "SUP05 cluster")
genus # List the appropriate taxonomic names for this whole level to be placed into "Other" category
<- c("Verrucomicrobiae")
class_other <- c("Planctomycetes", "Poribacteria", "Cyanobacteria", "WPS-2")
phylum_other <- c("Synechococcales")
order_other <- sum(df$value) # total number of sequences
totalsumseq <- df %>%
tmp_filter group_by(Feature.ID) %>%
summarise(SUM = sum(value)) %>%
mutate(RELABUN = 100*(SUM/totalsumseq)) %>%
filter(RELABUN >= THRESHOLD) %>% #User-defined relabun threshold
data.frame<- as.character(unique(tmp_filter$Feature.ID))
keep_asvs_relabun <- df %>%
df2 mutate(Tax_update = Phylum) %>% # Default to filling new taxa level to phylum
mutate(Tax_update = ifelse(Feature.ID %in% keep_asvs_relabun, Tax_update, "Other"), # Change name to other if it falls below relative abundance Threshold
Tax_update = ifelse(Class %in% class, paste(Phylum, Class, sep = "-"), Tax_update),
Tax_update = ifelse(Order == "Methylococcales", paste(Phylum, "Methylococcales", sep = "-"), Tax_update),
Tax_update = ifelse(Order == "Oceanospirillales", paste(Phylum, "Oceanospirillales", sep = "-"), Tax_update),
Tax_update = ifelse(Order == "Thioglobaceae", paste(Phylum, "Thioglobaceae", sep = "-"), Tax_update),
Tax_update = ifelse(Family == "Nitrospinaceae", paste(Phylum, "Nitrospinaceae", sep = "-"), Tax_update),
Tax_update = ifelse(Class %in% class_other, "Other", Tax_update),
Tax_update = ifelse(Phylum %in% phylum_other, "Other", Tax_update),
Tax_update = ifelse(Order %in% order_other, "Other", Tax_update),
Tax_update = ifelse(Genus %in% genus, paste(Phylum, Genus, sep = "-"), Tax_update))
return(df2)
}
Removal of known Kitome contamination.
# head(bac_df_plot) # Add updated taxa list to this dataframe
# unique(bac_df_plot$LocationName)
<- curate_16s_tax(bac_df_plot %>% filter(!(Genus == "Ralstonia")),
bac_wcuratedtax 0.1) #Will place ASVs <0.1% abundance into 'Other category'
# unique(bac_wcuratedtax$Tax_update) length(unique(bac_wcuratedtax$Tax_update))
<- bac_wcuratedtax %>% select(Tax_update, Domain:Species) %>% distinct()
tax_16s # write_delim(tax_16s, path = 'tax-key-16s-21-08-2020.txt', delim = '\t')
# Average sequence count for ASVs across replicates (by location name)
<- bac_wcuratedtax %>% # Average ASV seq count across replicates
bac_gr_avg group_by(Feature.ID, LocationName, Tax_update) %>% summarise(AVG_count = mean(value)) %>%
data.frame
## `summarise()` regrouping output by 'Feature.ID', 'LocationName' (override with `.groups` argument)
# write_delim(bac_gr_avg, path = 'data-input/16s-gr-data-curated-avg.txt', delim
# = '\t')
Average across replicates and sum to taxa
# update exisiting taxonomy
<- bac_wcuratedtax %>% # Average ASV seq count across replicates
bac_wcuratedtax_toplot group_by(Feature.ID, LocationName, Tax_update) %>% summarise(AVG_count = mean(value)) %>%
ungroup() %>% group_by(LocationName, Tax_update) %>% summarise(SUM_bytax = sum(AVG_count)) %>%
data.frame
# unique(bac_wcuratedtax_toplot$LocationName)
$LOCATION <- factor(bac_wcuratedtax_toplot$LocationName, levels = c("Shallow seawater",
bac_wcuratedtax_toplot"Deep seawater", "Near vent BW", "Mt Edwards Plume", "Mt Edwards Vent", "Venti Latte Vent",
"Sir Ventsalot", "Candelabra Plume", "Candelabra Vent", "Purple Rain Vent", "Octopus Springs Vent",
"Blue Ciliate"))
Generate 16S bar plots
<- c("#7fcdbb", "#014636", "#41b6c4", "#016c59", "#d0d1e6", "#02818a",
tax_color "#a6bddb", "#3690c0", "#00441b", "#99d8c9", "#006d2c", "#66c2a4", "#238b45",
"#c7e9b4", "#fdd0a2", "#fd8d3c", "#d94801", "#7f2704", "#dadaeb", "#54278f",
"#bcbddc", "#6a51a3", "#9e9ac8", "#8c6bb1", "#737373")
<- c("Epsilonbacteraeota-Arcobacter", "Epsilonbacteraeota-Caminibacter",
tax_order "Epsilonbacteraeota-Campylobacter", "Epsilonbacteraeota-Hydrogenimonas", "Epsilonbacteraeota-Nitratifractor",
"Epsilonbacteraeota-Nitratiruptor", "Epsilonbacteraeota-Sulfurimonas", "Epsilonbacteraeota-Sulfurovum",
"Proteobacteria-Alphaproteobacteria", "Proteobacteria-Deltaproteobacteria", "Proteobacteria-Gammaproteobacteria",
"Proteobacteria-Methylococcales", "Proteobacteria-Oceanospirillales", "Proteobacteria-SUP05 cluster",
"Acidobacteria", "Actinobacteria", "Aquificae", "Bacteroidetes", "Chloroflexi",
"Thaumarchaeota-Nitrososphaeria", "Euryarchaeota-Thermoplasmata", "Fusobacteria-Psychrilyobacter",
"Marinimicrobia (SAR406 clade)", "Nitrospinae-Nitrospinaceae", "Other")
names(tax_color) <- tax_order
$TAX_ORDER <- factor(bac_wcuratedtax_toplot$Tax_update, levels = tax_order)
bac_wcuratedtax_toplot
<- function(df) {
barplot_16s ggplot(df, aes(x = LOCATION, y = SUM_bytax, fill = TAX_ORDER)) + geom_bar(stat = "identity",
position = "fill", color = "black") + scale_fill_manual(values = tax_color) +
scale_y_continuous(expand = c(0, 0)) + theme(legend.position = "right", panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, color = "black",
face = "bold", size = 12), axis.text.y = element_text(color = "black",
face = "bold", size = 12), strip.text = element_blank(), legend.title = element_blank()) +
labs(x = "", y = "Relative abundance") + facet_grid(. ~ LOCATION, space = "free",
scales = "free") + guides(fill = guide_legend(ncol = 1))
}
Remove unused samples and make supplementary bar plot.
<- c("Purple Rain Vent", "Octopus Springs Vent", "Blue Ciliate")
rm_loc barplot_16s(bac_wcuratedtax_toplot %>% filter(!(LOCATION %in% rm_loc)))
16S barplot with all samples.
# barplot_16s(bac_wcuratedtax_toplot)
Format ahead of data transformation of 16S analysis.
# head(bac_wcuratedtax)
<- c("Purple Rain Vent", "Octopus Springs Vent", "Blue Ciliate")
rm_loc
<- bac_wcuratedtax %>% type.convert(as.is = TRUE) %>% filter(!LocationName %in%
bac_df_num %>% unite(SAMPLENAME, LocationName, SAMPLEID, NEW_SAMPLEID, SAMPLEID_16S,
rm_loc) sep = "-") %>% select(Feature.ID, SAMPLENAME, value) %>% pivot_wider(names_from = SAMPLENAME,
values_from = value, values_fill = 0) %>% column_to_rownames(var = "Feature.ID") %>%
as.matrix
CLR transformation for 16S amplicons.
library(compositions)
<- data.frame(clr(t(bac_df_num)))
df_log_clr
# Ordination - PCA
<- prcomp(df_log_clr)
pca_clr
# Check variance
<- (pca_clr$sdev^2)/sum(pca_clr$sdev^2)
check_variance
# Screeplot, how many axes are appropriate?
barplot(check_variance, main = "Log-Ratio PCA Screeplot", xlab = "PC Axis", ylab = "% Variance",
cex.names = 1.5, cex.axis = 1.5, cex.lab = 1.5, cex.main = 1.5)
Extract samples and include metadata
# Convert to dataframe and parse metadata
<- data.frame(pca_clr$x, SAMPLENAME = rownames(pca_clr$x))
df_pca_clr
<- df_pca_clr %>% separate(SAMPLENAME, c("LocationName", "Sampletype",
df_pca_clr_wnames "SAMPLEID", "Fastq"), sep = "-") %>% data.frame
# head(df_pca_clr_wnames)
unique(df_pca_clr_wnames$LocationName)
## [1] "Near vent BW" "Deep seawater" "Mt Edwards Vent" "Candelabra Plume"
## [5] "Venti Latte Vent" "Shallow seawater" "Candelabra Vent" "Mt Edwards Plume"
## [9] "Sir Ventsalot"
Factor for plotting
<- c("Candelabra Vent", "Mt Edwards Vent", "Sir Ventsalot",
sample_order_all_16s "Venti Latte Vent", "Deep seawater", "Shallow seawater", "Near vent BW", "Candelabra Plume",
"Mt Edwards Plume")
<- c("#dfa837", "#61ac86", "#ce536b", "#711518", "#413f44", "#bfbbb0", "#6f88af",
all "#dfa837", "#61ac86")
names(sample_color_all) <- sample_order_all_16s
<- c(21, 21, 21, 21, 22, 22, 23, 24, 24)
shapes $SAMPLE_ORDER <- factor(df_pca_clr_wnames$LocationName, levels = sample_order_all_16s) df_pca_clr_wnames
<- ggplot(df_pca_clr_wnames, aes(x = PC1, y = PC2, fill = SAMPLE_ORDER, shape = SAMPLE_ORDER,
pca_16s color = SAMPLE_ORDER)) + geom_point(aes(x = PC1, y = PC2, fill = SAMPLE_ORDER,
shape = SAMPLE_ORDER, color = SAMPLE_ORDER), size = 4) + scale_fill_manual(values = sample_color_all) +
scale_color_manual(values = sample_color_all) + scale_shape_manual(values = shapes) +
ylab(paste0("PC2 ", round(check_variance[2] * 100, 2), "%")) + xlab(paste0("PC1 ",
round(check_variance[1] * 100, 2), "%")) + ggtitle("16S - CLR PCA Ordination") +
theme_bw() + theme(axis.text = element_text(color = "black", size = 12), legend.title = element_blank()) +
geom_hline(yintercept = 0) + geom_vline(xintercept = 0)
Supplementary plot with 16S information.
# svg('16s-panel-supplementary.svg', w=16, h=8)
plot_grid(barplot_16s(bac_wcuratedtax_toplot %>% filter(!(LOCATION %in% rm_loc))),
axis = c("tblr"), align = c("hv"), labels = c("a", "b")) pca_16s,
# dev.off()
Use these ASVs downstream to explore hypotheses with correlation results. Below set up 16S and 18S rRNA gene output data as phyloseq objects to import into SPIEC-EASI. Following SPIEC-EASI analysis, export as dataframe, add metadata, and process.
Format input 18S and 16S data, save for correlation analysis. Only in situ samples that were found in both the 18S rRNA and 16S rRNA gene amplicon results were considered for the network analysis. Both datasets were subsampled to include ASVs that appeared in at least 3 samples, had at least 50 sequences each, and made up at least 0.001% of the sequenced reads.
# Import curated 16S and 18S datasets
load("data-input/GR-countinfo-withASVdistribution.RData", verbose = T)
## Loading objects:
## gr_stats_wtax_toplot
## gr_stats_wtax
## gr_dist_grazing
## gr_dist
<- read.delim("data-input/16s-gr-data-curated-avg.txt") bac_wtax
Subset in situ samples.
# unique(gr_stats_wtax$LocationName) unique(bac_wtax$LocationName)
<- c("Purple Rain Vent", "Octopus Springs Vent", "Blue Ciliate")
rm_loc <- bac_wtax %>% filter(!(LocationName %in% rm_loc)) %>% mutate(LocationName = case_when(LocationName ==
bac_wtax_mod "Sir Ventsalot" ~ "SirVentsAlot Vent", TRUE ~ as.character(LocationName)))
# Sort and filter eukaryote ASVs to consider:
<- sum(gr_stats_wtax$COUNT_AVG)
sumseq <- c("Opisthokonta-Fungi", "Opisthokonta-Other", "Opisthokonta-Metazoa")
metaz # head(gr_stats_wtax)
<- gr_stats_wtax %>%
euk_data_interact type.convert(as.is = TRUE) %>%
filter(Sampletype == "in situ") %>% #select only in situ samples
filter(!Taxa %in% metaz) %>%
filter(!Taxa == "Unassigned-Eukaryote") %>%
select(Feature.ID, Taxon_updated, COUNT_AVG, LocationName) %>%
group_by(Feature.ID, Taxon_updated, LocationName) %>%
summarise(COUNT_TOTAL = sum(COUNT_AVG)) %>%
ungroup() %>%
# Calculate relative abundance
mutate(RelAbun = 100*(COUNT_TOTAL/sumseq)) %>%
# Remove ASVs ahead of network analysis
group_by(Feature.ID) %>%
filter(RelAbun > 0.001) %>%
mutate(sample_appear = n_distinct(LocationName)) %>% #Calculate how many times an ASV appears
filter(sample_appear > 3) %>% #ASV must appear in at least 3 samples
filter(COUNT_TOTAL >= 50) %>% #ASV must have at least 10 sequences
add_column(domain = "euk") %>%
unite(FEATURE, domain, Feature.ID, sep = "_", remove = TRUE) %>%
select(FEATURE, LocationName, Taxon_EUK = Taxon_updated, COUNT = COUNT_TOTAL) %>%
data.frame
## `summarise()` regrouping output by 'Feature.ID', 'Taxon_updated' (override with `.groups` argument)
length(unique(gr_stats_wtax$Feature.ID)); length(unique(euk_data_interact$FEATURE))
## [1] 9028
## [1] 328
# View(euk_data_interact)
# head(euk_data_interact)
<- sum(bac_wtax_mod$AVG_count)
sumseq <- unique(gr_stats_wtax$LocationName)
locations_gr
<- bac_wtax_mod %>%
bac_data_interact filter(LocationName %in% locations_gr) %>%
filter(!(Tax_update == "Other")) %>% #Remove "other"
group_by(Feature.ID, Tax_update, LocationName) %>%
summarise(COUNT_TOTAL = sum(AVG_count)) %>%
ungroup() %>%
add_column(domain = "prok") %>%
# Calculate relative abundance
mutate(RelAbun = 100*(COUNT_TOTAL/sumseq)) %>%
# Remove ASVs ahead of network analysis
group_by(Feature.ID) %>%
filter(RelAbun > 0.001) %>%
mutate(sample_appear = n_distinct(LocationName)) %>% #Calculate how many times an ASV appears
filter(sample_appear > 3) %>% #ASV must appear in at least 3 samples
filter(COUNT_TOTAL >= 50) %>% #ASV must have at least 10 sequences
unite(FEATURE, domain, Feature.ID, sep = "_", remove = TRUE) %>%
select(FEATURE, LocationName, Taxon_BAC = Tax_update, COUNT = COUNT_TOTAL) %>%
data.frame
## `summarise()` regrouping output by 'Feature.ID', 'Tax_update' (override with `.groups` argument)
length(unique(bac_wtax$Feature.ID)); length(unique(bac_data_interact$FEATURE))
## [1] 3650
## [1] 152
Save subset 16S and 18S data.
# save(euk_data_interact, bac_data_interact, file =
# 'data-input/Filtered-correlation-R-objects-10-11-2020.RData')
# load('data-input/Filtered-correlation-R-objects-10-11-2020.RData')
Phyloseq import for 18S data.
<- euk_data_interact %>% pivot_wider(names_from = LocationName, values_from = COUNT,
euk_df values_fill = 0) %>% select(order(colnames(.))) %>% data.frame
# head(euk_df)
<- as.matrix(select(euk_df, -Taxon_EUK) %>% column_to_rownames(var = "FEATURE"))
euk_asv <- as.matrix(select(euk_df, FEATURE, Taxon_EUK) %>% column_to_rownames(var = "FEATURE"))
euk_tax # head(bac_asv); head(bac_tax)
row.names(euk_asv) <- row.names(euk_tax)
# Phyloseq import
<- otu_table(euk_asv, taxa_are_rows = TRUE)
euk_asv_table <- tax_table(euk_tax)
euk_tax_table <- phyloseq(euk_asv_table, euk_tax_table)
euk_phy euk_phy
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 328 taxa and 9 samples ]
## tax_table() Taxonomy Table: [ 328 taxa by 1 taxonomic ranks ]
Phyloseq import for 16S data.
<- bac_data_interact %>% pivot_wider(names_from = LocationName, values_from = COUNT,
bac_df values_fill = 0) %>% select(order(colnames(.))) %>% data.frame
<- as.matrix(select(bac_df, -Taxon_BAC) %>% column_to_rownames(var = "FEATURE"))
bac_asv <- as.matrix(select(bac_df, FEATURE, Taxon_BAC) %>% column_to_rownames(var = "FEATURE"))
bac_tax # head(bac_asv); head(bac_tax)
row.names(bac_asv) <- row.names(bac_tax)
# Phyloseq import
<- otu_table(bac_asv, taxa_are_rows = TRUE)
bac_asv_table <- tax_table(bac_tax)
bac_tax_table <- phyloseq(bac_asv_table, bac_tax_table)
bac_phy bac_phy
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 152 taxa and 9 samples ]
## tax_table() Taxonomy Table: [ 152 taxa by 1 taxonomic ranks ]
Save 16S and 18S phyloseq R objects to run Spiec-easi.
# save phyloseq objects to run SpiecEasi in another script
save(bac_phy, euk_phy, file = "data-input/phyloseq-18s-16s-12-11-2020.RData")
# save(bac_phy_tmp, euk_phy_tmp, file = 'phyloseq-18s-16s-TMP.RData') ?spiec.easi
18S rRNA and 16S rRNA gene datasets were each center-log ratio transformed then SPIEC-EASI was run using the Meinshausen-Buhlmann’s neighborhood selection estimation method.
SPIEC EASI was run on the HPC, due to computational load. Commented out commands below detail the run process, where both 16S and 18S data were used. Save output and bring locally below.
# library(SpiecEasi) ?spiec.easi Cross Domain approach se_GR <-
# spiec.easi(list(bac_phy, euk_phy), method = 'mb', nlambda = 40,
# lambda.min.ratio = 1e-2, pulsar.params = list(thresh = 0.05))
## Check output getStability(se_GR) sum(getRefit(se_GR))/2
## Extract weighted matrix se.beta <- as.matrix(symBeta(getOptBeta(se_GR)))
## df_beta <- as.data.frame(se.beta)
## Extract adajency matrix adj.mat <- getRefit(se_GR) df_adj <-
## as.data.frame(as.matrix(adj.mat))
## Assign names from original dataframes colnames(df_beta) <-
## colnames(se_GR$est$data) colnames(df_adj) <- colnames(se_GR$est$data)
## row.names(df_adj) <- colnames(se_GR$est$data) row.names(df_beta) <-
## colnames(se_GR$est$data)
## Save output save(df_adj, df_beta, se_GR, file =
## 'gr-spieceasi-output-20-08-2020.RData')
Transform into dataframes to look at relationship of pairs. Available on GitHub repo. Significant interactions to infer putative predator-prey relationships were determined by subsetting only interactions between 18S rRNA and 16S rRNA-derived ASVs.
# From SPEIC-EASI output run on HPC, import results:
load("data-input/gr-spieceasi-dfs-12-11-2020.RData", verbose = T)
## Loading objects:
## df_adj
## df_beta_weighted
load("data-input/gr-spieceasi-objs-12-11-2020.RData", verbose = T)
## Loading objects:
## se_GR
## adj.mat
## se.beta
Import taxonomy keys from previously formatted data tables.
<- read.delim("data-input/CountTable-wtax-16s-plus3-2020-06-23.txt")
countbac # colnames(countbac)
<- bac_data_interact %>% select(FEATURE, TAX_SHORT = Taxon_BAC) %>%
bac_data_interact_fulltax separate(FEATURE, c("domain", "Feature.ID"), sep = "_", remove = FALSE) %>% left_join(select(countbac,
TAX_FULL = Taxon)) %>% select(FEATURE, TAX_FULL, TAX_SHORT) %>% data.frame Feature.ID,
## Joining, by = "Feature.ID"
# Make taxonomy key
<- euk_data_interact %>% select(FEATURE, TAX_FULL = Taxon_EUK) %>% separate(FEATURE,
tax_key_se c("domain", "Feature.ID"), sep = "_", remove = FALSE) %>% left_join(select(gr_tax_res,
TAX_SHORT = Taxa, EUK_2 = Taxa2, EUK_DIST = DIST_simple)) %>% select(FEATURE,
Feature.ID, %>% bind_rows(bac_data_interact_fulltax) %>%
TAX_FULL, TAX_SHORT, EUK_2, EUK_DIST) distinct() %>% data.frame
## Joining, by = "Feature.ID"
# View(tax_key_se)
Format SPIEC EASI output. Select Prok-Euk and Euk-Prok interactiosn only.
<- function(df_in) {
reformat_spieceasi <- c("PROK-EUK", "EUK-PROK")
interaction %>% rownames_to_column(var = "SIDEA") %>% pivot_longer(cols = starts_with(c("prok",
df_in "euk")), names_to = "SIDEB") %>% mutate(domain_a = case_when(grepl("prok",
~ "PROK", grepl("euk", SIDEA) ~ "EUK"), domain_b = case_when(grepl("prok",
SIDEA) ~ "PROK", grepl("euk", SIDEB) ~ "EUK")) %>% mutate(COMBO = paste(domain_a,
SIDEB) sep = "-")) %>% mutate(COMBO_TYPE = case_when(COMBO %in% interaction ~
domain_b, "cross", TRUE ~ "same"), SIG_ID = paste(SIDEA, SIDEB, sep = "-")) %>% select(-starts_with("domain")) %>%
left_join(select(tax_key_se, TAX_SIDEA = TAX_FULL, everything()), by = c(SIDEA = "FEATURE")) %>%
left_join(select(tax_key_se, TAX_SIDEB = TAX_FULL, everything()), by = c(SIDEB = "FEATURE"),
suffix = c(".A", ".B")) %>% data.frame
}
<- reformat_spieceasi(df_adj)
df_adj_long <- reformat_spieceasi(df_beta_weighted) df_beta_long
Evaluate the range of weighted outputs from SpiecEasi. Determine if a threshold can be set.
# Get list of these parameters Adjacency matrix - binary, where 1 = significant
# interaction Boot strapped pvalue, showing weight of each correlation
<- df_adj_long %>% filter(value == 1) %>% filter(COMBO_TYPE == "cross") %>%
adj_sig select(everything(), Adjacency = value) %>% left_join(select(df_beta_long, SIG_ID,
Weight = value)) %>% data.frame
## Joining, by = "SIG_ID"
# colnames(adj_sig)
dim(adj_sig)
## [1] 1074 15
# head(df_adj_long) table(df_adj_long$value)
dim(adj_sig) # 1074 significant interactions that are cross-domain
## [1] 1074 15
head(adj_sig)
## SIDEA SIDEB
## 1 prok_01dd6ee6ebb76ef5250378057597a969 euk_704617bd30c6df21f779ff5300baf810
## 2 prok_0606870e7caf9d39f42f23dff84c6190 euk_908baaf2bec72eafc520025ef78d0b01
## 3 prok_08932eb86e915caa9c4034ae623d0f45 euk_607390a6a39c3a2bdd7ef41282083418
## 4 prok_08932eb86e915caa9c4034ae623d0f45 euk_75b879fa0e65e7dab54ceb63b5ce5ad3
## 5 prok_08932eb86e915caa9c4034ae623d0f45 euk_a3a866756aa4943b2f4dfbf95badcab0
## 6 prok_08932eb86e915caa9c4034ae623d0f45 euk_ac8ef156389ffd84799bf78d382a0595
## Adjacency COMBO COMBO_TYPE
## 1 1 PROK-EUK cross
## 2 1 PROK-EUK cross
## 3 1 PROK-EUK cross
## 4 1 PROK-EUK cross
## 5 1 PROK-EUK cross
## 6 1 PROK-EUK cross
## SIG_ID
## 1 prok_01dd6ee6ebb76ef5250378057597a969-euk_704617bd30c6df21f779ff5300baf810
## 2 prok_0606870e7caf9d39f42f23dff84c6190-euk_908baaf2bec72eafc520025ef78d0b01
## 3 prok_08932eb86e915caa9c4034ae623d0f45-euk_607390a6a39c3a2bdd7ef41282083418
## 4 prok_08932eb86e915caa9c4034ae623d0f45-euk_75b879fa0e65e7dab54ceb63b5ce5ad3
## 5 prok_08932eb86e915caa9c4034ae623d0f45-euk_a3a866756aa4943b2f4dfbf95badcab0
## 6 prok_08932eb86e915caa9c4034ae623d0f45-euk_ac8ef156389ffd84799bf78d382a0595
## TAX_SIDEA
## 1 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Arcobacteraceae;D_5__Arcobacter
## 2 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Oceanospirillales;D_4__Halomonadaceae;D_5__Halomonas
## 3 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Thiovulaceae;D_5__Sulfurimonas
## 4 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Thiovulaceae;D_5__Sulfurimonas
## 5 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Thiovulaceae;D_5__Sulfurimonas
## 6 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Thiovulaceae;D_5__Sulfurimonas
## TAX_SHORT.A EUK_2.A EUK_DIST.A
## 1 Epsilonbacteraeota-Arcobacter <NA> <NA>
## 2 Proteobacteria-Oceanospirillales <NA> <NA>
## 3 Epsilonbacteraeota-Sulfurimonas <NA> <NA>
## 4 Epsilonbacteraeota-Sulfurimonas <NA> <NA>
## 5 Epsilonbacteraeota-Sulfurimonas <NA> <NA>
## 6 Epsilonbacteraeota-Sulfurimonas <NA> <NA>
## TAX_SIDEB
## 1 Eukaryota;Hacrobia;Telonemia;Telonemia_X;Telonemia_XX;Telonemia-Group-2;Telonemia-Group-2_X;Telonemia-Group-2_X_sp.
## 2 Eukaryota;Alveolata;Ciliophora;Spirotrichea;Strombidiida_D;Strombidiida_D_X;Strombidiida_D_XX;Strombidiida_D_XX_sp.
## 3 Eukaryota;Alveolata;Ciliophora;Oligohymenophorea;Scuticociliatia_1;Philasterida
## 4 Eukaryota;Hacrobia;Haptophyta;Prymnesiophyceae;Prymnesiales;Chrysochromulinaceae;Chrysochromulina;Chrysochromulina_sp.
## 5 Eukaryota;Alveolata;Ciliophora;Litostomatea;Haptoria_6;Lacrymariidae
## 6 Eukaryota;Stramenopiles;Opalozoa;MAST-3;MAST-3I;MAST-3I_X;MAST-3I_XX;MAST-3I_XX_sp.
## TAX_SHORT.B EUK_2.B EUK_DIST.B Weight
## 1 Hacrobia-Other Telonemia Cosmopolitan 1.147447e-06
## 2 Alveolata-Ciliates Spirotrichea-Strombidiida Cosmopolitan -7.159271e-03
## 3 Alveolata-Ciliates Oligohymenophorea Resident 1.146349e-02
## 4 Hacrobia-Haptophyta Hacrobia-Haptophyta Cosmopolitan -2.044291e-01
## 5 Alveolata-Ciliates Litostomatea Resident 2.509552e-01
## 6 Stramenopiles-MAST MAST Resident 1.911483e-02
# Isolate the unique interactions and make a table for export
<- adj_sig %>% filter(COMBO == "EUK-PROK") %>% separate(SIDEA, c("sideA",
complete_list "ASV_18S"), sep = "_") %>% separate(SIDEB, c("sideB", "ASV_16S"), sep = "_") %>%
select(-COMBO, -COMBO_TYPE, -SIG_ID, TAX_18S = TAX_SIDEA, TAX_16S = TAX_SIDEB) %>%
data.frame
Option to save supplementary tables to analyze data outside of R.
# write_delim(complete_list, path = 'Complete-cross-domain-interactions.txt',
# delim = '\t') Write to visualize in cytoscape write.csv(complete_list,
# 'cross-domain-gr.csv')
Of the interactions between 18S- and 16S-derived data, we are interested in capturing the putative predator prey relationships
<- adj_sig %>% filter(COMBO == "EUK-PROK") %>% separate(SIDEA, c("domain",
tax_sum_interact "ASV_18S"), sep = "_") %>% separate(SIDEB, c("domain2", "ASV_16S"), sep = "_") %>%
select(-starts_with("domain"), -COMBO, -COMBO_TYPE, -SIG_ID, -Adjacency) %>%
unite(INTERACTION, TAX_SHORT.A, TAX_SHORT.B, sep = "_", remove = FALSE) %>% add_column(COUNT = 1) %>%
data.frame# View(tax_sum_interact)
length(unique(tax_sum_interact$INTERACTION)) #Total significant interactions between euk and bac
## [1] 141
# table(tax_sum_interact$INTERACTION)
# How many 18S ASVs are involved? what taxonomic groups do the interactions
# belong to? head(tax_sum_interact)
unique(tax_sum_interact$TAX_SHORT.A)
## [1] "Rhizaria-Radiolaria" "Hacrobia-Haptophyta"
## [3] "Alveolata-Syndiniales" "Stramenopiles-Other"
## [5] "Alveolata-Ciliates" "Alveolata-Dinoflagellates"
## [7] "Stramenopiles-MAST" "Hacrobia-Other"
## [9] "Stramenopiles-Ochrophyta" "Archaeplastida-Chlorophyta"
## [11] "Rhizaria-Cercozoa" "Hacrobia-Cryptophyta"
# Table of significant interactions
<- tax_sum_interact %>% select(ASV_18S, ASV_16S, TAX_SHORT.A,
summary_sig_interactions %>% # distinct() %>%
COUNT) group_by(TAX_SHORT.A) %>% summarise(UNIQUE_18S_ASVs = n_distinct(ASV_18S), TOTAL_18S_ASVs = sum(COUNT)) %>%
data.frame
## `summarise()` ungrouping output (override with `.groups` argument)
# View(summary_sig_interactions) # Included in Table 2
# Classify interactions to taxa level 2
<- tax_sum_interact %>% select(ASV_18S, ASV_16S, TAX_SHORT.A,
summary_sig_interactions_2 %>% # distinct() %>%
EUK_2.A, COUNT) group_by(TAX_SHORT.A, EUK_2.A) %>% summarise(UNIQUE_18S_ASVs = n_distinct(ASV_18S),
TOTAL_18S_ASVs = sum(COUNT)) %>% data.frame
## `summarise()` regrouping output by 'TAX_SHORT.A' (override with `.groups` argument)
# View(summary_sig_interactions_2)
head(tax_sum_interact)
## ASV_18S ASV_16S
## 1 01d1a4a17e3a26ee76b34b62cb0cbef8 29b36587344bb929651696c2a41e56cc
## 2 01d1a4a17e3a26ee76b34b62cb0cbef8 9023b3075fc598bad518430ee25519bc
## 3 020295103ca8304135054e04d9110899 2806f0957cc10412ad6a887f25abc970
## 4 020295103ca8304135054e04d9110899 66c28633afa706a1e8785165a4ce933e
## 5 02c7b94c00a919db1d1ef6d9d1ce810c 6e8d876077c3eae3a1f703ac2357d76c
## 6 02c7b94c00a919db1d1ef6d9d1ce810c 929cbf36f791dd363157d90871061cee
## TAX_SIDEA
## 1 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 2 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 3 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 4 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 5 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 6 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## INTERACTION TAX_SHORT.A
## 1 Rhizaria-Radiolaria_Bacteroidetes Rhizaria-Radiolaria
## 2 Rhizaria-Radiolaria_Epsilonbacteraeota-Sulfurimonas Rhizaria-Radiolaria
## 3 Rhizaria-Radiolaria_Proteobacteria-Alphaproteobacteria Rhizaria-Radiolaria
## 4 Rhizaria-Radiolaria_Proteobacteria-SUP05 cluster Rhizaria-Radiolaria
## 5 Rhizaria-Radiolaria_Epsilonbacteraeota-Sulfurovum Rhizaria-Radiolaria
## 6 Rhizaria-Radiolaria_Proteobacteria-Gammaproteobacteria Rhizaria-Radiolaria
## EUK_2.A EUK_DIST.A
## 1 RAD-B Resident
## 2 RAD-B Resident
## 3 RAD-B Resident
## 4 RAD-B Resident
## 5 RAD-B Resident
## 6 RAD-B Resident
## TAX_SIDEB
## 1 D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Flavobacteriales;D_4__Flavobacteriaceae;D_5__Mesonia
## 2 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Thiovulaceae;D_5__Sulfurimonas
## 3 D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__S25-593;D_5__marine metagenome;D_6__marine metagenome
## 4 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Thiomicrospirales;D_4__Thioglobaceae;D_5__SUP05 cluster
## 5 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Sulfurovaceae;D_5__Sulfurovum
## 6 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__UBA10353 marine group;D_4__uncultured organism;D_5__uncultured organism;D_6__uncultured organism
## TAX_SHORT.B EUK_2.B EUK_DIST.B Weight COUNT
## 1 Bacteroidetes <NA> <NA> 0.004163727 1
## 2 Epsilonbacteraeota-Sulfurimonas <NA> <NA> -0.139107326 1
## 3 Proteobacteria-Alphaproteobacteria <NA> <NA> -0.002829730 1
## 4 Proteobacteria-SUP05 cluster <NA> <NA> 0.006002560 1
## 5 Epsilonbacteraeota-Sulfurovum <NA> <NA> 0.065705183 1
## 6 Proteobacteria-Gammaproteobacteria <NA> <NA> -0.012716197 1
# Classify interactions to taxa level 2
<- tax_sum_interact %>% select(ASV_18S, ASV_16S, TAX_SHORT.B,
summary_sig_interactions_16s %>% # distinct() %>%
COUNT) group_by(TAX_SHORT.B) %>% summarise(UNIQUE_16S_ASVs = n_distinct(ASV_16S), TOTAL_16S_ASVs = sum(COUNT)) %>%
data.frame
## `summarise()` ungrouping output (override with `.groups` argument)
# View(summary_sig_interactions_16s)
# What is the breakdown of bacteria and archaea ASVs?
head(tax_sum_interact)
## ASV_18S ASV_16S
## 1 01d1a4a17e3a26ee76b34b62cb0cbef8 29b36587344bb929651696c2a41e56cc
## 2 01d1a4a17e3a26ee76b34b62cb0cbef8 9023b3075fc598bad518430ee25519bc
## 3 020295103ca8304135054e04d9110899 2806f0957cc10412ad6a887f25abc970
## 4 020295103ca8304135054e04d9110899 66c28633afa706a1e8785165a4ce933e
## 5 02c7b94c00a919db1d1ef6d9d1ce810c 6e8d876077c3eae3a1f703ac2357d76c
## 6 02c7b94c00a919db1d1ef6d9d1ce810c 929cbf36f791dd363157d90871061cee
## TAX_SIDEA
## 1 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 2 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 3 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 4 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 5 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## 6 Eukaryota;Rhizaria;Radiolaria;RAD-B;RAD-B_X;RAD-B-Group-I;RAD-B-Group-I_X;RAD-B-Group-I_X_sp.
## INTERACTION TAX_SHORT.A
## 1 Rhizaria-Radiolaria_Bacteroidetes Rhizaria-Radiolaria
## 2 Rhizaria-Radiolaria_Epsilonbacteraeota-Sulfurimonas Rhizaria-Radiolaria
## 3 Rhizaria-Radiolaria_Proteobacteria-Alphaproteobacteria Rhizaria-Radiolaria
## 4 Rhizaria-Radiolaria_Proteobacteria-SUP05 cluster Rhizaria-Radiolaria
## 5 Rhizaria-Radiolaria_Epsilonbacteraeota-Sulfurovum Rhizaria-Radiolaria
## 6 Rhizaria-Radiolaria_Proteobacteria-Gammaproteobacteria Rhizaria-Radiolaria
## EUK_2.A EUK_DIST.A
## 1 RAD-B Resident
## 2 RAD-B Resident
## 3 RAD-B Resident
## 4 RAD-B Resident
## 5 RAD-B Resident
## 6 RAD-B Resident
## TAX_SIDEB
## 1 D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Flavobacteriales;D_4__Flavobacteriaceae;D_5__Mesonia
## 2 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Thiovulaceae;D_5__Sulfurimonas
## 3 D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__S25-593;D_5__marine metagenome;D_6__marine metagenome
## 4 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Thiomicrospirales;D_4__Thioglobaceae;D_5__SUP05 cluster
## 5 D_0__Bacteria;D_1__Epsilonbacteraeota;D_2__Campylobacteria;D_3__Campylobacterales;D_4__Sulfurovaceae;D_5__Sulfurovum
## 6 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__UBA10353 marine group;D_4__uncultured organism;D_5__uncultured organism;D_6__uncultured organism
## TAX_SHORT.B EUK_2.B EUK_DIST.B Weight COUNT
## 1 Bacteroidetes <NA> <NA> 0.004163727 1
## 2 Epsilonbacteraeota-Sulfurimonas <NA> <NA> -0.139107326 1
## 3 Proteobacteria-Alphaproteobacteria <NA> <NA> -0.002829730 1
## 4 Proteobacteria-SUP05 cluster <NA> <NA> 0.006002560 1
## 5 Epsilonbacteraeota-Sulfurovum <NA> <NA> 0.065705183 1
## 6 Proteobacteria-Gammaproteobacteria <NA> <NA> -0.012716197 1
<- tax_sum_interact %>% group_by(INTERACTION, EUK_DIST.A) %>% summarise(TOTAL_INTERACTIONS = sum(COUNT)) %>%
summary_int data.frame
## `summarise()` regrouping output by 'INTERACTION' (override with `.groups` argument)
# View(summary_int %>% filter(EUK_DIST.A != 'Resident'))
# head(tax_sum_interact)
<- tax_sum_interact %>% unite(EUK, TAX_SHORT.A, EUK_2.A, sep = "_",
tax_interact_cor remove = TRUE) %>% select(EUK, PROK = TAX_SHORT.B, COUNT) %>% group_by(EUK, PROK) %>%
summarise(SUM_COUNT = sum(COUNT)) %>% # pivot_wider(names_from = PROK, values_from = COUNT, values_fn = sum,
# values_fill = 0) %>%
data.frame
## `summarise()` regrouping output by 'EUK' (override with `.groups` argument)
# head(tax_interact_cor) table(tax_sum_interact$EUK_DIST.A)
# View(tax_sum_interact)
Plot distribution of 16S-18S interactions
library(ggalluvial)
# head(tax_sum_interact) unique(tax_sum_interact$TAX_SHORT.A)
<- tax_sum_interact %>% # Remove NA
putative_prey filter(!(is.na(TAX_SHORT.B))) %>% group_by(TAX_SHORT.A, TAX_SHORT.B, EUK_DIST.A) %>%
summarise(count_sum = sum(COUNT)) %>% data.frame
## `summarise()` has grouped output by 'TAX_SHORT.A', 'TAX_SHORT.B'. You can override using the `.groups` argument.
<- c("Alveolata-Ciliates", "Alveolata-Dinoflagellates", "Alveolata-Syndiniales",
alluvial_level2order "Rhizaria-Cercozoa", "Rhizaria-Radiolaria", "Stramenopiles-MAST", "Stramenopiles-Ochrophyta",
"Stramenopiles-Other", "Hacrobia-Cryptophyta", "Hacrobia-Haptophyta", "Hacrobia-Other",
"Archaeplastida-Chlorophyta")
<- c("Alveolates-Ciliates", "Alveolates-Dinoflagellates", "Alveolates-Syndiniales",
alluvial_level2labels "Rhizaria-Cercozoa", "Rhizaria-Radiolaria", "Stramenopiles-MAST", "Stramenopiles-Ochrophytes",
"Stramenopiles-Other", "Hacrobia-Cryptophytes", "Hacrobia-Haptophytes", "Hacrobia-Other",
"Archaeplastid-Chlorophytes")
<- c("#f1eef6", "#d7b5d8", "#df65b0", "#fc9272", "#ef3b2c",
alluvial_level2colors "#fff7bc", "#fec44f", "#a63603", "#74c476", "#238b45", "#00441b", "#2b8cbe")
$LEVEL2ORDER <- factor(putative_prey$TAX_SHORT.A, levels = alluvial_level2order,
putative_preylabels = alluvial_level2labels)
names(alluvial_level2colors) <- alluvial_level2labels
# Add line for prokaryotes
<- c("Alveolata-Ciliates", "Alveolata-Dinoflagellates", "Alveolata-Syndiniales",
donut_joint_tax "Alveolata-Other", "Rhizaria-Cercozoa", "Rhizaria-Radiolaria", "Rhizaria-Other",
"Stramenopiles-MAST", "Stramenopiles-Ochrophyta", "Stramenopiles-Other", "Hacrobia-Cryptophyta",
"Hacrobia-Haptophyta", "Hacrobia-Other", "Amoebozoa", "Excavata", "Apusozoa",
"Archaeplastida-Chlorophyta", "Archaeplastida-Other", "Opisthokonta-Fungi", "Opisthokonta-Metazoa",
"Opisthokonta-Other", "Unassigned-Eukaryote", "Epsilonbacteraeota-Arcobacter",
"Epsilonbacteraeota-Caminibacter", "Epsilonbacteraeota-Campylobacter", "Epsilonbacteraeota-Hydrogenimonas",
"Epsilonbacteraeota-Nitratifractor", "Epsilonbacteraeota-Nitratiruptor", "Epsilonbacteraeota-Sulfurimonas",
"Epsilonbacteraeota-Sulfurovum", "Proteobacteria-Alphaproteobacteria", "Proteobacteria-Deltaproteobacteria",
"Proteobacteria-Gammaproteobacteria", "Proteobacteria-Methylococcales", "Proteobacteria-Oceanospirillales",
"Proteobacteria-SUP05 cluster", "Acidobacteria", "Actinobacteria", "Aquificae",
"Bacteroidetes", "Chloroflexi", "Thaumarchaeota-Nitrososphaeria", "Euryarchaeota-Thermoplasmata",
"Fusobacteria-Psychrilyobacter", "Marinimicrobia (SAR406 clade)", "Nitrospinae-Nitrospinaceae",
"Other")
$TAX_ORDER_BAC <- factor(putative_prey$TAX_SHORT.B, levels = donut_joint_tax) putative_prey
Generate series of alluvial donut plots. Compile plots outside of R for Figure 4.
Alluvial donut for protists.
# svg('figs/alluvial-donut-resident.svg', h = 12, w = 24)
%>% # Resident only
putative_prey filter(EUK_DIST.A != "Cosmopolitan") %>% ggplot(aes(axis1 = LEVEL2ORDER, axis2 = TAX_ORDER_BAC,
y = count_sum)) + scale_x_discrete(limits = c("TAX_SHORT.A", "TAX_SHORT.B"),
expand = c(0.5, 0)) + geom_alluvium(aes(fill = LEVEL2ORDER), alpha = 1, width = 1/10,
color = "black") + scale_fill_manual(values = alluvial_level2colors, drop = F) +
geom_stratum(width = 1/4, alpha = 1, color = "black", aes(fill = LEVEL2ORDER)) +
::geom_text_repel(stat = "stratum", aes(label = str_wrap(after_stat(stratum),
ggrepel6)), force = 1, size = 4, color = "black", direction = "y", nudge_x = 0.4) +
coord_polar("y", start = 0) + theme_void() + theme(legend.title = element_blank()) +
labs(title = "Resident interactions (n = 165)") + guides(fill = guide_legend(ncol = 1))
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
# dev.off()
# svg('figs/alluvial-donut-cosmopolitan.svg', h = 12, w = 24)
%>% # Cosmopiltan only
putative_prey filter(EUK_DIST.A == "Cosmopolitan") %>% ggplot(aes(axis1 = LEVEL2ORDER, axis2 = TAX_ORDER_BAC,
y = count_sum)) + scale_x_discrete(limits = c("TAX_SHORT.A", "TAX_SHORT.B"),
expand = c(0.5, 0)) + geom_alluvium(aes(fill = LEVEL2ORDER), alpha = 1, width = 1/10,
color = "black") + scale_fill_manual(values = alluvial_level2colors, drop = F) +
geom_stratum(width = 1/4, alpha = 1, color = "black", aes(fill = LEVEL2ORDER)) +
::geom_text_repel(stat = "stratum", aes(label = str_wrap(after_stat(stratum),
ggrepel6)), force = 1, size = 4, color = "black", direction = "y", nudge_x = 0.4) +
coord_polar("y", start = 0) + theme_void() + theme(legend.title = element_blank()) +
labs(title = "Cosmopolitan interactions (n = 165)") + guides(fill = guide_legend(ncol = 1))
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
# dev.off()
Format and factor for prokaryote alluvial plots.
<- putative_prey %>% # Resident only
bac_resident_putative_prey filter(EUK_DIST.A != "Cosmopolitan") %>% data.frame
<- c("Epsilonbacteraeota-Arcobacter", "Epsilonbacteraeota-Caminibacter",
bac_tax_res "Epsilonbacteraeota-Campylobacter", "Epsilonbacteraeota-Nitratifractor", "Epsilonbacteraeota-Nitratiruptor",
"Epsilonbacteraeota-Sulfurimonas", "Epsilonbacteraeota-Sulfurovum", "Proteobacteria-Alphaproteobacteria",
"Proteobacteria-Gammaproteobacteria", "Proteobacteria-Oceanospirillales", "Proteobacteria-SUP05 cluster",
"Actinobacteria", "Bacteroidetes", "Chloroflexi", "Thaumarchaeota-Nitrososphaeria",
"Euryarchaeota-Thermoplasmata", "Fusobacteria-Psychrilyobacter", "Marinimicrobia (SAR406 clade)",
"Nitrospinae-Nitrospinaceae")
<- c("#7fcdbb", "#014636", "#41b6c4", "#d0d1e6", "#02818a", "#a6bddb",
bac_col_res "#3690c0", "#00441b", "#006d2c", "#238b45", "#c7e9b4", "#fd8d3c", "#7f2704",
"#dadaeb", "#54278f", "#bcbddc", "#6a51a3", "#9e9ac8", "#8c6bb1")
$BAC_RES <- factor(bac_resident_putative_prey$TAX_SHORT.B,
bac_resident_putative_preylevels = bac_tax_res)
names(bac_col_res) <- bac_tax_res
##
<- putative_prey %>% # Resident only
bac_cosmo_putative_prey filter(EUK_DIST.A == "Cosmopolitan") %>% data.frame
# unique(bac_cosmo_putative_prey$TAX_SHORT.B)
<- c("Epsilonbacteraeota-Arcobacter", "Epsilonbacteraeota-Caminibacter",
bac_tax_cos "Epsilonbacteraeota-Campylobacter", "Epsilonbacteraeota-Hydrogenimonas", "Epsilonbacteraeota-Nitratifractor",
"Epsilonbacteraeota-Nitratiruptor", "Epsilonbacteraeota-Sulfurimonas", "Epsilonbacteraeota-Sulfurovum",
"Proteobacteria-Alphaproteobacteria", "Proteobacteria-Gammaproteobacteria", "Proteobacteria-Methylococcales",
"Proteobacteria-Oceanospirillales", "Proteobacteria-SUP05 cluster", "Actinobacteria",
"Aquificae", "Bacteroidetes", "Chloroflexi", "Thaumarchaeota-Nitrososphaeria",
"Euryarchaeota-Thermoplasmata", "Fusobacteria-Psychrilyobacter", "Marinimicrobia (SAR406 clade)",
"Nitrospinae-Nitrospinaceae")
<- c("#7fcdbb", "#014636", "#41b6c4", "#016c59", "#d0d1e6", "#02818a",
bac_col_cos "#a6bddb", "#3690c0", "#00441b", "#006d2c", "#66c2a4", "#238b45", "#c7e9b4",
"#fd8d3c", "#d94801", "#7f2704", "#dadaeb", "#54278f", "#bcbddc", "#6a51a3",
"#9e9ac8", "#8c6bb1")
$BAC_COS <- factor(bac_cosmo_putative_prey$TAX_SHORT.B, levels = bac_tax_cos)
bac_cosmo_putative_preynames(bac_col_cos) <- bac_tax_cos
Alluvial donut for bacteria and archaea
# svg('figs/alluvial-donut-resident-16S.svg', h = 12, w = 24)
ggplot(bac_resident_putative_prey, aes(axis1 = LEVEL2ORDER, axis2 = BAC_RES, y = count_sum)) +
scale_x_discrete(limits = c("TAX_SHORT.A", "BAC_RES"), expand = c(0.5, 0)) +
geom_alluvium(aes(fill = BAC_RES), alpha = 1, width = 1/10, color = "black") +
geom_stratum(width = 1/4, alpha = 1, color = "black", aes(fill = TAX_ORDER_BAC)) +
scale_fill_manual(values = bac_col_res) + ggrepel::geom_text_repel(stat = "stratum",
aes(label = str_wrap(after_stat(stratum), 6)), force = 1, size = 4, color = "black",
direction = "y", nudge_x = 0.4) + coord_polar("y", start = 0) + theme_void() +
theme(legend.title = element_blank()) + labs(title = "Resident interactions") +
guides(fill = guide_legend(ncol = 1))
# dev.off()
# svg('figs/alluvial-donut-cosmopolitan-16S.svg', h = 12, w = 24)
ggplot(bac_cosmo_putative_prey, aes(axis1 = LEVEL2ORDER, axis2 = BAC_COS, y = count_sum)) +
scale_x_discrete(limits = c("TAX_SHORT.A", "BAC_COS"), expand = c(0.5, 0)) +
geom_alluvium(aes(fill = BAC_COS), alpha = 1, width = 1/10, color = "black") +
geom_stratum(width = 1/4, alpha = 1, color = "black", aes(fill = TAX_ORDER_BAC)) +
scale_fill_manual(values = bac_col_cos) + ggrepel::geom_text_repel(stat = "stratum",
aes(label = str_wrap(after_stat(stratum), 6)), force = 1, size = 4, color = "black",
direction = "y", nudge_x = 0.4) + coord_polar("y", start = 0) + theme_void() +
theme(legend.title = element_blank()) + labs(title = "Cosmopolitan interactions") +
guides(fill = guide_legend(ncol = 1))
# dev.off()
<- read.delim("data-input/Grazing-calc-wCarbon-results.txt")
gr <- read.delim("data-input/GR-environ-SAMPLE.txt")
env env
## SAMPLE DEPTH TEMP PH MG SEA_PER MICRO
## 1 BW-Near vent BW 2745 1.7 7.8 51.8 1.00 51959.11
## 2 Vent-Mt Edwards 2707 30.0 5.8 42.8 0.83 51439.52
## 3 Vent-Venti latte 2708 23.0 5.5 50.4 0.97 111192.50
## 4 Vent-Candelabra 2730 68.0 5.8 35.7 0.88 55076.66
## 5 Vent-SirVentsalot 2732 72.0 NA 50.8 0.98 52998.29
Join environmental and grazing data.
<- gr %>% left_join(env, by = "SAMPLE") %>% select(SAMPLE, SampleOrigin, Vent.name,
gr_env
SAMPLE_ORDER, GrazingRate_hr, Prok_turnover, ugC_L_perday_morono, DEPTH, TEMP, %>% pivot_longer(cols = c(GrazingRate_hr, Prok_turnover,
PH, MG, SEA_PER, MICRO) names_to = "Grazing_variable", values_to = "grazing_value") %>%
ugC_L_perday_morono), pivot_longer(cols = c(DEPTH, TEMP, PH, MG, SEA_PER, MICRO), names_to = "Env_variable",
values_to = "env_value")
# ?pivot_longer head(gr_env) colnames(gr_env)
unique(gr_env$Grazing_variable)
## [1] "GrazingRate_hr" "Prok_turnover" "ugC_L_perday_morono"
Run regression analysis.
library(broom)
<- gr_env %>% filter(!(Env_variable == "DEPTH")) %>% filter(!is.na(env_value)) %>%
regression_gr_tmp select(SampleOrigin, Vent.name, Grazing_variable, grazing_value, Env_variable,
%>% group_by(Grazing_variable, Env_variable) %>% nest(-Grazing_variable,
env_value) -Env_variable) %>% mutate(lm_fit = map(data, ~lm(grazing_value ~ env_value, data = .)),
tidied = map(lm_fit, tidy)) %>% unnest(tidied) %>% select(Grazing_variable, Env_variable,
%>% pivot_wider(names_from = term, values_from = estimate) %>%
term, estimate) select(everything(), SLOPE = env_value) %>% data.frame
## Warning: All elements of `...` must be named.
## Did you want `data = c(SampleOrigin, Vent.name, grazing_value, env_value)`?
# head(regression_gr_tmp)
<- gr_env %>% filter(!(Env_variable == "DEPTH")) %>% filter(!is.na(env_value)) %>%
regression_gr_env select(SampleOrigin, Vent.name, Grazing_variable, grazing_value, Env_variable,
%>% group_by(Grazing_variable, Env_variable) %>% nest(-Grazing_variable,
env_value) -Env_variable) %>% mutate(lm_fit = map(data, ~lm(grazing_value ~ env_value, data = .)),
glanced = map(lm_fit, glance)) %>% unnest(glanced) %>% select(Grazing_variable,
%>% right_join(regression_gr_tmp) %>%
Env_variable, r.squared, adj.r.squared) right_join(gr_env) %>% data.frame
## Warning: All elements of `...` must be named.
## Did you want `data = c(SampleOrigin, Vent.name, grazing_value, env_value)`?
## Joining, by = c("Grazing_variable", "Env_variable")
## Joining, by = c("Grazing_variable", "Env_variable")
# View(regression_gr) range(regression_gr$r.squared)
Factor results for plots.
<- c("Near vent BW", "Mt Edwards", "Venti latte", "Candelabra", "SirVentsalot")
sampleorder <- c("Near vent BW", "Mt. Edwards", "Venti latte", "Candelabra",
sampleorder_names "Sir Ventsalot")
<- c(23, 21, 21, 21, 21)
shapes <- c("#6f88af", "#61ac86", "#711518", "#dfa837", "#ce536b")
samplecolor $SAMPLE_ORDER <- factor(regression_gr_env$Vent.name, levels = sampleorder,
regression_gr_envlabels = sampleorder_names)
names(sampleorder_names) <- samplecolor
$ENV_LABEL <- factor(regression_gr_env$Env_variable, levels = c("TEMP",
regression_gr_env"MICRO", "SEA_PER", "PH", "MG"), labels = c(expression("Temperature"^o ~ "C"),
bquote("Cells " ~ mL^-1), bquote("Seawater~Percent"), bquote("pH"), bquote("Mg (mM)")))
# head(regression_gr_env)
Generate plot for supplementary.
# X = GrazingRate_hr, Prok_turnover, ugC_L_perday Y = DEPTH, TEMP, PH, MG,
# SEA_PER, MICRO svg('figs/SUPPLEMENTARY-grazing-env-relationship.svg', h = 10, w
# = 10)
plot_grid(regression_gr_env %>% filter(!(Env_variable == "DEPTH")) %>% filter(!(is.na(env_value))) %>%
filter(Grazing_variable == "GrazingRate_hr") %>% ggplot(aes(x = env_value, y = grazing_value,
fill = SAMPLE_ORDER)) + geom_abline(aes(slope = SLOPE, intercept = X.Intercept.),
color = "black", linetype = "dashed", size = 0.5) + geom_point(color = "black",
size = 4, aes(shape = SAMPLE_ORDER)) + geom_smooth(method = lm) + scale_fill_manual(values = samplecolor) +
scale_shape_manual(values = shapes) + facet_wrap(. ~ ENV_LABEL + round(r.squared,
3), scales = "free", ncol = 5, strip.position = "bottom", labeller = label_parsed) +
theme_bw() + theme(strip.background = element_blank(), strip.placement = "outside",
strip.text = element_text(color = "black", size = 10), axis.title = element_text(color = "black",
size = 10), legend.title = element_blank()) + labs(y = bquote("Cells " ~
^-1 ~ hr^-1), x = ""), regression_gr_env %>% filter(!(Env_variable == "DEPTH")) %>%
mLfilter(!(is.na(env_value))) %>% filter(Grazing_variable == "Prok_turnover") %>%
ggplot(aes(x = env_value, y = grazing_value, fill = SAMPLE_ORDER)) + geom_abline(aes(slope = SLOPE,
intercept = X.Intercept.), color = "black", linetype = "dashed", size = 0.5) +
geom_point(color = "black", size = 4, aes(shape = SAMPLE_ORDER)) + geom_smooth(method = lm) +
scale_fill_manual(values = samplecolor) + scale_shape_manual(values = shapes) +
facet_wrap(. ~ ENV_LABEL + round(r.squared, 3), scales = "free", ncol = 5, strip.position = "bottom",
labeller = label_parsed) + theme_bw() + theme(strip.background = element_blank(),
strip.placement = "outside", strip.text = element_text(color = "black", size = 10),
axis.title = element_text(color = "black", size = 10), legend.title = element_blank()) +
labs(y = bquote("Prokaryote Turnover %" ~ day^-1), x = ""), regression_gr_env %>%
filter(!(Env_variable == "DEPTH")) %>% filter(!(is.na(env_value))) %>% filter(Grazing_variable ==
"ugC_L_perday_morono") %>% ggplot(aes(x = env_value, y = grazing_value, fill = SAMPLE_ORDER)) +
geom_abline(aes(slope = SLOPE, intercept = X.Intercept.), color = "black", linetype = "dashed",
size = 0.5) + geom_point(color = "black", size = 4, aes(shape = SAMPLE_ORDER)) +
geom_smooth(method = lm) + scale_fill_manual(values = samplecolor) + scale_shape_manual(values = shapes) +
facet_wrap(. ~ ENV_LABEL + round(r.squared, 3), scales = "free", ncol = 5, strip.position = "bottom",
labeller = label_parsed) + theme_bw() + theme(strip.background = element_blank(),
strip.placement = "outside", strip.text = element_text(color = "black", size = 10),
axis.title = element_text(color = "black", size = 10), legend.title = element_blank()) +
labs(y = bquote("ug C" ~ L^{
-1
~ day^{
} -1
x = ""), nrow = 3, labels = c("a", "b", "c")) }),
# dev.off()
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS/LAPACK: /Users/sarahhu/anaconda3/envs/r_3.6.0/lib/R/lib/libRblas.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] broom_0.7.5 ggalluvial_0.12.1 dendextend_1.14.0
## [4] ggdendro_0.1-20 ape_5.3 RColorBrewer_1.1-2
## [7] cluster_2.1.0 compositions_1.40-5 bayesm_3.1-4
## [10] robustbase_0.93-6 tensorA_0.36.1 ade4_1.7-15
## [13] vegan_2.5-6 lattice_0.20-41 permute_0.9-5
## [16] decontam_1.6.0 phyloseq_1.30.0 cowplot_1.0.0
## [19] reshape2_1.4.4 forcats_0.5.0 stringr_1.4.0
## [22] dplyr_1.0.5 purrr_0.3.4 readr_1.3.1
## [25] tidyr_1.1.3 tibble_3.1.0 ggplot2_3.3.1
## [28] tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_2.0-0 ellipsis_0.3.1 XVector_0.26.0
## [4] fs_1.4.1 rstudioapi_0.11 farver_2.1.0
## [7] ggrepel_0.8.2 fansi_0.4.2 lubridate_1.7.8
## [10] xml2_1.3.2 codetools_0.2-16 splines_3.6.1
## [13] knitr_1.31 jsonlite_1.6.1 dbplyr_1.4.4
## [16] compiler_3.6.1 httr_1.4.2 backports_1.2.1
## [19] assertthat_0.2.1 Matrix_1.2-18 cli_2.4.0
## [22] formatR_1.7 htmltools_0.5.1.1 tools_3.6.1
## [25] igraph_1.2.5 gtable_0.3.0 glue_1.4.2
## [28] Rcpp_1.0.5 Biobase_2.46.0 cellranger_1.1.0
## [31] vctrs_0.3.7 Biostrings_2.54.0 multtest_2.42.0
## [34] debugme_1.1.0 nlme_3.1-148 iterators_1.0.12
## [37] xfun_0.20 rvest_0.3.5 lifecycle_1.0.0
## [40] DEoptimR_1.0-8 zlibbioc_1.32.0 MASS_7.3-51.6
## [43] scales_1.1.1 hms_0.5.3 parallel_3.6.1
## [46] biomformat_1.14.0 rhdf5_2.30.1 yaml_2.2.1
## [49] gridExtra_2.3 stringi_1.5.3 highr_0.8
## [52] S4Vectors_0.24.4 foreach_1.5.0 BiocGenerics_0.32.0
## [55] rlang_0.4.10 pkgconfig_2.0.3 evaluate_0.14
## [58] Rhdf5lib_1.8.0 labeling_0.4.2 tidyselect_1.1.0
## [61] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
## [64] IRanges_2.20.2 generics_0.1.0 DBI_1.1.0
## [67] pillar_1.5.1 haven_2.3.1 withr_2.4.1
## [70] mgcv_1.8-31 survival_3.1-12 modelr_0.1.8
## [73] crayon_1.4.1 utf8_1.2.1 rmarkdown_2.6
## [76] viridis_0.5.1 grid_3.6.1 readxl_1.3.1
## [79] data.table_1.14.0 blob_1.2.1 reprex_0.3.0
## [82] digest_0.6.27 stats4_3.6.1 munsell_0.5.0
## [85] viridisLite_0.3.0