library(pr2database)
library(tidyverse)Database curation page
0.1 Compare with PR2
Explore alongside PR2 database See R package for this
Set up your R environment.
0.1.1 Searching the PR2 database
Take a look at the whole pr2 database. Import and set as pr2.
pr2 <- pr2_database()
glimpse(pr2)
View(pr2)Use View() to search the database, or filter().
pr2 %>%
filter(family == "Pseudocolliniidae")Isolate metadata that we want about the taxonomic group and alternate taxonomic names.
load(file = "input-data/taxonomic-lineages.RData")
colnames(taxonomic_lineages)pr2_metadata <- pr2 %>%
select(Domain = domain, Supergroup = supergroup, Division = division, Phylum = subdivision, Class = class, Order = order, Family = family, Genus = genus, Species = species, gb_taxonomy, metadata_remark, eukribo_UniEuk_taxonomy_string, silva_taxonomy, species_url)taxonomic_lineages_pr2 <- taxonomic_lineages %>%
left_join(pr2_metadata)glimpse(taxonomic_lineages_pr2)
dim(taxonomic_lineages)# write.csv(taxonomic_lineages_pr2, file = "taxonomic-assignments-p2.csv")0.2 Compare with Functional Trait Database
Import the data
fxn_trait <- read.delim("input-data/functional-traits-ramond.csv", sep = ";")
glimpse(fxn_trait)colnames(fxn_trait)
fxn_trait %>%
filter(grepl("Pseudocolliniidae", Lineage))The above returns no findings, so the species does not exist in this database or is under a different name.
We can try again with View()
# View(fxn_trait)Using a partial text match, I see that Eukaryota|Harosa|Alveolata|Ciliophora|Intramacronucleata|Oligohymenophorea|Apostomatida|Colliniidae|Pseudocollinia is an entry. But now we need to cross check with other rows in our database and see if Colliniidae was a different name at some point. In fact it was!
fxn_trait %>%
filter(grepl("Colliniidae", Lineage))Now we can take the above information and full in the other descriptive features.
0.3 Compare with PIDA database
Datbase explores microbe-microbe interactions.
Bjorbækmo MFM, Evenstad A, Røsæg LL, Krabberød AK, Logares R. The planktonic protist interactome: where do we stand after a century of research? ISME J 2020; 14: 544–559.
Reference: https://github.com/ramalok/PIDA/actions
Import PIDA database.
pida <- read.csv("input-data/PIDA_v_1.11_FORMATTED.csv")
head(pida)
# View(pida)In the PIDA database, org 1 corresponds to the host, and org 2 corresponds to the symbiont. See the github page readme for a complete description of the database here.
The below query doesn’t reveal anything.
pida %>%
filter(grepl("Pseudocolli", Taxonomic.level.3..org2)) #this doesn't reveal anything.
pida %>%
filter(grepl("Pseudocolli", Taxonomic.level.2..org2))The below query doesn’t reveal anything, again.
pida %>%
filter(grepl("Pseudocolli", Genus.org2)) #this doesn't reveal anything.When searching the database by text, only a close text match to Pseudocohnilembus in PIDA came up. This is a different organism. So we can determine that this is not related to the species in our database.