library(pr2database)
library(tidyverse)
Database curation page
0.1 Compare with PR2
Explore alongside PR2 database See R package for this
Set up your R environment.
0.1.1 Searching the PR2 database
Take a look at the whole pr2 database. Import and set as pr2
.
<- pr2_database()
pr2 glimpse(pr2)
View(pr2)
Use View()
to search the database, or filter()
.
%>%
pr2 filter(family == "Pseudocolliniidae")
Isolate metadata that we want about the taxonomic group and alternate taxonomic names.
load(file = "input-data/taxonomic-lineages.RData")
colnames(taxonomic_lineages)
<- pr2 %>%
pr2_metadata select(Domain = domain, Supergroup = supergroup, Division = division, Phylum = subdivision, Class = class, Order = order, Family = family, Genus = genus, Species = species, gb_taxonomy, metadata_remark, eukribo_UniEuk_taxonomy_string, silva_taxonomy, species_url)
<- taxonomic_lineages %>%
taxonomic_lineages_pr2 left_join(pr2_metadata)
glimpse(taxonomic_lineages_pr2)
dim(taxonomic_lineages)
# write.csv(taxonomic_lineages_pr2, file = "taxonomic-assignments-p2.csv")
0.2 Compare with Functional Trait Database
Import the data
<- read.delim("input-data/functional-traits-ramond.csv", sep = ";")
fxn_trait glimpse(fxn_trait)
colnames(fxn_trait)
%>%
fxn_trait filter(grepl("Pseudocolliniidae", Lineage))
The above returns no findings, so the species does not exist in this database or is under a different name.
We can try again with View()
# View(fxn_trait)
Using a partial text match, I see that Eukaryota|Harosa|Alveolata|Ciliophora|Intramacronucleata|Oligohymenophorea|Apostomatida|Colliniidae|Pseudocollinia
is an entry. But now we need to cross check with other rows in our database and see if Colliniidae
was a different name at some point. In fact it was!
%>%
fxn_trait filter(grepl("Colliniidae", Lineage))
Now we can take the above information and full in the other descriptive features.
0.3 Compare with PIDA database
Datbase explores microbe-microbe interactions.
Bjorbækmo MFM, Evenstad A, Røsæg LL, Krabberød AK, Logares R. The planktonic protist interactome: where do we stand after a century of research? ISME J 2020; 14: 544–559.
Reference: https://github.com/ramalok/PIDA/actions
Import PIDA database.
<- read.csv("input-data/PIDA_v_1.11_FORMATTED.csv")
pida head(pida)
# View(pida)
In the PIDA database, org 1 corresponds to the host, and org 2 corresponds to the symbiont. See the github page readme for a complete description of the database here.
The below query doesn’t reveal anything.
%>%
pida filter(grepl("Pseudocolli", Taxonomic.level.3..org2)) #this doesn't reveal anything.
%>%
pida filter(grepl("Pseudocolli", Taxonomic.level.2..org2))
The below query doesn’t reveal anything, again.
%>%
pida filter(grepl("Pseudocolli", Genus.org2)) #this doesn't reveal anything.
When searching the database by text, only a close text match to Pseudocohnilembus
in PIDA came up. This is a different organism. So we can determine that this is not related to the species in our database.