Database curation page

1 Major taxonomic groups

Visual summaries of what groups are present

1.1 Taxonomic levels in database

Visual summaries of taxonomic groups

1.2 Compare with PR2

Explore alongside PR2 database See R package for this

Set up your R environment.

# install.packages("devtools")
# devtools::install_github("pr2database/pr2database")
library(pr2database)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1.2.1 Searching the PR2 database

Take a look at the whole pr2 database. Import and set as pr2.

pr2 <- pr2_database()
glimpse(pr2)
Rows: 221,085
Columns: 95
$ pr2_accession                  <chr> "AB353770.1.1740_U", "AB284159.1.1765_U…
$ domain                         <chr> "Eukaryota", "Eukaryota", "Eukaryota", …
$ supergroup                     <chr> "TSAR", "TSAR", "Obazoa", "Obazoa", "Ob…
$ division                       <chr> "Alveolata", "Alveolata", "Opisthokonta…
$ subdivision                    <chr> "Dinoflagellata", "Dinoflagellata", "Fu…
$ class                          <chr> "Dinophyceae", "Dinophyceae", "Ascomyco…
$ order                          <chr> "Peridiniales", "Peridiniales", "Pezizo…
$ family                         <chr> "Kryptoperidiniaceae", "Protoperidiniac…
$ genus                          <chr> "Unruhdinium", "Protoperidinium", "Sord…
$ species                        <chr> "Unruhdinium_kevei", "Protoperidinium_b…
$ genbank_accession              <chr> "AB353770", "AB284159", "AY123745", "FJ…
$ start                          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ end                            <dbl> 1740, 1765, 924, 1907, 853, 1731, 1627,…
$ label                          <chr> "U", "U", "UC", "U", "U", "U", "U", "U"…
$ gene                           <chr> "18S_rRNA", "18S_rRNA", "18S_rRNA", "18…
$ organelle                      <chr> "nucleus", "nucleus", "nucleus", "nucle…
$ reference_sequence             <int> 1, 1, NA, NA, NA, NA, NA, NA, NA, 1, NA…
$ added_version                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ edited_version                 <chr> "4.9.0", "4.10.0", NA, NA, NA, NA, NA, …
$ edited_by                      <chr> "Mordret S.", "Vaulot D.", NA, NA, NA, …
$ edited_remark                  <chr> NA, "length of sequence fixed (before t…
$ remark                         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "re…
$ taxo_id                        <int> 49091, 2087, 13926, 12048, 13926, 41939…
$ taxo_edited_version            <chr> "4.9.0", "4.9.0", "4.5", NA, "4.5", NA,…
$ taxo_edited_by                 <chr> "Mordret S.", "Mordret S.", NA, NA, NA,…
$ taxo_remark                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ reference                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ mixoplankton                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ worms_id                       <int> 1380090, NA, NA, NA, NA, NA, NA, NA, NA…
$ worms_marine                   <chr> "0", NA, NA, NA, NA, NA, NA, NA, NA, "0…
$ worms_brackish                 <int> 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, N…
$ worms_freshwater               <int> 1, NA, NA, NA, NA, NA, NA, NA, NA, 1, N…
$ worms_terrestrial              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ seq_id                         <int> 5, 8, 10, 11, 14, 15, 16, 17, 18, 19, 2…
$ sequence                       <chr> "ATGCTTGTCTCAAAGATTAAGCCATGCATGTCTCAGTA…
$ sequence_length                <int> 1740, 1765, 924, 1907, 853, 1731, 1627,…
$ ambiguities                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sequence_hash                  <chr> "26c3890d597f9d39e45e91eaa4f01ba6c603b6…
$ gb_date                        <chr> "04-SEP-2007", "22-MAY-2007", "07-AUG-2…
$ gb_division                    <chr> "PLN", "PLN", "ENV", "PLN", "ENV", "ENV…
$ gb_definition                  <chr> "Peridiniopsis cf. kevei gene for 18S r…
$ gb_organism                    <chr> "Peridiniopsis cf. kevei", NA, "uncultu…
$ gb_organelle                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ gb_taxonomy                    <chr> "Eukaryota; Sar; Alveolata; Dinophyceae…
$ gb_strain                      <chr> NA, NA, NA, "CBS 120353", NA, NA, NA, N…
$ gb_culture_collection          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ gb_clone                       <chr> NA, NA, NA, NA, "G913P33FB4.T0", "SType…
$ gb_isolate                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, "145Br"…
$ gb_isolation_source            <chr> NA, "seawater sample", "soil", NA, "air…
$ gb_specimen_voucher            <chr> NA, NA, NA, NA, NA, NA, "Ed Biffin 9102…
$ gb_host                        <chr> NA, NA, NA, NA, NA, NA, NA, "yellow cat…
$ gb_collection_date             <chr> "26-juil.-03", NA, NA, NA, "17-May-2006…
$ gb_environmental_sample        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ gb_country                     <chr> "Japan: Toyama, Tomi-iwa Canal Park", "…
$ gb_lat_lon                     <chr> NA, NA, NA, NA, "40.01 N 105.27 W", NA,…
$ gb_collected_by                <chr> NA, NA, NA, NA, "Noah Fierer", NA, NA, …
$ gb_note                        <chr> NA, "acquired from 2 individual cells c…
$ gb_publication                 <chr> "Serial replacement of diatom endosymbi…
$ gb_authors                     <chr> "Takano,Y.", "Yamaguchi,A., Kawamura,H.…
$ gb_journal                     <chr> "Unpublished", "Unpublished", "Phytopat…
$ eukref_name                    <chr> NA, NA, NA, NA, NA, NA, NA, "YCRPS2", N…
$ eukref_source                  <chr> NA, NA, NA, NA, NA, NA, NA, "Environmen…
$ eukref_env_material            <chr> NA, NA, NA, NA, NA, NA, NA, "ruminal fl…
$ eukref_env_biome               <chr> "freshwater biome", "marine pelagic bio…
$ eukref_biotic_relationship     <chr> "host of diatom symbiont", NA, NA, NA, …
$ eukref_specific_host           <chr> NA, NA, NA, NA, NA, NA, NA, "yellow cat…
$ eukref_geo_loc_name            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ eukref_notes                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pr2_sample_type                <chr> NA, NA, "environmental", "culture", "en…
$ pr2_sample_method              <chr> "single cell isolation", "single cell i…
$ pr2_latitude                   <dbl> NA, NA, NA, NA, 40.010, NA, NA, NA, NA,…
$ pr2_longitude                  <dbl> NA, NA, NA, NA, -105.270, NA, NA, NA, N…
$ pr2_depth                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pr2_ocean                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pr2_sea                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pr2_sea_lat                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pr2_sea_lon                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pr2_country                    <chr> "Japan", "Japan", NA, NA, "United State…
$ pr2_location                   <chr> "Toyama, Tomi-iwa Canal Park", "Hokkaid…
$ pr2_location_geoname           <chr> NA, NA, NA, NA, "Boulder", "Glasgow", N…
$ pr2_location_geotype           <chr> NA, NA, NA, NA, "seat of a second-order…
$ pr2_location_lat               <dbl> NA, NA, NA, NA, 40, 56, NA, NA, NA, NA,…
$ pr2_location_lon               <dbl> NA, NA, NA, NA, -105, -4, NA, NA, NA, N…
$ pr2_sequence_origin            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ metadata_remark                <chr> "Metadata from DinoRef (Mordret S.)", "…
$ pr2_continent                  <chr> "Asia", "Asia", NA, NA, NA, NA, NA, NA,…
$ pr2_country_geocode            <chr> "JP", "JP", NA, NA, NA, NA, NA, NA, "PT…
$ pr2_country_lat                <dbl> 36, 36, NA, NA, NA, NA, NA, NA, 40, NA,…
$ pr2_country_lon                <dbl> 140, 140, NA, NA, NA, NA, NA, NA, -8, N…
$ eukribo_UniEuk_taxonomy_string <chr> "Eukaryota|Diaphoretickes|Sar|Alveolata…
$ eukribo_V4                     <chr> "yes - complete", "yes - complete", NA,…
$ eukribo_V9                     <chr> "yes - partial", "yes - complete", NA, …
$ silva_taxonomy                 <chr> "Eukaryota;SAR;Alveolata;Dinoflagellata…
$ organelle_code                 <chr> "", "", "", "", "", "", "", "", "", "",…
$ species_url                    <glue> "<a href='https://www.marinespecies.or…
# View(pr2)

Use View() to search the database, or filter().

pr2 %>% 
  filter(family == "Pseudocolliniidae")
# A tibble: 25 × 95
   pr2_accession domain supergroup division subdivision class order family genus
   <chr>         <chr>  <chr>      <chr>    <chr>       <chr> <chr> <chr>  <chr>
 1 HQ591486.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 2 HQ591477.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 3 HQ591480.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 4 HM561004.1.9… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 5 HQ591482.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 6 HQ591473.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 7 HQ591485.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 8 HQ591474.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
 9 HQ591479.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
10 HQ591472.1.1… Eukar… TSAR       Alveola… Ciliophora  Olig… Apos… Pseud… Pseu…
# ℹ 15 more rows
# ℹ 86 more variables: species <chr>, genbank_accession <chr>, start <dbl>,
#   end <dbl>, label <chr>, gene <chr>, organelle <chr>,
#   reference_sequence <int>, added_version <chr>, edited_version <chr>,
#   edited_by <chr>, edited_remark <chr>, remark <chr>, taxo_id <int>,
#   taxo_edited_version <chr>, taxo_edited_by <chr>, taxo_remark <chr>,
#   reference <chr>, mixoplankton <chr>, worms_id <int>, worms_marine <chr>, …

Isolate metadata that we want about the taxonomic group and alternate taxonomic names.

load(file = "input-data/taxonomic-lineages.RData")
colnames(taxonomic_lineages)
[1] "Taxon"    "Domain"   "Division" "Phylum"   "Class"    "Order"    "Family"  
[8] "Genus"    "Species" 
pr2_metadata <- pr2 %>% 
  select(Domain = domain, Supergroup = supergroup, Division = division, Phylum = subdivision, Class = class, Order = order, Family = family, Genus = genus, Species = species, gb_taxonomy, metadata_remark, eukribo_UniEuk_taxonomy_string, silva_taxonomy, species_url)
taxonomic_lineages_pr2 <- taxonomic_lineages %>% 
  left_join(pr2_metadata)
Joining with `by = join_by(Domain, Division, Phylum, Class, Order, Family,
Genus, Species)`
glimpse(taxonomic_lineages_pr2)
Rows: 40,013
Columns: 15
$ Taxon                          <chr> "Eukaryota;Alveolata;Ciliophora;Nassoph…
$ Domain                         <chr> "Eukaryota", "Eukaryota", "Eukaryota", …
$ Division                       <chr> "Alveolata", "Stramenopiles", "Strameno…
$ Phylum                         <chr> "Ciliophora", "Ochrophyta", "Sagenista"…
$ Class                          <chr> "Nassophorea", "Pelagophyceae", NA, "Cn…
$ Order                          <chr> "Nassophorea_X", "Pelagomonadales", NA,…
$ Family                         <chr> "Discotrichidae", "Pelagomonadaceae", N…
$ Genus                          <chr> "NASSO_1", "Pelagomonas", NA, NA, "Stro…
$ Species                        <chr> "NASSO_1_sp.", "Pelagomonas_calceolata"…
$ Supergroup                     <chr> NA, NA, NA, NA, NA, NA, NA, "TSAR", "TS…
$ gb_taxonomy                    <chr> NA, NA, NA, NA, NA, NA, NA, "Eukaryota;…
$ metadata_remark                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ eukribo_UniEuk_taxonomy_string <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ silva_taxonomy                 <chr> NA, NA, NA, NA, NA, NA, NA, "Eukaryota;…
$ species_url                    <glue> NA, NA, NA, NA, NA, NA, NA, "Dino-Grou…
dim(taxonomic_lineages)
[1] 1094    9
# write.csv(taxonomic_lineages_pr2, file = "taxonomic-assignments-p2.csv")

2 Compare with Functional Trait Database

Import the data

fxn_trait <- read.delim("input-data/functional-traits-ramond.csv", sep = ";")
glimpse(fxn_trait)
Rows: 2,007
Columns: 37
$ Lineage                 <chr> "Eukaryota", "Eukaryota|Amoebozoa|Amoebozoa_U1…
$ Fam                     <chr> "Undetermined_Eukaryota", "Amoebozoa", "Amoebo…
$ Taxogroup               <chr> "Undetermined_Eukaryota", "Amoebozoa", "Amoebo…
$ Taxo1                   <chr> "Undetermined_Eukaryota", "Amoebozoa", "Amoebo…
$ Last                    <chr> "Eukaryota", "PRSlineage", "Protostelium", "Fl…
$ SizeMin                 <dbl> NA, NA, 20.0, 17.0, 17.0, 70.0, 11.0, 13.5, NA…
$ SizeMax                 <dbl> NA, NA, 100, 80, 80, 185, 192, 50, NA, NA, 40,…
$ Cover                   <chr> NA, "Naked", "Naked", "Naked", "Naked", "Organ…
$ Shape                   <chr> NA, "Amoeboid", "Amoeboid", "Amoeboid", "Amoeb…
$ Spicule                 <int> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Symmetry                <chr> NA, "Asymetrical", "Asymetrical", "Asymetrical…
$ Polarity                <chr> NA, "Heteropolar", "Heteropolar", "Heteropolar…
$ Colony                  <chr> NA, "None", "None", "None", "None", "None", "N…
$ Motility                <chr> NA, "Gliding", "Gliding", "Gliding", "Gliding"…
$ Chloroplast             <int> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Plast_Origin            <chr> NA, "None", "None", "None", "None", "None", "N…
$ Ingestion               <chr> NA, "Phagotrophic", "Saprotrophic", "Phagotrop…
$ Behaviour               <chr> NA, "Active_Ambush_Feeder", "Passive_Ambush_Fe…
$ Mutualistic_Host        <chr> NA, "No", "No", "Other_Mutualist", "Other_Mutu…
$ Symbiontic              <chr> NA, NA, "No", "No", "No", "No", "No", "No", NA…
$ Symbiont_Location       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Host_Specialisation     <chr> NA, NA, NA, "Bacteria", "Bacteria", NA, NA, NA…
$ Symbiont_Specialisation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Prey_Specialisation     <chr> NA, "0", "deadmatter__bacteria__yeasts__and_fu…
$ Mucilage                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Chemical_Signal         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Nutrient_Afinity        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Oxygen_Tolerance        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Salinity                <chr> NA, NA, "Soil", "Aquatic", "Aquatic", "Aquatic…
$ Temperature             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Depth                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Toxygenity              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Benthic_Phase           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Longevity               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cyst_Spore              <int> NA, 1, 1, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1…
$ Ploidy                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Genome_Size             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
colnames(fxn_trait)
 [1] "Lineage"                 "Fam"                    
 [3] "Taxogroup"               "Taxo1"                  
 [5] "Last"                    "SizeMin"                
 [7] "SizeMax"                 "Cover"                  
 [9] "Shape"                   "Spicule"                
[11] "Symmetry"                "Polarity"               
[13] "Colony"                  "Motility"               
[15] "Chloroplast"             "Plast_Origin"           
[17] "Ingestion"               "Behaviour"              
[19] "Mutualistic_Host"        "Symbiontic"             
[21] "Symbiont_Location"       "Host_Specialisation"    
[23] "Symbiont_Specialisation" "Prey_Specialisation"    
[25] "Mucilage"                "Chemical_Signal"        
[27] "Nutrient_Afinity"        "Oxygen_Tolerance"       
[29] "Salinity"                "Temperature"            
[31] "Depth"                   "Toxygenity"             
[33] "Benthic_Phase"           "Longevity"              
[35] "Cyst_Spore"              "Ploidy"                 
[37] "Genome_Size"            
fxn_trait %>% 
  filter(grepl("Pseudocolliniidae", Lineage))
 [1] Lineage                 Fam                     Taxogroup              
 [4] Taxo1                   Last                    SizeMin                
 [7] SizeMax                 Cover                   Shape                  
[10] Spicule                 Symmetry                Polarity               
[13] Colony                  Motility                Chloroplast            
[16] Plast_Origin            Ingestion               Behaviour              
[19] Mutualistic_Host        Symbiontic              Symbiont_Location      
[22] Host_Specialisation     Symbiont_Specialisation Prey_Specialisation    
[25] Mucilage                Chemical_Signal         Nutrient_Afinity       
[28] Oxygen_Tolerance        Salinity                Temperature            
[31] Depth                   Toxygenity              Benthic_Phase          
[34] Longevity               Cyst_Spore              Ploidy                 
[37] Genome_Size            
<0 rows> (or 0-length row.names)

The above returns no findings, so the species does not exist in this database or is under a different name.

We can try again with View()

# View(fxn_trait)

Using a partial text match, I see that Eukaryota|Harosa|Alveolata|Ciliophora|Intramacronucleata|Oligohymenophorea|Apostomatida|Colliniidae|Pseudocollinia is an entry. But now we need to cross check with other rows in our database and see if Colliniidae was a different name at some point. In fact it was!

fxn_trait %>% 
  filter(grepl("Colliniidae", Lineage))
                                                                                                             Lineage
1                Eukaryota|Harosa|Alveolata|Ciliophora|Intramacronucleata|Oligohymenophorea|Apostomatida|Colliniidae
2 Eukaryota|Harosa|Alveolata|Ciliophora|Intramacronucleata|Oligohymenophorea|Apostomatida|Colliniidae|Pseudocollinia
         Fam  Taxogroup        Taxo1           Last SizeMin SizeMax Cover
1 Alveolates Ciliophora Apostomatida    Colliniidae      10     100 Naked
2 Alveolates Ciliophora Apostomatida Pseudocollinia      21      39 Naked
      Shape Spicule    Symmetry    Polarity Colony Motility Chloroplast
1 Elongated       0 Asymetrical Heteropolar   None Attached           0
2 Elongated       0 Asymetrical Heteropolar   None Attached           0
  Plast_Origin   Ingestion             Behaviour Mutualistic_Host Symbiontic
1         None Myzocytotic Passive_Ambush_Feeder             <NA>   Parasite
2         None Myzocytotic Passive_Ambush_Feeder             <NA>   Parasite
  Symbiont_Location Host_Specialisation Symbiont_Specialisation
1              <NA>                <NA>                    <NA>
2              <NA>                <NA>                   Krill
  Prey_Specialisation Mucilage Chemical_Signal Nutrient_Afinity
1                <NA>     <NA>            <NA>             <NA>
2                <NA>     <NA>            <NA>             <NA>
  Oxygen_Tolerance Salinity Temperature Depth Toxygenity Benthic_Phase
1             <NA>     <NA>        <NA>  <NA>       <NA>          <NA>
2             <NA>     <NA>        <NA>  <NA>       <NA>          <NA>
  Longevity Cyst_Spore Ploidy Genome_Size
1        NA          1   <NA>          NA
2        NA          1   <NA>          NA

Now we can take the above information and full in the other descriptive features.

3 Compare with PIDA database

Datbase explores microbe-microbe interactions.

Bjorbækmo MFM, Evenstad A, Røsæg LL, Krabberød AK, Logares R. The planktonic protist interactome: where do we stand after a century of research? ISME J 2020; 14: 544–559.

Reference: https://github.com/ramalok/PIDA/actions

Import PIDA database.

pida <- read.csv("input-data/PIDA_v_1.11_FORMATTED.csv")
head(pida)
  Observation.type Taxonomic.interaction Interaction Accession.org1
1                1            Prot - Bac        symb           <NA>
2                3            Prot - Bac        symb           <NA>
3                3            Prot - Bac        symb         Y19166
4                3            Prot - Bac        symb       KT023596
5                1            Prot - Arc        symb           <NA>
6                1            Prot - Arc        symb           <NA>
  Taxonomic.level.1..org1 Taxonomic.level.2..org1 Taxonomic.level.3..org1
1               Eukaryote               Alveolata          Dinoflagellata
2               Eukaryote               Alveolata              Ciliophora
3               Eukaryote               Alveolata              Ciliophora
4               Eukaryote                  Obazoa               Breviatea
5               Eukaryote               Alveolata              Ciliophora
6               Eukaryote               Alveolata              Ciliophora
   Genus.org1 Species.org1 Accession.org2 Taxonomic.level.1..org2
1  Histioneis      milneri           <NA>              Prokaryote
2 Euplotidium         itoi           <NA>              Prokaryote
3 Euplotidium    arenarium         Y19169              Prokaryote
4     Lenisia       limosa           <NA>              Prokaryote
5  Plagiopyla       nasuta           <NA>              Prokaryote
6     Metopus     striatus           <NA>              Prokaryote
  Taxonomic.level.2..org2 Taxonomic.level.3..org2              Genus.org2
1           Cyanobacteria            Cyanophyceae      unknown_cyanophyte
2         Verrucomicrobia unknown_verrucomicrobia unknown_verrucomicrobia
3         Verrucomicrobia unknown_verrucomicrobia unknown_verrucomicrobia
4          Proteobacteria   Epsilonproteobacteria             Arcrobacter
5                 Archaea         Methanobacteria        Methanobacterium
6                 Archaea         Methanobacteria        Methanobacterium
             Species.org2               Reference
1      unknown_cyanophyte             Gordon 1994
2 unknown_verrucomicrobia     Petroni et al. 2000
3 unknown_verrucomicrobia     Petroni et al. 2000
4                    <NA>      Hamann et al. 2016
5               formicium      Goosen et al. 1988
6               formicium van Bruggen et al. 1984
                                  Link        Database Interaction_ID
1 http://www.jstor.org/stable/24844780 MicroEcoSystems          ID_70
2               10.1073/pnas.030438197 MicroEcoSystems           ID_2
3               10.1073/pnas.030438197 MicroEcoSystems           ID_3
4                  10.1038/nature18297 MicroEcoSystems           ID_4
5                   10.1007/BF00425157 MicroEcoSystems           ID_5
6                   10.1007/BF00692703 MicroEcoSystems           ID_6
       Ecological.interaction    Habitat
1           nitrogen fixation     Marine
2                host defense    Marine 
3                host defense    Marine 
4 hydrogen-oxidizing symbiont     Marine
5       methanogenic symbiont Freshwater
6       methanogenic symbiont     Marine
                                                 Source
1     Water samples (Northern end of the Gulf of Aqaba)
2 tidal pools, Cape Peninsula, Republic of South Africa
3 tidal pools, Cape Peninsula, Republic of South Africa
4                          Intertidal zone (Wadden Sea)
5                        Sediment samples from aquarium
6                                collected from seaweed
# View(pida)

In the PIDA database, org 1 corresponds to the host, and org 2 corresponds to the symbiont. See the github page readme for a complete description of the database here.

The below query doesn’t reveal anything.

pida %>% 
  filter(grepl("Pseudocolli", Taxonomic.level.3..org2)) #this doesn't reveal anything.
 [1] Observation.type        Taxonomic.interaction   Interaction            
 [4] Accession.org1          Taxonomic.level.1..org1 Taxonomic.level.2..org1
 [7] Taxonomic.level.3..org1 Genus.org1              Species.org1           
[10] Accession.org2          Taxonomic.level.1..org2 Taxonomic.level.2..org2
[13] Taxonomic.level.3..org2 Genus.org2              Species.org2           
[16] Reference               Link                    Database               
[19] Interaction_ID          Ecological.interaction  Habitat                
[22] Source                 
<0 rows> (or 0-length row.names)
pida %>% 
  filter(grepl("Pseudocolli", Taxonomic.level.2..org2))
 [1] Observation.type        Taxonomic.interaction   Interaction            
 [4] Accession.org1          Taxonomic.level.1..org1 Taxonomic.level.2..org1
 [7] Taxonomic.level.3..org1 Genus.org1              Species.org1           
[10] Accession.org2          Taxonomic.level.1..org2 Taxonomic.level.2..org2
[13] Taxonomic.level.3..org2 Genus.org2              Species.org2           
[16] Reference               Link                    Database               
[19] Interaction_ID          Ecological.interaction  Habitat                
[22] Source                 
<0 rows> (or 0-length row.names)

The below query doesn’t reveal anything, again.

pida %>% 
  filter(grepl("Pseudocolli", Genus.org2)) #this doesn't reveal anything.
 [1] Observation.type        Taxonomic.interaction   Interaction            
 [4] Accession.org1          Taxonomic.level.1..org1 Taxonomic.level.2..org1
 [7] Taxonomic.level.3..org1 Genus.org1              Species.org1           
[10] Accession.org2          Taxonomic.level.1..org2 Taxonomic.level.2..org2
[13] Taxonomic.level.3..org2 Genus.org2              Species.org2           
[16] Reference               Link                    Database               
[19] Interaction_ID          Ecological.interaction  Habitat                
[22] Source                 
<0 rows> (or 0-length row.names)

When searching the database by text, only a close text match to Pseudocohnilembus in PIDA came up. This is a different organism. So we can determine that this is not related to the species in our database.

4 Phylogenetic relatedness by 18S rRNA gene

4.1 Open tree of life

Explore the Open tree of life website. As specific branches we want to explore will be covered on this website and we plan to submit projects as well. Therefore, we need to be aware of the requirements for submission.

Detailed information on how to submit can be found here. You’ll need a GitHub account to access.

We need to figure out how to explore the Open Tree of Life R package. https://cran.r-project.org/web/packages/rotl/index.html