Contributors | Affiliation | Role |
---|---|---|
Hartline, Daniel K. | University of Hawaii at Manoa (PBRC) | Principal Investigator |
Lenz, Petra H. | University of Hawaii at Manoa (PBRC) | Scientist, Contact |
Cieslak, Matthew C. | University of Hawaii at Manoa (PBRC) | Data Manager |
Merchant, Lynne M. | Woods Hole Oceanographic Institution (WHOI BCO-DMO) | BCO-DMO Data Manager |
These data are further described in the following publications:
Hartline, D. K., Cieslak, M. C., Castelfranco, A. M., Lieberman, B., Roncalli, V., & Lenz, P. H. (2023). De novo transcriptomes of six calanoid copepods (Crustacea): a resource for the discovery of novel genes. Scientific Data, 10(1). https://doi.org/10.1038/s41597-023-02130-1
Roncalli, V., Niestroy, J., Cieslak, M. C., Castelfranco, A. M., Hopcroft, R. R., & Lenz, P. H. (2022). Physiological acclimatization in high‐latitude zooplankton. Molecular Ecology, 31(6), 1753–1765. Portico. https://doi.org/10.1111/mec.16354
Roncalli, V., Cieslak, M. C., Germano, M., Hopcroft, R. R., & Lenz, P. H. (2019). Regional heterogeneity impacts gene expression in the subarctic zooplankter Neocalanus flemingeri in the northern Gulf of Alaska. Communications Biology, 2(1). https://doi.org/10.1038/s42003-019-0565-5
Sample collection: Zooplankton were collected from depth (2015, 2017, 2018, and 2019) at two stations in Prince William Sound: “PWS2” (Lat: 60°32′. N, Long: -147°48.2′ W) and “PWS3” (Lat: 60°40.0′ N, Long: -147°40.0′ W,) and the Gulf station “GAK1” (Lat: 59º50.7′ N, Long: -149º28′ W). Collection date, station and depth stratum for each individual are given in Hartline et al. (2023) and Roncalli et al. (2019). Zooplankton collections were made using vertical net tows with either a QuadNet with two 150 µm and two 53 µm mesh nets (April and May collections), or a multiple opening and closing plankton net (0.25 m2 cross-sectional area; 150 μm mesh nets; Multinet-Midi, Hydro-Bios; September collections). Zooplankton samples were diluted, and copepods were sorted under a dissection microscope to select individuals from the target species. Briefly, live and undamaged individuals were identified and staged using morphological criteria and preserved in RNALater Stabilization Reagent. Preserved copepods were frozen first in -20ºC during the cruises, and then transferred to −80°C until further processing. Species identification were confirmed through the COI sequence in the assembled transcriptomes.
Total RNA extraction, library construction, RNA sequencing and quality control: For each target species, total RNA was extracted from individuals using QIAGEN RNeasy Plus Mini Kit (catalog # 74134) in combination with a Qiashredder column (catalog # 79654). Selection for sequencing was based on high RNA yields and purity of extraction (RIN>8). The final list included pre-adults (CV) for Neocalanus flemingeri (n=3), Neocalanus cristatus (n=1), Calanus marshallae (n=2), Eucalanus bungii (n=1), an adult male (developmental stage CVI) for Neocalanus plumchrus (n=1) and an adult female for Metridia pacifica (n=1). Total RNA was shipped on dry ice to the Georgia Genomics Bioinformatics Core (https://dna.uga.edu) for RNA-Seq. There, double-stranded cDNA libraries (KAPA Stranded mRNA-Seq Kit, with KAPA mRNA Capture Beads (cat #KK8421]) from each individual were multiplexed and sequenced using an Illumina Next-Seq 500 instrument (High-Output Flow Cell, 150 bp, paired end). Quality of each RNA-Seq library was reviewed with the FastQC software28. From each RNA-Seq library, low quality reads were removed using FASTQ Toolkit (v. 2.2.5 within BaseSpace). Illumina adaptors, reads <50 bp long, reads with an average Phred score <30 and the first 12 bp from each read, were removed from each library. The same workflow was applied to all nine datasets.
De novo assembly, mapping, core-gene statistics: Individual de novo transcriptomes were generated from each RNA-Seq dataset at the National Center for Genome Analysis Support's (NCGAS; Indiana University, Bloomington, IN, USA) Mason Linux cluster using Trinity software (v. 2.4.0, except N. plumchrus, v. 2.0.6). Initial evaluation involved self-mapping of reads against the respective de novo assembly using Bowtie2 software (v. 2.3.5.1). Completeness of each de novo assembly was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) software31 by searching each assembly for the presence of eukaryote “core” genes using the Arthropoda database as reference (BUSCO version 5.3.2, dataset: arthropoda_odb10 (2020-09-10, 90 genomes, 1,013 BUSCOs). RNA-Seq data and transcriptome shotgun assemblies (TSAs) have been deposited with links to BioProject accession numbers PRJNA496596, and PRJNA662858 in the NCBI (National Center for Biotechnology Information) BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/).
Functional annotation: Assemblies were functionally annotated against the NCBI Swiss-Prot protein and UniProt databases. Initial annotations were obtained by using the BLASTx algorithm on a local BLAST webserver with a Beowulf cluster using the Swiss-Prot protein database (downloaded February 2021) as reference and a threshold E-value of 10-5. Transcripts with BLAST annotations were then searched against the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases using UniProt.
# Processing of submitted annotated individual transcriptome files and metadata
* Submitted annotated transcriptome files
Filename: n-flem_CV2015_GAK1_AccAnnots_multispp_Aug23_q-fix-2.csv
Description: Neocalanus flemingeri, annotated transcriptome #1
Filename: Nf2018_CV_Acc&Annot-single-rev_quoted.csv
Description: Neocalanus flemingeri, annotated transcriptome #2
Filename: Nf2019_CV-PWS2_Acc&Annot.csv
Description: Neocalanus flemingeri, annotated transcriptome #3
Filename: Np2015_maleR1_Acc&Annot.csv
Description: Neocalanus plumchrus, annotated transcriptome, adult male
Filename: Nc2017_CV_Acc&Annot.csv
Description: Neocalanus cristatus, annotated transcriptome
Filename: Cm2017_CV_Acc&Annot.csv
Description: Calanus marshallae, annotated transcriptome #1
Filename: Cm2018_CV_Acc&Annot.csv
Description: Calanus marshallae, annotated transcriptome #2
Filename: Eb2017_CV_Acc&Annot.csv
Description: Eucalanus bungii, annotated transcriptome
Filename: Mp2017_AF_Acc&Annot.csv
Description: Metridia pacifica, annotated transcriptome, adult female
* The metadata table with NCBI accessions was created using information from the submitter and by gathering accession numbers from NCBI for the BioProjects.
Metadata from Submitter via email:
File Name: Summary Table for the Annotated Transcriptomes in BCO.docx
Title: Summary table from Lenz of metadata for transcriptome annotation files
Table columns
Species: Species of the collected organisms
Annot Filename: Name of the submitted annotated transcriptomes file
Stage/Collection information: Life Stage, Sex if indicated, Collection date, station, depth range in meters
NCBI TSA#: NCBI TSA project accession
Fixes: Changed GHLB01000000 to GHLB00000000 because the TSA project master records accessions begin with a four-letter prefix of the TSA project followed by eight zeroes
* Clarifying metadata values with submitter
Looking at corresponding NCBI BioSample records, the depth listed is not within the collection depth range listed in the file “Summary Table for the Annotated Transcriptomes in BCO.docx”. To clarify this difference, I emailed a spreadsheet with both the BioSample depth listed and the depth ranges given. From correspondence, the BioSample depth refers to the depth of the water column at a station lat and lon and not the collection depth range of a plankton net. So the BioSample depth values are not used in the metadata table.
File name: transcriptomes_species_date_station_depth_table-PL-1.xlsx
Title: Submitter modified version of DM created metadata file
# Using the BCO-DMO data processor laminar, load in submitted annotated transcriptomes files and the DM created metadata table. Keep the fill value '#N/A' which stands for 'no result found'.
* Data Manager (DM) created metadata table used :
File name: transcriptome_annotations_metadata_table.csv
Title: DM created metadata table of supporting information for each transcriptome annotation
Station lat and lon values retrieved from the 'Methods & Sampling' section of the submission tool. They were then converted to decimal degrees with 4 digit precision.
A column TSA_project_accession was included to join the metadata table with the annotated transcriptome files later in processing, and is removed in the final metadata table.
From the National Center for Biotechnology Information (NCBI), various accession numbers, titles, and urls were located for each annotated transcriptome file.
Table columns:
Species, Station, Latitude, Longitude, date, Depth_range, Maximum_depth, Life_stage, Sex, TSA_project_accession, TSA_master_accession, SRA_accession, Experiment_title, Experiement_accession, BioSample, Sample_accession, Study_title, Study_accession, BioProject
- Create a new column for each annotated transcriptome file, TSA_project_accession, to join on the metadata table using regular expressions to extract the first 4 letters of each TSA accession number and then concatenate 8 zeros. This is the NCBI format for a TSA project accession number.
- Join each annotated transcriptome file with the metadata table on the key TSA_project_accession.
- Remove the parameter TSA_project_accession, since it was only used for the Join process. The TSA project accession number with the version is already recorded as the parameter TSA_master_accession.
- Add a suffix ‘.1’ to each TSA accession number to indicate version 1.
- Rename Accession# to Genbank_accession which is the TSA accession value with a version.
- Rename parameters in each annotated transcriptome file by removing commas and parentheses. And replacing spaces with underscores to follow BCO-DMO parameter naming conventions.
- Reorder the parameters so that experiment metadata parameters are at the beginning of each annotated transcriptome file and most NCBI accession numbers are at the end
- Using a second laminar processor pipeline, load in the individual annotated transcriptomes files that were created by laminar processing above and concatenate all the individual annotated transcriptome files into the primary dataset file.
Parameter | Description | Units |
seq_id | The Trinity software names assembled by Trinity software are named hierarchically grouping sequences by similarity | unitless |
Genbank_accession | NCBI (National Center for Biotechnology Information) Accession number for nucleotide sequence in the Transcriptome Shotgun Assembly (TSA) Sequence Database. The accession number is a unique identifier assigned to a record in the sequence database GenBank at NCBI. It is of the format [alphabetical prefix][series of digits].[version]. A change in the record is tracked by an integer extension of the accession number, an Accession.version identifier. The initial version of a sequence has the extension “.1”. | unitless |
Genbank_accession_url | URL of the GenBank accession | unitless |
TSA_Master_Accession | NCBI (National Center for Biotechnology Information) Accession number for the Transcriptome Shotgun Assembly (TSA) master record. The accession number is a unique identifier assigned to a master record in the database GenBank at NCBI. It is of the format [alphabetical prefix][00000000].[version]. A change in the record is tracked by an integer extension of the accession number, an Accession.version identifier. The initial version of a sequence has the extension “.1”. | unitless |
Species | A taxonomic binomial that consists of a genus name followed by the species name | unitless |
Station | Station identifier | unitless |
Latitude | Sampling location latitude, south is negative | decimal degrees |
Longitude | Sampling location longitude, west is negative | decimal degrees |
Collection_date | Collection date of organism | unitless |
Depth_range | Collection depth range | meters (m) |
Maximum_depth | Maximum depth of collection area | meters (m) |
Life_stage | Organism life history stage | unitless |
Sex | sex | unitless |
Entry | Uniprot KB entry identifies the top BLAST (Basic Local Alignment Search Tool) hit sequence to the assembled nucleotide sequence, searches were conducted using the blastx algorithm, #N/A indicates that there was no positive BLAST hit that met the e-value threshold | unitless |
Entry_name | Uniprot entry name for the top hit, #N/A = no hit | unitless |
evalue | Expected probability, e-value is the number of expected hits of similar quality (score) that could be found just by chance, cut-off value used for the annotation: E-value = 10-5, #N/A = no hit | unitless |
Protein_names | Protein names, #N/A = no hit | unitless |
Gene_names | Gene name based on the reference genome of the top hit, #N/A = no hit | unitless |
Organism | Species name of top hit sequence, #N/A = no hit | unitless |
Cross_reference_KEGG | KEGG identification based on the cross-reference of protein annotation and the Kyoto Encyclopedia of Genes and Genomes database, #N/A = no hit | unitless |
Gene_ontology_IDs | GO terms based on cross-reference of protein annotation to the Gene Ontology Resource for functional identification as to Biological Process (BP), Cellular Component (CC) and Molecular Function (MF), proteins are typically involved in multiple processes, #N/A = no hit | unitless |
Gene_ontology_GO | Identification of GO term with a functional description, #N/A = no hit | unitless |
Gene_ontology_biological_process | Gene ontology terms and descriptions associated with Biological Process, #N/A = no hit | unitless |
Gene_ontology_cellular_component | Gene ontology terms and descriptions associated with Cellular Component, #N/A = no hit | unitless |
Gene_ontology_molecular_function | Gene ontology terms and descriptions associated with Molecular Function, #N/A = no hit | unitless |
Dataset-specific Instrument Name | QuadNet |
Generic Instrument Name | Plankton Net |
Dataset-specific Description | Two 150 µm and two 53 µm mesh nets |
Generic Instrument Description | A Plankton Net is a generic term for a sampling net that is used to collect plankton. It is used only when detailed instrument documentation is not available. |
Dataset-specific Instrument Name | Multinet-Midi, Hydro-Bios |
Generic Instrument Name | MultiNet |
Dataset-specific Description | Hydro-Bios Multinet-Midi with a 0.25 m2 cross-sectional area and 150 μm mesh |
Generic Instrument Description | The MultiNet© Multiple Plankton Sampler is designed as a sampling system for horizontal and vertical collections in successive water layers. Equipped with 5 or 9 net bags, the MultiNet© can be delivered in 3 sizes (apertures) : Mini (0.125 m2), Midi (0.25 m2) and Maxi (0.5 m2). The system consists of a shipboard Deck Command Unit and a stainless steel frame to which 5 (or 9) net bags are attached by means of zippers to canvas. The net bags are opened and closed by means of an arrangement of levers that are triggered by a battery powered Motor Unit. The commands for actuation of the net bags are given via single or multi-conductor cable between the Underwater Unit and the Deck Command Unit. Although horizontal collections typically use a mesh size of 300 microns, mesh sizes from 100 to 500 may also be used. Vertical collections are also common. The shipboard Deck Command Unit displays all relevant system data, including the actual operating depth of the net system. |
Dataset-specific Instrument Name | Illumina Next-Seq 500 |
Generic Instrument Name | Automated DNA Sequencer |
Dataset-specific Description | Desktop sequencer |
Generic Instrument Description | General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. |
Dataset-specific Instrument Name | Dissection microscope |
Generic Instrument Name | Microscope - Optical |
Dataset-specific Description | An optical microscope variant |
Generic Instrument Description | Instruments that generate enlarged images of samples using the phenomena of reflection and absorption of visible light. Includes conventional and inverted instruments. Also called a "light microscope". |
Website | |
Platform | R/V Tiglax |
Report | |
Start Date | 2018-09-11 |
End Date | 2018-09-25 |
Description | NGA LTER Fall cruise |
Website | |
Platform | R/V Tiglax |
Report | |
Start Date | 2019-04-26 |
End Date | 2019-05-08 |
Description | NGA LTER Summer cruise |
Website | |
Platform | R/V Tiglax |
Start Date | 2015-09-09 |
End Date | 2015-09-21 |
Description | Latitude North boundary (decimal degrees): 60.5298
Latitude South boundary (decimal degrees): 57.7747
Longitude West Boundary (decimal degrees): -149.4755
Longitude East Boundary (decimal degrees): -147.5105 |
Website | |
Platform | R/V Tiglax |
Start Date | 2017-09-09 |
End Date | 2017-09-22 |
Description | Latitude North boundary (decimal degrees): 60.6753
Latitude South boundary (decimal degrees): 57.7923
Longitude West Boundary (decimal degrees): . -149.4853
Longitude East Boundary (decimal degrees): -147.503 |
Website | |
Platform | R/V Tiglax |
Start Date | 2015-05-05 |
End Date | 2015-05-11 |
NSF Award Abstract:
The sub-arctic Pacific sustains major fisheries with nearly all commercially important species depending either directly or indirectly on lipid-rich copepods (Neocalanus flemingeri, Neocalanus plumchrus, Neocalanus cristatus and Calanus marshallae). In turn, these species depend on a short-lived spring algal bloom for growth and the accumulation of lipid stores in order to complete an annual life cycle that includes a period of dormancy. The intellectual thrust of this project measures how the timing and magnitude of algal blooms affect preparation for dormancy using a combination of field and experimental observations. The Northern Gulf of Alaska - with four calanid species that experience dormancy, steep environmental gradients, well-described phytoplankton bloom dynamics, and a concurrent NSF-LTER program - provides an unusual opportunity to identify the factors that affect dormancy preparation. Education and outreach plans are integrated with the research. Educational efforts focus on interdisciplinary opportunities for undergraduate, graduate and post-doctoral trainees. The project will generate content for existing graduate and undergraduate courses. U. of Alaska Fairbanks and U. Hawaii at Manoa are Alaska Native and Native Hawaiian Serving Institutions, and students from these groups will be recruited to participate in the project. Because fishing is a major industry in the Gulf of Alaska, outreach will communicate the role copepods play in marine ecosystems using the concept of a dynamic food web tied to production cycles.
Diapause (dormancy) and the accompanying accumulation of lipids in copepods have been identified as key drivers in high latitude ecosystems that support economically important fisheries, including those of the Gulf of Alaska. While the disappearance of lipid-rich copepods has been linked to severe declines in fish stocks, little is known about the environmental conditions that are required for the successful completion of the copepod's life cycle. A physiological profiling approach that measures relative gene expression will be used to test two alternative hypotheses: the lipid accumulation window hypothesis, which holds that individuals enter diapause only after they have accumulated sufficient lipid stores, and the developmental program hypothesis, which holds that once the diapause program is activated, progression occurs independent of lipid accumulation. The specific objectives are: 1) determine the effect of food levels during N. flemingeri copepodite stages on progression towards diapause using multiple physiological and developmental markers; 2) characterize the seasonal changes in the physiological profile of N. flemingeri across environmental gradients and across years; 3) compare physiological profiles across co-occurring calanid species (N. flemingeri, Neocalanus plumchrus, Neocalanus cristatus and Calanus marshallae); and 4) estimate the reproductive potential of the overwintering populations of N. flemingeri. The broader scientific significance includes the acquisition of new genomic data and molecular resources that will be made publicly available through established data repositories, and the development of new tools for routinely obtaining physiological profiles of copepods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NOTE: Petra Lenz is a former Principal Investigator (PI) and Andrew Christie is a former Co-Principal Investigator (Co-PI) on this project (award #1756767). Daniel Hartline is the PI listed for the award #1756767 and is now a former Co-PI on this project.
Funding Source | Award |
---|---|
NSF Division of Ocean Sciences (NSF OCE) | |
NSF Division of Ocean Sciences (NSF OCE) | |
NSF Division of Ocean Sciences (NSF OCE) |