Annotated de novo transcriptomes generated from six co-occurring species of calanoid copepods from the R/V Tiglax in the Gulf of Alaska from 2015 to 2019

Website: https://www.bco-dmo.org/dataset/908689
Data Type: experimental
Version: 1
Version Date: 2023-09-19

Project
» Collaborative Research: Molecular profiling of the ecophysiology of dormancy induction in calanid copepods of the Northern Gulf of Alaska LTER site (Diapause preparation)
ContributorsAffiliationRole
Hartline, Daniel K.University of Hawaii at Manoa (PBRC)Principal Investigator
Lenz, Petra H.University of Hawaii at Manoa (PBRC)Scientist, Contact
Cieslak, Matthew C.University of Hawaii at Manoa (PBRC)Data Manager
Merchant, Lynne M.Woods Hole Oceanographic Institution (WHOI BCO-DMO)BCO-DMO Data Manager

Abstract
The dataset includes the annotation files of nine high-quality de novo transcriptomes generated from shotgun assemblies of short-sequence reads. The species are ecologically-important members of sub-arctic North Pacific marine zooplankton communities. The de novo assemblies included one generated several years ago plus eight new ones generated from six co-occurring species of calanoid copepods in the Gulf of Alaska. The transcriptomes include the first published ones for Neocalanus plumchrus, Neocalanus cristatus, Eucalanus bungii and Metridia pacifica and three for Neocalanus flemingeri and two for Calanus marshallae. Total RNA from single individuals was used to construct gene libraries that were sequenced on an Illumina Next-Seq platform. Short-sequence reads were assembled with Trinity software and resulting transcripts were annotated using the SwissProt database with additional functional annotation using gene ontology terms and enzyme function. The annotations files are the first ones published for these species. The integrated database can be used for quantitative inter- and intra-species comparisons of gene expression patterns across biological processes using the annotations.


Coverage

Spatial Extent: N:60.667 E:-147.667 S:59.845 W:-149.467
Temporal Extent: 2015-05-10 - 2019-04-30

Methods & Sampling

Sample collection: Zooplankton were collected from depth (2015, 2017, 2018, and 2019) at two stations in Prince William Sound: “PWS2” (Lat: 60°32′. N, Long: -147°48.2′ W, depth 798 m) and “PWS3” (Lat: 60°40.0′ N, Long: -147°40.0′ W, depth 742 m) and the Gulf station “GAK1” (Lat: 59º50.7′ N, Long: -149º28′ W, depth 264 m). Collection date, station and depth stratum for each individual are given in Hartline et al. (2023) and Roncalli et al. (2019). Zooplankton collections were made using vertical net tows with either a QuadNet with two 150 µm and two 53 µm mesh nets (April and May collections), or a multiple opening and closing plankton net (0.25 m2 cross-sectional area; 150 μm mesh nets; Multinet-Midi, Hydro-Bios; September collections). Zooplankton samples were diluted, and copepods were sorted under a dissection microscope to select individuals from the target species. Briefly, live and undamaged individuals were identified and staged using morphological criteria and preserved in RNALater Stabilization Reagent. Preserved copepods were frozen first in -20ºC during the cruises, and then transferred to −80°C until further processing. Species identification were confirmed through the COI sequence in the assembled transcriptomes.

Total RNA extraction, library construction, RNA sequencing and quality control: For each target species, total RNA was extracted from individuals using QIAGEN RNeasy Plus Mini Kit (catalog # 74134) in combination with a Qiashredder column (catalog # 79654). Selection for sequencing was based on high RNA yields and purity of extraction (RIN>8). The final list included pre-adults (CV) for Neocalanus flemingeri (n=3), Neocalanus cristatus (n=1), Calanus marshallae (n=2), Eucalanus bungii (n=1), an adult male (developmental stage CVI) for Neocalanus plumchrus (n=1) and an adult female for Metridia pacifica (n=1). Total RNA was shipped on dry ice to the Georgia Genomics Bioinformatics Core (https://dna.uga.edu) for RNA-Seq. There, double-stranded cDNA libraries (KAPA Stranded mRNA-Seq Kit, with KAPA mRNA Capture Beads (cat #KK8421]) from each individual were multiplexed and sequenced using an Illumina Next-Seq 500 instrument (High-Output Flow Cell, 150 bp, paired end). Quality of each RNA-Seq library was reviewed with the FastQC software28. From each RNA-Seq library, low quality reads were removed using FASTQ Toolkit (v. 2.2.5 within BaseSpace). Illumina adaptors, reads <50 bp long, reads with an average Phred score <30 and the first 12 bp from each read, were removed from each library. The same workflow was applied to all nine datasets.


Data Processing Description

De novo assembly, mapping, core-gene statistics: Individual de novo transcriptomes were generated from each RNA-Seq dataset at the National Center for Genome Analysis Support's (NCGAS; Indiana University, Bloomington, IN, USA) Mason Linux cluster using Trinity software (v. 2.4.0, except N. plumchrus, v. 2.0.6). Initial evaluation involved self-mapping of reads against the respective de novo assembly using Bowtie2 software (v. 2.3.5.1). Completeness of each de novo assembly was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) software31 by searching each assembly for the presence of eukaryote “core” genes using the Arthropoda database as reference (BUSCO version 5.3.2, dataset: arthropoda_odb10 (2020-09-10, 90 genomes, 1,013 BUSCOs). RNA-Seq data and transcriptome shotgun assemblies (TSAs) have been deposited with links to BioProject accession numbers PRJNA496596, and PRJNA662858 in the NCBI (National Center for Biotechnology Information) BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/).

Functional annotation: Assemblies were functionally annotated against the NCBI Swiss-Prot protein and UniProt databases. Initial annotations were obtained by using the BLASTx algorithm on a local BLAST webserver with a Beowulf cluster using the Swiss-Prot protein database (downloaded February 2021) as reference and a threshold E-value of 10-5. Transcripts with BLAST annotations were then searched against the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases using UniProt.


[ table of contents | back to top ]

Related Publications

Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Software
Hartline, D. K., Cieslak, M. C., Castelfranco, A. M., Lieberman, B., Roncalli, V., & Lenz, P. H. (2023). De novo transcriptomes of six calanoid copepods (Crustacea): a resource for the discovery of novel genes. Scientific Data, 10(1). https://doi.org/10.1038/s41597-023-02130-1
Results
Roncalli, V., Cieslak, M. C., Germano, M., Hopcroft, R. R., & Lenz, P. H. (2019). Regional heterogeneity impacts gene expression in the subarctic zooplankter Neocalanus flemingeri in the northern Gulf of Alaska. Communications Biology, 2(1). https://doi.org/10.1038/s42003-019-0565-5
Methods
Roncalli, V., Niestroy, J., Cieslak, M. C., Castelfranco, A. M., Hopcroft, R. R., & Lenz, P. H. (2022). Physiological acclimatization in high‐latitude zooplankton. Molecular Ecology, 31(6), 1753–1765. Portico. https://doi.org/10.1111/mec.16354
Results

[ table of contents | back to top ]

Related Datasets

IsDerivedFrom
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GHLB00000000.1, TSA: Neocalanus flemingeri, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GHLB00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJAO00000000.1, TSA: Metridia pacifica isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJAO00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJRF00000000.1, TSA: Calanus marshallae isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJRF00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJRG00000000.1, TSA: Eucalanus bungii isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJRG00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJRH00000000.1, TSA: Neocalanus cristatus isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJRH00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJRL00000000.1, TSA: Calanus marshallae isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJRL00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJRT00000000.1, TSA: Neocalanus flemingeri, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJRT00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJRU00000000.1, TSA: Neocalanus plumchrus isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJRU00000000.1
Nucleotide [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – . Accession No. GJSD00000000.1, TSA: Neocalanus flemingeri isolate Monoisolate, transcriptome shotgun assembly; [cited 2023 Oct 3]. Available from: https://www.ncbi.nlm.nih.gov/nuccore/GJSD00000000.1

[ table of contents | back to top ]

Parameters

Parameters for this dataset have not yet been identified


[ table of contents | back to top ]

Instruments

Dataset-specific Instrument Name
QuadNet
Generic Instrument Name
Plankton Net
Dataset-specific Description
Two 150 µm and two 53 µm mesh nets
Generic Instrument Description
A Plankton Net is a generic term for a sampling net that is used to collect plankton. It is used only when detailed instrument documentation is not available.

Dataset-specific Instrument Name
Multinet-Midi, Hydro-Bios
Generic Instrument Name
MultiNet
Dataset-specific Description
Hydro-Bios Multinet-Midi with a 0.25 m2 cross-sectional area and 150 μm mesh
Generic Instrument Description
The MultiNet© Multiple Plankton Sampler is designed as a sampling system for horizontal and vertical collections in successive water layers. Equipped with 5 or 9 net bags, the MultiNet© can be delivered in 3 sizes (apertures) : Mini (0.125 m2), Midi (0.25 m2) and Maxi (0.5 m2). The system consists of a shipboard Deck Command Unit and a stainless steel frame to which 5 (or 9) net bags are attached by means of zippers to canvas. The net bags are opened and closed by means of an arrangement of levers that are triggered by a battery powered Motor Unit. The commands for actuation of the net bags are given via single or multi-conductor cable between the Underwater Unit and the Deck Command Unit. Although horizontal collections typically use a mesh size of 300 microns, mesh sizes from 100 to 500 may also be used. Vertical collections are also common. The shipboard Deck Command Unit displays all relevant system data, including the actual operating depth of the net system.

Dataset-specific Instrument Name
Illumina Next-Seq 500
Generic Instrument Name
Automated DNA Sequencer
Dataset-specific Description
Desktop sequencer that can carry out whole transcriptome sequencing
Generic Instrument Description
General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step.


[ table of contents | back to top ]

Deployments

TXF18

Website
Platform
R/V Tiglax
Report
Start Date
2018-09-11
End Date
2018-09-25
Description
NGA LTER Fall cruise

TXS19

Website
Platform
R/V Tiglax
Report
Start Date
2019-04-26
End Date
2019-05-08
Description
NGA LTER Summer cruise

TXF15

Website
Platform
R/V Tiglax
Start Date
2015-09-09
End Date
2015-09-21
Description
Latitude North boundary (decimal degrees): 60.5298 Latitude South boundary (decimal degrees): 57.7747 Longitude West Boundary (decimal degrees): -149.4755 Longitude East Boundary (decimal degrees):  -147.5105

TXF17

Website
Platform
R/V Tiglax
Start Date
2017-09-09
End Date
2017-09-22
Description
Latitude North boundary (decimal degrees): 60.6753 Latitude South boundary (decimal degrees): 57.7923 Longitude West Boundary (decimal degrees): . -149.4853 Longitude East Boundary (decimal degrees): -147.503


[ table of contents | back to top ]

Project Information

Collaborative Research: Molecular profiling of the ecophysiology of dormancy induction in calanid copepods of the Northern Gulf of Alaska LTER site (Diapause preparation)

Coverage: Northern Gulf of Alaska LTER


NSF Award Abstract:
The sub-arctic Pacific sustains major fisheries with nearly all commercially important species depending either directly or indirectly on lipid-rich copepods (Neocalanus flemingeri, Neocalanus plumchrus, Neocalanus cristatus and Calanus marshallae). In turn, these species depend on a short-lived spring algal bloom for growth and the accumulation of lipid stores in order to complete an annual life cycle that includes a period of dormancy. The intellectual thrust of this project measures how the timing and magnitude of algal blooms affect preparation for dormancy using a combination of field and experimental observations. The Northern Gulf of Alaska - with four calanid species that experience dormancy, steep environmental gradients, well-described phytoplankton bloom dynamics, and a concurrent NSF-LTER program - provides an unusual opportunity to identify the factors that affect dormancy preparation. Education and outreach plans are integrated with the research. Educational efforts focus on interdisciplinary opportunities for undergraduate, graduate and post-doctoral trainees. The project will generate content for existing graduate and undergraduate courses. U. of Alaska Fairbanks and U. Hawaii at Manoa are Alaska Native and Native Hawaiian Serving Institutions, and students from these groups will be recruited to participate in the project. Because fishing is a major industry in the Gulf of Alaska, outreach will communicate the role copepods play in marine ecosystems using the concept of a dynamic food web tied to production cycles.

Diapause (dormancy) and the accompanying accumulation of lipids in copepods have been identified as key drivers in high latitude ecosystems that support economically important fisheries, including those of the Gulf of Alaska. While the disappearance of lipid-rich copepods has been linked to severe declines in fish stocks, little is known about the environmental conditions that are required for the successful completion of the copepod's life cycle. A physiological profiling approach that measures relative gene expression will be used to test two alternative hypotheses: the lipid accumulation window hypothesis, which holds that individuals enter diapause only after they have accumulated sufficient lipid stores, and the developmental program hypothesis, which holds that once the diapause program is activated, progression occurs independent of lipid accumulation. The specific objectives are: 1) determine the effect of food levels during N. flemingeri copepodite stages on progression towards diapause using multiple physiological and developmental markers; 2) characterize the seasonal changes in the physiological profile of N. flemingeri across environmental gradients and across years; 3) compare physiological profiles across co-occurring calanid species (N. flemingeri, Neocalanus plumchrus, Neocalanus cristatus and Calanus marshallae); and 4) estimate the reproductive potential of the overwintering populations of N. flemingeri. The broader scientific significance includes the acquisition of new genomic data and molecular resources that will be made publicly available through established data repositories, and the development of new tools for routinely obtaining physiological profiles of copepods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

 

NOTE: Petra Lenz is a former Principal Investigator (PI) and Andrew Christie is a former Co-Principal Investigator (Co-PI) on this project (award #1756767). Daniel Hartline is the PI listed for the award #1756767 and is now a former Co-PI on this project.



[ table of contents | back to top ]

Funding

Funding SourceAward
NSF Division of Ocean Sciences (NSF OCE)
NSF Division of Ocean Sciences (NSF OCE)
NSF Division of Ocean Sciences (NSF OCE)

[ table of contents | back to top ]