GTseq DNA sequencing of Kelletia kelletii collected in California, USA and Baja, Mexico from 2015 to 2017

Website: https://www.bco-dmo.org/dataset/995189
Data Type: Other Field Results
Version: 1
Version Date: 2026-03-19

Project
» Collaborative Research: RUI: Combined spatial and temporal analyses of population connectivity during a northern range expansion (KW connectivity)
ContributorsAffiliationRole
White, CrowCalifornia Polytechnic State University (Cal Poly)Principal Investigator
Christie, MarkPurdue UniversityCo-Principal Investigator
Toonen, Robert J.University of Hawaiʻi at Mānoa (HIMB)Co-Principal Investigator
Davidson, JeanCalifornia Polytechnic State University (Cal Poly)Scientist
Daniels, BenjaminOregon State University (OSU)Student
Lee, AndyPurdue UniversityStudent
López, CataixaHawaii Pacific University (HPU)Student
York, Amber D.Woods Hole Oceanographic Institution (WHOI BCO-DMO)BCO-DMO Data Manager

Abstract
Climate-driven warming and changes in major ocean currents enable poleward transport and range expansions of many marine species. Here, we report the population genetic assignment of post-dispersal recruits to pre-dispersal natal source locations for the gastropod Kellet’s whelk (Kelletia kelletii), a commercial fisheries species and subtidal predator with top-down food web effects, whose populations have recently undergone climate-driven northward range expansion. This dataset includes sample id, collection site and year, tissue type, length, and extraction information for samples sequenced. Samples were collected between 2015-2017. We genotyped 2,874  individuals from 24 locations across the species’ entire biogeographic range using 305 genotyping-in-thousands by sequencing (GT-Seq) loci. Analysis shows a large contribution of 1-year old recruits from the historical range to the expanded range, variable post-settlement survival of those recruits in the expanded range in relation to their natal origin, and that El Niño Southern Oscillation may play a role in long distance dispersal.


Coverage

Location: Southern and Central California, USA subtidal coastal waters 
Spatial Extent: N:36.618167 E:-114.36197 S:27.15326 W:-121.939167
Temporal Extent: 2015-06-19 - 2017-08-19

Dataset Description

"KW" or "kw" in this dataset's files, sample IDs, or metadata indicate the organism of interest:
Kellet’s whelk, gastropod, Kelletia kelletii, LSID (urn:lsid:marinespecies.org:taxname:491054)


Methods & Sampling

Field Collections 

Using SCUBA, we collected adult and recruit Kellet's whelks by hand from sub-tidal (approximately 15 m depth) locations across Kellet’s whelk’s entire biogeographic range, from Isla San Roque in Baja California, Mexico to Monterey Bay in California, USA. Collections occurred across three years, from 2015 to 2017. We used Kellet’s whelk’s growth function to classify the ages of recruit whelks based on length (White et al., 2025).

DNA extraction

DNA was extracted from whelk tissue using a Salting-out protocol (described in detail by Daniels, et al., 2023), cleaned using the ZR-96 DNA Clean-Up Kit (Zymo Research, USA), and sequenced using the generated GT-Seq panel. 

Sequencing

We developed a novel GT-Seq panel (Campbell et al., 2015) using SNPs found on differentially expressed genes between Kellet’s whelks’ expanded and historical range sites (MON and NAP respectively) (Lee et al., 2024). Individuals were genotyped by GTSeek (Twin Falls, ID). 


Data Processing Description

Briefly, RNA reads from the expanded and historical ranges were aligned to a de novo reference transcriptome (Daniels, et al., 2023) using bowtie2/2.4.2 (Langmead & Salzberg, 2012), sorted using samtools/1.9 (Li et al., 2009), and merged using stringtie2/2.1.1 (Kovaka et al., 2019). The count matrix was created using the featureCounts tool of subread/2.0.1 (Liao et al., 2014). Differentially expressed genes (DEGs) were identified using the R package DESeq2/1.34 (Love et al., 2014) using a minimum significance threshold of 0.05 after false discovery rate correction via the Benjamini–Hochberg method (Benjamini & Hochberg, 1995). SNPs were identified on DEG contigs using the GATK pipeline (Van der Auwera & O’Connor, 2020), following best practices for RNA-Seq. We then selected 1,000 SNPs with the highest pairwise FST values for GT-Seq multi-plex primer design by GTSeek (Twin Falls, ID). 


BCO-DMO Processing Description

- Loaded two CSV files: "kw_gtseq_allinds_samplemeta v3.csv" as resource "995189_v1_kw_gtseq_allinds_samplemeta" and "kw_gtseq_genotypes_combined v3.csv" as resource "kw_gtseq_genotypes"; both with missing values defined as empty strings and "NA"
- Applied extensive field-level metadata (descriptions, standard name IDs, supplied units) to all fields in the sample metadata resource
- Renamed four columns in the sample metadata resource: "OriginalCalculatedConcentration_ng_ul_" → "OriginalCalculatedConcentration_ng_ul", "ExtractionCap_" → "ExtractionCap", "Elution_TEBuffer_" → "Elution_TEBuffer", "DNAYield_ug_" → "DNAYield_ug"
- Time was not provided so removed trailing " 0:00" time padding from six date fields (DateOfExtraction, SurveyDate, tfert_date, thatch_date, tmiddisp_date, tsettle_date).
- Converted all six date fields from "%m/%d/%Y" format to ISO 8601 "%Y-%m-%d" format
- Set data types for all columns in the sample metadata resource: numeric fields (Age_year, Lat, Lon, etc.), integer fields (CLNo, Cohort_year, GTseq_On_Target_Reads, etc.), string fields (CapLabel, GTseq_Sample, Region, etc.), and date fields with "%Y-%m-%d" format
- Fixed longitude values in the sample metadata resource by prepending a negative sign, as these decimal degree locations should be negative (West)
- Output two CSV files: "995189_v1_kw_gtseq_allinds_samplemeta.csv" and supplemental table "kw_gtseq_genotypes.csv"


[ table of contents | back to top ]

Related Publications

Campbell, N. R., Harmon, S. A., & Narum, S. R. (2014). Genotyping‐in‐Thousands by sequencing (GT‐seq): A cost effective SNP genotyping method based on custom amplicon sequencing. Molecular Ecology Resources, 15(4), 855–867. Portico. https://doi.org/10.1111/1755-0998.12357
Methods
Daniels, B. N., Nurge, J., Sleeper, O., Lee, A., López, C., Christie, M. R., Toonen, R. J., White, C., & Davidson, J. M. (2023). Genomic DNA extraction optimization and validation for genome sequencing using the marine gastropod Kellet’s whelk. PeerJ, 11, e16510. Portico. https://doi.org/10.7717/peerj.16510
Methods
Lee, A., Daniels, B. N., Hemstrom, W., López, C., Kagaya, Y., Kihara, D., Davidson, J. M., Toonen, R. J., White, C., & Christie, M. R. (2024). Genetic adaptation despite high gene flow in a range‐expanding population. Molecular Ecology. Portico. https://doi.org/10.1111/mec.17511
Results
White, C., Tett, P., Kushner, D. J., Beas, R., Zacherl, D., Lonhart, S. I., Lorda, J., Roy, S., Toonen, R. J., Christie, M., Daniels, B. N., Lee, A., & Lopez, C. (2025). Cohort tracking using size‐frequency population survey data to estimate individual growth. Ecosphere, 16(10). Portico. https://doi.org/10.1002/ecs2.70436
Methods

[ table of contents | back to top ]

Related Datasets

IsRelatedTo
White, C. (2025) Kelletia kelletii size-frequency population survey data collected by scientific SCUBA divers at 36 kelp forest habitat sites across the species’ biogeographic range in 2015, 2016 and 2017. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2025-03-12 doi:10.26008/1912/bco-dmo.955710.1 [view at BCO-DMO]
Relationship Description: Related datasets from the same study
White, C., Christie, M., Toonen, R. J. (2022) Wild adult and recruit Kelletia kelletii samples from 2015 to 2017 (KW connectivity project). Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2022-05-17 doi:10.26008/1912/bco-dmo.874458.1 [view at BCO-DMO]
Relationship Description: Related datasets from the same study
White, C., Christie, M., Toonen, R. J., López, C., Lee, A., Davidson, J., Daniels, B., Evan, F. (2025) Restriction site-associated DNA sequence metadata of Kelletia kelletii collected in California, USA and Baja, Mexico in 2015 to 2017. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2025-04-09 doi:10.26008/1912/bco-dmo.958359.1 [view at BCO-DMO]
Relationship Description: Related datasets from the same study
White, C., Toonen, R. J., Christie, M., Davidson, J., Anderson, P., Daniels, B., Lee, A., López, C. (2024) Full genome and transcriptome sequence assembly of the non-model organism Kellet’s whelk, Kelletia kelletii. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 1) Version Date 2024-12-04 doi:10.26008/1912/bco-dmo.945292.1 [view at BCO-DMO]
Relationship Description: Related datasets from the same study

[ table of contents | back to top ]

Parameters

ParameterDescriptionUnits
CLNo

DNA extraction cap label number

unitless
CapLabel

DNA extraction cap label

unitless
SiteDescription

Name of Kelletia kelletii tissue sample collection site

unitless
SiteCode

Code name of Kelletia kelletii tissue sample collection site

unitless
Lat

Latitude of Kelletia kelletii tissue sample collection site, North is positive

decimal degrees
Lon

Longitude of Kelletia kelletii tissue sample collection site, West is negative

decimal degrees
SiteCodeLetter_cap

DNA extraction cap label letter

unitless
Whelk_ID

Unique sample ID

unitless
TissueType

Sample tissue type (adult or recruit)

unitless
Year

Year of Kelletia kelletii tissue sample collection

unitless
ExtractionCap

DNA extraction cap label number

unitless
DateOfExtraction

Date DNA extraction was performed

unitless
PerformedBy

Researcher number who performed DNA extraction

unitless
OriginalCalculatedConcentration

DNA extraction concentration (ng/ul) if measured

ug/ul or ng/ul (conflicting units description)
Elution_TEBuffer

TE elution buffer used in DNA extraction protocol

microliters (uL)
DNAYield_ug

DNA extraction yield (ug), if measured

micrograms (ug)
Measurement_mm

Kelletia kelletii maximum shell length (mm), if measured (recruits only)

millimeters (mm)
GTseq_Sample

Unique sample ID used by GTSeek for sequencing (same meaning in GTseq_Sample in supplemental kw_gtseq_genotypes.csv)

unitless
GTseq_Raw_Reads

Raw number of reads sequenced

unitless
GTseq_On_Target_Reads

Number of reads containing in-silico probe sequences

unitless
GTseq_Percent_On_Target

Percentage of raw reads containing in-silico probe sequences.

percent (%)
GTseq_Percent_GT

Percentage of loci that was genotyped

percent (%)
GTseq_IFI

Individual fuzziness index of each sample. This is a measure of DNA cross-contamination and is calculated using read counts from the background signal at homozygous and No-Call loci. Low scores are better than high scores

unitless
GTseq_Sample_reprep

Unique sample ID for resequenced individuals

unitless
GTseq_Raw_Reads_reprep

Raw number of reads sequenced for resequenced individuals

count
GTseq_On_Target_Reads_reprep

Number of reads containing in-silico probe sequences for resequenced individuals

count
GTseq_Percent_On_Target_reprep

Percentage of raw reads containing in-silico probe sequences for resequenced individuals

percent (%)
GTseq_Percent_GT_reprep

Percentage of loci that was genotyped for resequenced individuals

percent (%)
GTseq_IFI_reprep

Individual fuzziness index of each sample for resequenced individuals. This is a measure of DNA cross-contamination and is calculated using read counts from the background signal at homozygous and No-Call loci. Low scores are better than high scores

unitless
GTseq_Run_reprep

Whether or not a sample was re-run, 0 = no, 2 = yes.

unitless
PoolRADseq

Flag if sample used in Pooled RADseq (1=yes, 0=no)

unitless
Recruit_label

CLNo for just recruits

unitless
SL_mm

Recruit shell length

millimeters (mm)
Age_year

Recruit age calculated using growth function from White et al. (2025)

years
SurveyDate

Date sample was collected in the wild

unitless
tfert_date

Date recruit was estimated to have been fertilized

unitless
thatch_date

Date recruit was estimated to have hatched from its egg capsule

unitless
tmiddisp_date

Date recruit was estimated to be midpoint in its dispersal period

unitless
tsettle_date

Date recruit was estimated to have settled post-dispersal

unitless
Cohort_year

Year recruit was estimated to be midpoint in its dispersal period

unitless
Region

Region recruit was collected (species' historical or expanded range region)

unitless
Recruit_type

Categorical age group of recruits

unitless
Lee_etal

Flag if sample used in GTseq (included=yes, excluded=no)

unitless


[ table of contents | back to top ]

Instruments

Dataset-specific Instrument Name
Generic Instrument Name
Self-Contained Underwater Breathing Apparatus
Dataset-specific Description
Tissue samples collected via SCUBA and small vessel
Generic Instrument Description
The self-contained underwater breathing apparatus or scuba diving system is the result of technological developments and innovations that began almost 300 years ago. Scuba diving is the most extensively used system for breathing underwater by recreational divers throughout the world and in various forms is also widely used to perform underwater work for military, scientific, and commercial purposes. Reference: https://oceanexplorer.noaa.gov/technology/technical/technical.html


[ table of contents | back to top ]

Project Information

Collaborative Research: RUI: Combined spatial and temporal analyses of population connectivity during a northern range expansion (KW connectivity)

Coverage: California, USA and Baja, Mexico coast


NSF Award Abstract:
Where do young marine fish and shellfish come from? This project aims to improve our understanding of how coastal marine populations are connected in space and time. Coastal populations are replenished through the arrival of minuscule larvae that have been dispersed for weeks to months in the open ocean after spawning at remote sites. The combination of the long dispersal period of marine fish and shellfish larvae and the varying ocean currents results in complex patterns of "connectivity" among populations near and far. Identifying these patterns of connectivity is fundamental to marine science and critical for effective fisheries management and conservation, yet it remains an unresolved component of marine ecology. The study species is currently expanding its biogeographic range up the U.S. west coast. By genetically analyzing individuals from across the species' range, including offspring spawned in the laboratory by experimentally-crossed individuals collected in the field from throughout the species historical and expanded range, certain genes can serve to differentiate populations along the coast. The team leverages the statistical power of these geographically-informative genes to assign thousands of young collected in the field to the source populations that spawned them (across the species' range and over multiple years). The team then quantifies patterns of connectivity over multiple years, and tests fundamental hypotheses on the spatial scale, temporal variability, biogeographic patterns, and biophysical drivers of population connectivity. The project trains approximately two dozen U.S. university students in molecular ecology and marine science, as well as creating intellectual linkages among Ph.D.-granting and non-Ph.D.-granting universities. The project also supports further development of a K-12 education program that uses SCUBA diving and videography to teach elementary school students Next Generation Science Standards and train them for careers in science, technology, engineering and mathematics.

Using a kelp forest gastropod and fisheries species (Kellet's whelk, Kelletia kelletii), this project combines genome-wide Restriction site Associated DNA (RAD) loci with transcriptomic loci identified from common-garden laboratory crosses of individuals from the species' historical and expanded range to identify geographically-informative loci that maximize power for individual assignment testing. Leveraging the combined power of these loci, genetic assignment of approximately three thousand recruit samples to 20 putative source populations allows the team to construct three independent years of connectivity matrices and test some of the most fundamental questions in marine ecology, including: 1) Are marine populations open or closed and at what scales? 2) To what degree is the evolutionary pattern of gene flow represented by single versus multiple generations of connectivity events? And, 3) How spatially heterogeneous and temporally variable is population connectivity? Can one year of connectivity data predict anything about the next? Additionally, by focusing on a range-expanding species with common life history traits, the team addresses a number of questions with broad applicability and significant ecological and societal implications: 4) How much is population connectivity influenced by post-recruitment demographic and evolutionary processes? 5) How well-connected are historic- and expanded-range populations? And, of particular relevance to climate change, 6) Are El Nino oceanographic conditions, which are predicted to increase in frequency and intensity this century, driving the poleward range expansion of this coastal marine species? By coupling common-garden experimental crosses to identify maximally-informative transcriptomic loci with genomic RAD analysis of field samples, this project aims to accurately and precisely quantify marine population connectivity in high gene flow species with large population sizes.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.



[ table of contents | back to top ]

Funding

Funding SourceAward
NSF Division of Ocean Sciences (NSF OCE)
NSF Division of Ocean Sciences (NSF OCE)
NSF Division of Ocean Sciences (NSF OCE)

[ table of contents | back to top ]