Annotated genome of Nucella lapillus (Atlantic dog whelk) collected from Nahant, MA in May 2024

Website: https://www.bco-dmo.org/dataset/991343
Data Type: experimental
Version: 1
Version Date: 2026-03-23

Project
» Local adaptation and the evolution of plasticity under predator invasion and warming seas: consequences for individuals, populations and communities (evolution of plasticity)
ContributorsAffiliationRole
Trussell, Geoffrey C.Northeastern UniversityPrincipal Investigator
Vollmer, Steven V.Northeastern UniversityScientist
Ford, MeghanNortheastern UniversityStudent
Gerlach, Dana StuartWoods Hole Oceanographic Institution (WHOI BCO-DMO)BCO-DMO Data Manager

Abstract
Nucella lapillus has been a focal organism of ecological and evolutionary studies on rocky shores for decades. As a direct developer, this species has limited dispersal but its broad geographic range makes it ideal for studies of population structure, isolation by distance, and selection across different environmental gradients. A fully annotated genome was generated using Oxford Nanopore sequencing at ~37x coverage, resulting in a genome assembly of 2.32 Gbp. Genomic resources for mollusks are relatively limited compared to other phyla and this new genome will enhance our understanding of genomic variation in mollusks and studies seeking to link genomic variation with organismal performance and community-level processes under various dimensions of environmental change.


Coverage

Location: Laboratory extraction of whelk specimen from Nahant, MA (42.419732, −70.902171)
Spatial Extent: Lat:42.419732 Lon:-70.902171
Temporal Extent: 2024-05 - 2024-05

Dataset Description

Nucella lapillus is an important player in rocky shore food chains and has been a focal organism of ecological and evolutionary studies for decades. Despite poor dispersal, they have a broad geographic range, which makes them an ideal species to examine isolation by distance and selection across environmental gradients.  A fully annotated genome of N. lapillus generated with Oxford Nanopore Technology (ONT) sequencing at ∼37× coverage was described in detail in the following publication:

Ford, M. R., Vollmer, S. V., & Trussell, G. C. (2025). Annotated genome of the Atlantic dog whelk, Nucella lapillus. G3: Genes, Genomes, Genetics, 15(10). https://doi.org/10.1093/g3journal/jkaf182

This dataset provides information on the study and includes links to the genetic dataset at the National Center for Biotechnology Information (NCBI) BioProject PRJNA1238877.


Methods & Sampling

Sample collection and DNA extraction
An adult Nucella lapillus individual was collected in May 2024 from Nahant, MA (42.419732, −70.902171). The foot tissue was used for DNA extraction and isolation. 

High molecular weight (HMW) DNA was extracted from the foot via the CTAB method (1.4 M NaCl and 2% CTAB), followed by three chloroform reactions. The complete extraction protocol is available on the project GitHub repository (https://github.com/meghanclownfish/Nucella-lapillus-genome/tree/main/1_extraction). The HMW DNA was precipitated in ethanol (EtOH) and resuspended in Tris-EDTA buffer. The sample was further purified using a Genomic DNA Clean and Concentrator kit (gDCC-10, ZYMO Research, Irvine, CA, USA) per manufacturer’s instructions. Sample quality and concentration were assessed by running 2 μL of the sample on NanoDrop (Thermo Fisher Scientific, Singapore). A sample was deemed ready for sequencing if the 260/280 ratio was ∼1.9 and the 260/230 ratio fell between 2.0 and 2.2, following Sun et al. 2020.

Library prep and sequencing
N. lapillus long reads were sequenced using ONT platforms and libraries were prepared with the ONT Ligation sequencing kit (SQK-LSK114, ONT, Oxford, UK) and NEBNext Companion Module (E7180S NEB). Standard manufacturer's protocol was implemented with a few exceptions (see Ford et al., 2025 for details). Sequencing was done on a PromethION. Six flow cells in total were used to generate 103,553,219,099 bp raw data. PromethION flow cells (FLO-PRO114M) were primed and loaded per the standard manufacturer's protocol. To increase the yield of each flow cell, runs were paused and flushed using the EXP-WSH004 (ONT) kit and reloaded. High quality base calling was performed with Dorado 0.7.1 (ONT).

Genome size and heterozygosity
Genome size was estimated using JELLYFISH v2.2.10 to count canonical 41-mers from high quality ONT reads (min quality: 5) and computed a histogram of k-mer occurrence (Marcais and Kingsford 2011). The histogram was used to estimate heterozygosity with GenomeScope (Ranallo-Benavidez et al. 2020).


Data Processing Description

Assembly and annotation
Briefly, we assembled the genome using all reads of 2kb in length or greater with  Hifiasm (0.25.0-r726).  BlobTools2 (v4.4.0, Challis et al. 2020) was used to visually assess the assembly and filter contigs. RepeatModeler (v2.0.6, Flynn et al. 2020) and RepeatMasker (Smit et al.)  identified and soft-masked repetitive regions in the genome. RNASeq data was mapped to the soft-masked genome with HISAT2. This information, along with a custom protein database, was supplied as evidence for Braker3. TSEBRA was used to merge Braker outputs. Functional annotation was carried out with InterProScan and Funannotate. For full details, please see Ford et al. (2025) paper.


BCO-DMO Processing Description

- Loaded source files "ncbi_dataset.tsv" (annotated genome) and "Nucleotide_Accessions_2525.xlsx" that had been downloaded from NCBI BioProject PRJNA1238877
- Added fields for Latitude, Longitude, Habitat, Tissue_type, Sampling_Date, and Specimen_type to both tables
- Added fields/columns for taxon_IDs, isolate, species, and GenBank_assembly for the annotated genome table
- Renamed fields/column headings to conform with BCO-DMO naming conventions and improve interoperability.
- Exported two CSV files: 991343_v1_nucella_lapillus_annotated_genome.csv and 991343_v1_nucella_lapillus_nucleotides.csv


[ table of contents | back to top ]

Data Files

File
991343_v1_nucella_lapillus_annotated_genome.csv
(Comma Separated Values (.csv), 16.18 MB)
MD5:6305806518415c58584b0226d453c3c7
Annotated genome of Nucella lapillus (NCBI BioProject 1238877); Primary data file for dataset ID 991343, version 1

[ table of contents | back to top ]

Supplemental Files

File
991343_v1_nucella_lapillus_nucleotides.csv
(Comma Separated Values (.csv), 517.09 KB)
MD5:3a9ed310d7d37fe0bf056f4014d63b0e
Nucleotide accessions for Nucella lapillus (NCBI BioProject PRJNA1238877)

Column_name,Description,Units,Term_match,Type
BioProject,NCBI BioProject,unitless,BioProject,String
Sampling_Date,Date of sampling for whelk specimen,unitless,date,String
Latitude,Latitude of sampling,decimal degrees,lat,Float
Longitude,Longitude of sampling,decimal degrees,lon,Float
Habitat,Site description of specimen location,unitless,site_descrip,String
Specimen_type,Specimen description with life stage and environment,unitless,sample_descrip,String
Tissue_type,Tissue used for analyses,unitless,sample_descrip,String
Entry_num,NCBI Nucleotide entry number,unitless,exp_id,Integer
Species,Species,unitless,species,String
Isolate,The specific biological sample or individual organism from which the genome was obtained.,unitless,sample_descrip,String
Contig_ID,Continuous sequence identifier ,unitless,sequence,String
Sequence_type,Sequence type,unitless,brief_desc,String
Length_base_pairs,Number of base pairs,base pairs,length,Integer
DNA_type,DNA type for the analyses,unitless,brief_desc,String
Accession,GenBank whole genome shotgun accession,unitless,GenBank_accession,String
GI_num,GenInfo identifier,unitless,accession_number,Integer

[ table of contents | back to top ]

Related Publications

Challis, R., Richards, E., Rajan, J., Cochrane, G., & Blaxter, M. (2020). BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3 Genes|Genomes|Genetics, 10(4), 1361–1374. https://doi.org/10.1534/g3.119.400908
Software
Cheng, H. (2025). Efficient near telomere-to-telomere assembly of Nanopore Simplex reads (hifiasm-0.25.0-r726). Zenodo. https://doi.org/10.5281/zenodo.18079612
Software
Flynn, J. M., Hubley, R., Goubert, C., Rosen, J., Clark, A. G., Feschotte, C., & Smit, A. F. (2020). RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences, 117(17), 9451–9457. https://doi.org/10.1073/pnas.1921046117
Software
Ford, M. R. (2025). Nucella-lapillus-genome [Software]. GitHub. https://github.com/meghanclownfish/Nucella-lapillus-genome
Methods
,
Results
Ford, M. R., Vollmer, S. V., & Trussell, G. C. (2025). Annotated genome of the Atlantic dog whelk, Nucella lapillus. G3: Genes, Genomes, Genetics, 15(10). https://doi.org/10.1093/g3journal/jkaf182
Results
Gabriel, L., Brůna, T., Hoff, K. J., Ebel, M., Lomsadze, A., Borodovsky, M., & Stanke, M. (2023). BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. https://doi.org/10.1101/2023.06.10.544449
Software
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M., & Stanke, M. (2021). TSEBRA: transcript selector for BRAKER. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859-021-04482-0
Software
Jones, P., Binns, D., Chang, H.-Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn, A.F., Sangrador-Vegas, A., Scheremetijew, M., Yong, S-Y., Lopez, R., and Hunter, S. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics, 30(9), 1236–1240. doi:10.1093/bioinformatics/btu031
Software
Kim, D., Paggi, J. M., Park, C., Bennett, C., & Salzberg, S. L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology, 37(8), 907–915. https://doi.org/10.1038/s41587-019-0201-4
Software
Marçais, G., & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6), 764–770. https://doi.org/10.1093/bioinformatics/btr011
Methods
Oxford Nanopore Technologies. (2025). Dorado (Version 0.7.1) [Computer software]. https://github.com/nanoporetech/dorado.
Software
Palmer, J. M., & Stajich, J. (2020). Funannotate v1.8.1: Eukaryotic genome annotation (Version v1.8.1) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.1134477 https://doi.org/10.5281/zenodo.1134477
Software
Ranallo-Benavidez, T. R., Jaron, K. S., & Schatz, M. C. (2020). GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-14998-3
Software
Smit AFA, Hubley R, and Green P. “RepeatMasker-Open 4.0.” 2013-2015. http://www.repeatmasker.org/
Software
Sun, J., Chen, C., Miyamoto, N., Li, R., Sigwart, J. D., Xu, T., Sun, Y., Wong, W. C., Ip, J. C. H., Zhang, W., Lan, Y., Bissessur, D., Watsuji, T., Watanabe, H. K., Takaki, Y., Ikeo, K., Fujii, N., Yoshitake, K., Qiu, J.-W., … Qian, P.-Y. (2020). The Scaly-foot Snail genome and implications for the origins of biomineralised armour. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-15522-3
Methods

[ table of contents | back to top ]

Related Datasets

Results
Northeastern University. Nucella lapillus genome sequencing and assembly. 2025/03. In: BioProject [Internet]. Bethesda, MD: National Library of Medicine (US), National Center for Biotechnology Information; 2011-. Available from: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA1238877. NCBI:BioProject: PRJNA1238877. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1238877/

[ table of contents | back to top ]

Parameters

ParameterDescriptionUnits
BioProject

NCBI BioProject

unitless
Sampling_Date

Date of sampling for whelk specimen

unitless
Latitude

Latitude of sampling

decimal degrees
Longitude

Longitude of sampling

decimal degrees
Habitat

Site description of specimen location

unitless
Specimen_type

Specimen description with life stage and environment

unitless
Species

Species

unitless
Taxon_IDs

NCBI and Aphia ID taxon identifiers

unitless
Tissue_type

Tissue used for analyses

unitless
GenBank_Assembly

The GenBank assembly accession identifier corresponding to the genome assembly from which the annotation was derived

unitless
Isolate

The specific biological sample or individual organism from which the genome was obtained.

unitless
Accession

A unique identifier generated by NCBI for each contig (continuous sequence) in the genome

unitless
Begin

The starting genomic coordinate of the annotated feature on the sequence

base pair (bp)
End

The ending genomic coordinate of the annotated feature on the sequence.

base pair (bp)
Orientation

The strand on which the feature is encoded (plus = forward, minus = reverse)

unitless
Gene_ID

A unique identifier assigned to the gene

unitless
Gene_Type

Classification of the gene

unitless
Protein_accession

Accession number for the translated protein product(s) associated with the gene

unitless
Protein_length

Length of the predicted protein sequence

amino acids (aa)
Locus_tag

A systematic, unique identifier assigned to each gene within the genome annotation

unitless


[ table of contents | back to top ]

Instruments

Dataset-specific Instrument Name
PromethION (Oxford Nanopore Technologies)
Generic Instrument Name
Oxford Nanopore Technologies PromethION sequencer
Dataset-specific Description
Sequencing was done on a PromethION
Generic Instrument Description
A DNA sequencer that is manufactured by the Oxford Nanopore Technologies corporation, capable of running up to 48 flow cells and producing up to 7.6 Tb of data per run. The sequencer produces real-time results and utilizes Nanopore technology, with each flow cell allowing up to 3,000 nanopores to be sequencing simultaneously. The ONT PromethION is a high-throughput, high-sample number benchtop sequencing machine developed by Oxford Nanopore Technologies.    

Dataset-specific Instrument Name
NanoDrop spectrophotometer (ThermoFisher Scientific, Singapore)
Generic Instrument Name
Thermo Scientific NanoDrop spectrophotometer
Dataset-specific Description
Sample quality and concentration were assessed by running 2 μL of the sample on NanoDrop (Thermo Fisher Scientific, Singapore).
Generic Instrument Description
Thermo Scientific NanoDrop spectrophotometers provide microvolume quantification and purity assessments of DNA, RNA, and protein samples. NanoDrop spectrophotometers work on the principle of ultraviolet-visible spectrum (UV-Vis) absorbance. The range consists of the NanoDrop One/OneC UV-Vis Spectrophotometers, NanoDrop Eight UV-Vis Spectrophotometer and NanoDrop Lite Plus UV Spectrophotometer.


[ table of contents | back to top ]

Project Information

Local adaptation and the evolution of plasticity under predator invasion and warming seas: consequences for individuals, populations and communities (evolution of plasticity)


NSF Award Abstract:
Over the past two decades, the Gulf of Maine has experienced unprecedented warming that, among other things, has further enabled the invasive green crab to expand its range in rocky shore habitats. The adverse ecological impacts of this invasive predator have been documented worldwide. This study examines how geographic variation in the capacity of two common prey species to respond to the combination of this predator and warming ocean temperatures can shape prey feeding and performance and impact community structure and dynamics. Hence, this research enhances understanding of the evolution of phenotypes, their plasticity, and the nature of adaptation and its role in eco-evolutionary dynamics. More broadly, it informs understanding of how organisms and marine communities may respond to future environmental change. In addition, this project makes contributions to the STEM pipeline by providing middle and high school, undergraduate, and graduate students with cross-disciplinary training in evolutionary and community ecology. In collaboration with an institutional outreach program, the investigator is also developing web-based multimedia projects and teacher resource materials based on this research.

A central principle in ecology is that species residing in the middle of food chains must balance the benefits of eating with the risk of being eaten by their predators. Solving this foraging-predation risk trade-off often involves plasticity in prey traits with consequences for the evolution of adaptation and species interactions that drive community-level processes. Hence, the foraging-predation risk trade-off provides a powerful conceptual framework that links evolutionary and community ecology. Yet at the same time, other environmental stressors like temperature can shape this trade-off, adding complexity that makes it difficult to predict the capacity of organisms to adapt to environmental change and the consequences for communities. The investigator is conducting this study in rocky shore habitats of the Gulf of Maine (GOM) which have long been influenced by strong latitudinal temperature gradients and non-native species invasions. The overarching hypothesis is that predation risk and temperature are factors shaping geographic variation in plasticity and adaptation, with consequences for individuals, populations, and communities. First, the investigator is conducting field experiments to document geographic variation in the trait plasticity of two common prey species in the green crab's diet. Second, he is using reciprocal transplant experiments to examine trait plasticity in response to risk and water temperature, generating data to compare with similar experiments conducted in the late 90s prior to recent ocean warming and expansion in range of green crabs. Third, he is conducting a laboratory common garden experiment to evaluate the effects of risk and water temperature on trait plasticity. Finally, he is using reciprocal transplant experiments in the field to understand the interactive effects of risk and water temperature on prey foraging rates and the abundance of a species that plays an important role in intertidal community structure and dynamics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.



[ table of contents | back to top ]

Funding

Funding SourceAward
NSF Division of Ocean Sciences (NSF OCE)

[ table of contents | back to top ]