Diversity and distribution of sediment bacteria across an ecological and trophic gradient from 2018 to 2019 (Cyanos Great Lakes project)

Website: https://www.bco-dmo.org/dataset/986587

Version: 1

Version Date: 2025-10-13

Project

» Collaborative Research: Cyanobacteria, Nitrogen Cycling, and Export Production in the Laurentian Great Lakes (Cyanos Great Lakes)

Contributors	Affiliation	Role
Hamilton, Trinity	University of Minnesota (UMN)	Principal Investigator
Sauer, Hailey	University of Minnesota (UMN)	Principal Investigator

Abstract

The microbial communities of lake sediments have the potential to serve as valuable bioindicators and integrators of watershed land use and water quality; however, the relative sensitivity of these communities to physicochemical and geographical parameters must be demonstrated at taxonomic resolutions that are feasible with current sequencing and bioinformatic approaches. The geologically diverse and lake-rich state of Minnesota (USA) is uniquely suited to address this potential because of its variability in ecological region, lake type, and watershed land use. In this study, we selected twenty lakes with varying physicochemical properties across four ecological regions of Minnesota. Our objectives were to (i) evaluate the diversity and composition of the bacterial community at the sediment–water interface and (ii) determine how lake location and watershed land use impact aqueous chemistry and influence bacterial community structure. Our 16S rRNA amplicon data from lake sediment cores at two depth intervals indicate that sediment communities are more likely to cluster by ecological region rather than by any individual lake properties (e.g., trophic status, total phosphorus concentration, lake depth). However, composition is tied to a given lake, wherein samples from the same core were more alike than samples collected at similar depths across lakes. Our results illustrate the diversity within lake sediment microbial communities and provide insight into relationships between taxonomy, physicochemical, and geographic properties of north temperate lakes. All relevant data are in Sauer et al., 2022. All 16S rRNA amplicon data are available from the SRA database at BioProject PRJNA763898.

Coverage
Dataset Description
Parameters
Project Information
Funding

Coverage

Location: Minnesota Lakes, United States

Spatial Extent: N:47.98746 E:96.339307 S:43.90347 W:90.16812

Temporal Extent: 2018-07-31 - 2019-07-25

Methods & Sampling

Site description

For this study, we selected twenty lakes within Minnesota’s Sentinel Lakes in a Changing Environment (SLICE) program. SLICE is a collaborative research initiative that provides long-term data on a representative sub-sampling of Minnesota’s lakes spanning the diverse geographic, land-use, and climatic gradients present in Minnesota (Fig. 1 in Sauer et al., 2022). The lakes span four of the seven Environmental Protection Agency/Commission for Environmental Cooperation (Level III) ecological regions. These regions are characterized by differences in underlying geology, soils, vegetation, and land use (Table S1 in Sauer et al., 2022). This is the first comprehensive sediment bacterial survey of these lakes.

Water Sample Collection & Analysis

At each site, we collected water profile measurements of temperature, pH, conductivity, turbidity, and dissolved oxygen using a YSI EXO2 multi-parameter sonde (YSI, Inc.). We also collected an integrated epilimnetic water sample (0–2 m) and a hypolimnetic water sample (maximum lake depth – 1 m) when thermal stratification was present. All samples were stored on ice in the field and at either 4°C or −20°C in the laboratory, depending on methodology, until processed.

Samples for soluble reactive phosphorus (SRP), dissolved organic carbon (DOC), and dissolved inorganic carbon (DIC) were filtered, processed, and analyzed within 36 hours of sampling using standard methods for SRP (4500-P) on a SmartChem 170 (Unity Scientific, Inc.) and for DIC/DOC (Method 5310-C) using a Torch Combustion TOC Analyzer (Teledyne Tekmar, Inc.) (American Public Health Association, 2012). Samples for total nitrogen (TN) and total phosphorus (TP) were frozen and analyzed using standard methods for TN (4500-N) and TP (4500-P). Samples for ammonia (NH₃) and nitrate (NO₃) were filtered and frozen prior to analysis following methods 4500-NH₃ and 4500-NO₃. All TP, TN, NH₃, and NO₃ samples were analyzed within six months of sampling on a SmartChem 170 (Unity Scientific, Inc.) discrete analyzer (APHA, 2012).

Additionally, samples for chlorophyll-a were filtered, frozen, and analyzed via fluorometry following EPA Method 445.0 (Arar et al., 1997). A complete summary of aqueous chemistry results, including sampling dates, is provided in Table S2 (Sauer et al., 2022).

Sediment Sample Collection & DNA Isolation

Sediment cores were collected from July 2018 through June 2019 using a rod-driven piston corer with a 7 cm diameter polycarbonate tube (Wright, 1997). Coring locations (i.e., flat areas near the deepest basin) were determined using publicly available bathymetric maps (https://www.dnr.state.mn.us/lakefind/index.html), while avoiding steep-sided “holes” where sediment focusing may be high. Following retrieval, core tops were stabilized in the field using a gelling agent (e.g., Zorbitrol), and intact cores were returned to the laboratory, where they were stored vertically at 4°C for no more than seven days prior to processing. In cases where the upper sediments were extremely flocculent, the uppermost sections (~0–30 cm) were immediately sectioned in the field to prevent mixing during transport.

Cores were vertically extruded in the laboratory at 1–2 cm intervals, depending on lake productivity, and subsamples from two intervals were collected for DNA analysis. Subsamples were collected from the 0–2 cm interval (hereafter referred to as shallow) and from either the 3–4 cm or 4–6 cm interval (hereafter referred to as deep). Subsamples were frozen under nitrogen for up to three months prior to DNA extraction (Table S3 in Sauer et al., 2022).

DNA was extracted from 0.25 g of wet sediment from each subsample using a PowerSoil DNA Isolation Kit (Qiagen, Inc.) following the manufacturer’s protocols. Negative controls were performed by carrying out extractions on blanks containing only reagents and no sample. Final bulk DNA concentrations were determined using a Qubit™ dsDNA HS Assay Kit (Molecular Probes, Eugene, OR, USA) and a Qubit™ Fluorometer (Invitrogen, Carlsbad, CA, USA). The detection limit of the Qubit™ dsDNA HS Assay Kit is 10 pg μL⁻¹. All samples that yielded detectable amounts of DNA were submitted for sequencing (Table S3 in Sauer et al., 2022). Although DNA was not detected in negative controls, these samples were submitted for sequencing; they failed quality control performed by the University of Minnesota Genomics Center (UMGC), and no sequencing data were obtained.

Nucleic acid preparation, amplification, and sequencing

DNA samples were submitted to the University of Minnesota Genomics Center (UMGC), where library preparation for Illumina high-throughput sequencing was performed using a Nextera XT workflow with 2 × 300 bp chemistry. This workflow utilizes transposome-based shearing, which fragments DNA and adds adapter sequences in a single step. DNA was amplified and dual-indexed with adapter sequences through PCR using primers 515F (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA-3′) and 806R (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT-3′), targeting the V4 hypervariable region of the bacterial 16S SSU rRNA gene.

The amplicon library preparation methods developed and employed by the UMGC have been shown to be more quantitatively accurate and qualitatively complete, detecting taxonomic groups that often go undetected with existing methods (Gohl et al., 2016). Indexed samples were sequenced once using an Illumina MiSeq at the UMGC. A total of 3.29 million (3,290,170) raw reads were obtained from 40 samples.

Temporal bounds within the dataset

The date range associated with this dataset represents the sediment core collection dates.

Data Availability in SRA

All relevant data are reported in the results paper, Sauer et al. (2022). All 16S rRNA amplicon data are available from the Sequence Read Archive (SRA) under BioProject accession PRJNA763898.

Data Processing Description

Following sequencing and initial quality control performed by the University of Minnesota Genomics Center, we conducted all downstream sequence processing and analyses using established bioinformatic and statistical workflows.

We conducted post-sequence processing in Mothur (v1.43.0) following the MiSeq SOP (Schloss et al., 2009; Kozich et al., 2013). Briefly, we merged forward and reverse reads and screened, trimmed, and removed ambiguous bases. We aligned reads to references in the SILVA database (v.132) and identified and removed chimeras using vsearch (v2.13.3) (Quast et al., 2013; Edgar et al., 2011). Finally, given the nature of the study (i.e., broad-scale patterns of diversity), we classified sequences as operational taxonomic units (OTUs) using a 97% similarity threshold and assigned taxonomy using the SILVA database (Stackebrand et al., 1994; Glassman et al., 2018).

Unless otherwise stated, all statistical analyses were conducted in R (v4.0.0) (R Core Team, 2018; Wickham et al., 2019). Both environmental and community data were loaded into R using Phyloseq (v1.32.0) (McMurdie et al., 2013), and reads classified as mitochondrial or chloroplast were removed. The final dataset after all post-processing contained 2,181,132 reads assigned to 53,854 taxa across 40 samples (two sediment depths per lake).

Alpha diversity

All singletons (OTUs observed only once across all 40 samples) were removed prior to calculating alpha diversity statistics. Given the observed correlation between richness and sample read depth across sequencing batches (Fig. S1 in Sauer et al., 2022), the data were rarefied to 90% of the read depth of the lowest-depth samples (15,771 reads; Fig. S2 and Table S3 in Sauer et al., 2022). The final dataset used for alpha diversity analyses included 630,840 read counts representing 25,563 taxa across 40 samples.

Alpha diversity metrics were calculated using the Phyloseq package in R (Fig. S3 and Table S4 in Sauer et al., 2022) (McMurdie et al., 2013). Richness (observed number of OTUs) and evenness (Shannon index) were compared between sediment depths (shallow n = 20, deep n = 19) using a Wilcoxon test, and among trophic status categories (hypereutrophic n = 4, eutrophic n = 16, mesotrophic n = 16, oligotrophic n = 3) and ecological regions (Western Cornbelt Plains n = 12, North Central Hardwood Forests n = 14, Northern Lakes & Forests n = 8, Canadian Shield n = 5) using Kruskal–Wallis tests with Dunn post hoc tests and Bonferroni correction. In all analyses, one outlying sample (Trout, Deep) was excluded due to uncharacteristically low diversity.

The predictive relationships between environmental parameters measured at the time of sampling (Table S2 in Sauer et al., 2022) and alpha diversity metrics were evaluated using multiple regression. The significance and variance explained by each predictor were assessed using the relimpo (v.2.2.3) and vegan (v.2.5–6) packages in R (Groemping et al., 2006; Oksanen et al., 2009). Final models for richness (observed) and evenness (Shannon) were selected based on AIC scores.

Beta diversity

Prior to beta diversity analyses, OTUs were filtered by removing those with fewer than two total counts and occurring in fewer than 10% of samples. Following filtering, the average number of reads per sample was 47,605, with a minimum read depth of 15,150 and a maximum read depth of 99,561. Because OTU count data exhibit strong positive skew, a variance-stabilizing transformation (VST) was applied to reduce heteroscedasticity (Love et al., 2020). Log-like transformations such as VST have been shown to transform count data toward near-normal distributions and produce larger eigengap values, resulting in more consistent correlation estimates that influence downstream analyses (Badri et al., 2020). After filtering and transformation, the final dataset for beta diversity analyses included 5,512 taxa across 40 samples.

Sample dissimilarity was visualized using principal component analysis (PCA) with the ordinate function in Phyloseq (McMurdie et al., 2013). Differences in community composition among ecological regions were assessed using permutational analysis of variance (PERMANOVA) with the adonis function in vegan (Oksanen et al., 2009), based on Bray–Curtis dissimilarity. Dispersion within groups was evaluated using permutation tests with the betadisp and permutest functions in vegan. Prior to calculating the dissimilarity matrix, negative VST values were converted to zero, as these values likely represent zero or near-zero counts and were considered negligible for distance calculations and hypothesis testing. Cluster analysis was performed using Ward’s (D2) method with the same dissimilarity matrix used for PERMANOVA.

BCO-DMO Processing Description

* The primary data file of this dataset has been converted from its original format (.tsv) to csv.
* Within the primary data file (filename: 986587_v1_sediment_bacteria_in_MN_lakes.csv), lat and lon values have been split into two separate columns. Originally, both values were provided in a column named "lat_lon." The published file has both a "lat" column and a "lon" column.
* Unit values (grams represented by "g"s) have been removed from the "sample_size" column so the data within this column can be rendered accurately as numeric values.
* Country, state, lake name and sample depth range values have been parsed from the column "geo_loc_name" into individual columns. The original "geo_loc_name" column has been retained within the data file.

[ table of contents | back to top ]

Parameters

Parameter	Description	Units
accession	NCBI Sequence Read Archive (SRA) run accession identifying a unique sequencing run.	unitless
message	Message from NCBI - should indicate successfully loaded.	unitless
sample_name	Unique name of the environmental or biological sample.	unitless
sample_title	Sample title (this is an optional field within the dataset and may not apply to every sample).	unitless
organism	DNA source - metagenome for environmental sample.	unitless
host	Host organism (if relevant, can be left blank).	unitless
isolation_source	Type of substrate DNA was extracted from (e.g. sediment, soil, tissue).	unitless
collection_date	Date of sample collection.	unitless
geo_loc_name	County, State, Country of sample collection.	unitless
lat	Latitude of sample collection location in decimal degrees; a positive value indicates a northern coordinate.	decimal degrees
lon	Longitude of sample collection location in decimal degrees; a negative value indicates a western coordinate.	decimal degrees
ref_biomaterial	DNA source.	unitless
samp_collect_device	Device used for sample collection (core, pump, etc).	units
samp_size	Size of sample collected.	mL
country	Country of sample collection derived from geo_loc_name.	unitless
state	State of sample collected derived from geo_loc_name.	unitless
lake_name	Name of lake where sample was collected derived from geo_loc_name.	unitless
sample_depth_range_min	Minimum depth range of sediment core derived from geo_loc_name.	cm
sample_depth_range_max	Maximum depth range of sediment core derived from geo_loc_name.	cm

[ table of contents | back to top ]

Project Information

Collaborative Research: Cyanobacteria, Nitrogen Cycling, and Export Production in the Laurentian Great Lakes (Cyanos Great Lakes)

Coverage: Lake Superior and Lake Erie

NSF Award Abstract:
The Great Lakes hold about 20% of the freshwater on Earth and have been increasingly impacted by human activities in recent decades. Lake Erie suffers from large, annually recurring, toxic cyanobacterial blooms in summer, whereas Lake Superior experiences smaller, localized cyanobacterial blooms after storm events. Cyanobacterial blooms have harmful ecological, human health, and economic implications. These blooms are a global phenomenon, observed in lakes and oceans, and can lead to low oxygen conditions and the production of toxins, both of which can be harmful for ecosystems. Understanding how different types of cyanobacteria influence nutrient cycling remains a major knowledge gap. This project aims to provide a deeper understanding of the long-term state of the Great Lakes ecosystem. The research approach combines new and established methods. Project results and implications will be shared with local and regional water interests in partnership with the Pittsburgh Collaboratory for Water Research, Education, and Outreach, the Great Lakes Commission Harmful Algal Blooms Collaborative, and the Lake Erie Area Research Network. Education is a central part of this project and training opportunities target next generation of scientists, including postdoctoral, graduate, and undergraduate students. The students and postdoc will receive state-of-the-art training in the rapidly developing fields of biogeochemistry and geomicrobiology, while working with an interdisciplinary team of scientists.

This study will examine nitrogen cycling, phytoplankton community composition, and the nitrogen isotopic composition of chloropigments in order to evaluate cyanobacterial productivity in the modern Laurentian Great Lakes as well as the historical record of cyanobacterial blooms over the past several hundred years. The nitrogen isotope composition of chloropigments is expected to provide a powerful new proxy for understanding primary productivity and the relative importance of cyanobacteria to export production and nitrogen cycling. This proxy would be valuable not only for management of modern systems but has important implications for increasing our understanding of the role of cyanobacteria throughout Earth history. This project would test this molecular isotopic proxy in contemporary aquatic ecosystems to assess its efficacy for: (1) determining the relative contributions of cyanobacteria vs eukaryotic algae (e.g., diatoms) to primary production; (2) evaluating export production of cyanobacterial productivity (including blooms); and (3) constraining historical cyanobacteria productivity in the sedimentary record. Comparison of a system characterized by eutrophication and seasonal cyanobacterial blooms (Lake Erie) with one characterized by picocyanobacteria productivity, but the near-absence of large-scale cyanobacterial blooms (Lake Superior), will provide information about the range of impacts that cyanobacteria can have on carbon and nitrogen cycling. Further information regarding nitrogen cycling will be derived from analysis of solid and dissolved nitrogen species throughout the annual cycle, as well as seasonal studies of sediment processes to measure associated sediment nitrogen removal rates through different processes.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

[ table of contents | back to top ]

Funding

Funding Source	Award
NSF Division of Ocean Sciences (NSF OCE)	OCE-1948058

[ table of contents | back to top ]