SEQUENCE DATA FILES
Illumina HiSeq reads are available NCBI Sequence Read Archive (SRA). Libraries were prepared following the ezRAD protocol (Toonen et al. 2013). Sequences from Illumina HiSeq 2500, with quality trimming and adaptor removal using TrimGalore (as follows).
#ADAPTERS
#Illumina TruSeq HT dual-indexed Adapters (96 barcode combinations)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG #Read1 w/ 8 digit wildcard i7 #barcode
GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTNNNNNNNNGTGTAGATCTCGGTGGTCGCCGTATCATT #Read2 w/ wildcard i5 barcode (reverse complemented)
#TrimGalore Command
#first make directory for cleaned files
mkdir cleaned_for_stacks
##FOR R1 loop for trim_galore##
declare -a TEST=(site09_12 site09_15 site09_18 site09_21 site09_24 site09_13 site09_16 site09_19 site09_22 site09_14 site09_17 site09_20 site09_23)
for i in "${TEST[@]}"; do perl ~/ddocent/trim_galore —-phred33 —-dont_gzip -a gatcggaagagcacacgtctgaactccagtcacnnnnnnnnatctcgtatgccgtcttctgcttg --stringency 5 -e 0.1 -r1 100 --output_dir ./cleaned_for_stacks $i.R1.fq; done
##FOR R2 loop for trim_galore##
declare -a TEST=(site09_12 site09_15 site09_18 site09_21 site09_24 site09_13 site09_16 site09_19 site09_22 site09_14 site09_17 site09_20 site09_23)
for i in "${TEST[@]}"; do perl ~/ddocent/trim_galore —-phred33 —-dont_gzip -a gatcggaagagcgtcgtgtagggaaagagtgtnnnnnnnngtgtagatctcggtggtcgccgtatcatt --stringency 5 -e 0.1 -r1 100 --output_dir ./cleaned_for_stacks $i.R2.fq; done
Contact: Erica Goetze for any questions, or for subsequent use of these data.
BCO-DMO Processing Notes:
added conventional header with dataset name, PI name, version date
modified parameter names to conform with BCO-DMO naming conventions
combined SRA metadata with collection information
converted latitude and longitude to decimal degrees
added links to NCBI GenBank BioProject and BioSample pages