Lake ice is an important resource supporting water quality, local biodiversity, arctic transportation, and regional economies (Knoll et al. 2019). However, many studies have identified the threat of climate change on the long-term persistence of lake ice (Magnuson et al. 2000; Livingstone et al. 2009; Sharma et al. 2019; Filazzola et al. 2020; Sharma et al. 2021). Long-term surveys of lake ice collected in situ are the gold standard for monitoring in environmental science (Filazzola & Cahill 2021). Here, we present ice phenology records for 78 lakes in the Northern Hemisphere spanning up to 578 years. These surveys include 12 different countries across North America and Eurasia collected by researchers, community scientists, priests, and digital observations. This database includes information about ice phenology (ice-on dates, ice-off dates, ice cover duration), lake characteristics (e.g., size, location, names), and meta-data about each data source (e.g., how the data was collected). Our intention is that this database may be used to understand factors driving lake ice patterns and the biological or socio-economic consequences.
## Load libraries
library(tidyverse)
library(DT)
## Load data
lakeChar <- read.csv("data//LakeCharacteristics.csv")
icePheno <- read.csv("data//PhenologyData.csv")
datatable(lakeChar, caption="Lake Characteristics")
datatable(lakeChar, caption="Lake Ice Phenology")
The data within this database are separated into three main files - PhenologyData.csv: has the lake ice phenology for all 478 lakes. - LakeCharacteristics.csv: has the physical characteristics and coordinates of the lakes in the database. - Definitions.csv: the meta-data associated with each lake including the range of the time series, number of missing observations, and definitions of ice-on/ice-off.
The qaqc.r file was used for converting 78_lakes_ts_minimal.csv into “long” format where only one each column represents ice on and ice off dates. The qaqc.r file also performs some basic quality control and assurance of the dataset. There are two source files, create_lake_ice_time_series.py and additional_functions.py that consolated lake names, conduct some quality control, and were responsible for the original data aggregation across multiple files.
library(tidyverse)
lakeChar %>%
select(lake = lakename, ManuscriptName) %>%
right_join(icePheno) %>%
group_by(ManuscriptName, lake) %>%
summarize(minYear = min(start_year),
maxYear = max(start_year),
totalTimeseries = length(iceOn),
missingIceOn = sum(is.na(iceOn)),
missingIceOff = sum(is.na(iceOff)))
## # A tibble: 78 x 7
## # Groups: ManuscriptName [78]
## ManuscriptName lake minYear maxYear totalTimeseries missingIceOn
## <chr> <chr> <int> <int> <int> <int>
## 1 Cazenovia Lake cazenovia 1838 2018 191 14
## 2 China Lake china 1873 2018 101 101
## 3 Christmas Lake christmas 1886 2017 131 128
## 4 Clear Lake clear 1873 2019 128 126
## 5 Cobbosseecontee Lake cobbossee 1839 2018 178 178
## 6 Damariscotta Lake damariscot~ 1836 2018 182 182
## 7 Detroit Lake detroit 1892 2019 128 18
## 8 Geneva Lake geneva 1862 2018 157 6
## 9 Grand Traverse Bay grand_trav~ 1850 2016 166 44
## 10 Green Lake green 1896 2015 91 21
## # ... with 68 more rows, and 1 more variable: missingIceOff <int>