Abstract
Precise crop yield predictions are of national importance for ensuring food
security and sustainable agricultural practices. While AI-for-science
approaches have exhibited promising achievements in solving many scientific
problems such as drug discovery, precipitation nowcasting, etc., the
development of deep learning models for predicting crop yields is constantly
hindered by the lack of an open and large-scale deep learning-ready dataset
with multiple modalities to accommodate sufficient information. To remedy this,
we introduce the CropNet dataset, the first terabyte-sized, publicly available,
and multi-modal dataset specifically targeting climate change-aware crop yield
predictions for the contiguous United States (U.S.) continent at the county
level. Our CropNet dataset is composed of three modalities of data, i.e.,
Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, for over
2200 U.S. counties spanning 6 years (2017-2022), expected to facilitate
researchers in developing versatile deep learning models for timely and
precisely predicting crop yields at the county-level, by accounting for the
effects of both short-term growing season weather variations and long-term
climate change on crop yields. Besides, we develop the CropNet package,
offering three types of APIs, for facilitating researchers in downloading the
CropNet data on the fly over the time and region of interest, and flexibly
building their deep learning models for accurate crop yield predictions.
Extensive experiments have been conducted on our CropNet dataset via employing
various types of deep learning solutions, with the results validating the
general applicability and the efficacy of the CropNet dataset in climate
change-aware crop yield predictions.