Time series of the Advanced Very High-Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (NDVI) have been used for crop yield forecasting since the 1980s. Image masking was a critical component of several of these yield forecasting efforts as researchers attempted to isolate subsets of a region’s pixels that would improve their modeling results. Approaches generally sought to identify cropped pixels and when possible, pixels that corresponded to the particular crop type under investigation. In this paper, the former approach will be referred to as cropland masking, and the latter approach will be called crop-specific masking.
The research presented in this paper examines the underlying assumptions made when image masking for the purpose of regional crop yield forecasting. An alternative statistical image masking approach (called yield-correlation masking) is proposed that is objective (i.e., can be automated) and has the flexibility to be applied to any region with several years of time series imagery and corresponding historical crop yield information. The primary appeal of yield-correlation masking is that, unlike cropland masking, no land cover map is required, yet we will show that NDVI models generated using the two methods demonstrate comparable predictive ability.
The goal of this research is very specific: We will establish the yield-correlation masking procedure as a viable image masking technique in the context of crop yield forecasting. To accomplish this objective, we empirically evaluate and compare cropland masking and yield-correlation masking for the purpose of crop yield forecasting. Cropland masking has been shown to benefit yield forecasting models, thus providing a practical benchmark. In the process, we present a robust statistical yield forecasting protocol that can be applied to any (region, crop)-pair possessing the requisite data, and this protocol is used to evaluate the two masking methods that are being compared.
This paper is developed as follows. A brief review of related research is presented. The two primary data sets, AVHRR NDVI time-series imagery and United States Department of Agriculture (USDA) regional crop yield data, are described. Details of the study regions, crop types, and time periods under investigation are specified. Three image masking procedures are discussed, two of which are evaluated in the research.
Finally, the modeling approach and performance evaluation framework are described, along with a summary of results and conclusions drawn from the analysis. Details of the modeling strategy used are described in the Appendix.
Traditionally, yield estimations are made through agrometeorological modeling or by compiling survey information provided throughout the growing season. Yield estimates derived from agro-meteorological models use soil properties and daily weather data as inputs to simulate various plant processes at a field level (Wiegand & Richardson, 1990; Wiegand et al., 1986). At this scale, agro-meteorological crop yield modeling provides useful results. However, at regional
scales these models are of limited practical use because of spatial differences in soil characteristics and crop growthdetermining factors such as nutrition levels, plant disease, herbicide and insecticide use, crop type, and crop variety, which would make informational and analytical costs excessive. Additionally, Rudorff and Batista (1991) indicated that, at a regional level, agro-meteorological models are unable to completely simulate the different crop growing conditions that result from differences in climate, local weather conditions, and land management practices. The scale of applicability of agrometeorological models is getting larger, though, but presently only through the integration of remotely sensed imagery. For instance, Doraiswamy et al. (2003) developed a method using AVHRR NDVI data as proxy inputs to an agro-meteorological model in estimating spring wheat yields at county and subcounty scales in the U.S. state of North Dakota.
In the past 25 years, many scientists have utilized remote sensing techniques to assess agricultural yield, production, and crop condition. Wiegand et al. (1979) and Tucker et al. (1980) first identified a relationship between the NDVI and crop yield using experimental fields and ground-based spectral radiometer measurements. Final grain yields were found to be highly correlated with accumulated NDVI (a summation of NDVI between two dates) around the time of maximum greenness
(Tucker et al., 1980). In another experimental study, Das et al. (1993) used remotely sensed data to predict wheat yield 85– 110 days before harvest in India. These early experiments identified relationships between NDVI and crop response, paving the way for crop yield estimation using satellite imagery.
Rasmussen (1992) used 34 AVHRR images of Burkina Faso, Africa, for a single growing season to estimate millet yield. Using accumulated NDVI and statistical regression techniques, he found strong correlations between accumulated NDVI and yield, but only during the reproductive stages of crop growth. The lack of a strong correlation between accumulated NDVI and yield during other stages of growth was attributed to the limited temporal profile of imagery used in the study and the high variability of millet yield in his study area. Potdar (1993) estimated sorghum yield in India using 14 AVHRR images from the same growing season. He was able to forecast actual yield at an accuracy of T15% up to 45 days before harvest. Rudorff and Batista (1991) used NDVI values
as inputs into an agro-meteorological model to explain nearly 70% of the variation in 1986 wheat yields in Brazil. Hayes and Decker (1996) used AVHRR NDVI data to explain more than 50% of the variation in corn yields in the United States Corn Belt. Each of these studies found positive relationships between crop yield and NDVI, but the strength of the relationships depended upon the amount and quality of the imagery used.
Some studies have used large, multi-year AVHRR NDVI data sets. Maselli et al. (1992) found strong correlations between NDVI and final crop yields in the Sahel region of Niger using 3 years of AVHRR imagery (60 images). In India, Gupta et al. (1993) used 3 years of AVHRR data to estimate wheat yields within T5% up to 75 days before harvest. The success of this study was dependent on the fact that over 80% of the study area was covered with wheat. In Greece, 2 years of AVHRR imagery were used to estimate crop yields (Quarmby et al., 1993). Actual harvested rice yields were predicted with an accuracy of T10%, and wheat yields were predicted with an accuracy of T12% at the time of maximum greenness. Groten (1993) was able to predict crop yield with a T15% estimation error 60 days before harvest in Burkina Faso using regression techniques and 5 years of AVHRR NDVI data (41 images).
Doraiswamy and Cook (1995) used 3 years of AVHRR NDVI imagery to assess spring wheat yields in North and South Dakota in the United States. They concluded that the most promising way to improve the use of AVHRR NDVI for estimating crop yields at regional scales would be to use larger temporal data sets, better crop masks, and climate data. Lee et al. (1999) used a 10-year, biweekly AVHRR data set to forecast corn yields in the U.S. state of Iowa. They found that the most accurate forecasts of crop yield were made using a cropland mask and measurements of accumulated NDVI. Maselli and Rembold (2001) used multi-year series of annual crop yields and monthly NDVI to develop cropland masks for four Mediterranean African countries. They found that application of the derived cropland masks improved relationships between NDVI and final yield during optimal yield prediction periods. Ferencz et al. (2004) found yields of eight different crops in Hungary to be highly correlated with optimized, weighted seasonal NDVI sums using 1-km AVHRR NDVI from 1996 to 2000. They used non-forest vegetation masks and a novel time series interpolation approach and actually obtained their best results when using a greenness index equivalent to the numerator of the NDVI formula (NIR-RED; see Section 3). Additionally, many researchers have found that crop condition and yield estimation are improved through the inclusion of metrics that characterize crop development stage (Badhwar & Henderson, 1981; Groten, 1993; Kastens, 1998; Lee et al., 1999; Quarmby et al., 1993; Rasmussen, 1992). Ancillary data have been found useful as well. For example, Rasmussen (1997) found that soil type information improved the explanation of millet and ground nut yield variation using 3 years of AVHRR NDVI from the Peanut Basin in Senegal. In a later study, Rasmussen (1998) found that the inclusion of tropical livestock unit density further improved the explanation of millet yield variation in intensively cultivated regions of the Peanut Basin.
Based on the studies described, for the purpose of crop yield forecasting, longer time series of NDVI imagery are preferred to shorter ones. Also, few image masking techniques have been thoroughly and comparatively explored, likely due to the inherent complexities underlying this phase in any remote sensing-based yield forecasting methodology. Thus, to help achieve the goal of this project, an important objective of this research is to use historical yield information and historical time series AVHRR NDVI imagery to devise a thorough and robust statistical procedure for obtaining early to mid-season crop yield forecasts, with particular emphasis on image masking. The techniques described in this paper can be applied to any (region, crop)-pair that possesses sufficient historical yield information and corresponding time series NDVI imagery. Since few meaningful crop phenology metrics can be accurately derived at early points in the growing season, our research does not attempt to use this information. Also, no ancillary information is used, to prevent dependence on the availability of such data.
Description of data
The research presented in this paper relies on two data sets. The first is a time series of biweekly AVHRR NDVI compositeimagery from 1989 to 2000, obtained from the U.S. Geological survey Earth Resources Observation Systems (EROS) Data Center (EDC) in Sioux Falls, SD (Eidenshink, 1992). This data set was chosen because it is relatively inexpensive, reliable, and is updated in near real-time. NDVI is defined by the formula (NIR-RED)/(NIR +RED), where NIR is reflectance in the near-infrared spectrum (0.75 – 1.10 Am) and RED is reflectance in the red band of the visible spectrum (0.58 – 0.68 Am). Chlorophyll uses electromagnetic energy in the RED band for photosynthesis, and plant structure is reflective of energy in the NIR band. So, for vegetated surfaces, NDVI increases if plant biomass increases or if photosynthetic activity increases.
The NDVI data were received in unsigned 8-bit integer format, with the original NDVI range [1,1] linearly scaled to the integer range 0 –200. For analysis purposes, the integer data were rescaled to their native range of [1,1]. As a consequence of the limited precision of the original 8-bit data, the precision of the rescaled data is 0.01, so there is an implicit expected numerical error of 0.005 in the pixel-level NDVI values.
The NDVI data set is not without uncertainty, both temporal and spatial. From 1989 – 2000, two polar orbiting National Oceanic and Atmospheric Administration (NOAA) satellites (NOAA-11 [1989 – 1994] and NOAA-14 [1995 – 2000]) carried the AVHRR sensors that collected the data comprising our data set. The U.S. annually invariant target curve, with NDVI from periods 5– 21 (February 26 –October 21) shown for each year. This curve represents the average time series of nearly 3500 pixels selectively sampled from the 48 states in the conterminous U.S. to possess highly regular annual periodicity, thus exposing any artificial interannual NDVI value drift (Kastens et al., 2003). The NDVI data originating from NOAA-11 are fairly consistent over time. The data from NOAA-14 are less so, exhibiting a large artificial oscillation from 1997 to 2000. The range of the trend curve from 1989 to 2000 has width 0.0464. Comparing this width to
the overall effective range of the AVHRR NDVI data being used (which is approximately [0.05,0.95] for the full U.S. terrestrial range, but narrower in most practical situations), it follows that nearly 5% of the effective AVHRR NDVI data range can be attributed to artificial interannual NDVI value drift. In retrospect, we know that sensor orbit decay and sensor calibration degradation were the primary sources of the interannual NDVI value drift found in the NOAA-14 data.
Image resolution (1 km2 /pixel, or 100 ha/pixel) of the AVHRR NDVI data is also an issue because pixel size is more than twice as large as the typical field size for soybeans and major grains in the U.S., which is roughly 40 ha (Kastens & Dhuyvetter, 2002). Furthermore, when considering spatial error of the image registration performed during the NDVI compositing process, the area of the region from which a single pixel’s values can be obtained grows to more than 4 km2, or more than 400 ha (Eidenshink, 1992). A combination of sensor factors (e.g., sensor stability, view angle, orbit integrity) and effects of image pre-processing and compositing induce this spatial variation.
The second data set is historical, final, state-level yield data, obtained from the USDA National Agricultural Statistics Service (NASS) through its publicly accessible website (http://www.usda.gov/nass). The database is updated annually for all crops, with each particular crop’s final regional yield estimates released well after harvest completion. Updates to the final regional yield estimates can occur up to 3 years after their initial release, but generally these changes are not large. No historical or expected error statistics for these estimates are published below the national spatial scale, but they are nonetheless accepted by the industry as the best widely available record for average regional crop yield in the U.S.
Description of crops, regions, and time periods under investigation
The crops and regions under investigation in the present research are corn and soybeans in the U.S. states of Iowa (IA) and Illinois (IL), winter wheat and grain sorghum in the state of Kansas (KS), and spring wheat and barley in the state of North Dakota (ND). Compared to other states, during 1989 –2000, Iowa ranked first in corn production (100.2 million mt/year; ‘‘mt’’=metric ton) and second in soybean production (26.9 million mt/year). Illinois ranked second in corn production (89.9 million mt/year) and first in soybean production (27.1 million mt/year). Kansas was the top-producing winter wheat state (25.5 million mt/year) and the top-producing grain sorghum state (14.2 million mt/year). North Dakota was the top-producer of both spring wheat (16.6 million mt/year) and barley (6.3 million mt/year).
For each crop, a six-period window of early to mid-season NDVI imagery is considered. The source data for these six biweekly composites span nearly 3 months of raw AVHRR imagery, corresponding to Julian biweekly periods 5 – 10 (approximately February 26 –May 20) for winter wheat, 9– 14 (approximately April 23 – July 15) for spring wheat and barley, and 11 –16 (approximately May 21–August 12) for corn, soybeans, and sorghum. Labeling the six biweekly periods 1 to 6, yields are modeled using data from periods 1– 4, 1 –5, and 1 – 6, with each of these three ranges providing a unique yield forecasting opportunity corresponding to a
different point in the growing season. To obtain the dates for the crop-specific ranges, the initial release dates of USDA NASS yield forecasts were considered. The first NDVI image generated after the release of the initial USDA state-level estimates for the season is assigned to period 6, which fixes periods 1– 5 as well. Initial release dates for USDA state level estimates are approximately May 11 for winter wheat, July 11 for spring wheat and barley, and August 11 for corn, soybeans, and grain sorghum. Hereafter, winter wheat will be classified as an early-season crop, spring wheat and barley as mid-season crops, and corn, soybeans, and grain sorghum as late-season crops. With this timing, forecasts generated at periods 4 and 5 for each crop are produced before any state-level USDA yield estimates are released.
Approaches to image masking in crop yield forecasting
The purpose of image masking in the context of crop yield forecasting is to identify subsets of a region’s pixels that lead to NDVI variable values that are optimal indicators of a particular crop’s final yield. A good image mask should capture the essence (i.e., salient features) of the present year’s growing season with respect to how the crop of interest is progressing. This growing season essence is a combination of climatic and terrestrial factors.
In theory, the ideal approach to image masking for the purpose of crop yield forecasting would be to use crop-specific masking. This would allow one to consider only NDVI information pertaining to the crop of interest. However, when such masking is applied to multiple years of imagery, several difficulties are encountered. Principal among these is the widespread practice of crop rotation, which suggests that year-specific masks are needed rather than a single cropspecific mask that can be applied to all years. Regional trending in crop area (increase or decrease in the amount of a region’s area planted to a particular crop over time), if severe
enough, also may call for year-specific masking. Identifying a particular crop in the year to be forecasted presents even greater difficulties, as only incomplete growing season NDVI information is available. This is especially true early in the season when the crop has low biomass and does not produce a large
NDVI response. In addition to hindering crop classification, this low NDVI response of a crop early in its development also stifles crop yield modeling efforts, as AVHRR NDVI measurements from pixels corresponding to immature crops are not very sensitive and are thus minimally informative (see Wardlow et al., in press, for an example of such insensitivity occurring with 250-m Moderate Resolution Imaging Spectroradiometer [MODIS] NDVI data, and MODIS has better radiometric resolution than AVHRR). Moreover, with the coarse-resolution (about 100 hectares/ pixel) AVHRR NDVI imagery used in this study, identifying monocropped pixels becomes an improbable task. This is particularly true in low-producing regions and in regions with sparse crop distribution. As noted, a single pixel covers an area well over twice the average field size, and when error of the image registration is considered, a pixel’s effective ground coverage can become more than 400 hectares/pixel, or roughly ten times the typical field size.
A more feasible alternative to crop-specific masking is cropland masking, which refers to using pixels dominated by land in general agricultural crop production. Kastens (1998, 2000) and Lee et al. (1999) obtained some of their best yield modeling results using this approach. Rasmussen (1998) used a percent-cropland map to improve his yield modeling by splitting the data into two categories based on cropland density and building different models for the two classes. Maselli and Rembold (2001) used correlations between 13 years of monthly NDVI composites and 13- year series of national crop yields to estimate pixel-level cropland fraction for four Mediterranean African countries. Upon application of these derived cropland masks, the authors found improved relationships between NDVI and
final estimated yield.
Cropland masks usually are derived from existing land use/land cover maps (one exception being Maselli and Rembold (2001)). If relatively small amounts of land in a study area have been taken out of or put into agricultural crop production during a study period, a single mask can be obtained and applied to all years of data. Considering that all traditional agricultural crops are now lumped in the general class of ‘‘cropland,’’ heavily cropped pixels are more prevalent in heavily cropped regions, which allows for the construction of well-populated masks dominated by cropland. But the generation of such masks becomes difficult when low-producing regions are encountered, as well as in regions where cropland is widely interspersed with non-cropland.
As with crop-specific masking, cropland masking also can suffer the effects of minimally informative NDVI response early in a crop’s growing season (e.g., March for early-season crops, May for mid-season crops, and late May and early June for late-season crops). Many important agricultural regions are almost completely dominated by single-season crop types (e.g., Iowa produces predominately late-season crops). In such cases, cropland AVHRR NDVI from time periods early in the
particular growing season may not be very useful for predicting final yield.
By late May and June, some of the year’s terrestrial and weather-based growth-limiting factors may have already been established for some crops or regions. For instance, in the U.S., soil moisture has largely been set by this time, and soil moisture is an important determinant of crop yields in the four states comprising our study area. Such moisture information is not readily detectable in immature crops because it is usually not a limiting factor until the plant’s water needs become
significant and its roots penetrate deeper into the soil. On the other hand, available soil moisture can noticeably affect other regional vegetation that is already well developed, such as grasslands, shrublands, and wooded areas, and in some cases, early- and mid-season crops.
For the reasons noted above, we propose a new masking technique, which we call yield-correlation masking. All vegetation in a region integrates the season’s cumulative growing conditions in some fashion and may be more indicative of a crop’s potential than the crop itself. Thus, all pixels are considered for use in crop yield prediction. This premise is most sound early in a crop growing season (especially for mid- and late-season crops), when the NDVI response of the immature crop is not yet strong enough to be a useful indicator of final yield. Also, as noted, when the crop is in early growth stages, problems such as lack of subsoil moisture may not yet have impacted the immature crop while having already affected more mature nearby vegetation.
Each NDVI-based variable captures a different aspect of the current growing season. This aspect manifests itself in different ways within the region’s vegetation, suggesting that optimal masks for the different NDVI-based variables likely will not be identical. Thus, for each (region, crop)-pair, yield-correlation masking generates a unique mask for each NDVI variable. The technique is initiated by correlating each of the historical, pixel-level NDVI variable values with the
region’s final yield history, a strategy similar to the initial step of the cropland classification strategy presented in Maselli and Rembold (2001). The highest correlating pixels, thresholded so that some pre-specified number of pixels is included in the mask (this issue is addressed later), are retained for further processing and evaluation of the variable at hand.
Though much more computationally intensive, the yieldcorrelation masking technique overcomes the major problems afflicting crop-specific masking and cropland masking. Unlike these approaches, yield-correlation masking readily can be applied to low-producing regions and regions possessing sparse crop distribution. Also, since yield correlation masks are not constrained to include pixels dominated by cropland, they are not necessarily hindered by the weak and insensitive NDVI
responses exhibited by crops early in their respective growing seasons. Furthermore, once the issue of identifying optimal mask size (i.e., determining how many pixels should be included in the masks) is addressed, the entire masking/ modeling procedure becomes completely objective.
Description of cropland masks
For Iowa, Illinois, and North Dakota, the cropland masks used in this study were derived from the United States Geological Survey (USGS) National Land Cover Database (NLCD) (Vogelmann et al., 2001). The original 30-m resolution land cover maps can be obtained from the website http://landcover.usgs.gov/natllandcover.html. After generalizing the classes to cropland and non-cropland, the data were aggregated to a 1-km grid corresponding to the NDVI imagery used in this study. All annual crops, as well as alfalfa, were assigned to the cropland category, and all other cover types
were classified as non-cropland. Pixel values in the resulting map corresponded to percent cropland within the 1-km2 footprint of the pixel.
(Source – http://kufs.ku.edu/media/uploads/work/Kastens_RSE2005_Image_masking_for_crop_yield_forecasting.pdf)Read more