Pollen traps frequently are installed in not easily accessible places and, being the mechanism only tested once a week. Consequently, failures in the traps are not detected immediately and days with missing data are usually found in databases. Sometimes the lost data are not relevant for the research, but other times, these gaps may compromise results. The aim of this study is to test the effectiveness of different methods for interpolating missing data, searching for the most accurate strategy depending on the circumstances and pollen type.
Interpolation was performed by using different strategies implemented in the “AeRobiology R package”: mathematical approaches, time series analysis and data from nearby stations. Each method was applied to 11 different pollen types. Aerobiological data from Ronda station were used to test the accuracy of the different interpolation methods. For each pollen type, the natural year was split into five periods: pre-season, pre-peak, peak, post-peak and post-season, the main pollen season. Gaps of 3, 5, 7 and 10 days were considered. Errors were expressed as Mean Absolute Error (MAE).
For all the methods, the highest MAEs were obtained when the gaps affected the peak period. Regarding the pollen types, the highest MAEs were detected for the most abundant pollen types. These errors were also higher when 3-days gaps were used, and the lowest for 7-days gaps. Mathematical approaches obtained the lowest MAEs while the highest were obtained by using the data coming from nearby stations.
According to our results, mathematical approaches resulted to be the most accurate methods to interpolate
missing data. Small gaps have a bigger influence of daily oscillations in pollen concentrations, being more difficult to predict than larger gaps, in which the general trend of the pollen concentrations could be observed, and daily fluctuations are smoothed. Abundant pollen types generally have bigger daily oscillations, obtaining a bigger MAE.