Optimizing Field Data Collection for Individual Tree Attribute Predictions Using Active Learning Methods

Light detection and ranging (lidar) data are nowadays a standard data source in studies related to forest ecology and environmental mapping. Medium/high point density lidar data allow to automatically detect individual tree crowns (ITCs), and they provide useful information to predict stem diameter and aboveground biomass of each tree represented by a detected ITC. However, acquisition of field data is necessary for the construction of prediction models that relate field data to lidar data and for validation of such models. When working at ITC level, field data collection is often expensive and time-consuming as accurate tree positions are needed. Active learning (AL) can be very useful in this context as it helps to select the optimal field trees to be measured, reducing the field data collection cost. In this study, we propose a new method of AL for regression based on the minimization of the field data collection cost in terms of distance to navigate between field sample trees, and accuracy in terms of root mean square error of the predictions. The developed method is applied to the prediction of diameter at breast heights (DBH) and aboveground biomass (AGB) of individual trees by using their height and crown diameter as independent variables and support vector regression. The proposed method was tested on two boreal forest datasets, and the obtained results show the effectiveness of the proposed selecting strategy to provide substantial improvements over the different iterations compared to a random selection. The obtained RMSE of DBH/AGB for the first dataset was 5.09 cm/95.5 kg with a cost equal to 8256/6173 m by using the proposed multi-objective method of selection. However, by using a random selection, the RMSE was 5.20 cm/102.1 kg with a cost equal to 28,391/30,086 m. The proposed approach can be efficient in order to get more accurate predictions with smaller costs, especially when a large forest area with no previous field data is subject to inventory and analysis.

[1]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[2]  E. Næsset Estimating timber volume of forest stands using airborne laser scanner data , 1997 .

[3]  E. Næsset,et al.  Forestry Applications of Airborne Laser Scanning , 2014, Managing Forest Ecosystems.

[4]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[6]  Andrew Rosenberg,et al.  Supervised and unsupervised active learning for automatic speech recognition of low-resource languages , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Juha Hyyppä,et al.  The accuracy of estimating individual tree variables with airborne laser scanning in a boreal nature reserve , 2004 .

[8]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[9]  Michele Dalponte,et al.  Tree‐centric mapping of forest carbon density from airborne laser scanning and hyperspectral data , 2016, Methods in ecology and evolution.

[10]  John W. Moser,et al.  A Generalized Framework for Projecting Forest Yield and Stand Structure Using Diameter Distributions , 1983 .

[11]  Terje Gobakken,et al.  Improved estimates of forest vegetation structure and biomass with a LiDAR‐optimized sampling design , 2009 .

[12]  Lorenzo Bruzzone,et al.  Definition of Effective Training Sets for Supervised Classification of Remote Sensing Images by a Novel Cost-Sensitive Active Learning Method , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Naif Alajlan,et al.  Deep learning approach for active classification of electrocardiogram signals , 2016, Inf. Sci..

[14]  Michele Dalponte,et al.  Unsupervised selection of training plots and trees for tree species classification , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[15]  Jun Zhou,et al.  Maximizing Expected Model Change for Active Learning in Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  E. Næsset Predicting forest stand characteristics with airborne scanning laser using a practical two-stage procedure and field data , 2002 .

[17]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[18]  Joanne C. White,et al.  Remote Sensing Technologies for Enhancing Forest Inventories: A Review , 2016 .

[19]  Lorenzo Bruzzone,et al.  A multiple criteria active learning method for support vector regression , 2014, Pattern Recognit..

[20]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[21]  Jeffrey Englin,et al.  Global climate change and optimal forest management , 1993 .

[22]  Vahid Azimi,et al.  Deep learning based Nucleus Classification in pancreas histological images , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[23]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[24]  Dong Yu,et al.  Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global Entropy Reduction Maximization Criterion Computer Speech and Language Article in Press Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global E , 2022 .

[25]  L. Marklund,et al.  Biomass functions for pine, spruce and birch in Sweden , 1988 .

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[28]  R. S. Laundy,et al.  Multiple Criteria Optimisation: Theory, Computation and Application , 1989 .

[29]  Terje Gobakken,et al.  Different plot selection strategies for field training data in ALS-assisted forest , 2010 .

[30]  A. McGuire,et al.  Global climate change and terrestrial net primary production , 1993, Nature.

[31]  Lorenzo Bruzzone,et al.  Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Daoqiang Zhang,et al.  Deep active learning for nucleus classification in pathology images , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[33]  Robert B. Allen,et al.  Active learning for text classification: Using the LSI Subspace Signature Model , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[34]  Mikko Inkinen,et al.  A segmentation-based method to retrieve stem volume estimates from 3-D tree height models produced by laser scanners , 2001, IEEE Trans. Geosci. Remote. Sens..

[35]  Yunming Ye,et al.  Batch-Mode Active Learning with Semi-supervised Cluster Tree for Text Classification , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[36]  Farid Melgani,et al.  Active Learning Methods for Electrocardiographic Signal Classification , 2010, IEEE Transactions on Information Technology in Biomedicine.

[37]  Bernhard Schölkopf,et al.  Cost-Sensitive Active Learning With Lookahead: Optimizing Field Surveys for Remote Sensing Data Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[38]  W. Stadler A survey of multicriteria optimization or the vector maximum problem, part I: 1776–1960 , 1979 .