Aggregated outputs by linear models: An application on marine litter beaching prediction

Abstract In regression, a predictive model which is able to anticipate the output of a new case is learnt from a set of previous examples. The output or response value of these examples used for model training is known. When learning with aggregated outputs, the examples available for model training are individually unlabeled. Collectively, the aggregated outputs of different subsets of training examples are provided. In this paper, we propose an iterative methodology to learn linear models from this type of data. In spite of being simple, its competitive performance is shown in comparison with a straightforward solution and state-of-the-art techniques. A real world problem is also illustrated which naturally fits the aggregated outputs framework: the estimation of marine litter beaching along the south-east coastline of the Bay of Biscay.

[1]  Ugo Valbusa,et al.  Plastic ingestion in aquatic-associated bird species in southern Portugal. , 2018, Marine pollution bulletin.

[2]  Felix X. Yu,et al.  SVM for learning with label proportions , 2013, ICML 2013.

[3]  L. Dery,et al.  Weakly supervised classification in high energy physics , 2017, Journal of High Energy Physics.

[4]  M. Thiel,et al.  Rivers as a source of marine litter--a study from the SE Pacific. , 2014, Marine pollution bulletin.

[5]  Juan Bellas,et al.  Ingestion of microplastics by demersal fish from the Spanish Atlantic and Mediterranean coasts. , 2016, Marine pollution bulletin.

[6]  Richard C. Thompson,et al.  The impact of debris on marine life. , 2015, Marine pollution bulletin.

[7]  N. Harrison,et al.  Fisheries as a source of marine debris on beaches in the United Kingdom. , 2016, Marine pollution bulletin.

[8]  Stefan Rüping,et al.  SVM Classifier Estimation from Group Probabilities , 2010, ICML.

[9]  I. Smith,et al.  'Ghost fishing' of target and non-target species by Norway lobster Nephrops norvegicus creels , 2008 .

[10]  Patrick ten Brink,et al.  The Economics of Marine Litter , 2015 .

[11]  Iñaki Inza,et al.  Fitting the data from embryo implantation prediction: Learning from label proportions , 2018, Statistical methods in medical research.

[12]  M. Portman,et al.  Marine litter from beach-based sources: Case study of an Eastern Mediterranean coastal town. , 2017, Waste management.

[13]  Yong Shi,et al.  Adaboost-LLP: A Boosting Method for Learning With Label Proportions , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[15]  Lev Reyzin,et al.  On the Complexity of Learning from Label Proportions , 2017, IJCAI.

[16]  D. W. Laist Impacts of Marine Debris: Entanglement of Marine Life in Marine Debris Including a Comprehensive List of Species with Entanglement and Ingestion Records , 1997 .

[17]  Peter G Ryan,et al.  The effect of fine-scale sampling frequency on estimates of beach litter accumulation. , 2014, Marine pollution bulletin.

[18]  P. Nguyen,et al.  On holographic entanglement entropy of Horndeski black holes , 2017, Journal of High Energy Physics.

[19]  Izaskun Preciado,et al.  Incidental ingestion of meso- and macro-plastic debris by benthic and demersal fish , 2018 .

[20]  Klaus-Robert Müller,et al.  Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees , 2017, PloS one.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Richard Nock,et al.  (Almost) No Label No Cry , 2014, NIPS.

[23]  Bin Liu,et al.  Kernel K-means Based Framework for Aggregate Outputs Classification , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[24]  Pedro Antonio Gutiérrez,et al.  Adapting linear discriminant analysis to the paradigm of learning from label proportions , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[25]  Tao Sun,et al.  A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[26]  Katharina Morik,et al.  Learning from Label Proportions by Optimizing Cluster Model Selection , 2011, ECML/PKDD.

[27]  J. Derraik The pollution of the marine environment by plastic debris: a review. , 2002, Marine pollution bulletin.

[28]  Harry F. Campbell,et al.  The economic cost and control of marine debris damage in the Asia-Pacific region , 2011 .

[29]  Elias Fakiris,et al.  Beach litter dynamics on Mediterranean coasts: Distinguishing sources and pathways. , 2017, Marine pollution bulletin.

[30]  Maiju Lehtiniemi,et al.  Feeding type affects microplastic ingestion in a coastal invertebrate community. , 2016, Marine pollution bulletin.

[31]  Martin Schulz,et al.  Daily accumulation rates of marine debris on sub-Antarctic island beaches. , 2013, Marine pollution bulletin.

[32]  Roy Brouwer,et al.  The social costs of marine litter along European coasts , 2017 .

[33]  S. Deudero,et al.  Mediterranean marine biodiversity under threat: Reviewing influence of marine litter on species. , 2015, Marine pollution bulletin.

[34]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[35]  Christian Schmidt,et al.  Export of Plastic Debris by Rivers into the Sea. , 2017, Environmental science & technology.

[36]  G. Kaufman,et al.  Trends and drivers of debris accumulation on Maui shorelines: Implications for local mitigation strategies. , 2016, Marine pollution bulletin.

[37]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[38]  Iñaki Inza,et al.  Learning Bayesian network classifiers from label proportions , 2013, Pattern Recognit..

[39]  Zhiquan Qi,et al.  Learning With Label Proportions via NPSVM , 2017, IEEE Transactions on Cybernetics.

[40]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).