Machine-Learning Applications of Algorithmic Randomness

Machine-LearningApplicationsofAlgorithmicRandomnessVolodyaovk,AlexGammerman,CraigSaundersComputerLearningResearchCentreandDepartmentofScienceRoyalHollowa,UniversitofLondon,Egham,SurreyTW200EX,Englandfvovk,alex,craigg@dcs.rhbnc.ac.ukAbstractMostmachinelearningalgorithmssharethefollowingdrawback:theyonlyoutputbarepredictionsbutnotthecon denceinthosepredictions.Inthe1960salgorithmicinfor-mationtheorysupplieduniversalmeasuresofcon dencebuttheseare,unfortunately,non-computable.Inthispap erwecombinetheideasofalgorithmicinformationtheorywiththetheoryofSupp ortVectormachinestoobtainpracticableapproximationsuni-versalmeasuresofcon dence.Weshowthatinsomestandardproblemsofpatternrecog-nitionourapproximationsworkell.1INTRODUCTIONTwoimp ortantdi erencesofmostmo dernmetho dsmachinelearning(suchasstatisticaltheory,seeVapnik[21],1998,orPACtheory)fromclassicalstatisticalmetho dsarethat:machinelearningmetho dspro ducebarepredic-tions,withoutestimatingcon denceinthosepre-dictions(unlike,eg,predictionoffutureobser-vationsintraditionalstatistics(Guttman[5],1970));manymachinelearningmetho dsaredesignedtowork(andtheirp erformanceisanalysed)un-derthegeneraliidassumption(unlikeclas-sicalparametricstatistics)andtheyareabletodealwithextremelyhigh-dimensionalhyp othesisspaces;cfVapnik[21](1998).Inthispap erwewillfurtherdeveloptheapproachofGammermanetal[4](1998)andSaunders[17Figure1:Ifthetrainingsetonlycontainsclear2sand7s,weouldliktoattachmucloercon dencethemiddleimagethantorightandleftones(1999),wherethegoalistoobtaincon dencesforpredictionsunderthegeneraliidassumptioninhigh-dimensionalsituations.Figure1demonstratesthede-sirabilityofcon dences.Themaincontributionthispap erisemb eddingtheapproachesofGammermanetal[4](1998)andSaunderset[17(1999)intoagen-eralschemebasedonthenotionofalgorithmicran-domness.Aswillb ecomeclearlater,theproblemofassigningcon dencestopredictionsiscloselyconnectedtheproblemofde ningrandomsequences.ThelatterproblemwassolvedbyKolmogorov[8](1965),whobasedhisde nitionontheexistenceUniver-salTuringMachine(thoughitb ecameclearthatKol-mogorov'sde nitiondo essolvetheproblemofde ningrandomsequencesonlyafterMartin-Lof 'spap er[15],1966);Kolmogorov'sde nitionmovedthenotionofrandomnessfromthegreyareasurroundingprobabil-itytheoryandstatisticstomathematicalcomputersci-ence.Kolmogorovb elievedhisnotionofrandomnesstob easuitablebasisforapplicationsofprobability.Unfor-tunately,fateideaasdi erentfromKol-mogorov's1933axioms(Kolmogorov[7],1933),which

[1]  Andrei N. Kolmogorov,et al.  Logical basis for information theory and probability theory , 1968, IEEE Trans. Inf. Theory.

[2]  Péter Gács,et al.  Exact Expressions for Some Randomness Tests , 1979, Math. Log. Q..

[3]  V. Vovk On the concept of the Bernoulli property , 1986 .

[4]  A. Gammerman,et al.  Bayesian diagnostic probabilities without assuming independence of symptoms. , 1991, Methods of information in medicine.

[5]  J. K. Ord,et al.  Statistical Tolerance Regions: Classical and Bayesian , 1971 .

[6]  Luc Longpre,et al.  Resource Bounded Kolmogorov Complexity and Statistical Tests , 1992 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  D. Fraser Nonparametric methods in statistics , 1957 .

[10]  M. Schervish Theory of Statistics , 1995 .

[11]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[12]  Alexander Gammerman,et al.  Learning by Transduction , 1998, UAI.

[13]  Ming Li,et al.  Computational Machine Learning in Theory and Praxis , 1995, Computer Science Today.

[14]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[15]  V. V. Holloway Transduction with Conndence and Credibility , 2022 .

[16]  V. Vovk A logic of probability, with application to the foundations of statistics , 1993 .

[17]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[18]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[19]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[20]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.