Machine-LearningApplicationsofAlgorithmicRandomnessVolodyaovk,AlexGammerman,CraigSaundersComputerLearningResearchCentreandDepartmentofScienceRoyalHollowa,UniversitofLondon,Egham,SurreyTW200EX,Englandfvovk,alex,craigg@dcs.rhbnc.ac.ukAbstractMostmachinelearningalgorithmssharethefollowingdrawback:theyonlyoutputbarepredictionsbutnotthecon denceinthosepredictions.Inthe1960salgorithmicinfor-mationtheorysupplieduniversalmeasuresofcon dencebuttheseare,unfortunately,non-computable.Inthispap erwecombinetheideasofalgorithmicinformationtheorywiththetheoryofSupp ortVectormachinestoobtainpracticableapproximationsuni-versalmeasuresofcon dence.Weshowthatinsomestandardproblemsofpatternrecog-nitionourapproximationsworkell.1INTRODUCTIONTwoimp ortantdi erencesofmostmo dernmetho dsmachinelearning(suchasstatisticaltheory,seeVapnik[21],1998,orPACtheory)fromclassicalstatisticalmetho dsarethat:machinelearningmetho dspro ducebarepredic-tions,withoutestimatingcon denceinthosepre-dictions(unlike,eg,predictionoffutureobser-vationsintraditionalstatistics(Guttman[5],1970));manymachinelearningmetho dsaredesignedtowork(andtheirp erformanceisanalysed)un-derthegeneraliidassumption(unlikeclas-sicalparametricstatistics)andtheyareabletodealwithextremelyhigh-dimensionalhyp othesisspaces;cfVapnik[21](1998).Inthispap erwewillfurtherdeveloptheapproachofGammermanetal[4](1998)andSaunders[17Figure1:Ifthetrainingsetonlycontainsclear2sand7s,weouldliktoattachmucloercon dencethemiddleimagethantorightandleftones(1999),wherethegoalistoobtaincon dencesforpredictionsunderthegeneraliidassumptioninhigh-dimensionalsituations.Figure1demonstratesthede-sirabilityofcon dences.Themaincontributionthispap erisemb eddingtheapproachesofGammermanetal[4](1998)andSaunderset[17(1999)intoagen-eralschemebasedonthenotionofalgorithmicran-domness.Aswillb ecomeclearlater,theproblemofassigningcon dencestopredictionsiscloselyconnectedtheproblemofde ningrandomsequences.ThelatterproblemwassolvedbyKolmogorov[8](1965),whobasedhisde nitionontheexistenceUniver-salTuringMachine(thoughitb ecameclearthatKol-mogorov'sde nitiondo essolvetheproblemofde ningrandomsequencesonlyafterMartin-Lof 'spap er[15],1966);Kolmogorov'sde nitionmovedthenotionofrandomnessfromthegreyareasurroundingprobabil-itytheoryandstatisticstomathematicalcomputersci-ence.Kolmogorovb elievedhisnotionofrandomnesstob easuitablebasisforapplicationsofprobability.Unfor-tunately,fateideaasdi erentfromKol-mogorov's1933axioms(Kolmogorov[7],1933),which
[1]
Andrei N. Kolmogorov,et al.
Logical basis for information theory and probability theory
,
1968,
IEEE Trans. Inf. Theory.
[2]
Péter Gács,et al.
Exact Expressions for Some Randomness Tests
,
1979,
Math. Log. Q..
[3]
V. Vovk.
On the concept of the Bernoulli property
,
1986
.
[4]
A. Gammerman,et al.
Bayesian diagnostic probabilities without assuming independence of symptoms.
,
1991,
Methods of information in medicine.
[5]
J. K. Ord,et al.
Statistical Tolerance Regions: Classical and Bayesian
,
1971
.
[6]
Luc Longpre,et al.
Resource Bounded Kolmogorov Complexity and Statistical Tests
,
1992
.
[7]
Vladimir N. Vapnik,et al.
The Nature of Statistical Learning Theory
,
2000,
Statistics for Engineering and Information Science.
[8]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[9]
D. Fraser.
Nonparametric methods in statistics
,
1957
.
[10]
M. Schervish.
Theory of Statistics
,
1995
.
[11]
Ming Li,et al.
An Introduction to Kolmogorov Complexity and Its Applications
,
2019,
Texts in Computer Science.
[12]
Alexander Gammerman,et al.
Learning by Transduction
,
1998,
UAI.
[13]
Ming Li,et al.
Computational Machine Learning in Theory and Praxis
,
1995,
Computer Science Today.
[14]
David Haussler,et al.
Predicting {0,1}-functions on randomly drawn points
,
1988,
COLT '88.
[15]
V. V. Holloway.
Transduction with Conndence and Credibility
,
2022
.
[16]
V. Vovk.
A logic of probability, with application to the foundations of statistics
,
1993
.
[17]
Lawrence D. Jackel,et al.
Handwritten Digit Recognition with a Back-Propagation Network
,
1989,
NIPS.
[18]
A. Kolmogorov.
Three approaches to the quantitative definition of information
,
1968
.
[19]
F. Y. Edgeworth,et al.
The theory of statistics
,
1996
.
[20]
Alexander Gammerman,et al.
Ridge Regression Learning Algorithm in Dual Variables
,
1998,
ICML.