Single photon in hierarchical architecture for physical reinforcement learning: Photon intelligence

Understanding and using natural processes for intelligent functionalities, referred to as natural intelligence, has recently attracted interest from a variety of fields, including post-silicon computing for artificial intelligence and decision making in the behavioural sciences. In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes the foundation of reinforcement learning and decision making. In this study, we propose and confirm a hierarchical architecture for single-photon-based reinforcement learning and decision making that verifies the scalability of the principle. Specifically, the four-armed bandit problem is solved given zero prior knowledge in a two-layer hierarchical architecture, where polarization is autonomously adapted in order to effect adequate decision making using single-photon measurements. In the hierarchical structure, the notion of layer-dependent decisions emerges. The optimal solutions in the coarse layer and in the fine layer, however, conflict with each other in some contradictive problems. We show that while what we call a tournament strategy resolves such contradictions, the probabilistic nature of single photons allows for the direct location of the optimal solution even for contradictive problems, hence manifesting the exploration ability of single photons. This study provides insights into photon intelligence in hierarchical architectures for future artificial intelligence as well as the potential of natural processes for intelligent functionalities.

[1]  T. Gacoin,et al.  Photophysics of single nitrogen-vacancy centers in diamond nanocrystals , 2015, 1501.03714.

[2]  米澤 明憲 20世紀の名著名論:John Backus: Can Programming Be Liberated from the von Neumann Style? A Functional Style and its Algebra of Programs , 2002 .

[3]  Serge Huant,et al.  Diamond nanocrystals hosting single nitrogen-vacancy color centers sorted by photon-correlation near-field microscopy. , 2007, Optics letters.

[4]  Larry A. Coldren,et al.  High-frequency single-photon source with polarization control , 2007 .

[5]  P. Grangier,et al.  Nonclassical radiation from diamond nanocrystals , 2001, OFC 2001.

[6]  Song-Ju Kim,et al.  Harnessing Natural Fluctuations: Analogue Computer for Efficient Socially Maximal Decision Making , 2015, ArXiv.

[7]  Andrew A. Chien,et al.  Moore's Law: The First Ending and a New Beginning , 2013, Computer.

[8]  Taiji Sakamoto,et al.  PLC-Based Four-Mode Multi/Demultiplexer With LP11 Mode Rotator on One Chip , 2015, Journal of Lightwave Technology.

[9]  Song-Ju Kim,et al.  Category theoretic foundation of single-photon-based decision making , 2016 .

[10]  Motoichi Ohtsu,et al.  Decision making based on optical excitation transfer via near-field interactions between quantum dots , 2014 .

[11]  Alán Aspuru-Guzik,et al.  Photonic quantum simulators , 2012, Nature Physics.

[12]  Song-Ju Kim,et al.  Amoeba-inspired algorithm for cognitive medium access , 2014 .

[13]  A. Minelli BIO , 2009, Evolution & Development.

[14]  Taksu Cheon,et al.  Interference and inequality in quantum decision theory , 2010, 1008.2628.

[15]  J. Busemeyer,et al.  A quantum probability explanation for violations of ‘rational’ decision theory , 2009, Proceedings of the Royal Society B: Biological Sciences.

[16]  G. Northoff,et al.  Culture-sensitive neural substrates of human cognition: a transcultural neuroimaging approach , 2008, Nature Reviews Neuroscience.

[17]  Archil Avaliani,et al.  Quantum Computers , 2004, ArXiv.

[18]  Masatoshi Ishikawa,et al.  Ultra high-speed Robot Based on 1 kHz vision system , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[21]  Hiroyuki Mizuno,et al.  A 20k-Spin Ising Chip to Solve Combinatorial Optimization Problems With CMOS Annealing , 2016, IEEE Journal of Solid-State Circuits.

[22]  Naoya Tate,et al.  Optical security based on near-field processes at the nanoscale , 2012 .

[23]  Daniel A. Lidar,et al.  Experimental signature of programmable quantum annealing , 2012, Nature Communications.

[24]  Kyo Inoue,et al.  Performance of various quantum-key-distribution systems using 1.55-μm up-conversion single-photon detectors , 2005 .

[25]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[26]  Pedram Khalili Amiri,et al.  Quantum computers , 2003 .

[27]  Song-Ju Kim,et al.  Single-photon decision maker , 2015, Scientific Reports.

[28]  Bee-Chung Chen,et al.  Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[29]  Song-Ju Kim,et al.  Tug-of-war model for the two-bandit problem: Nonlocally-correlated parallel exploration via resource conservation , 2010, Biosyst..

[30]  Song-Ju Kim,et al.  Efficient decision-making by volume-conserving physical object , 2014, ArXiv.

[31]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[32]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[33]  Thierry Gacoin,et al.  Photo-induced creation of nitrogen-related color centers in diamond nanocrystals under femtosecond illumination , 2004 .

[34]  Shoko Utsunomiya,et al.  Transient time of an Ising machine based on injection-locked laser network , 2012 .

[35]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[36]  R. A. Smith,et al.  Single Photon Sources , 2008 .

[37]  John W. Backus,et al.  Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs , 1978, CACM.

[38]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[39]  M. Naruse,et al.  Information physics fundamentals of nanophotonics , 2013, Reports on progress in physics. Physical Society.

[40]  M. Zollo,et al.  The neuro-scientific foundations of the exploration-exploitation dilemma , 2010 .

[41]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[42]  A. Politi,et al.  Multimode quantum interference of photons in multiport integrated devices , 2010, Nature communications.

[43]  E. Maskin Nash Equilibrium and Welfare Optimality , 1999 .