Inductive Biases, Invariances and Generalization in RL (BIG)

International Conference on Machine Learning (ICML)

July 18, 2020

@BIGICML · #BIGICML

Contact: generalizationworkshop@gmail.com

One proposed solution towards the goal of designing machines that can extrapolate experience across environments and tasks, are inductive biases. Providing and starting algorithms with inductive biases might help to learn invariances e.g. a causal graph structure, which in turn will allow the agent to generalize across environments and tasks. While some inductive biases are already available and correspond to common knowledge, one key requirement to learn inductive biases from data seems to be the possibility to perform and learn from interventions. This assumption is partially motivated by the accepted hypothesis in psychology about the need to experiment in order to discover causal relationships. This corresponds to an reinforcement learning environment, where the agent can discover causal factors through interventions and observing their effects. We believe that one reason which has hampered progress on building intelligent agents is the limited availability of good inductive biases. Learning inductive biases from data is difficult since this corresponds to an interactive learning setting, which compared to classical regression or classification frameworks is far less understood e.g. even formal definitions of generalization in RL have not been developed. While Reinforcement Learning has already achieved impressive results, the sample complexity required to achieve consistently good performance is often prohibitively high. This has limited most RL to either games or settings where an accurate simulator is available. Another issue is that RL agents are often brittle in the face of even tiny changes to the environment (either visual or mechanistic changes) unseen in the training phase.

The question of generalization in reinforcement learning is essential to the field’s future both in theory and in practice. However there are still open questions about the right way to think about generalization in RL, the right way to formalize the problem, and the most important tasks. This workshop would help to address this issue by bringing together researchers from different backgrounds to discuss these challenges. In our workshop we hope to explore research and new ideas on topics related to inductive biases, invariances and generalization, including:

Key questions to be addressed and discussed include:

What are efficient ways to learn inductive biases from data?
Which inductive biases are most suitable to achieve generalization?
Can we make the problem of generalization in particular for RL more concrete and figure out standard terms for discussing the problem?
Causality and generalization especially in RL
Model-based RL and generalization.
Can we create models that are robust visual environments, assuming all the underlying mechanics are the same. Should this count as generalization or transfer learning?
Can we create a theoretical understanding of generalization in RL, and understand how it is related to the well developed ideas from statistical learning theory ?
What is the difference between a prediction that is made with a causal model and that with a non‐causal model?

Organizers

Anirudh Goyal (Mila, University of Montreal)
Rosemary Nan Ke (Mila, University of Montreal)
Stefan Bauer (Max Planck Institute for Intelligent Systems)
Jane Wang (Deepmind)
Fabio Viola (Deepmind)
Theophane Weber (Deepmind)
Bernhard Schölkopf (Max Planck Institute for Intelligent Systems)

References

Stuart Russell and Eric Wefald, Principles of metareasoning, 1991
Sutton, Richard S. "Generalization in reinforcement learning: Successful examples using sparse coarse coding." Advances in neural information processing systems. 1996.
Boyan, Justin A., and Andrew W. Moore. "Generalization in reinforcement learning: Safely approximating the value function." Advances in neural information processing systems. 1995.
J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
Murphy et.al, “A generalization error for q-learning” (2005)
George Konidaris A framework for transfer in reinforcement learning, ICMLW’06
Taylor, Matthew E., and Peter Stone. "Transfer learning for reinforcement learning domains: A survey." JMLR’09
Ponsen et. al, Abstraction and Generalization in Reinforcement Learning, Springer’09
Whiteson et. al, Protecting against evaluation overfitting in empirical reinforcement learning (2011)
Solway et. al, Optimal behavioral hierarchy, (2014)
Schaul et. al “Universal Value Function Approximators”, ICML’15
Gershman SJ et. al, Novelty and Inductive Generalization in Human Reinforcement Learning. 2015
Wang et. al, Learning to Reinforcement Learn (ICML’16)
Rajeswaran et. al, Towards generalization and simplicity in continuous control (NIPS’17)
Andreas et. al, Modular Multitask Reinforcement Learning with Policy Sketches, ICML’17
Yee Whye Teh, et. al “Distral: Robust Multitask Reinforcement Learning” (2017)
Momennejad et.al The successor representation in human reinforcement learning, (2017)
Hamrick et. al, Metacontrol for adaptive imagination-based optimization. ICLR’17
Cobbe, Karl, et al. "Quantifying Generalization in Reinforcement Learning." arXiv preprint arXiv:1812.02341(2018).
Zhang et. al, “ A Study on Overfitting in Deep Reinforcement Learning” (arxiv 2018)
Sanchez-Gonzalez et. al, Graph networks as learnable physics engines for inference and control, ICML’18
Nichol, Alex, et al. "Gotta Learn Fast: A New Benchmark for Generalization in RL." arXiv preprint arXiv:1804.03720 (arxiv 2018).
Packer, Charles, et al. "Assessing Generalization in Deep Reinforcement Learning." arXiv (2018)
Zhang et. al, A dissection of overfitting and generalization in continuous reinforcement learning, (2018)
Alet et. al, Modular meta-learning, CORL’18
Galashov et. al, Information asymmetry in KL-regularized RL, ICLR’19
Goyal et. al, InfoBot: Transfer and Exploration via the Information Bottleneck ICLR’19
Farebrother, Jesse, et al. “Generalization and Regularization in DQN” (2019).
Goyal et. al, Recurrent Independent Mechanisms, arxiv preprint arxiv:1909.10893
Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. MIT press, 2017.
Judea Pearl (2018): The Seven Pillars of Causal Reasoning with Reflections on Machine Learning.
James Woodward (2005): Making Things Happen: A Theory of Causal Explanation.
Bottou, Léon, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. "Counterfactual reasoning and learning systems: The example of computational advertising"
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant prediction: identification and confidence intervals.
On causal and anticausal learning, arXiv preprint arXiv:1206.6471.
Dasgupta, Ishita, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, and Zeb Kurth-Nelson. "Causal Reasoning from Meta-reinforcement Learning." arXiv preprint arXiv:1901.08162 (2019)
Bengio, Y., Deleu, T., Rahaman, N., Ke, R., Lachapelle, S., Bilaniuk, O., Goyal, A. and Pal, C., 2019. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms. arXiv preprint arXiv:1901.10912.
Ke, N. R., Bilaniuk, O., Goyal, A., Bauer, S., Larochelle, H., Pal, C., & Bengio, Y. (2019). Learning Neural Causal Models from Unknown Interventions. arXiv preprint arXiv:1910.01075
Parascandolo, G., et al. Learning independent causal mechanisms, https://arxiv.org/abs/1712.00961.
Buesing, L., Weber, T., Zwols, Y., Racaniere, S., Guez, A., Lespiau, J.B. and Heess, N., 2018. Woulda, coulda, shoulda: Counterfactually-guided policy search. arXiv preprint arXiv:1811.06272.
Suter, R. et al. Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness. ICML. 2019.
Goyal. et al, Recurrent Independent Mechanisms, arXiv preprint arXiv:1909.10893.
Rezende et al, Causally Correct Partial Models for Reinforcement Learning, arXiv preprint arXiv:2002.02836.