Skip to content

Latest commit

 

History

History
1123 lines (875 loc) · 53.1 KB

README.md

File metadata and controls

1123 lines (875 loc) · 53.1 KB

Awesome Information Bottleneck Paper List Awesome PRs Welcome

In memory of Professor Naftali Tishby.
Last updated on October, 2022.

0. Introduction

illustration To learn, you must forget. This may probably be one of the most intuitive lessons we have from Naftali Tishby's Information Bottleneck (IB) methods, which grew out of the fundamental tradeoff (rate v.s. distortion) from Claude Shannon's information theory, and later creatively explained the learning behaviors of deep neural networks by the fitting & compression framework.

It has been four years since the dazzling talk on Opening the Black Box of Deep Neural Networks, and more than twenty years since the first paper on the Information Bottleneck method. It is time for us to take a look back, to celebrate what has been established, and to prepare for a future.

This repository is organized as follows:

All papers are selected and sorted by topic/conference/year/importance. Please send a pull request if you would like to add any paper.

We also made slides on theory, applications and controversy for the initial Information Bottleneck principle in deep learning (p.s., some controversy has been addressed by recent publications, e.g., Lorenzen et al., 2021).

1. Classics

Agglomerative Information Bottleneck [link]
Noam Slonim, Naftali Tishby
NIPS, 1999

🐤 The Information Bottleneck Method [link]
Naftali Tishby, Fernando C. Pereira, William Bialek
Preprint, 2000

Predictability, complexity and learning [link]
William Bialek, Ilya Nemenman, Naftali Tishby
Neural Computation, 2001

Sufficient Dimensionality Reduction: A novel analysis principle [link]
Amir Globerson, Naftali Tishby
ICML, 2002

The information bottleneck: Theory and applications [link]
Noam Slonim
PhD Thesis, 2002

An Information Theoretic Tradeoff between Complexity and Accuarcy [link]
Ran Gilad-Bachrach, Amir Navot, Naftali Tishby
COLT, 2003

Information Bottleneck for Gaussian Variables [link]
Gal Chechik, Amir Globerson, Naftali Tishby, Yair Weiss
NIPS, 2003

Information and Fitness [link]
Samuel F. Taylor, Naftali Tishby and William Bialek
Preprint, 2007

Efficient representation as a design principle for neural coding and computation [link]
William Bialek, Rob R. de Ruyter van Steveninck, and Naftali Tishby
Preprint, 2007

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure [link]
Peter Harremoes and Naftali Tishby
ISIT, 2007

🐤 Learning and Generalization with the Information Bottleneck [link]
Ohad Shamir, Sivan Sabato, Naftali Tishby
Journal of Theoretical Computer Science, 2009

🐤 Information-Theoretic Bounded Rationality [link]
Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby
Preprint, 2015

🐤 Opening the Black Box of Deep Neural Networks via Information [link]
Ravid Shwartz-Ziv, Naftali Tishby
ICRI, 2017

2. Reviews

Information Bottleneck and its Applications in Deep Learning [link]
Hassan Hafez-Kolahi, Shohreh Kasaei
Preprint, 2019

The Information Bottleneck Problem and Its Applications in Machine Learning [link]
Ziv Goldfeld, Yury Polyanskiy
Preprint, 2020

On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views [link]
Abdellatif Zaidi, Iñaki Estella-Aguerri, Shlomo Shamai
Entropy, 2020

Information Bottleneck: Theory and Applications in Deep Learning [link]
Bernhard C. Geiger, Gernot Kubin
Entropy, 2020

On Information Plane Analyses of Neural Network Classifiers – A Review [link]
Bernhard C. Geiger
Preprint, 2021

Table 1 (p.2) gives a nice summary on the effect of different architectures & MI estimators on the existence of the compression phases and causal links between compression and generalizations.

A Critical Review of Information Bottleneck Theory and its Applications to Deep Learning [link]
Mohammad Ali Alomrani
Preprint, 2021

Information Flow in Deep Neural Networks [link]
Ravid Shwartz-Ziv
PhD Thesis, 2022

3. Theories

Gaussian Lower Bound for the Information Bottleneck Limit [link]
Amichai Painsky, Naftali Tishby
JMLR, 2017

Information-theoretic analysis of generalization capability of learning algorithms [link]
Aolin Xu, Maxim Raginsky
NeurIPS, 2017

Caveats for information bottleneck in deterministic scenarios [link] [ICLR version]
Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk
UAI, 2018

🐤🔥 Emergence of Invariance and Disentanglement in Deep Representations [link]
Alessandro Achille, Stefano Soatto
JMLR, 2018

  • This paper is a gem. On a high-level, it shows the relationship of generalization and information bottleneck in weights (IIW).
    • Be aware how this differs from Tishby's original definition on information bottleneck in representation).
  • Specifically, if we approximate SGD by stochastic differential equations, we can see that SGD naturally leads to minimization in IIW.
  • The authors argue that an optimal representation should have 4 properties: sufficiency, minimality, invariance, and disentanglement. Notably, the last two properties can naturally emerge with the minimization in mutual information between the datasets and network weights, or IIW.

On the Information Bottleneck Theory of Deep Learning [link]
Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox
ICLR, 2018

The Dual Information Bottleneck [link]
Zoe Piran, Ravid Shwartz-Ziv, Naftali Tishby
Preprint, 2019

🐤 Learnability for the Information Bottleneck [link] [slides] [poster] [journal version] [workshop version]
Tailin Wu, Ian Fischer, Isaac L. Chuang, Max Tegmark
UAI, 2019

🐤 Phase Transitions for the Information Bottleneck in Representation Learning [link] [video]
Tailin Wu, Ian Fischer
ICLR, 2020

Bottleneck Problems: Information and Estimation-Theoretic View [link]
Shahab Asoodeh, Flavio Calmon
Preprint, 2020

Information Bottleneck: Exact Analysis of (Quantized) Neural Networks [link]
Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen
Preprint, 2021

  • This paper shows that different ways of binning when computing the mutual information leads to qualitatively different results.
  • It then confirms then original IB paper's results of the fitting & compression phase using quantized nets with exact computation for mutual information.

Perturbation Theory for the Information Bottleneck [link]
Vudtiwat Ngampruetikorn, David J. Schwab
Preprint, 2021

PAC-Bayes Information Bottleneck [link]
Zifeng Wang, Shao-Lun Huang, Ercan Engin Kuruoglu, Jimeng Sun, Xi Chen, Yefeng Zheng
ICLR, 2022

  • This paper discusses using $I(w, S)$ instead to $I(T, X)$ as the information bottleneck.
  • However, activations should in effect play a crucial role in network's generalization, but they are not explicitly captured by $I(w, S)$.

4. Models

Deep Variational Information Bottleneck [link]
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
ICLR, 2017

The Deterministic Information Bottleneck [link] [UAI Version]
DJ Strouse, David J. Schwab
Neural Computation, 2017

This replaces the mutual information term with entropy in the original IB objective.


Learning Sparse Latent Representations with the Deep Copula Information Bottleneck [link]
Aleksander Wieczorek, Mario Wieser, Damian Murezzan, Volker Roth
ICLR, 2018

Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck [link]
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann
NeurIPS, 2019

Information bottleneck through variational glasses [link]
Slava Voloshynovskiy, Mouad Kondah, Shideh Rezaeifar, Olga Taran, Taras Holotyak, Danilo Jimenez Rezende
NeurIPS Bayesian Deep Learning Workshop, 2019

🐤 Variational Discriminator Bottleneck [link]
Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine
ICLR, 2019

Nonlinear Information Bottleneck [link]
Artemy Kolchinsky, Brendan Tracey, David Wolpert
Entropy, 2019

This formuation shows better performance than VIB.


General Information Bottleneck Objectives and their Applications to Machine Learning [link]
Sayandev Mukherjee
Preprint, 2019

This paper synthesize IB and Predictive IB, and provides a new variational bound.


🐤 Graph Information Bottleneck [link] [code] [slides]
Tailin Wu, Hongyu Ren, Pan Li, Jure Leskovec,
NeurIPS, 2020

🐤 Learning Optimal Representations with the Decodable Information Bottleneck [link]
Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam
NeurIPS, 2020

🐤 Concept Bottleneck Models [link]
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
ICML, 2020

Disentangled Representations for Sequence Data using Information Bottleneck Principle [link] [talk]
Masanori Yamada, Heecheol Kim, Kosuke Miyoshi, Tomoharu Iwata, Hiroshi Yamakawa
ICML, 2020

🐤 IBA: Restricting the Flow: Information Bottlenecks for Attribution [link] [code]
Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf
ICLR, 2020

On the Difference between the Information Bottleneck and the Deep Information Bottleneck [link]
Aleksander Wieczorek, Volker Roth
Entropy, 2020

The Convex Information Bottleneck Lagrangian [link]
Borja Rodríguez Gálvez, Ragnar Thobaben, Mikael Skoglund
Preprint, 2020

The HSIC Bottleneck: Deep Learning without Back-Propagation [link] [code]
Wan-Duo Kurt Ma, J.P. Lewis, W. Bastiaan Kleijn AAAI, 2020

  • This paper uses Hilbert-Schmidt independence criterion (HSIC) as a surrogate to compute mutual information in IB objective.
  • It shows an alternative way to learn a neural network without backpropagation, inspired by the IB principle.

Disentangled Information Bottleneck [link] [code]
Ziqi Pan, Li Niu, Jianfu Zhang, Liqing Zhang
AAAI, 2021

🐤 IB-GAN: Disentangled Representation Learning [link] [code][talk]
Insu Jeon, Wonkwang Lee, Myeongjang Pyeon, Gunhee Kim
AAAI, 2021

This model add additional IB constraint based on InfoGAN.


Deciding What to Learn: A Rate-Distortion Approach [link]
Dilip Arumugam, Benjamin Van Roy
ICML, 2021

🐤 Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization [link]
Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish
Preprint, 2021

Multi-Task Variational Information Bottleneck [link]
Weizhu Qian, Bowei Chen, Yichao Zhang, Guanghui Wen, Franck Gechter
Preprint, 2021

5. Applications (General)

🐤 Analyzing neural codes using the information bottleneck method [link]
Elad Schneidman, Noam Slonim, Naftali Tishby, Rob R. deRuyter van Steveninck, William Bialek
NIPS, 2001

Past-future information bottleneck in dynamical systems [link]
Felix Creutzig, Amir Globerson, Naftali Tishby
Physical Review, 2009

Compressing Neural Networks using the Variational Information Bottleneck [link]
Bin Dai, Chen Zhu, Baining Guo, David Wipf
ICML, 2018

🐤 InfoMask: Masked Variational Latent Representation to Localize Chest Disease [link]
Saeid Asgari Taghanaki, Mohammad Havaei, Tess Berthier, Francis Dutil, Lisa Di Jorio, Ghassan Hamarneh, Yoshua Bengio
MICCAI, 2019

Be aware how this differs from the IBA paper.


Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics [link]
Yihang Wang, João Marcelo Lamim Ribeiro, Pratyush Tiwary
Nature Communications, 2019

Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks [link]
Roman Pogodin, Peter Latham
NeurIPS, 2020

Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification [link]
Lynton Ardizzone, Radek Mackowiak, Carsten Rother, Ullrich Köthe
NeurIPS, 2020

Unsupervised Speech Decomposition via Triple Information Bottleneck [link] [code]
Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson, David Cox
ICML, 2020

Learning Efficient Multi-agent Communication: An Information Bottleneck Approach [link]
Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich
ICML, 2020

🐤 Inserting Information Bottlenecks for Attribution in Transformers [link]
Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin
EMNLP, 2020

Information Bottleneck for Estimating Treatment Effects with Systematically Missing Covariates [link]
Sonali Parbhoo, Mario Wieser, Aleksander Wieczorek, and Volker Roth
Entropy, 2020

Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding [link]
Yigit Ugur, George Arvanitakis, Abdellatif Zaidi
Entropy, 2020

Learning to Learn with Variational Information Bottleneck for Domain Generalization [link]
Yingjun Du, Jun Xu, Huan Xiong, Qiang Qiu, Xiantong Zhen, Cees G. M. Snoek, Ling Shao
ECCV, 2020

The information bottleneck and geometric clustering [link]
DJ Strouse, David J Schwab
Preprint, 2020

Causal learning with sufficient statistics: an information bottleneck approach [link]
Daniel Chicharro, Michel Besserve, Stefano Panzeri
Preprint, 2020

Learning Robust Representations via Multi-View Information Bottleneck [link]
Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata
Preprint, 2020

🐤 Information Bottleneck Disentanglement for Identity Swapping [link]
Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He
CVPR, 2021

A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition [link]
Ayush Srivastava, Oshin Dutta, Jigyasa Gupta, Sumeet Agarwal, Prathosh AP
WACV, 2021

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget [link]
Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine
ICLR, 2020

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning [link]
Rabeeh Karimi mahabadi, Yonatan Belinkov, James Henderson
ICLR, 2021

Dynamic Bottleneck for Robust Self-Supervised Exploration [link]
Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang
NeurIPS, 2021

Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck [link] [talk]
Junho Kim, Byung-Kwan Lee, Yong Man Ro NeurIPS, 2021

Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness [link] [talk]
Zifeng Wang, Tong Jian, Aria Masoomi, Stratis Ioannidis, Jennifer Dy
NeurIPS, 2021

A Variational Information Bottleneck Approach to Multi-Omics Data Integration [link]
Changhee Lee, Mihaela van der Schaar
AISTATS, 2021

Information Bottleneck Approach to Spatial Attention Learning [link]
Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu
IJCAI, 2021

Unsupervised Hashing with Contrastive Information Bottleneck [link]
Zexuan Qiu, Qinliang Su, Zijing Ou, Jianxing Yu, Changyou Chen
IJCAI, 2021

Neuron Campaign for Initialization Guided by Information Bottleneck Theory [link]
Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang
CIKM, 2021

Information Theoretic Meta Learning with Gaussian Processes [link]
Michalis K. Titsias, Francisco J. R. Ruiz, Sotirios Nikoloutsopoulos, Alexandre Galashov
UAI, 2021

A Closer Look at the Adversarial Robustness of Information Bottleneck Models [link]
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal
ICML Workshop on A Blessing in Disguise, 2021

Information Bottleneck Attribution for Visual Explanations of Diagnosis and Prognosis [link]
Ugur Demir, Ismail Irmakci, Elif Keles, Ahmet Topcu, Ziyue Xu, Concetto Spampinato, Sachin Jambawalikar, Evrim Turkbey, Baris Turkbey, Ulas Bagci
Preprint, 2021

State Predictive Information Bottleneck [link] [code]
Dedi Wang, Pratyush Tiwary
Preprint, 2021

Disentangled Variational Information Bottleneck for Multiview Representation Learning [link] [code]
Feng Bao
Preprint, 2021

Invariant Information Bottleneck for Domain Generalization [link]
Bo Li, Yifei Shen, Yezhen Wang, Wenzhen Zhu, Colorado J. Reed, Jun Zhang, Dongsheng Li, Kurt Keutzer, Han Zhao
Preprint, 2021

Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning [link]
Yue Jin, Shuangqing Wei, Jian Yuan, Xudong Zhang
Preprint, 2021

Generalization in Quantum Machine Learning: a Quantum Information Perspective [link]
Leonardo Banchi, Jason Pereira, Stefano Pirandola
Preprint, 2021

Causal Effect Estimation using Variational Information Bottleneck [link]
Zhenyu Lu, Yurong Cheng, Mingjun Zhong, George Stoian, Ye Yuan, Guoren Wang
Preprint, 2021

A Closer Look at the Adversarial Robustness of Information Bottleneck Models [link]
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal
ICML Workshop on A Blessing in Disguise, 2021

🐤 Neuron Campaign for Initialization Guided by Information Bottleneck Theory [link]
Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang
CIKM, 2021

Improving Subgraph Recognition with Variational Graph Information Bottleneck [link]
Junchi Yu, Jie Cao, Ran He
CVPR, 2022

Graph Structure Learning with Variational Information Bottleneck [link]
Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Xingcheng Fu, Cheng Ji, Philip S. Yu
AAAI, 2022

Renyi Fair Information Bottleneck for Image Classification [link]
Adam Gronowski, William Paul, Fady Alajaji, Bahman Gharesifard, Philippe Burlina
Preprint, 2022

The Distributed Information Bottleneck reveals the explanatory structure of complex systems [link]
Kieran A. Murphy, Dani S. Bassett
Preprint, 2021

Sparsity-Inducing Categorical Prior Improves Robustness of the Information Bottleneck [link]
Anirban Samaddar, Sandeep Madireddy, Prasanna Balaprakash
Preprint, 2022

Pareto-optimal clustering with the primal deterministic information bottleneck [link]
Andrew K. Tan, Max Tegmark, Isaac L. Chuang
Preprint, 2022

Information-Theoretic Odometry Learning [link]
Sen Zhang, Jing Zhang, Dacheng Tao
Preprint, 2022

Causal Effect Estimation using Variational Information Bottleneck [link]
Zhenyu Lu, Yurong Cheng, Mingjun Zhong, George Stoian, Ye Yuan, Guoren Wang
Preprint, 2022

6. Applications (RL)

InfoBot: Transfer and Exploration via the Information Bottleneck [paper] [code]
Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine
ICLR, 2019

The idea is simply to constrain the dependence on a certain goal, so that the agent can learn a default behavior.


Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck [link] [code] [talk]
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann
NeurIPS, 2019

Learning Task-Driven Control Policies via Information Bottlenecks [link] [spotlight talk]
Vincent Pacelli, Anirudha Majumdar
RSS, 2020

🐤 The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach [journal '20] [arxiv '18] Iulian Vlad Serban, Chinnadhurai Sankar, Michael Pieper, Joelle Pineau, Yoshua Bengio
Journal of Artificial Intelligence Research (JAIR), 2020

Learning Robust Representations via Multi-View Information Bottleneck [link] [code] [talk]
Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata
ICLR, 2020

DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [paper] [code]
Jiameng Fan, Wenchao Li
ICML, 2022

Learning Representations in Reinforcement Learning: an Information Bottleneck Approach [link] [code]
Yingjun Pei, Xinwen Hou
Rejected by ICLR, 2020

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [link]
Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin
ArXiv, 2020

Dynamic Bottleneck for Robust Self-Supervised Exploration [paper] [code]
Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye HAO, Peng Liu, Zhaoran Wang
NeurIPS, 2021

Regret Bounds for Information-Directed Reinforcement Learning [paper]
Botao Hao, Tor Lattimore
ArXiv, 2022

7. Methods for Mutual Information Estimation

😣😣😣 Mutual information is notoriously hard to estimate!

🐤 Benchmarking Mutual Information [link] [code] [doc]
Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx
NeurIPS, 2023

Variational f-Divergence and Derangements for Discriminative Mutual Information Estimation [link] [code]
Nunzio A. Letizia, Nicola Novello, Andrea M. Tonello
ArXiv, 2023

Estimating Mutual Information [link] [code]
Alexander Kraskov, Harald Stoegbauer, Peter Grassberger
Physical Review, 2004

Efficient Estimation of Mutual Information for Strongly Dependent Variables [link] [code]
Shuyang Gao, Greg Ver Steeg, Aram Galstyan
AISTATS, 2015

  • This shows that KNN-based estimators requires number of samples which scales exponentially with the true MI; that is, they become inaccurate as MI gets large.
  • Thus, as the relationship become more dependent, the MI estimation becomes more inaccurate. Or in other words, KNN-based estimators are only good at detecting independence of variables.

🐤 MINE: Mutual Information Neural Estimation [link] [code]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm
ICML, 2018

Evaluating Capability of Deep Neural Networks for Image Classification via Information Plane [link] [code]
Hao Cheng, Dongze Lian, Shenghua Gao, Yanlin Geng
ECCV, 2018

🐤 InfoMax: Learning Deep representations by Mutual Information Estimation and Maximization [link] [code]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio
ICLR, 2019 (Oral)

🐤 On Variational Bounds of Mutual Information [link] [PyTorch]
Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker
ICML, 2019

🐤 Estimating Information Flow in Deep Neural Networks [link] [PyTorch]
Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy
ICML, 2019

Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling [link] [code]
Sina Molavipour, Germán Bassi, Mikael Skoglund
Preprint, 2020

CCMI: Classifier based Conditional Mutual Information Estimation [link] [code]
Sudipto Mukherjee, Himanshu Asnani, Sreeram Kannan
UAI, 2020

MIGE: Mutual Information Gradient Estimation for Representation Learning [link] [code]
Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu
ICLR, 2020

🐤 Information Bottleneck: Exact Analysis of (Quantized) Neural Networks [link]
Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen
Preprint, 2021

  • This paper shows that different ways of binning when computing the mutual information leads to qualitatively different results.
  • It then confirms then original IB paper's results of the fitting & compression phase using quantized nets with exact computation for mutual information.

🐤 Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [link] [code]
Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao
Preprint, 2021

Entropy and mutual information in models of deep neural networks [link]
Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
NeurIPS, 2018

🐤 Understanding the Limitations of Variational Mutual Information Estimators [link] [PyTorch]
Jiaming Song, Stefano Ermon
ICLR, 2020

  • This implementation includes InfoNCE, NWJ, NWJ-JS, MINE, and their own method SMILE.
  • Basically, they show that the variance of traditional MI estimation can grow exponentially with true MI. In other words, just as KNN estimators, the more dependent (the higher MI), the less accurate.
  • Also, those estimators does not satisfy some important self-consistency properties, such as data processing inequality.
  • They propose SMILE which aims to reduce the variance issue.

🐤🐤 Sliced Mutual Information: A Scalable Measure of Statistical Dependence [link]
Ziv Goldfeld, Kristjan Greenewald
NeurIPS, 2021 (spotlight)

🐤 TImproving Mutual Information Estimation with Annealed and Energy-Based Bounds [link]
Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao
ICLR, 2022

Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [link] [code]
Danqi Liao*, Chen Liu*, Benjamin W Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy
ICML Workshop, 2023

This paper leverages diffusion geometry to estimate Entropy and MI in high dimensional representations of modern neural networks.


8. Other Information Theory Driven Work

f-GANs in an Information Geometric Nutshell [link]
Richard Nock, Zac Cranko, Aditya K. Menon, Lizhen Qu, Robert C. Williamson
NeurIPS, 2017

Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach [link]
Roel Dobbe, David Fridovich-Keil, Claire Tomlin
NeurIPS, 2017

Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications [link]
Linus Hamilton, Frederic Koehler, Ankur Moitra
NeurIPS, 2017

Information-theoretic analysis of generalization capability of learning algorithms [link]
Aolin Xu, Maxim Raginsky
NeurIPS, 2017

Learning Discrete Representations via Information Maximizing Self-Augmented Training [link]
Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
ICML, 2017

🐣 Nonparanormal Information Estimation [link]
Shashank Singh, Barnabás Póczos
ICML, 2017

This paper shows how to robustly estimate mutual information using i.i.d. samples from unknown distribution.


Entropy and mutual information in models of deep neural networks [link]
Marylou Gabrié, Andre Manoel, Clément Luneau, jean barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
NeurIPS, 2018

Chaining Mutual Information and Tightening Generalization Bounds [link]
Amir Asadi, Emmanuel Abbe, Sergio Verdu
NeurIPS, 2018

Information Constraints on Auto-Encoding Variational Bayes [link]
Romain Lopez, Jeffrey Regier, Michael I. Jordan, Nir Yosef
NeurIPS, 2018

Adaptive Learning with Unknown Information Flows [link]
Yonatan Gur, Ahmadreza Momeni
NeurIPS, 2018

Information-based Adaptive Stimulus Selection to Optimize Communication Efficiency in Brain-Computer Interfaces [link]
Boyla Mainsah, Dmitry Kalika, Leslie Collins, Siyuan Liu, Chandra Throckmorton
NeurIPS, 2018

Information Theoretic Guarantees for Empirical Risk Minimization with Applications to Model Selection and Large-Scale Optimization [link]
Ibrahim Alabdulmohsin
ICML, 2018

Mutual Information Neural Estimation [link]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
ICML, 2018

Learning to Explain: An Information-Theoretic Perspective on Model Interpretation [link]
Jianbo Chen, Le Song, Martin Wainwright, Michael Jordan
ICML, 2018

Fast Information-theoretic Bayesian Optimisation [link]
Binxin Ru, Michael A. Osborne, Mark Mcleod, Diego Granziol
ICML, 2018

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond [link]
Lin Chen, Hossein Esfandiari, Gang Fu, Vahab Mirrokni
NeurIPS, 2019

Information-Theoretic Confidence Bounds for Reinforcement Learning [link]
Xiuyuan Lu, Benjamin Van Roy
NeurIPS, 2019

L-DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise [link]
Yilun Xu, Peng Cao, Yuqing Kong, Yizhou Wang
NeurIPS, 2019

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio [link]
Julian Zimmert, Tor Lattimore
NeurIPS, 2019

Region Mutual Information Loss for Semantic Segmentation [link]
Shuai Zhao, Yang Wang, Zheng Yang, Deng Cai
NeurIPS, 2019

Learning Representations by Maximizing Mutual Information Across Views [link]
Philip Bachman, R Devon Hjelm, William Buchwalter
NeurIPS, 2019

Icebreaker: Element-wise Efficient Information Acquisition with a Bayesian Deep Latent Gaussian Model [link]
Wenbo Gong, Sebastian Tschiatschek, Sebastian Nowozin, Richard E. Turner, José Miguel Hernández-Lobato, Cheng Zhang
NeurIPS, 2019

Thompson Sampling with Information Relaxation Penalties [link]
Seungki Min, Costis Maglaras, Ciamac C. Moallemi
NeurIPS, 2019

InfoMax: Learning deep representations by mutual information estimation and maximization [link][code]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio
ICLR, 2019

Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds [link]
Peng Cao, Yilun Xu, Yuqing Kong, Yizhou Wang
ICLR, 2019

Information-Directed Exploration for Deep Reinforcement Learning [link]
Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause
ICLR, 2019

Soft Q-Learning with Mutual-Information Regularization [link]
Jordi Grau-Moya, Felix Leibfried, Peter Vrancx
ICLR, 2019

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization [link]
Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
ICLR, 2019

Information Asymmetry in KL-regularized RL [link]
Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess
ICLR, 2019

Adaptive Estimators Show Information Compression in Deep Neural Networks [link]
Ivan Chelombiev, Conor Houghton, Cian O'Donnell
ICLR, 2019

Information Theoretic lower bounds on negative log likelihood [link]
Luis A. Lastras-Montaño
ICLR, 2019

New results on information theoretic clustering [link] [code]
Ferdinando Cicalese, Eduardo Laber, Lucas Murtinho
ICML, 2019

Estimating Information Flow in Deep Neural Networks [link]
Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy
ICML, 2019

🐣 The information-theoretic value of unlabeled data in semi-supervised learning [link]
Alexander Golovnev, David Pal, Balazs Szorenyi
ICML, 2019

EMI: Exploration with Mutual Information [link] [code]
Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song
ICML, 2019

🐣 On Variational Bounds of Mutual Information [link]
Ben Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, George Tucker
ICML, 2019

Where is the Information in a Deep Neural Network? [link]
Alessandro Achille, Giovanni Paolini, Stefano Soatto
Preprint, 2020

Information Maximization for Few-Shot Learning [link]
Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, Jose Dolz, Pablo Piantanida, Ismail Ben Ayed
NeurIPS, 2020

Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information [link]
Genevieve Flaspohler, Nicholas A. Roy, John W. Fisher III
NeurIPS, 2020

Predictive Information Accelerates Learning in RL [link]
Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama
NeurIPS, 2020

"The predictive information is the mutual information between the past and the future, $I(X_{\text{past}}; X_{\text{future}})$."


Information Theoretic Regret Bounds for Online Nonlinear Control [link]
Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun
NeurIPS, 2020

Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds [link]
Hassan Hafez-Kolahi, Zeinab Golgooni, Shohreh Kasaei, Mahdieh Soleymani
NeurIPS, 2020

Variational Interaction Information Maximization for Cross-domain Disentanglement [link]
HyeongJoo Hwang, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim
NeurIPS, 2020

Information theoretic limits of learning a sparse rule [link]
Clément Luneau, jean barbier, Nicolas Macris
NeurIPS, 2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks [link]
Ryo Karakida, Kazuki Osawa
NeurIPS, 2020

🐣 On Mutual Information Maximization for Representation Learning [link]
Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic
ICLR, 2020

🐣 Understanding the Limitations of Variational Mutual Information Estimators [link]
Jiaming Song, Stefano Ermon
ICLR, 2020

Expected Information Maximization: Using the I-Projection for Mixture Density Estimation [link]
Philipp Becker, Oleg Arenz, Gerhard Neumann
ICLR, 2020

Mutual Information Gradient Estimation for Representation Learning [link]
Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu
ICLR, 2020

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization [link]
Fan-Yun Sun, Jordan Hoffman, Vikas Verma, Jian Tang
ICLR, 2020

A Mutual Information Maximization Perspective of Language Representation Learning [link]
Lingpeng Kong, Cyprien de Masson d'Autume, Lei Yu, Wang Ling, Zihang Dai, Dani Yogatama
ICLR, 2020

CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [link] [code]
Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin
ICML, 2020

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains [link] [code]
Johannes Fischer, Ömer Sahin Tas
ICML, 2020

Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation [link] [code]
Steven Kleinegesse, Michael U. Gutmann
ICML, 2020

FR-Train: A Mutual Information-Based Approach to Fair and Robust Training [link] [code]
Yuji Roh, Kangwook Lee, Steven Whang, Changho Suh
ICML, 2020

Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information [link] [code]
Karl Stratos, Sam Wiseman
ICML, 2020

Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective [link]
Ruixiang Zhang, Masanori Koyama, Katsuhiko Ishiguro
ICML, 2020

Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization [link] [code]
Sicheng Zhu, Xiao Zhang, David Evans
ICML, 2020

Usable Information and Evolution of Optimal Representations During Training [link]
Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan Kao
ICLR, 2021

Domain-Robust Visual Imitation Learning with Mutual Information Constraints [link]
Edoardo Cetin, Oya Celiktutan
ICLR, 2021

Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning [link]
Kanil Patel, William H. Beluch, Bin Yang, Michael Pfeiffer, Dan Zhang
ICLR, 2021

Graph Information Bottleneck for Subgraph Recognition [link]
Junchi Yu, Tingyang Xu, Yu Rong, Yatao Bian, Junzhou Huang, Ran He
ICLR, 2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [link]
Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
ICLR, 2021

Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information [link] [slides]
Willie Neiswanger, Ke Alexander Wang, Stefano Ermon
ICML, 2021

Decomposed Mutual Information Estimation for Contrastive Representation Learning [link]
Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Philip Bachman, Remi Tachet Des Combes
ICML, 2021

ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction [link] [code]
Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma
Preprint, 2021

Intelligence, physics and information – the tradeoff between accuracy and simplicity in machine learning [link]
Tailin Wu
PhD Thesis, 2021

The Information Geometry of Unsupervised Reinforcement Learning [link]
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
Preprint, 2021

9. Citation

If you would like to cite this repository 🐣:

@misc{git2022ib,
      title = {Awesome Information Bottleneck},
      author = {Ziyu Ye},
      howpublished = {\url{https://github.com/ZIYU-DEEP/Awesome-Information-Bottleneck}},
      year = 2022}