publications | Eric Zhao

2024

Preprint

Stacking as Accelerated Gradient Descent

Naman Agarwal, Pranjal Awasthi, Satyen Kale, and Eric Zhao

Working paper, last update: Feb 2024. (Preprint)
Feb 2024

Abs arXiv PDF

Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks. In this paper, we propose a theoretical explanation for the efficacy of stacking: viz., stacking implements a form of Nesterov’s accelerated gradient descent. The theory also covers simpler models such as the additive ensembles constructed in boosting methods, and provides an explanation for a similar widely-used practical heuristic for initializing the new classifier in each round of boosting. We also prove that for certain deep linear residual networks, stacking does provide accelerated training, via a new potential function analysis of the Nesterov’s accelerated gradient method which allows errors in updates. We conduct proof-of-concept experiments to validate our theory as well.

2023

COLT 2023

Open Problem: The Sample Complexity of Multi-Distribution Learning for VC Classes

Pranjal Awasthi, Nika Haghtalab, and Eric Zhao* Alphabetically ordered author names.

In Proceedings of the 36th Annual Conference on Learning Theory. (COLT 2023)
Jul 2023

Abs arXiv PDF

Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on k distributions to be O(ε^-2 \ln(k)(d + k) + \min{ε^-1 dk, ε^-4 \ln(k) d}), the best lower bound is Ω(ε^-2 (d + k \ln(k))). We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
Neurips 2023

A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning

Nika Haghtalab, Michael Jordan, and Eric Zhao* Alphabetically ordered author names.

In Proceedings of the 37th Annual Conference on Neural Information Processing Systems. (Neurips 2023)
Feb 2023

Abs arXiv PDF

We provide a unifying framework for the design and analysis of multicalibrated predictors. By placing the multicalibration problem in the general setting of multi-objective learning – where learning guarantees must hold simultaneously over a set of distributions and loss functions – we exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multicalibration learning problems. In addition to shedding light on existing multicalibration guarantees and greatly simplifying their analysis, our approach also yields improved guarantees, such as obtaining stronger multicalibration conditions that scale with the square-root of group size and improving the complexity of k-class multicalibration by an exponential factor of k. Beyond multicalibration, we use these game dynamics to address emerging considerations in the study of group fairness and multi-distribution learning.

2022

Neurips 2022

On-Demand Sampling: Learning Optimally from Multiple Distributions

Nika Haghtalab, Michael Jordan, and Eric Zhao* Alphabetically ordered author names.

In Proceedings of the 36th Annual Conference on Neural Information Processing Systems. (Neurips 2022)
May 2022

Outstanding Paper Award (Top 0.5% of accepted papers).

Abs arXiv PDF

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative, group distributionally robust, and fair federated learning. In each of these settings, a learner seeks to minimize its worst-case loss over a set of n predefined distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms thatmeet this sample complexity. Importantly, our sample complexity bounds exceed that of the sample jcomplexity of learning a single distribution only by an additive factor of n \log(n) / ε^2. These improve upon the best known sample complexity of agnostic federated learning by Mohri et al. by a multiplicative factor of n, the sample complexity of collaborative learning by Nguyen and Zakynthinou by a multiplicative factor \log n / ε^3, and give the first sample complexity bounds for the group DRO objective of Sagawa et al.. To achieve optimal sample complexity, our algorithms learn to sample and learn from distributions on demand. Our algorithm design and analysis is enabled by our extensions of stochastic optimization techniques for solving stochastic zero-sum games. In particular, we contribute variants of Stochastic Mirror Descent that can trade off between players’ access to cheap one-off samples or more expensive reusable ones.
Scaling Bias Mitigation with Multiple Fairness Tasks and Multiple Protected Attributes

Eric Zhao, De-An Huang, Hao Liu, Zhiding Yu, Anqi Liu, Olga Russakovsky, and Anima Anandkumar

Working paper, last update: May 2022. (Preprint)
May 2022

Abs PDF

Bias mitigation methods are commonly evaluated with a single fairness task, which aims to reduce performance disparity with respect to a single protected attribute (e.g., gender) while maintaining predictive performance for target labels (e.g., is-cooking). In this work, we question whether this mode of evaluation provides reliable insights into the effectiveness of bias mitigation methods. First, there are multiple protected attributes in real-world applications, such as skin color, gender and age. Second, we find that the results of these studies vary greatly depending on the choice of fairness task for evaluation. We address these shortcomings by first evaluating bias mitigation methods on the CelebA dataset on 54 different fairness tasks, which involve various selections and intersections of multiple protected attributes. Our thorough analysis shows that simple importance weighting is still a consistently competitive method for bias mitigation. We then extend our analysis to ImageNet’s People Subtree, which poses qualitatively different real-world challenges than CelebA: having hundreds of protected groups while fewer than 10% of the training dataset has protected attribute labels. We find that strategies to reduce model complexity are important in this scenario. We show that leveraging these insights can reduce the bias amplification of empirical risk minimization by 28% on ImageNet’s People Subtree.

2021

Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents

Eric Zhao, Alexander R. Trott, Caiming Xiong, and Stephan Zheng

In Proceedings of the 37th AAAI Conference on Artificial Intelligence. (AAAI 2023)
Jun 2021

Abs arXiv PDF

We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic responses of other agents, which is generally NP-hard. However, we show that no-regret dynamics can identify these worst-case responses in poly-time in smooth games. We propose a framework that uses this policy evaluation method for efficiently learning a robust principal policy using RL. This framework can be extended to provide robustness to boundedly rational agents too. Our motivating application is automated mechanism design: we empirically demonstrate our framework learns robust mechanisms in both matrix games and complex spatiotemporal games. In particular, we learn a dynamic tax policy that improves the welfare of a simulated trade-and-barter economy by 15%, even when facing previously unseen boundedly rational RL taxpayers.
Active Learning under Label Shift

Eric Zhao, Anqi Liu, Animashree Anandkumar, and Yisong Yue

In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. (AISTATS 2021)
Feb 2021

Abs arXiv PDF

We address the problem of active learning under label shift: when the class proportions of source and target domains differ. We introduce a "medial distribution" to incorporate a tradeoff between importance weighting and class-balanced sampling and propose their combined usage in active learning. Our method is known as Mediated Active Learning under Label Shift (MALLS). It balances the bias from class-balanced sampling and the variance from importance weighting. We prove sample complexity and generalization guarantees for MALLS which show active learning reduces asymptotic sample complexity even under arbitrary label shift. We empirically demonstrate MALLS scales to high-dimensional datasets and can reduce the sample complexity of active learning by 60% in deep active learning tasks.