Learning to Make Decisions in Statistical and Strategic Environments

Speaker Zhengyuan Zhou, Stanford University
Date: 2/11/2019
Time: 10 a.m.

303 Transportation Building


ISE Grad Seminar

Event Type: Seminar/Symposium

     Data-driven decision making, lying at the intersection between learning and decision making, has emerged as an important paradigm in operations research and data science. In the first part of the talk, we focus on data-driven decision making in a statistical environment. In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. Here we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then discuss optimization schemes for implementing decision-tree based policies. 

     In the second part of the talk, we consider a model of multi-agent online strategic decision making, in which the reward structures of agents are given by a general continuous game. After introducing a general equilibrium stability notion for continuous games, called variational stability, we examine the well-known online mirror descent (OMD) learning algorithm and show that OMD converges to variationally stable Nash equilibria. Subsequently, by developing various algorithmic variants, we show that convergence to Nash equilibria still holds even in the presence of severely imperfect information, including noise, (fully asynchronous and unbounded) delays and loss. We also discuss two applications of this theoretical framework: one in distributed stochastic optimization and the other in power management. 

To request disability-related accommodations for this event, please contact the person listed above, or the unit hosting the event.