Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent
Published in preprint, 2022
Recommended citation: Chen, Xi, Zehua Lai, He Li, and Yichen Zhang. "Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent." preprints (2022). http://laizehua.github.io/files/bandit.pdf
Abstract:
With the fast development of big data, it has been easier than before to learn the optimal decision rule by updating the decision rule recursively and making online decisions. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow dierent weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator signicantly improves the asymptotic ef- ciency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to traditional SGD due to the adaptive data collection.
The paper is available in http://laizehua.github.io/files/bandit.pdf.