Distributionally Robust Batch Contextual Bandits | PERPUSTAKAAN UNIVERSITAS KATOLIK PARAHYANGAN

Text

Distributionally Robust Batch Contextual Bandits

Si, Nian - Nama Orang; Blanchet, Jose - Nama Orang; Zhang, Fan - Nama Orang; Zhou, Zhengyuan - Nama Orang;

Policy learning using historical observational data are an important problem that has widespread applications. Examples include selecting offers, prices, or advertisements for consumers; choosing bids in contextual first-price auctions; and selecting medication based on patients’ characteristics. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data: an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting data set.

Ketersediaan

Barcode		Tipe Koleksi	Nomor Panggil	Lokasi	Status
art147973	null	Artikel		Gdg9-Lt3	Tersedia namun tidak untuk dipinjamkan - No Loan

Informasi Detail

Judul Seri: MANAGEMENT SCIENCE; Vol.69 No.10 October 2023
No. Panggil: -
Penerbit: : .,
Deskripsi Fisik: p. 5772-5793
Bahasa: English
ISBN/ISSN: -
Klasifikasi: NONE
Tipe Isi: -
Tipe Media: -
Tipe Pembawa: -
Edisi: -
Subjek: PERSONALIZATION
DISTRIBUTIONAL ROBUSTNESS
CONTEXTUAL BANDITS
POLICY LEARNING
Info Detail Spesifik: https://doi.org/10.1287/mnsc.2023.4678
Pernyataan Tanggungjawab: Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

Versi lain/terkait

Tidak tersedia versi lain

Lampiran Berkas

Tidak Ada Data

Komentar

Anda harus masuk sebelum memberikan komentar