State-Temporal Compression in Reinforcement Learning With the Reward-Restricted Geodesic Metric | PERPUSTAKAAN UNIVERSITAS KATOLIK PARAHYANGAN

Pencarian berdasarkan :

Pencarian terakhir:

Text

State-Temporal Compression in Reinforcement Learning With the Reward-Restricted Geodesic Metric

Guo, Shangqi - Nama Orang; Chen, Feng - Nama Orang; Yan, Qi - Nama Orang; Hu, Xiaolin - Nama Orang; Su, Xin - Nama Orang;

It is difficult to solve complex tasks that involve large state spaces and long-term decision processes by reinforcement learning (RL) algorithms. A common and promising method to address this challenge is to compress a large RL problem into a small one. Towards this goal, the compression should be state-temporal and optimality-preserving (i.e., the optimal policy of the compressed problem should correspond to that of the uncompressed problem). In this paper, we propose a reward-restricted geodesic (RRG) metric, which can be learned by a neural network, to perform state-temporal compression in RL. We prove that compression based on the RRG metric is approximately optimality-preserving for the raw RL problem endowed with temporally abstract actions. With this compression, we design an RRG metric-based reinforcement learning (RRG-RL) algorithm to solve complex tasks. Experiments in both discrete (2D Minecraft) and continuous (Doom) environments demonstrated the superiority of our method over existing RL approaches.

Ketersediaan

Barcode		Tipe Koleksi	Nomor Panggil	Lokasi	Status
art143896	null	Artikel		Gdg9-Lt3	Tersedia namun tidak untuk dipinjamkan - No Loan

Informasi Detail

Judul Seri: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE; Vol.44 No.9 Part 2 September 2022
No. Panggil: -
Penerbit: : .,
Deskripsi Fisik: p. 5572-5589
Bahasa: English
ISBN/ISSN: -
Klasifikasi: NONE
Tipe Isi: -
Tipe Media: -
Tipe Pembawa: -
Edisi: -
Subjek: OPTION
REINFORCEMENT LEARNING (RL)
SEMI-MARKOV DECISION PROCESS (SMDP)
REWARD-RESTRICTED GEODESIC (RRG) METRIC
STATE COMPRESSION
STATE-TEMPORAL COMPRESSION
Info Detail Spesifik: DOI: 10.1109/TPAMI.2021.3069005
Pernyataan Tanggungjawab: Shangqi Guo, Qi Yan, Xin Su, Xiaolin Hu, Feng Chen

Versi lain/terkait

Tidak tersedia versi lain

Lampiran Berkas

Tidak Ada Data

Komentar

Anda harus masuk sebelum memberikan komentar