Text
A Stochastic Composite Augmented Lagrangian Method for Reinforcement Learning
In this paper, we consider the linear programming (LP) formulation for deep reinforcement learning. The number of the constraints depends on the size of state and action spaces, which makes the problem intractable in large or continuous environments. The general augmented Lagrangian method suffers the double-sampling obstacle in solving the linear program. Motivated from the updates of the multipliers, we overcome the obstacles in minimizing the augmented Lagrangian function by replacing the intractable conditional expectations with the multipliers. Therefore, a deep parameterized augmented Lagrangian method is proposed. The replacement provides a promising breakthrough to integrate the two steps in the augmented Lagrangian method into a single quadratic penalty problem. A general theoretical analysis shows that the solutions generated from a sequence of the constrained optimization converge to the optimal solution of the linear program if the error is controlled properly. A theoretical analysis on the quadratic penalty algorithm without using target networks under a neural tangent kernel setting shows the residual can be arbitrarily small if the parameter in the network and the optimization algorithm is chosen suitably. Preliminary experiments illustrate that our method is competitive with other state-of-the-art algorithms.
Barcode | Tipe Koleksi | Nomor Panggil | Lokasi | Status | |
---|---|---|---|---|---|
art146411 | null | Artikel | Gdg9-Lt3 | Tersedia namun tidak untuk dipinjamkan - No Loan |
Tidak tersedia versi lain