Markov Decision Processes and Reinforcement Learning

By Martin L. Puterman and Timothy C. Y. Chan

Keywords: Markov decision processes, reinforcement learning, operations research, sequential decision making, decision making under uncertainty, dynamic programming

Abstract: This book provides an accessible introduction to the theory, applications and algorithms of Markov decision processes (MDPs). It delves into the mathematical and algorithmic foundations of decision-making under uncertainty, focusing on problems where trade-offs, uncertainty and dynamic considerations are paramount. By clearly describing the core principles of these decision problems, the book aims to equip readers with the tools to formulate, analyze and solve a wide range of sequential decision problems. The book consists of three parts, each comprised of several chapters. The first part establishes the fundamentals of MDPs, in particular defining the basic model components and optimality criteria, and describing and rigorously formulating a diverse set of applications. The second part covers classical MDP models including finite horizon MDPs, infinite horizon MDPs under discounted, total and average reward optimality criteria, and partially observable MDPs. The final part focuses on approximate dynamic programming and reinforcement learning. It describes the behavioral foundations of RL and its main methods.

The book is in production and will be published by Cambridge University Press in 2026. Pre-prints of the Preface and first five chapters are provided below. These materials are free to view and download for personal use only. Not for re-distribution, re-sale, or use in derivative works. Please reach out if you are an instructor wishing to use these and/or other chapters for teaching. An up-to-date repository is maintained at https://github.com/martyput/MDP_book.