Week 22: Markov Decision Process | Notion

学习内容：

顺序决策问题
奖励、效用和政策
价值迭代
政策迭代
强化学习

Untitled

Untitled

Definition:

MDP formulation

MDP需要一个结构来跟踪决策序列。

Untitled

Policy iteration(政策迭代)

A solution should describe what the robot does in every state:

this is called a policy, π.

Untitled

每次从初始状态开始执行一个给定的政策，环境的随机性可能导致不同的环境历史。