Shuwa Miura has shown** that there may exist no scalar Markov reward function that characterizes some set of acceptable policies but that a set of deterministic policies can always be characterized by multidimensional Markov rewards.
* See Capitalism is Wrong, chapter 2 of my book Twelve Papers, on my website www.robert-w-jones.com under "Book" and my blog of 19 February 2011.
** On the Expressivity of Multidimensional Markov Rewards, arXiv: 2307.12184v1, 22 July 2023.
No comments:
Post a Comment