There are at least two distinct issues here: one about the optimal policy, and another about transparency.
Again, one can always find an optimal policy amongst stationary policies.
Market monetarists maintain a nominal income target is the optimal monetary policy.
When such an action-value function is learned, the optimal policy can be constructed by simply selecting the action with the highest value in each state.
The optimal policy is obtained by optimizing the long-term reward.
"Root-canal economics also does not constitute optimal economic policy."
To compute an actual description of an optimal policy may be a harder problem.
The levels of the actual policies and the optimal policies are shown in Table 4.
In fact, for every the probability of best choice with the optimal policy is at least .
The optimal policy for the problem is a stopping rule.