AI 24/25 Project Software
Documentation for the AI 24/25 course programming project software
|
#include "probfd/algorithms/interval_iteration.h"
Implemention of interval iteration [7].
While classical value iteration algorithms converge against the optimal value function in a mathematical sense, it is not clear how a termination condition can be derived that ensures a fixed error bound on the computed value function. Interval iteration remedies this issue by performing two value iterations in parallel, starting from a lower and upper bound respectively, and stopping when both bounds are less than epsilon away from each other.
Interval iteration consists of two steps:
The respective sequences of value functions are adjacent sequences. Interval iteration stops when the lower and upper bounding value functions are less than epsilon away, ensuring that any of the value functions is at most epsilon away from the optimal value function.
Public Member Functions | |
void | print_statistics (std::ostream &out) const override |
Prints algorithm statistics to the specified output stream. | |
virtual std::unique_ptr< PolicyType > | compute_policy (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double maxtime)=0 |
Computes a partial policy for the input state. | |
virtual Interval | solve (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double max_time)=0 |
Runs the MDP algorithm for the initial state state with a maximum time limit. | |
|
overridevirtual |
Prints algorithm statistics to the specified output stream.
Reimplemented from probfd::MDPAlgorithm< State, Action >.
|
pure virtualinherited |
Computes a partial policy for the input state.
|
pure virtualinherited |
Runs the MDP algorithm for the initial state state
with a maximum time limit.