AI 24/25 Project Software
Documentation for the AI 24/25 course programming project software
Loading...
Searching...
No Matches
probfd::algorithms::interval_iteration::IntervalIteration< State, Action > Class Template Referenceabstract

#include "probfd/algorithms/interval_iteration.h"

Inheritance diagram for probfd::algorithms::interval_iteration::IntervalIteration< State, Action >:
[legend]

Description

template<typename State, typename Action>
class probfd::algorithms::interval_iteration::IntervalIteration< State, Action >

Implemention of interval iteration [7].

While classical value iteration algorithms converge against the optimal value function in a mathematical sense, it is not clear how a termination condition can be derived that ensures a fixed error bound on the computed value function. Interval iteration remedies this issue by performing two value iterations in parallel, starting from a lower and upper bound respectively, and stopping when both bounds are less than epsilon away from each other.

Interval iteration consists of two steps:

  1. Build the MEC decomposition of the underlying MDP to ensure convergence from any initial value function.
  2. Performs two value iterations in parallel, one from an initial lower bound and one from an initial upper bound.

The respective sequences of value functions are adjacent sequences. Interval iteration stops when the lower and upper bounding value functions are less than epsilon away, ensuring that any of the value functions is at most epsilon away from the optimal value function.

Note
This implementation outputs the values of the upper bounding value function.
See also
EndComponentDecomposition
Template Parameters
StateThe state type of the underlying MDP model.
ActionThe action type of the underlying MDP model.

Public Member Functions

void print_statistics (std::ostream &out) const override
 Prints algorithm statistics to the specified output stream.
 
virtual std::unique_ptr< PolicyType > compute_policy (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double maxtime)=0
 Computes a partial policy for the input state.
 
virtual Interval solve (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double max_time)=0
 Runs the MDP algorithm for the initial state state with a maximum time limit.
 

Member Function Documentation

◆ print_statistics()

template<typename State , typename Action >
void probfd::algorithms::interval_iteration::IntervalIteration< State, Action >::print_statistics ( std::ostream & ) const
overridevirtual

Prints algorithm statistics to the specified output stream.

Reimplemented from probfd::MDPAlgorithm< State, Action >.

◆ compute_policy()

template<typename State , typename Action >
virtual std::unique_ptr< PolicyType > probfd::MDPAlgorithm< State, Action >::compute_policy ( MDPType & mdp,
EvaluatorType & heuristic,
param_type< State > state,
ProgressReport progress,
double maxtime )
pure virtualinherited

Computes a partial policy for the input state.

◆ solve()

template<typename State , typename Action >
virtual Interval probfd::MDPAlgorithm< State, Action >::solve ( MDPType & mdp,
EvaluatorType & heuristic,
param_type< State > state,
ProgressReport progress,
double max_time )
pure virtualinherited

Runs the MDP algorithm for the initial state state with a maximum time limit.