#include "probfd/algorithms/interval_iteration.h"

Inheritance diagram for probfd::algorithms::interval_iteration::IntervalIteration< State, Action >:

Description

template<typename State, typename Action>
class probfd::algorithms::interval_iteration::IntervalIteration< State, Action >

Implemention of interval iteration [7].

While classical value iteration algorithms converge against the optimal value function in a mathematical sense, it is not clear how a termination condition can be derived that ensures a fixed error bound on the computed value function. Interval iteration remedies this issue by performing two value iterations in parallel, starting from a lower and upper bound respectively, and stopping when both bounds are less than epsilon away from each other.

Interval iteration consists of two steps:

Build the MEC decomposition of the underlying MDP to ensure convergence from any initial value function.
Performs two value iterations in parallel, one from an initial lower bound and one from an initial upper bound.

The respective sequences of value functions are adjacent sequences. Interval iteration stops when the lower and upper bounding value functions are less than epsilon away, ensuring that any of the value functions is at most epsilon away from the optimal value function.

Note: This implementation outputs the values of the upper bounding value function.

See also: EndComponentDecomposition

Template Parameters

State	The state type of the underlying MDP model.
Action	The action type of the underlying MDP model.

Public Member Functions
void	print_statistics (std::ostream &out) const override
	Prints algorithm statistics to the specified output stream.

virtual std::unique_ptr< PolicyType >	compute_policy (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double maxtime)=0
	Computes a partial policy for the input state.

virtual Interval	solve (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double max_time)=0
	Runs the MDP algorithm for the initial state `state` with a maximum time limit.

Member Function Documentation

◆ print_statistics()

template<typename State , typename Action >

void probfd::algorithms::interval_iteration::IntervalIteration< State, Action >::print_statistics ( std::ostream & ) const

overridevirtual

Prints algorithm statistics to the specified output stream.

Reimplemented from probfd::MDPAlgorithm< State, Action >.

◆ compute_policy()

template<typename State , typename Action >

virtual std::unique_ptr< PolicyType > probfd::MDPAlgorithm< State, Action >::compute_policy	(	MDPType &	mdp,
		EvaluatorType &	heuristic,
		param_type< State >	state,
		ProgressReport	progress,
		double	maxtime )

pure virtualinherited

Computes a partial policy for the input state.

◆ solve()

template<typename State , typename Action >

virtual Interval probfd::MDPAlgorithm< State, Action >::solve	(	MDPType &	mdp,
		EvaluatorType &	heuristic,
		param_type< State >	state,
		ProgressReport	progress,
		double	max_time )

pure virtualinherited

Runs the MDP algorithm for the initial state state with a maximum time limit.

Description

Public Member Functions

Member Function Documentation

◆ print_statistics()

◆ compute_policy()

◆ solve()