#include "probfd/algorithms/topological_value_iteration.h"

Inheritance diagram for probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >:

Description

template<typename State, typename Action, bool UseInterval = false>
class probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >

Implements Topological Value Iteration [5].

This algorithm performs value iteration on each strongly connected component of the MDP in reverse topological order. This implementation exhaustively explores the entire reachable state space and computes a full optimal state value table by default. A heuristic can be supplied to explicitly prune parts of the state space or to accelerate convergence.

This implementation also supports value intervals. However, convergence is not guaranteed with value intervals if traps exist within the reachable state space. In this case, traps must be removed prior to running topological value iteration, or the trap-aware variant TATopologicalValueIteration of this algorithm must be used, which eliminates as traps on-the-fly to guarantee convergence.

See also: interval_iteration::IntervalIteration; ta_topological_value_iteration::TATopologicalValueIteration

Template Parameters

State	- The state type of the underlying MDP model.
Action	- The action type of the underlying MDP model.
UseInterval	- Whether value intervals are used.

Public Member Functions
void	print_statistics (std::ostream &out) const override
	Prints algorithm statistics to the specified output stream.

Statistics	get_statistics () const
	Retreive the algorithm statistics.

template<typename ValueStore >
Interval	solve (MDPType &mdp, EvaluatorType &heuristic, StateID init_state_id, ValueStore &value_store, double max_time=std::numeric_limits< double >::infinity(), MapPolicy *policy=nullptr)
	Runs the algorithm with the supplied state value storage.

virtual std::unique_ptr< PolicyType >	compute_policy (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double maxtime)=0
	Computes a partial policy for the input state.

virtual Interval	solve (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double max_time)=0
	Runs the MDP algorithm for the initial state `state` with a maximum time limit.

Member Function Documentation

◆ print_statistics()

template<typename State , typename Action , bool UseInterval>

void probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::print_statistics ( std::ostream & ) const

overridevirtual

Prints algorithm statistics to the specified output stream.

Reimplemented from probfd::MDPAlgorithm< State, Action >.

◆ get_statistics()

template<typename State , typename Action , bool UseInterval>

Statistics probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::get_statistics ( ) const

nodiscard

Retreive the algorithm statistics.

◆ solve() [1/2]

template<typename State , typename Action , bool UseInterval>

template<typename ValueStore >

Interval probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::solve	(	MDPType &	mdp,
		EvaluatorType &	heuristic,
		StateID	init_state_id,
		ValueStore &	value_store,
		double	max_time = std::numeric_limits<double>::infinity(),
		MapPolicy *	policy = nullptr )

Runs the algorithm with the supplied state value storage.

Computes the full optimal value function for the entire state space reachable from initial_state. Stores the state values in the output parameter value_store. Returns the value of the initial state.

◆ compute_policy()

template<typename State , typename Action >

virtual std::unique_ptr< PolicyType > probfd::MDPAlgorithm< State, Action >::compute_policy	(	MDPType &	mdp,
		EvaluatorType &	heuristic,
		param_type< State >	state,
		ProgressReport	progress,
		double	maxtime )

pure virtualinherited

Computes a partial policy for the input state.

◆ solve() [2/2]

template<typename State , typename Action >

virtual Interval probfd::MDPAlgorithm< State, Action >::solve	(	MDPType &	mdp,
		EvaluatorType &	heuristic,
		param_type< State >	state,
		ProgressReport	progress,
		double	max_time )

pure virtualinherited

Runs the MDP algorithm for the initial state state with a maximum time limit.

Description

Public Member Functions

Member Function Documentation

◆ print_statistics()

◆ get_statistics()

◆ solve() [1/2]

◆ compute_policy()

◆ solve() [2/2]