AI 24/25 Project Software
Documentation for the AI 24/25 course programming project software
Loading...
Searching...
No Matches
probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval > Class Template Referenceabstract

#include "probfd/algorithms/topological_value_iteration.h"

Inheritance diagram for probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >:
[legend]

Description

template<typename State, typename Action, bool UseInterval = false>
class probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >

Implements Topological Value Iteration [5].

This algorithm performs value iteration on each strongly connected component of the MDP in reverse topological order. This implementation exhaustively explores the entire reachable state space and computes a full optimal state value table by default. A heuristic can be supplied to explicitly prune parts of the state space or to accelerate convergence.

This implementation also supports value intervals. However, convergence is not guaranteed with value intervals if traps exist within the reachable state space. In this case, traps must be removed prior to running topological value iteration, or the trap-aware variant TATopologicalValueIteration of this algorithm must be used, which eliminates as traps on-the-fly to guarantee convergence.

See also
interval_iteration::IntervalIteration
ta_topological_value_iteration::TATopologicalValueIteration
Template Parameters
State- The state type of the underlying MDP model.
Action- The action type of the underlying MDP model.
UseInterval- Whether value intervals are used.

Public Member Functions

void print_statistics (std::ostream &out) const override
 Prints algorithm statistics to the specified output stream.
 
Statistics get_statistics () const
 Retreive the algorithm statistics.
 
template<typename ValueStore >
Interval solve (MDPType &mdp, EvaluatorType &heuristic, StateID init_state_id, ValueStore &value_store, double max_time=std::numeric_limits< double >::infinity(), MapPolicy *policy=nullptr)
 Runs the algorithm with the supplied state value storage.
 
virtual std::unique_ptr< PolicyType > compute_policy (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double maxtime)=0
 Computes a partial policy for the input state.
 
virtual Interval solve (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double max_time)=0
 Runs the MDP algorithm for the initial state state with a maximum time limit.
 

Member Function Documentation

◆ print_statistics()

template<typename State , typename Action , bool UseInterval>
void probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::print_statistics ( std::ostream & ) const
overridevirtual

Prints algorithm statistics to the specified output stream.

Reimplemented from probfd::MDPAlgorithm< State, Action >.

◆ get_statistics()

template<typename State , typename Action , bool UseInterval>
Statistics probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::get_statistics ( ) const
nodiscard

Retreive the algorithm statistics.

◆ solve() [1/2]

template<typename State , typename Action , bool UseInterval>
template<typename ValueStore >
Interval probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::solve ( MDPType & mdp,
EvaluatorType & heuristic,
StateID init_state_id,
ValueStore & value_store,
double max_time = std::numeric_limits<double>::infinity(),
MapPolicy * policy = nullptr )

Runs the algorithm with the supplied state value storage.

Computes the full optimal value function for the entire state space reachable from initial_state. Stores the state values in the output parameter value_store. Returns the value of the initial state.

◆ compute_policy()

template<typename State , typename Action >
virtual std::unique_ptr< PolicyType > probfd::MDPAlgorithm< State, Action >::compute_policy ( MDPType & mdp,
EvaluatorType & heuristic,
param_type< State > state,
ProgressReport progress,
double maxtime )
pure virtualinherited

Computes a partial policy for the input state.

◆ solve() [2/2]

template<typename State , typename Action >
virtual Interval probfd::MDPAlgorithm< State, Action >::solve ( MDPType & mdp,
EvaluatorType & heuristic,
param_type< State > state,
ProgressReport progress,
double max_time )
pure virtualinherited

Runs the MDP algorithm for the initial state state with a maximum time limit.