AI 24/25 Project Software
Documentation for the AI 24/25 course programming project software
|
#include "probfd/algorithms/topological_value_iteration.h"
Implements Topological Value Iteration [5].
This algorithm performs value iteration on each strongly connected component of the MDP in reverse topological order. This implementation exhaustively explores the entire reachable state space and computes a full optimal state value table by default. A heuristic can be supplied to explicitly prune parts of the state space or to accelerate convergence.
This implementation also supports value intervals. However, convergence is not guaranteed with value intervals if traps exist within the reachable state space. In this case, traps must be removed prior to running topological value iteration, or the trap-aware variant TATopologicalValueIteration of this algorithm must be used, which eliminates as traps on-the-fly to guarantee convergence.
Public Member Functions | |
void | print_statistics (std::ostream &out) const override |
Prints algorithm statistics to the specified output stream. | |
Statistics | get_statistics () const |
Retreive the algorithm statistics. | |
template<typename ValueStore > | |
Interval | solve (MDPType &mdp, EvaluatorType &heuristic, StateID init_state_id, ValueStore &value_store, double max_time=std::numeric_limits< double >::infinity(), MapPolicy *policy=nullptr) |
Runs the algorithm with the supplied state value storage. | |
virtual std::unique_ptr< PolicyType > | compute_policy (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double maxtime)=0 |
Computes a partial policy for the input state. | |
virtual Interval | solve (MDPType &mdp, EvaluatorType &heuristic, param_type< State > state, ProgressReport progress, double max_time)=0 |
Runs the MDP algorithm for the initial state state with a maximum time limit. | |
|
overridevirtual |
Prints algorithm statistics to the specified output stream.
Reimplemented from probfd::MDPAlgorithm< State, Action >.
|
nodiscard |
Retreive the algorithm statistics.
Interval probfd::algorithms::topological_vi::TopologicalValueIteration< State, Action, UseInterval >::solve | ( | MDPType & | mdp, |
EvaluatorType & | heuristic, | ||
StateID | init_state_id, | ||
ValueStore & | value_store, | ||
double | max_time = std::numeric_limits<double>::infinity(), | ||
MapPolicy * | policy = nullptr ) |
Runs the algorithm with the supplied state value storage.
Computes the full optimal value function for the entire state space reachable from initial_state
. Stores the state values in the output parameter value_store
. Returns the value of the initial state.
|
pure virtualinherited |
Computes a partial policy for the input state.
|
pure virtualinherited |
Runs the MDP algorithm for the initial state state
with a maximum time limit.