Probabilistic and Reinforcement Learning Track

2023 International Planning Competition

Mountain car Control

   
Example name MountainCar
Action space Dict
State space Dict

Description

This domain is a recreation of the domain Mountain Car from the OpenAI Gym repositotry.

The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being the accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. There are two versions of the mountain car domain in gym: one with discrete actions and one with continuous. This version is the one with discrete actions.

This MDP first appeared in Andrew Moore’s PhD Thesis (1990)

Constants (non-fluents)

Constant Type Desc
GRAVITY_MAG float32 Force of gravity acting down
FORCE_MAG float32 Force applied to the side of the cart
DEPTH float32 depth of the valley
MIN_POS float32 min position of cart
MAX_POS float32 max position of cart
MAX_VEL float32 max velocity of cart
GOAL_MIN float32 desired x position of cart

All of these can be read from the RDDLEnv interface and from the RDDL files.

Action Space

There is a single action taking {0, 1, 2} values, indicating if the cart should be pushed to the left or to the right or not at all.

Action Type Desc
action Discrete(3) whether to accelerate left, none or right

If action is 0 then the cart is pushed to the left with FORCE_MAG force
If action is 1 then no force is acting on the cart
If action is 2 then the cart is pushed to the right with FORCE_MAG force

State Space

The state space represents the positions and velocities of all the drones in the problem, as well as the state of all the minearls in the domain. The location and harvesting regions of the minearls are not part of the state, but are available through the non fluents in the problem.

State Type Desc
pos Box(1, MIN_POS, MAX_POS, float32) Cart position
vel Box(1, -MAX_VEL, MAX_VEL, float32) Cart velocity

Rewards

The goal is to reach the flag placed on top of the right hill as quickly as possible, as such the agent is penalised with a reward of -1 for each timestep.

References


Back to main page