Mountain car Control
Example name | MountainCar |
Action space | Dict |
State space | Dict |
This domain is a recreation of the domain Mountain Car from the OpenAI Gym repositotry.
The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being the accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. There are two versions of the mountain car domain in gym: one with discrete actions and one with continuous. This version is the one with discrete actions.
This MDP first appeared in Andrew Moore’s PhD Thesis (1990)
Constant | Type | Desc |
---|---|---|
GRAVITY_MAG | float32 | Force of gravity acting down |
FORCE_MAG | float32 | Force applied to the side of the cart |
DEPTH | float32 | depth of the valley |
MIN_POS | float32 | min position of cart |
MAX_POS | float32 | max position of cart |
MAX_VEL | float32 | max velocity of cart |
GOAL_MIN | float32 | desired x position of cart |
All of these can be read from the RDDLEnv interface and from the RDDL files.
There is a single action taking {0, 1, 2} values, indicating if the cart should be pushed to the left or to the right or not at all.
Action | Type | Desc |
---|---|---|
action | Discrete(3) | whether to accelerate left, none or right |
If action is 0 then the cart is pushed to the left with FORCE_MAG force
If action is 1 then no force is acting on the cart
If action is 2 then the cart is pushed to the right with FORCE_MAG force
The state space represents the positions and velocities of all the drones in the problem, as well as the state of all the minearls in the domain. The location and harvesting regions of the minearls are not part of the state, but are available through the non fluents in the problem.
State | Type | Desc |
---|---|---|
pos | Box(1, MIN_POS, MAX_POS, float32) | Cart position |
vel | Box(1, -MAX_VEL, MAX_VEL, float32) | Cart velocity |
The goal is to reach the flag placed on top of the right hill as quickly as possible, as such the agent is penalised with a reward of -1 for each timestep.