Cart Pole Control

This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem”. A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.

Constants (non-fluents)

Constant Type Desc
GRAVITY float32 Force of gravity acting down
FORCE_MAG float32 Force applied to the side of the cart (discrete version)
FORCE_MAX float32 Maximum force applied to the side of the cart (continuous version)
CART_MASS float32 Mass of the cart
POLE_MASS float32 Mass of the pole
POLE_LEN float32 Half of the pole length
TIME_STEP float32 Seconds between state updates
POS_LIMIT float32 Limit of cart position
ANG_LIMIT float32 Limit of pole angle

All of these can be read from the RDDLEnv interface and from the RDDL files.

Action Space

Discrete version

There is a single action taking {0,1} values, indicating if the cart should be pushed to the left or to the right.

Action Type Desc
force_side Discrete(2) whether to apply force to left, right side or none

If force_side is 0 then the cart is pushed to the left with FORCE_MAG force
If force_side is 1 then the cart is pushed to the right with FORCE_MAG force

Continuous version

| Action | Type | Desc | |:———————|:—————–|:——————————————————-| | force_side | Box(1, -FORCE-MAX, FORCE-MAX) | force applied to the side of the cart |

Note: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it

State Space

State Type Desc

State Type Desc
pos Box(1, -POS_LIMIT, POS_LIMIT, float32) Cart position
ang_pos Box(1, -ANG_LIMIT, ANG_LIMIT, float32) Pole angle
vel Box(1, -np.inf, np.inf, float32) Cart velocity
ang_vel Box(1, -np.inf, np.inf, float32) Pole angular velocity

Note: The bounds above denote the possible values for the state space of each element. upon violation of one of these values the episode will continuo without change (the state will be frozen, and reward zero), for the remaining of th episode.


Since the goal is to keep the pole upright for as long as possible, a reward of +1 for every step taken, including the termination step, is allotted.


