Probabilistic and Reinforcement Learning Track

2023 International Planning Competition

Cart Pole Control

   
Example name Cartpole **
Action space Dict
State space Dict

** stand for discrete, continuous.

Description

This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem”. A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.

Constants (non-fluents)

Constant Type Desc
GRAVITY float32 Force of gravity acting down
FORCE_MAG float32 Force applied to the side of the cart (discrete version)
FORCE_MAX float32 Maximum force applied to the side of the cart (continuous version)
CART_MASS float32 Mass of the cart
POLE_MASS float32 Mass of the pole
POLE_LEN float32 Half of the pole length
TIME_STEP float32 Seconds between state updates
POS_LIMIT float32 Limit of cart position
ANG_LIMIT float32 Limit of pole angle

All of these can be read from the RDDLEnv interface and from the RDDL files.

Action Space

Discrete version

There is a single action taking {0,1} values, indicating if the cart should be pushed to the left or to the right.

Action Type Desc
force_side Discrete(2) whether to apply force to left, right side or none

If force_side is 0 then the cart is pushed to the left with FORCE_MAG force
If force_side is 1 then the cart is pushed to the right with FORCE_MAG force

Continuous version

| Action | Type | Desc | |:———————|:—————–|:——————————————————-| | force_side | Box(1, -FORCE-MAX, FORCE-MAX) | force applied to the side of the cart |

Note: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it

State Space

The state space represents the positions and velocities of all the drones in the problem, as well as the state of all the minearls in the domain. The location and harvesting regions of the minearls are not part of the state, but are available through the non fluents in the problem.

State Type Desc
pos Box(1, -POS_LIMIT, POS_LIMIT, float32) Cart position
ang_pos Box(1, -ANG_LIMIT, ANG_LIMIT, float32) Pole angle
vel Box(1, -np.inf, np.inf, float32) Cart velocity
ang_vel Box(1, -np.inf, np.inf, float32) Pole angular velocity

Note: The bounds above denote the possible values for the state space of each element. upon violation of one of these values the episode will continuo without change (the state will be frozen, and reward zero), for the remaining of th episode.

Rewards

Since the goal is to keep the pole upright for as long as possible, a reward of +1 for every step taken, including the termination step, is allotted.

References


Back to main page