Mars Rover Science Mission Navigation
Example name | MarsRover |
Action space | Dict |
State space | Dict |
Multi-agent path finding (MAPF) problem, where agents starts from a some initial position, and should harvast as many minerals as possible. Each mineral is locatied randomly at the instatiation of the problem, and has different value. Agent dynamics in each axis is a second order integrator i.e., linear rate of change
for each agent the state vector is the position and velocity, and the action is the acceleration. The full state vector is the stacking of all agents’ states, and similarly for the actions. The reward is the total rewards collected from harvesting the mineral, minus the power consumption usued throughout the process.
Constant | Type | Desc |
---|---|---|
MAX-POWER(drone) | float32 | Norm upper bound constraint on the power inputs |
SCALE-FACTOR | float32 | Time scale factor for dynamic equations (Delta T) |
MINERAL-AREA(mineral) | float32 | Mineral harvesting radius area |
MINERAL-VALUE(mineral) | float32 | Mineral harvest value |
MINERAL-POS_X(mineral) | float32 | Mineral X position |
MINERAL-POS_XY(mineral) | float32 | Mineral Y position |
All of these can be read from the RDDLEnv interface and from the RDDL files.
The actions are the forces operating on the drones by their motors in the x and y axes (decoupled model), and a harvest action that can be applied by a drone if it is in a mineral harvest region, the result of the harvest action if applicable is that the mineral is harvested, and cannot be harvested again.
Action | Type | Desc |
---|---|---|
power-x(drone) | Box(1, -MAX_POWER(drone), MAX_POWER(drone), float32) | Propelling force in x axis |
power-y(drone) | Box(1, -MAX_POWER(drone), MAX_POWER(drone), float32) | Propelling force in y axis |
harvest(drone) | Discrete(2) | Harvest if in mineral area |
The state space represents the positions and velocities of all the drones in the problem, as well as the state of all the minearls in the domain. The location and harvesting regions of the minearls are not part of the state, but are available through the non fluents in the problem.
State | Type | Desc |
---|---|---|
pos-x(drone) | Box(1, -np.inf, np.inf, float32) | Position in x axis |
vel-x(drone) | Box(1, -np.inf, np.inf, float32) | Velocity in x axis |
pos-y(drone) | Box(1, -np.inf, np.inf, float32) | Position in y axis |
vel-y(drone) | Box(1, -np.inf, np.inf, float32) | Velocity in y axis |
mineral_harvested(mineral) | Discrete(2) | True if the minearl was not harvested |
The reward function is defined as
\[r_t = \sum_{d \in drones} -power-x(d)^2 - power-y(d)^2 - harvest-action(d) + harvest(d,m)\]where,