RDDL and the pyRDDLGym Infrastructure

The problems in this year’s competition will be described in the RDDL language. RDDL is intended to compactly support the representation of a wide range of relational MDPs and POMDPs and support the efficient simulation of these domains. The domains will be simulated via autogenerating environment simulator and interacted via the standard Gym interface. Simply put, the pyRDDLSim takes textual problem description in RDDL, and generates a gym environment without writing a single line of python code.

Getting started with RDDL

The following tutorial covers RDDL basics and how to run it in pyRDDLGym:

pyRDDLGym RDDL tutorial

For those who wish to learn RDDL at a later time, it is possible to just skip ahead for the next section, without knowing RDDL (and use the existing environemnts), pyRDDLGym is a fully gym compatible simulator, and can be treated as such, with the knowledge that the environments are not written in python but in RDDL.

The RDDL language guide which documents all the language components is also available here:

RDDL language guide

Please cite as

@unpublished{Sanner:RDDL,
      author = "Scott Sanner",
      title = "Relational Dynamic Influence Diagram Language (RDDL): Language Description",
      note = "http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf",
      year = 2010}

Legacy RDDL

RDDL is out there since 2010, with a JAVA simulator and an excellent tutorial, explaining step by step with the help of a simple and illustrative example the power of RDDL and how to describe an MDP as an RDDL domain and instance.

Note: the 2023 competition is fully in python and the old simulator will not be used.

pyRDDLGym

pyRDDLSim is a generic autogeneration simulator from RDDL files to OpenAI Gym environments.

Paper

Please see our paper describing pyRDDLGym.

Status

pyRDDLGym supports a major subset of the original RDDL language:

RDDL

The following components are omitted (or marked as deprecated) from the language variant implemented in pyRDDLGym:

derived-fluent are supported by the framework as described in the language description. However, they are considered deprecated and will be removed from future versions.
Fluent levels are deprecated and are reasoned automatically by the framework, specifying levels explicitly is not required.
state-action-constraint is not implemented and is considered deprecated in the language to avoid ambiguity. Only the newer syntax of specifying state-invariants and action-preconditions is supported.

Additional components and structures have been added to the language to increase expressivity, and to accommodate learning interaction type. These are listed here:

Terminal states can now be explicitly defined. The termination block has been added to the language.
action-preconditions are implemented according to the original language description. However, they are subject to the user preference. By default the framework does not enforce the expressions in the action-preconditions block, thus, upon violation a warning will be printed to the user and the simulation will push the actions inside the legal space by using the default value and the simulation will continue. To ensure correct behaviour it is expected from the domain designer to include the appropriate logic of how to handle an invalid action within the cpfs block. In the case where the user does choose to enforce action-preconditions, the simulation will be interrupted and an appropriate exception will be thrown.
Direct Inquiry of variable (states/action) domains is supported through the standard action_space and state_space properties of the environment. In order for this functionality to work correctly, the domain designer is required to specify each (lifted-)variable bounds within the action-preconditions block in the format “fluent OP BOUND” where OP \in {<,>,>=,<=}, and BOUND is a deterministic function of the problem parameter to be evaluated at instantiation.
Parameter inequality is supported for lifted types. I.e., the following expression ?p == ?r can be evaluated to True or False.
Nested indexing is now supported, e.g., fluent’(?p,?q) = NEXT(fluent(?p, ?q)).
Vectorized distributions such as Multivariate normal, Student, Dirichlet, and Multinomial are now supported.
Basic matrix algebra such as determinant and inverse operation are supported for two appropriate fluents.
argmax and argmin are supported over enumerated types (enums).

Getting started

There are two options at the moment to obtain the pyRDDLGym infrstructure

Pip installation pip install pyRDDLGym
In order to avoid any package conflicts we recommand using a virtual environment:
```
conda create -n rddl python=3.8
conda activate rddl
pip install pyrddlgym
```
Cloning directly from Github https://github.com/pyrddlgym-project/pyRDDLGym.git

Please refer to the README page for information on the framework contents, requirements, examples, and more.

Basic usage

Initializing Environments

Initializing environments is very easy in pyRDDLGym and can be done via:

from pyRDDLGym import RDDLEnv
myEnv = RDDLEnv.RDDLEnv(domain="domain.rddl", instance='instance.rddl')

where domain.rddl and instance.rddl are rddl files of your choosing.

Built in environments

puRDDLGym is shiped with 12 environments designed completly in RDDL. The RDDL files are part of the distribution and can be accessed. In order to use the built in environments and keep the api of the RDDLEnv standard we supply an ExampleManager class:

from pyRDDLGym import ExampleManager
ExampleManager.ListExamples()

The ListExample() static function lists all the example environments in pyRDDLGym Then in order to retrive the informaiton of a specific environment:

EnvInfo = ExampleManager.GetEnvInfo(ENV)

Where ENV is a string name of the desired example.
Setting up an environment at the point is just

myEnv = RDDLEnv.RDDLEnv(domain=EnvInfo.get_domain(), instance=EnvInfo.get_instance(0))

Where the argument of the method get_instance(<num>) is the ID number of the instance (0 in this case). Listing all the available instances of the problem is accessed via

EnvInfo.list_instances()

Last, setting up the dedicated visualizer for the example is done via

myEnv.set_visualizer(EnvInfo.get_visualizer())

Interacting with the Environment

pyRDDLGym is build on Gym as so implements the classic “agent-environment loop”. The infrastructure comes with two simple agents:

NoOpAgent - which allows the environment to evolve according to the default behavior as specified in the RDDL file.
RandomAgent - which sends a rendom action according to the env.action_space and the maximum number of allowed concurrent actions as specified in the RDDL file.

Using a pre existing agent, or using of of your own is as simple as:

from Policies.Agents import RandomAgent
agent = RandomAgent(action_space=myEnv.action_space, num_actions=myEnv.NumConcurrentActions)

Let’s see what a complete the agent-environment loop looks like in pyRDDLGym. This example will run the Example MarsRover The loop will run for the amount of time steps specified in the environment’s horizon field. If the env.render() function will be used we will also see a window pop up rendering the environment

from pyRDDLGym import RDDLEnv
from pyRDDLGym import ExampleManager
from pyRDDLGym.Policies.Agents import RandomAgent

# get the environment info
EnvInfo = ExampleManager.GetEnvInfo('MarsRover')

# set up the environment class, choose instance 0 because every example has at least one example instance
myEnv = RDDLEnv.RDDLEnv(domain=EnvInfo.get_domain(), instance=EnvInfo.get_instance(0))
# set up the environment visualizer
myEnv.set_visualizer(EnvInfo.get_visualizer())

# set up an aget
agent = RandomAgent(action_space=myEnv.action_space, num_actions=myEnv.NumConcurrentActions)

total_reward = 0
state = myEnv.reset()
for _ in range(myEnv.horizon):
      myEnv.render()
      next_state, reward, done, info = myEnv.step(agent.sample_action())
      total_reward += reward
      state = next_state
      if done:
            break
myEnv.close()

Lifted vs Grounded Representations

RDDL is a lifted language, which means it compactly describes variables and processes in a general non-specific way. It is best explained with an example. The following block describes the behavior of an abstract entity car, with first order dynamics:

types {
      car : object;
};

pvariables{
      DT            : { non-fluent, real, default=0.1 };
      position(car) : { state-fluent, real, default=0.0 };
      velocity(car) : { action-fluent, real, default=0.0 };
};

cpfs {
      position'(car) = position(car) + DT * velocity(car)
};

This is a behavior description, this type of code can be found in the domain block of the RDDL code. In the code above no specific car is described. that will be done in the instance and non-fluents blocks. first we should define the objects in the problem:

objects {
      car : {car1, car2};
};

Now that we have specific car objects, we can define their intial state:

init_state {
      position(car1) = -1.0;
      position(car2) = 1.0;
};

So in the lifted description we have behavior, types and objects for instantiation. When pyRDDLGym instantiate an environment it will ground everything, which means we will no longer have types and objects, we will have only effects and evolutions over the explicit variables of the problem, i.e., the variables of the problem will be (with their initial values):

states = { position_car1 : -1.0, position_car2 : 1.0 }
actions = { velocity_car1 : 0.0, velocity_car2 : 0.0 } 

and the explicit effect will be:

position_car1 = position_car1 + DT * velocity_car1;
position_car2 = position_car2 + DT * velocity_car2;

The power of the lifted representation is the ability to easily specify different objects (and many of them), and in reasoning. When interactingn with pyRDDLGym environment, the states recieved and actions submitted are always specific (e.g., setting the velocity of car1) and thus grounded.

Note the name conversions between the lifted and the grounded representation.

Spaces

The state and action spaces of pyRDDLGym are standard gym.spaces, and inquireable througth the standard API: env.state_space and env.action_space. State/action spaces are of type gym.spaces.Dict, where each key-value pair where the key name is the state/action and the value is the state/action current value or action to apply.

Thus, RDDL types are converted to gym.spaces with the appropriate bounds as specified in the RDDL action-preconditions and state-invariants fields. The conversion is as following:

real -> Box with bounds as specified in action-preconditions, or with np.inf and symetric bounds.
int -> Discrete with bounds as specified in action-preconditions, or with np.inf and symetric bounds.
bool -> Discrete(2)

There is no need in pyRDDLGym to specify the values of all the existing action in the RDDL domain description, only thus the agent wishes to assign non-default values, the infrastructure will construct the full action vector as necessery with the default action values according to the RDDL description.

Note: enum types are not supported by pyRDDLGym at this stage.

Constants

RDDL allows for the constants of the problem instead of being hardcoded, to be specified and in the non-fluent block of the instance. Meaning every instance can have different constants, e.g., different bounds on action, different static object location, etc.

While these constants are not available through the state of the problem, it is possible to access them through gym (or directly through the RDDL description) with a dedicated API: env.non_fluents. The non_fluents property returns a python dictionary where the keys are the grounded non-fluents and the values are the appropriate values.

Termination

An Addition made to the RDDL language during the development of this infrastructure is the termination block. The termination block is intended to specify terminal states in the MDP, when reached the simulation will end. A terminal state is a valid state of the MDP (to emphesize the difference ob state-invariants). An example of terminal state can be any state within the goal set for which the simulation should not continue, or a state where there are no possible actions and the simulation should end, e.g., hitting a wall when it is not allowed. When a terminal state is reached the state is returned from the environment and the done flag is returned as True. The reward is handled independently by the reward function, thus if there is a specific reward for the terminal state, it should specified in the reward formula. The termination block has the following syntax:

termination {
	Terminal_condition1;
	Terminal_condition2;
	...
};

where Terminal_condition# are boolean formulas. The termination decision is a disjunction of all the conditions in the block (termination if at least one is True).

Visualization

pyRDDLGym visualization is just like regular Gym. Users can visualize the current state of the simulation by calling env.render(). The standard visualizer that comes out of the box with every pyRDDLGym domain (even used defined domain will have it without explicitly doing anything) is the TextViz. TextViz just renders an image with textual description of the states and their current values.

Replacing the built is TextViz is simple as calling the environment method env.set_visualizer(viz) with viz as the desired visualization object.

from pyRDDLGym import RDDLEnv
from pyRDDLGym import ExampleManager

EnvInfo = ExampleManager.GetEnvInfo('MarsRover')
myEnv = RDDLEnv.RDDLEnv(domain=EnvInfo.get_domain(), instance=EnvInfo.get_instance(0))

# set up the environment visualizer
myEnv.set_visualizer(EnvInfo.get_visualizer())

In order to build custom visualiztions (for new user defined domains), one just need to inherit the class Visualizer.StateViz.StateViz() and return in the visualizer.render() method a PIL image for the gym to render to the screen. The environment initilization will look something like that:

```python
from pyRDDLGym import RDDLEnv
from pyRDDLGym.Visualizer.StateViz import StateViz

class MyDomainViz(StateViz)
# here goes the visualization implementation


myEnv = RDDLEnv.RDDLEnv(domain='myDomain.rddl', instance='myInstance.rddl')

# set up the environment visualizer
myEnv.set_visualizer(MyDomainViz)

By default, calling EnvInfo.get_visualizer() on a domain without a dedicated visualizer sub-class will return a Visualizer.TextViz.TextVisualizer() instance, which simply prints a textual representation of the state. An alternative ChartVisualizer is also available to better track the evolution of the fluents over time. Like the textual visualizer, this viz handles both continuous and discrete fluent variables:

scatter plots are used for discrete/categorical variables
line plots are used for continuous variables
heatmaps are used for boolean variables

It can be instantiated as simply as follows:

from pyRDDLGym import RDDLEnv
from pyRDDLGym.Visualizer.ChartViz import ChartVisualizer

myEnv = RDDLEnv.RDDLEnv(domain='myDomain.rddl', instance='myInstance.rddl')

# set up the graphical visualizer
myEnv.set_visualizer(ChartVisualizer)

and when running on the wildfire domain, produces the following:

Custom user defined domains

Writing new user defined domains is as easy as writing a few lines of text in a mathematical fashion! All is required is to specify the lifted constants, variables (all are refered as fluents in RDDL), behavior/dynamic of the problem and generating an instance with the actual objects and initial state in RDDL - and pyRDDLGym will do the rest.
For more information about how to create new domains in RDDL please see the RDDL tutorial at the top of this page.

Back to main page