Power Unit Commitment
A number of power producers cooperate to meet daily demand that fluctuates according to the maximum temperature on a given day. A cost is incurred for every unit of power produced and income is received for every unit consumed by the demand. There is a large penalty for failing to meet demand on a given day and there are per-power plant penalties for deviating from the previous day’s production at each plant – some plants must pay higher operating costs for changes in production. Power generation is in integer units, consumption is real, and time steps are assumed to span 24 hours.
|PROD_UNITS_MIN(plant)||int||Minimum unit to produce|
|PROD_UNITS_MAX(plant)||int||Maximum unit to produce|
|PROD_CHANGE_PENALTY(plant)||float32||Penaly for changing the production amount|
|COST_PER_UNIT(plant)||float32||Cost per power unit|
|INCOME_PER_UNIT||float32||Income per power unit|
|TEMP_VARIANCE||float32||Temperature change variance|
|DEMAND_EXP_COEF||float32||Exp coefficient for the demand U shape|
|MIN_DEMAND_TEMP||float32||Center of the demand U shape|
|MIN_CONSUMPTION||float32||DC level of the demand U|
|UNFULFILLED_DEMAND_PENALTY||float32||Penalty for producing too much power|
All of these can be read from the RDDLEnv interface and from the RDDL files.
The actions are the current amount of power each plant is required to produce.
|curProd(plant)||Discrete(PROD_UNITS_MIN, PROD_UNITS_MAX)||current production command to each plant|
The state space represents the temerature and the previous production of each of the plants in the problem.
|temperature||Box(1, -inf, inf, float32)||Current temperature|
|prevProd(plant)||Discrete(-inf, inf)||previous power produced per plant|
The reward function is defined as
cost of supply per plant, demand income, demand exceeds supply penalty, steady-state penalties