The recommender system maintains a limited inventory of items (say we can index/process so many items in a given day). Providers compete for these slots, where the probability of a provider getting the slot is proportional to their satisfaction. Essentially, unsatisfied providers will become unrecommendable. Consumer satisfaction is according to the item features and the consumers affinity. The goal of the recommander system is to maximize the consumer satifaction with the recommanded items.
There is a single type of action in this domain, a boolean decision, whether to recommand consumer s item t.
|recommend(consumer, item)||Discrete(2)||recommend an item to a consumer|
The state is compriesed of the satisfaction level of the providers and consumers, and the all the item features.
|provider-satisfaction(provider)||Box(1, np.float32)||consumer statisfaction level|
|consumer-satisfaction(consumer)||Box(1, np.float32)||consumer statisfaction level|
|item-feature(item, feature)||Box(1, np.float32)||how much of a feature is present in an item|
The reward is a sum over all consumers statisfaction levels.