Tuesday, February 12, 2013

Basic Planning Algorithm

You can think of planning as a graph search problem where each node in the graph represents a possible "state" of the reality. A directed edge from nodeA to nodeB representing an "action" is available to transition stateA to stateB.

Planning can be thought of as another form of a constraint optimization problem which is quite different from the one I described in my last blog. In this case, the constraint is the goal state we want to achieve, where a sequence of actions needs to be found to meet the constraint. The sequence of actions will incur cost and our objective is to minimize the cost associated with our chosen actions

Basic Concepts 

A "domain" defined the structure of the problem.
  • A set of object types.  e.g. ObjectTypeX, ObjectTypeY ... etc.
  • A set of relation types  e.g. [ObjectTypeX RelationTypeA ObjectTypeY] or [ObjectTypeX RelationTypeA ValueTypeY]
A "state" is composed of a set of relation instances,  It can either be a "reality state" or a "required state".

A reality state contains tuples of +ve atoms.  e.g. [(personX in locationA), (personX is male)].  Notice that -ve atoms will not exist in reality state.  e.g. If personX is NOT in locationB, such tuple will just not show up in the state.

A required state contains both +ve and -ve atoms.  e.g. [(personX in locationA), NOT(personX is male)]  The required state is used to check against the reality state.  The required state is reached if all of the following is true.
  • All +ve atoms in the required state is contained in the +ve atoms of the reality state
  • None of the -ve atoms in the required state is contained in the +ve atoms of the reality state
Notice that there can be huge (or even infinite) number of nodes and edges in the graph if we are to expand the whole graph (with all possible states and possible actions).  Normally we will expressed only a subset of nodes and edges in an analytical way.  Instead of enumerating all possible states, we describe the state as a set of relations that we care, in particular we describe the initial state of the environment with all the things we observed and the goal state as what we want to reach.  Similarity, we are not enumerate every possible edges, instead we describe actions with variables such that it describe rules that can transition multiple situations of states.

An "action" causes transition from one state to the other.  It is defined as action(variable1, variable2 ...) and contains the following components.
  • Pre-conditions: a required state containing a set of tuples (expressed by variables).  The action is feasible if the current reality state contains all the +ve atoms but not any -ve atoms specified in the pre-conditions.
  • Effects: A set of +ve atoms and -ve atoms (also expressed by variables).  After the action is taken, it removes all the -ve atoms from the current reality state and then insert all the +ve atoms into the current reality state.
  • Cost of executing this actio.
Notice that since actions contains variables but the reality state does not, therefore before an action can be execution, we need to bind the variables in the pre-conditions to a specific value such that it will match the current reality state.  This binding will propagate to the variable in the effects of the actions and new atoms will be insert / removed from the reality state.

Planning Algorithm

This can be think of a Search problem.  Given an initial state and a goal state, our objective is to search for a sequence of actions such that the goal state is reached.

We can perform the search from the initial state, expand all the possible states that can be reached by taking some actions, and check during this process whether the goal state has been reached.  If so, terminate the process and return the path.

Forward planning build the plan from the initial state.  It works as follows ...
  1. Put the initial state into the exploration queue, with an empty path.
  2. Pick a state (together with its path from initial state) from the exploration queue as the current state according to some heuristics.
  3. If this current state is the goal state, then return its path that contains the sequence of action and we are done.  Else move on.
  4. For this current state, explore which action is possible by seeing whether the current state meet the pre-conditions (ie: contains all +ve and no -ve state specified in the action pre-conditions).
  5. If the action is feasible, compute the next reachable state, and the path (by adding this action to the original path), insert the next state into the exploration queue.
  6. Repeat 5 for all feasible actions of current state.

Alternatively, we can perform the search from the goal state.  We looked at what need to be accomplished and identify what possible actions can accomplish that (ie: the effect of the action meets the goal state).  Then we looked at whether those actions are feasible (ie: the initial state meets the action's pre-conditions).  If so we can execute the action, otherwise we take the action's pre-conditions as our sub-goal and expand our over goal state.

Backward planning build the plan from the goal state.  It works as follows ...
  1. Put the goal state into the exploration queue, with an empty path.
  2. Pick a regression state (a state that can reach the goal state, can be considered as a sub-goal) from the exploration queue according to some heuristics.
  3. If the regression state is contained in initial state, then we are done and return the path as the plan.  Else move on.
  4. From this regression state, identify all "relevant actions"; those actions who has some +ve effect is contained in the regression state; and all of its +ve effect is not overlap with the -ve regression state; and all of its -ve effect is not overlap with the +ve regression state.
  5. If the action is relevant, compute the next regression state by removing the action effect from the current regression state and adding the action pre-conditions into the current regression state, insert the next regression state into the exploration queue.
  6. Repeat 5 from all relevant actions of current regression state.

Heuristic Function

In above algorithms, to pick the next candidate from the exploration queue.  We can employ many strategies.
  • If we pick the oldest element in the queue, this is a breathe-first search
  • If we pick the youngest element in the queue, this is a depth-first search
  • We can pick the best element in the queue based on some value function.
Notice that what is "best" is very subjective and is also domain specific. A very popular approach is using the A* search whose value function = g(thisState) + h(thisState).

Notice that g(thisState) is the accumulative cost to move from initial state to "thisState", while h(thisState) is a domain-specific function that estimate the cost from "thisState" to the goal state.  It can be proved that in order for A* search to return an optimal solution (ie: the least cost path), the chosen h(state) function must not over-estimate (ie: need to underestimate) the actual cost to move from "thisState" to the goal state.

Here is some detail of A* search.

No comments: