REINFORCED ENCODING FOR PLANNING AS SAT

Solving planning problems via translation to satisfiability (SAT) is one of the most successful approaches to automated planning. We propose a new encoding scheme, called Reinforced Encoding, which encodes a planning problem represented in the SAS+ formalism into SAT. The Reinforced Encoding is a combination of the transition-based SASE encoding with the classical propositional encoding. In our experiments we compare our new encoding to other known SAS+ based encodings. The results indicate, that he Reinforced encoding performs well on the benchmark problems of the 2011 International Planning Competition and can outperform all the other known encodings for several domains.


Introduction
Planning is the problem of finding a sequence of actions -a plan, that transforms the world from an initial state to a state that satisfies some goal conditions.The world is fully-observable, deterministic and static (only the agent we make the plan for changes the world).The number of possible states of the world as well as the number of possible actions is finite, though possibly very large.We will assume that the actions are instantaneous (take a constant time) and therefore we only need to deal with their sequencing.Actions have preconditions, which specify in which states of the world they can be applied, as well as effects, which dictate how the world will be changed after the action is executed.
One of the most successful approaches to planning is encoding the planning problem into a series of satisfiability (SAT) formulas and then using a SAT solver to solve them.The method was first introduced by Kautz and Selman [1] and is still very popular and competitive.This is partly due to the power of SAT solvers, which are getting more efficient year by year.Since then many new improvements have been made to the method, such as new compact and efficient encodings [2][3][4][5], better ways of scheduling the SAT solvers [3] or modifying the SAT solver's heuristics to be more suitable for solving planning problems [6].
In this paper we present a new encoding scheme.It is inspired by the SASE transition-based encoding [2], which was the first SAT encoding based on the SAS+ planning formalism.The motivation for our work is to make the SASE encoding more robust by incorporating the strengths of older encoding schemes.We will prove the correctness of our encoding and compute an upper bound on the size of the encoded formula.In the experimental section of the paper we compare our new encoding to other SAS+ encodings on benchmark problems from the 2011 International Planning Competition (IPC) [7].

Preliminaries
In this section we give the basic definitions of satisfiability, and planning with parallel plans.

Satisfiability
A Boolean variable is a variable with two possible values True and False.A literal of a Boolean variable x is either x or ¬x (positive or negative literal).A clause is a disjunction (OR) of literals.A clause with only one literal is called a unit clause and with two literals a binary clause.An implication of the form x ⇒ (y formula is a conjunction (AND) of clauses.A truth assignment φ of a formula F assigns a truth value to its variables.The assignment φ satisfies a positive (negative) literal if it assigns the value True (False) to its variable and φ satisfies a clause if it satisfies any of its literals.Finally, φ satisfies a CNF formula if it satisfies all of its clauses.A formula F is said to be satisfiable if there is a truth assignment φ that satisfies F .Such an assignment is called a satisfying assignment.The satisfiability problem (SAT) is to find a satisfying assignment of a given CNF formula or determine that it is unsatisfiable.

Planning
In the introduction we briefly described what planning is, in this section we give the formal definitions.We will use the multivalued SAS+ formalism [8] instead of the classical STRIPS formalism [9] based on propositional logic.
A planning task Π in the SAS+ formalism is defined as a tuple Π = {X, O, s I , s G } where • O is a set of actions (or operators).Each action a ∈ O is a tuple (pre(a), eff(a)) where pre(a) is the set of preconditions of a and eff(a) is the set of effects of a.Both preconditions and effects are of the form x i = v where v ∈ dom(x i ).
• A state is a set of assignments to the state variables.
Each state variable has exactly one value assigned from its respective domain.We denote by S the set of all states.s I ∈ S is the initial state.s G is a partial assignment of the state variables (not all variables have assigned values) and a state s ∈ S is a goal state if s G ⊆ s.
An action a is applicable in the given state s if pre(a) ⊆ s.By s = apply(a, s) we denote the state after executing the action a in the state s, where a is applicable in s.All the assignments in s are the same as in s except for the assignments in eff(a) which replace the corresponding (same variable) assignments in s.
is a sequence of actions, then apply(P, s) = apply(a k , apply(a k−1 . . .apply(a 2 , apply(a 1 , s)) . . .)).A sequential plan P of length k for a given planning task Π is a sequence of k actions P such that s G ⊆ apply(P, s I ).

Parallel Plans
A parallel plan P with makespan k for a given planning task Π is a sequence of sets of actions (called parallel steps is a sequential plan for Π, where is an ordering function, which transforms a set of actions A i into a sequence of actions (A i ) and ⊕ denotes the concatenation of sequences.
Let us denote by s j the world state in between the parallel steps A j and A j+1 , which is obtained by applying the sequence (A j ) on s j−1 , i.e., s j = apply( (A j ), s j−1 ) (except for s 0 = s I ).In this paper we will use the ∀-Step parallel planning semantics [10], which requires that each action a ∈ A j is applicable in the state s j , the effects of all actions are applied in s j+1 and all possible orderings of the sets A j make valid sequential plans (hence the name ∀-Step semantics).
To ensure, that each ordering of the sets of actions in a parallel plan leads to a valid sequential plan, it is sufficient to check that the actions in each set are pairwise independent [3].We say that two actions a 1 and a 2 are independent if they do not share common variables, i.e., scope(a 1 ) ∩ scope(a 2 ) = ∅, where scope(a) ⊆ X is a set of all state variables that appear in pre(a) and eff(a).
Note, that the pairwise independence of actions is a sufficient but not a necessary condition for the parallel steps in a ∀-Step semantics plan, as the following example demonstrates.

Example 1.
Let a 1 and a 2 be two actions such that pre(a 1 ) = pre(a 2 ) = {x = 1}, eff(a 1 ) = {y = 2}, and eff(a 2 ) = {z = 2}.Clearly, a 1 and a 2 are not independent (they share the variable x), however, they can be ordered arbitrarily to achieve the same changes between two given states.
The pairwise independence of actions in each step of a parallel plan also implies that they can be executed in parallel (at the same time).

Finding Plans using SAT
The basic idea of solving planning as SAT is the following [1].We construct (by encoding the planning task) a series of SAT formulas F 1 , F 2 , . . .such that F i is satisfiable if there is a parallel plan of makespan ≤ i.Then we solve them one by one starting from F 1 until we reach the first satisfiable formula F k .From the satisfying assignment of F k we can extract a plan of makespan k.The pseudo-code of this algorithm is presented in Figure 1 The method was first introduced by Kautz and Selman [1] and is still very popular and competitive.This is partly due to the power of SAT solvers, which are getting more efficient year by year.Since then many new improvements have been made to the method, such as new compact and efficient encodings [2][3][4][5], better ways of scheduling the SAT solvers [3] or modifying the SAT solver's heuristics to be more suitable for solving planning problems [6].Clever ways of solver scheduling [3] can significantly improve the performance of the planning algorithm at the cost of possibly longer-makespan plans.Nevertheless, we will use the basic one-by-one scheduling since we are interested only in comparing the properties of encodings, i.e., the construction of the formulas.In the following section we will describe how a formula encoding a planning task can be constructed using our new Reinforced encoding.

Reinforced Encoding
Our goal is (given a planning task Π = {X, O, s I , s G } and an integer k) to construct a CNF formula F k such that F k is satisfiable only if there is a parallel plan of at most k steps for Π.We also want to construct F k in a way, that in the case it is satisfiable, we can easily extract a plan from its satisfying assignment.Before we describe the formula, we need to introduce the notion of transitions [2].
A transition represents a change of a state variable x ∈ X from one value to another from its domain dom(x) or from an arbitrary value to a specific value.There are the following three kinds of transitions.
• An active transition changes the value of the variable x from d to e such that d = e, {d, e} ⊆ dom(x), it is denoted by δ x: d→e .An action a has an active transition δ x: d→e if (x = d) ∈ pre(a) and (x = e) ∈ eff(a).
• A prevailing transition conserves the value of the variable x (if it was d, then it remains d, d ∈ dom(x)), it is denoted by δ x: d→d .An action a has a prevailing transition and there is no assignment related to x in eff(a).
• A mechanical transition changes the value of the variable x from any value to the value d (d ∈ dom(x)), it is denoted by δ x: * →d .An action a has a mechanical transition and there is no assignment related to x in pre(a).The transition set of an action a is the set of all transitions that a has, it is denoted by ∆ a .By ∆ p we will mean the set of all possible prevailing transitions of a planning task, i.e., ∆ p = {δ x: d→d | x ∈ X, d ∈ dom(x)}.The set of all transitions ∆ is the union of all the prevailing transitions and the transition sets of all the actions ∆ = ∆ p ∪ {∆ a | a ∈ O}.By ∆ x ⊆ ∆ where x ∈ X we will denote the set of all transitions related to the variable x.
The constructed formula F k will have the following three kinds of Boolean variables.
• Action variables a t i indicating whether the i-th action is used in the t-th step.We will have one such variable for each action from the description of the planning task and for each of the k parallel steps.
• Assignment variables b t x=v indicating whether the value of the variable x is equal to v in the end of the t-th step (after applying the actions of the t-th step).We will have one such Boolean variable for each state variable x ∈ X and each value v ∈ dom(x) for each of the k parallel steps.
• Transition variables c t δ (or c t x: d→e where δ = δ x: d→e ) indicating whether the transition δ occurred during the t-th step.We will have one such variable for each δ ∈ ∆ for each of the k parallel steps.Now we are ready to define the clauses contained in F k .
The following set of binary clauses will enforce, that at most one value is assigned to each state variable x ∈ X.
The following three kinds of clauses connect the assignment variables with the transition variables.The first set of clauses ensures that each transition δ x: d→e (including prevailing transitions δ x: e→e and mechanical transitions δ x: * →e ) implies that x = e at the end of each step.
Similarly, we need to add clauses for each transition δ x: d→e (except for mechanical transitions) to enforce that x = d holds at the end of the previous step, except for the first step, where we explicitly disable all the transitions that are not compatible with the initial state (using the clauses from equation 8).
The third kind of clauses is needed to guarantee, that if a variable x has the value v then there is a transition which changes the value of x to v.
Next we describe the clauses that connect the action variables with the transition variables.If an action a is selected, then all the transitions in its transition set ∆ a must be selected as well.This implication is expressed via the following clauses.
Also we need to make sure, that transitions (except for prevailing transitions) cannot happen without actions that have them in their transition sets.The following set of clauses will ensure this.
By support(δ) we mean the set of indices of actions that have δ in their transition set, i.e., support(δ Next we need to deal with the interfering actions inside a parallel step.As discussed earlier, it is sufficient to ensure, that only pair-wise independent actions are together in each parallel step.We will achieve this by disabling all pairs of non-independent (interfering) actions.To extrude interfering actions from the parallel steps we will add binary clauses for all the interfering action pairs.(¬a t i ∨ ¬a t j ) ∀a i , a j ∈ O, a i , a j not independent, ∀t ∈ {1, .

. . , k}
There might be a plenty of interfering action pairs producing a lot of clauses.But if we look carefully at the clauses we have already described, we can see, that most of the interfering actions cannot occur together anyway as we will show via the following notion of compatible actions.
Two sets of conditions (assignments) are compatible if they assign the same values to the variables they share.Two actions a 1 and a 2 are compatible if the preconditions of a 1 are compatible with the preconditions of a 2 and also the effects of a 1 are compatible with the effects of a 2 .
Due to the clauses that enforce, that actions imply their transitions (5) and their connection to assignment variables (2 and 3) together with the clauses that forbid a state variable to have more than one value 1, actions that are not compatible cannot be in a parallel step together.Therefore it is enough to suppress compatible interfering action pairs.(¬a t i ∨ ¬a t j ) ∀a i , a j ∈ O, a i , a j compatible and not independent, ∀t ∈ {1, . . ., k} (7) Lastly, we add the clauses that enforce the initial state to hold in the beginning and the goal conditions to be satisfied in the end.As for the initial state, we will disable all the transitions that are not compatible with the initial state, i.e., if a variable x has the value d in the initial state, then all the transitions that change x from a value other than d are disabled by using a unit clause.Note, that mechanical transitions are always compatible with the initial state (or any other state) and therefore no mechanical transition is disabled.
To encode the goal conditions we will use unit clauses with assignment variables.Fore each goal condition (x = v) ∈ s G we will have a unit clause (b k x=v ) which forces the value of x to be v after the last parallel step.
The formula F k for the Reinforced encoding is a conjunction of the clauses defined in equations 1, 2, 3, 4, 5, 6, 7, 8, and 9.A ∀-step parallel plan can be extracted from any satisfying assignment of F k in the following way.Let φ be a satisfying assignment of F k .P φ is a sequence of action sets such that its t-th set contains those actions a i ∈ O for which φ(a t i ) = T rue.

Correctness
In this subsection the prove the correctness of our encoding, i.e., the following proposition.
Proposition 1.If the formula F k obtained using the Reinforced encoding of the planning task Π is satisfied by a truth assignment φ then P φ is a valid ∀-Step parallel plan of makespan k for the planning task Π.
Proof.The requirements for the action sets given by the ∀-Step semantics are clearly satisfied: • the preconditions of actions in each parallel step are satisfied due to 2, 3, and 5 • the effects are propagated also due to 2, 3, and 5 • the actions can be ordered arbitrarily thanks to 7 It remains to prove that ] is a valid (sequential) plan for Π, where ⊕ denotes the concatenation of sequences and is an arbitrary ordering of an action set.
Let us observe, that the transitions of the state variables are consistent at each step, i.e., exactly one transition is allowed for each state variable (due to 2, 3, and 1) and a non-prevailing transition cannot happen without an action that has it (thanks to 6).Prevailing transitions do not have to be supported by any actions since they are used to preserve the values of the variables that are not changed in the given step by any actions.Furthermore, all the transitions between two neighboring parallel steps must be compatible due to 2, 3, 4 and 1. Note, that it may happen, that a variable x has no value assigned in the end of a step t (all the b t x=v , v ∈ dom(x) are False) and no transition related to the variable is selected (all the c t δ , δ ∈ ∆ x are False).However, this can only occur for variables that are not used in the goal conditions or by actions that appear in the t-th step or later.
Since the action variables imply the proper transition variables thanks to 5, the actions must be applicable if their action variable is True and also the transition connected to the action must happen.
Thanks to 8 only transitions compatible with the initial state can happen in the first step and because of 2 and 9 only transitions that change the variables to their goal values are allowed in the last step.This fact together with the consistency of the transitions during all the k steps implies the validity of P S φ for the planning task Π.
The reversed implication, which is that if P φ is a valid ∀-Step parallel plan of makespan k, then φ satisfies F k , does not hold.This is because P φ may contain a non-independent pair of actions in one of its steps and still be a valid ∀-Step plan (see Example 1).Such a pair of actions would make one of the clauses of type 7 unsatisfied.

Size of The Encoded Formula
The size of the formulas will of course depend on the parameters of the planning task being encoded.We will use the following quantitative properties of a planning task Π = (X, O, s I , s G ) to compute the upper bounds.
• v -The number of state variables (v = |X|).Typically, the number of actions is much higher than the other parameters.From these values we can compute the following upper bounds related to the planning task.
• The number of assignments is at most vd.From these bounds it is apparent, that F k (a formula for makespan k) has at most kn action variables, kvd assignment variables, and kv(d 2 + d) transition variables.Therefore the total number of Boolean variables in Now let us compute an upper bound on the number of clauses in F k .We will count separately the number of unit clauses (clauses with one literal), binary clauses (clauses with two literals), and Horn clauses (clauses with at most one positive literal).The formula F k obtained by the Reinforced encoding is the conjunction of the clauses defined in equations 1, 2, 3, 4, 5, 6, 7, 8, and 9 • There are at most kvd 2 clauses of the type 1 -one for each step and variable and two different values from its domain.These clauses are binary and Horn.
• There are at most kv(d 2 + d) clauses of the both type 2 and type 3 -one for each step and transition.These clauses are binary and Horn.
• There are at most kvd clauses of the type 4 -one for each step and assignment.
• There are at most 2knp clauses of the type 5 -one for each step, action and each of its transitions (there are at most 2p transitions connected to each action).These clauses are binary and Horn.
• There are at most kv(d 2 + d) clauses of the type 6one for each step and transition.
• There are at most kn 2 clauses of the type 7 -one for each step, and each pair of compatible interfering actions (at most each pair of actions).These clauses are binary and Horn.
• There are at most v(d 2 + d) clauses of type 8 -one for each transition that is not compatible with the initial state (at most all the transitions).These are unit clauses.
• There are at most v clauses of the type 9 -one for each goal condition.These clauses are unit.

Experimental Evaluation
To evaluate the performance of our new Reinforced encoding, we compared it with three other SAS+ based encodings of planning as SAT.We ran experiments with a 30 minutes time limit using the following four encodings.
• Reinforced Encoding (Reinf).A Java implementation of our new Reinforced encoding as described in the previous section.
• Direct Encoding (Dir).We implemented a simple encoding based on the historically first encoding of planning as SAT [1].We adapted it for the SAS+ formalism.This encoding is similar to our Reinforced encoding but uses only action and assignment variables.
• R 2 ∃-Step Encoding (R 2 ∃).The original Java implementation of the R 2 ∃-Step encoding [5].This encoding differs from the previous three encoding significantly since it uses a different parallel planning semantics.The R 2 ∃-Step encoding allows more actions inside the parallel steps, therefore it often finds plans with much lower makespans.Lower makespan indicates that fewer SAT solver calls are required to find a plan, however, it does not say anything about its length, i.e., the total number of actions it contains.

Experimental Setting
To compare the performance of the encodings we created a simple script, which iteratively constructed and solved the formulas for time steps 1, 2, . . .until a satisfiable formula was reached (see Figure 1).For each encoding we used the same SAT solver -Lingeling [11] (version ats).The time limit was 30 minutes for the SAT solving part, i.e., the total time the SAT solver could spend solving the formulas F 1 , F 2 , . . .for each problem instance was 30 minutes.The time required for the generation of F 1 , F 2 , . . . is usually negligible compared to the time required to solve them and therefore we will ignore it.Hence the overall planning time could exceed the given time limit for a problem instance.The experiments were run on a computer with Intel i7 920 CPU @ 2.67 GHz processor and 6 GB of memory.
The benchmark problems of the IPC are organized into domains.Each domain contains 20 problems and there are 14 domains which results in a total of 280 problems.The benchmark problems are provided in the PDDL format, however, the encodings require input in the SAS+ format.We used Helmert's translation tool, which is a part of the Fast Downward planning system [12], to obtain the SAS+ files from the PDDL files.The translation is very fast requiring only a few seconds for all domains.

Experimental Results
The number of solved instances in presented in Table 1.Looking at the results from the perspective of the domains, we can observe, that the elevators, parcprinter, and woodworking domains are entirely solved by every encoding.On the other hand, the parking domain is so difficult that not even a single problem is solved by any of the encodings.The openstacks domain is very difficult for all but the R 2 ∃-Step encoding.The sokoban and tidybot domains are also very hard for all of the encodings, only two of the twenty problems are solved by each encoding.
If we compare the encodings, we can observe that the R 2 ∃-Step encoding has the highest total number of solved instances followed by our new Reinforced encoding.As for the individual domains, the R 2 ∃-Step encoding solves strictly more problems than the other encodings in four cases.The Reinforced encoding achieves this for three domains, while the Direct and SASE encoding cannot outperform the other encod- ings in any of the domains.The Reinforced encoding solves the same number of problems as any other encoding in seven cases.Except for the visitall domain, the Reinforced encoding is never worse than the Direct or SASE encoding.
Looking at the makespans of found plans displayed in Table 2 we can observe that the makespans for the R 2 ∃-Step plans are indeed significantly lower than the makespans of plans found by the other three encodings.As expected, in the cases when the Direct, SASE, and Reinforced encodings solve all the problems (or solve the same problems) their total makespans are identical.This is due to the fact that these three encodings use the same ∀-Step parallel planning semantics.
The times required to solve the problems are presented in Table 3.If we look at the results for the domains, where each encoding solved the same number of problems, i.e., the highlighted domains, we can notice, that except for the parcprinter and sokoban problems, the runtime of the R 2 ∃-Step encoding is much higher than the runtime of the other methods.If we also look at Table 2, which contains the total makespan of the found plans, we can deduce, that lower makespan, i.e., fewer SAT calls does not necessarily mean faster planning, especially not in the case of the easy domains.Nevertheless, for the domains, where R 2 ∃-Step significantly outperformed the other methods -openstacks, pegsol, and visitall, the makespans are much lower than the makespans of the other methods, despite the fact, that they solved fewer problems.Table 3.The time in seconds required to solve all the problems that were solved within the time limit.The presented time is the sum of times the SAT solver alone required, formula generation time is not included.Domains with the same number of solved problems for each encoding are highlighted.

Conclusion
In this paper we have introduced a new encoding of a planning problem represented in the SAS+ formalism into SAT.Our new encoding performs well on the benchmark problems of the 2011 International Planning Competition.It can strictly outperform all the other evaluated SAS+ encodings in three domains and solve the same number of problems as any other encoding for seven domains out of fourteen.On the remaining four domains our encoding is outperformed by the R 2 ∃-Step encoding which uses a different parallel planning semantics.As for future work, we believe that the Reinforced encoding can be improved by decreasing the number of its clauses by using a more compact way of encoding of the action interference constraints.

Figure 1 .
Figure 1.Pseudo-code of the basic planning as satisfiability algorithm.

•
d -The maximum domain size of any state variable (d = max x∈X {| dom(x)|}).•p -The maximum number of preconditions or effects an action has (p = max a∈O {| pre(a)|, | eff(a)|})

Table 1 .
The number of problems (out of 20) in each domain that the encodings solved within the time limit (30 minutes for SAT solving).

Table 2 .
The sum of makespans of the plans found within the time limit for each domain.Lower makespan means fewer SAT solver calls, it does not indicate better plan quality.Domains with the same number of solved problems for each encoding are highlighted.