## Tuesday, August 25, 2015

### Common techniques in optimization

Optimization is a frequently encountered problem in real life.  We need to make a decision to achieve something within a set of constraints, and we need to maximize or minimize our objective based on some measurement.  For example, a restaurant may need to decide how many workers (of each position) to hire to serve the customers, with the constraint that workers cannot work overtime, and the objective is to minimize the cost.  A car manufacturer may need to decide how many units of each model to be produced, within the constraint of the storage capacity of its warehouse, and maximize the profit of its sales.

### Exhaustive Search

If there isn't a lot of choices, an exhaustive search (e.g. breath-first, depth-first, best-first, A* ... etc.) can be employed to evaluate each option, see if it meets all the constraints (ie: a feasible solution), and then compute its objective value.  Then we sort all feasible solutions based on its corresponding objective value and pick the solution that has the max (or min) objective value as our decision.  Unfortunately, real world problem usually involve a large number (exponentially large due to combinatorial explosion) of choices, making the exhaustive search in many cases impractical.

When this happens, two other solution approaches are commonly used.
1) Mathematical Programming
2) Local Greedy Search.

### Mathematical Programming

Mathematical programming is a classical way to solve optimization problem.  It is a family of approaches including linear programming, integer programming, quadratic programming and even non-linear programming.  The development process usually go through the following steps ...

From a problem description, the modeler will express the problem into a mathematical structure containing 3 parts.
• Variables:  there are two types of variables.  Data variables contains the current value of your business environment (e.g. cost of hiring a waiter, price of a car), and Decision variables hold the decision you make to optimize your objective (e.g. how many staff to hire, how many cars to make).
• Constraints:  it is a set of rules that you cannot break.  Effectively, constraints disallow certain combinations of your decision variables and is mainly used to filter out in-feasible solutions.  In a typical settings, constraints are expressed as a system of linear inequality equations where a linear expression of decision variables are specified on the left hand side and a value is specified on the right hand side after an inequality comparison.
• Objective function: it encapsulates the quantitative measure of how our goal has been achieved.  In a typical setting, objective function is expressed as a single linear (or quadratic) combination of decision variables
After the mathematical structure is defined, the modeler will submit it to a solver, which will output the best solution.  The process can be depicted as follows.

Expressing the problem in the mathematical structure is the key design of the solution.  There are many elements to be considered, which we described below.

The first consideration is how to represent your decision, specially whether the decision is a quantity (real number), a number of units (integer) or a binary decision (integer 0, 1).

The next step is to represent the constraints in terms of inequality equations of linear combination of decision variables.  You need to think whether the constraints is a hard of soft constraints.  Hard constraints should be expressed in the constraint part of the problem.  Notice the solver will not consider any solution once it violates any constraints.  Therefore, if the solver cannot find a feasible solution that fulfill all constraints, it will simply abort.  In other words, it won't return you a solution that violates the smallest number of constraints.  If you want the solve to tell you that because you have rooms to relax them, you should model these soft constraints in the objective function rather than in the constraint section.  Typically, you define an objective function that quantifies the degree of violation.  The solver will then give you the optimal solution (violating least number of constraints) rather than just telling you no solution is found.

Finally you define the objective function.  In most real-life problems, you have multiple goals in mind (e.g. you want to minimize your customer's wait time in the queue, but you also want to minimize your cost of hiring staffs).  First you express each goal as a linear expression of decision variables and then take the weighted average among different goals (which is also a linear combination of all decision variables) to form the final objective function.  How to choose the weights among different goals is a business question, based on how much you are willing to trade off between conflicting goals.

There are some objectives that cannot be expressed by a linear expression of decision variables.  One frequently encountered example is about minimizing the absolute deviation from a target value (ie. no matter the deviation is a positive and negative value).  A common way is to minimize the sum of square of the difference.  But after we square it, it is no longer a linear expression.  To address this requirement, there is a more powerful class of solver call "quadratic programming" which relax the objective function to allow a degree 2 polynomial expression.

After you expressed the problem in the mathematical structure.  You can pass it to a solver (there are many open source and commercial solver available) which you can treat it as a magical black box.  The solver will output an optimal solution (with a value assigned to each decision variable) that fulfill all constraints and maximize (or minimize) the objective function.

Once you received the solution from the solver, you need to evaluate how "reliable" is this optimal solution.  There may be fluctuations in the data variables due to collection error, or the data may have a volatile value that changes rapidly, or the data is an estimation of another unknown quantity.

Ideally, the optimal solution doesn't change much when we fluctuate the data values within its error bound.  In this case, our optimal solution is stable against error in our data variables and therefore is a good one.  However, if the optimal solution changes drastically when the data variable fluctuates, we say the optimal solution is unreliable and cannot use it.  In the case, we usually modify each data variables one at a time to figure out which data variable is causing a big swing of optimal solution and try to reduce our estimation error of that data variable.

The sensitivity analysis is an important step to evaluate the stability and hence the quality of our optimal solution.  It also provides guidance on which area we need to invest effort to make the estimation more accurate.

Mathematical Programming allows you to specify your optimization problem in a very declarative manner and also output an optimal solution if it exist.  It should be the first-to-go solution.  The downside of Mathematical programming is that it requires linear constraints and linear (or quadratic) objectives.  And it also has limits in terms of number of decision variables and constraints that it can store (and this limitation varies among different implementations).  Although there are non-linear solvers, the number of variables it can take is even smaller.

Usually the first thing to do is to test out if your problem is small enough to fit in the mathematically programming solver.  If not, you may need to use another approach.

### Greedy Local Search

The key words here are "local" and "greedy".  Greedy local search starts at a particular selection of decision variables, and then it look around surrounding neighborhood (hence the term "local") and within that find the best solution (hence the term "greedy").  Then it moves the current point to this best neighbor and then the process repeats until the new solution stays at the same place (ie. there is no better neighbor found).

If the initial point is selected to be a non-feasible solution, the greedy search will first try to find a feasible solution first by looking for neighbors that has less number of constraint violation.  After the feasible solution is found, the greedy search will only look for neighbors that fulfill all constraints and within which to find a neighbor with the best objective value.  Another good initialization strategy is to choose the initial point to be a feasible (of course not optimal) solution and then start the local search from there.

Because local search limits its search within a neighborhood, it can control the degree of complexity by simply limits the scope of neighbor.  Greedy local search can evaluate a large number of variables by walking across a chain of neighborhoods.  However, because local search only navigate towards the best neighbor within a limited scope, it loses the overall picture and rely on the path to be a convex shape.  If there are valleys along the path, the local search will stop there and never reach the global optimum.  This is called a local optimal trap because the search is trapped within a local maximum and not able to escape.  There are some techniques to escape from local optimum that I describe in my previous blog post.

When performing greedy local search, it is important to pick the right greedy function (also called heuristic function) as well as the right neighborhood.  Although it is common to choose the greedy function to be the same objective function itself, they don't have to be the same.  In many good practices, the greedy function is chosen to be the one that has high correlation with the objective function, but can be computed much faster.  On the other hand, evaluating objective function of every point in the neighborhood can also be a very expensive operation, especially when the data has a high dimension and the number of neighbors can be exponentially large.  A good neighbor function can be combined with a good greedy function such that we don't need to evaluate each neighbor to figure out which one has the best objective value.

### Combining the two approaches

In my experience, combining the two approaches can be a very powerful solution itself.  At the outer layer, we are using the greedy local search to navigate the direction towards better solution.  However, when we search for the best neighbor within the neighborhood, we use the mathematical programming by taking the greedy function as the objective function, and we use the constraints directly.  This way, we can use a very effective approach to search for best solution within the neighborhood and can use the flexible neighborhood scoping of local search to control the complexity of the mathematical programming solver.