Tuesday, September 24, 2013

Approaches to solving computational problems

Given a computational problem, we need to determine the most appropriate approach.

Can the problem be solved analytically?

Say we need to find whether a given list is pre-sorted. We can solve this problem analytically by walking through the list and check if each subsequent list item is larger than the previous list item.

Say we are given a knapsack problem. We can solve such a problem by enumerating through all choices and picking the choice with the most optimal result. To optimize the processing time, we can employ Dynamic Programming.

If not, take a random walk

Can we determine the location of an ant on a Cartesian plane? Since we cannot predict which way an ant will move, we will need to resort to simulation to come up with a likely answer. Simulation is useful for:
  • Problems that cannot be solved analytically
  • We cannot enumerate all possibilities
We can use simulation, commonly referred to as Monte Carlo Simulation.
  • Generate a sample representation of Scenarios
  • Run the experiment several times
  • Collate results
The results so obtained are Stochastic.
  • The results are descriptive, not prescriptive i.e. they explain how things are in the experiment and not necessarily how they are supposed to be
  • We can plot results to come with up an approximation of the result
  • Not a technique to come up with an optimal solution but the simulation results can be used to help with optimization
The simulation is based on Inferential Statistics - random sample tends to exhibit the same behavior as the general population. For example, if we flip a coin numerous time, we expect a 50-50 head-tail result as an aggregate. By making more runs the results become more predictable.

If simulation can explain current behavior, can it predict future behavior?
  • Statistical validity does not guarantee correctness of the result. For example, the assumptions behind the simulation may be incorrect to begin with.
  • Always do a sanity check by comparing simulation results with physical reality.

Deterministic Simulation

With deterministic simulation, the same result is obtained from each run.

Facets of Simulation

  • Static vs Dynamic (i.e. time plays role, for example, queing of requests in a network)
  • Discrete (behavior of single entity) vs Continuous (flow of traffic, for example)
  • Random walk (a Monte Carlo simulation) is dynamic, stochastic, and discreet

Estimate value of Pi using simulation

Even though Pi is not a random number, random walk simulation (stochastic) can be used to calculate value of Pi as first shown by Buffon, Laplace.

Π is a constant that is the ratio of circumference of a circle to its diameter.

Area of circle = Π r 2
Circumference = 2 Π r = Π d
  • Take a unit circle and inscribe quarter of a circle in it. The radius of this circle is 1.
  • Area under the arc is shaded.
  • Throw darts at the square.
  • Intuition suggests that the darts that land in the shaded area are in proportion to shaded area over the total area.
  • darts in shaded area / total darts = shaded area / total area of square
Hit ratio = h = (Π r 2 / 4) /  r 2
  • Divided shaded area by 4 because we are dealing with quarter of a circle
  • Numerator is the shaded area; denominator is the total area of the square
Simplifying,
Π = 4h
  • In a computer program, get random x, y coordinates such that 0 <= x <=1 and 0 <= y <= 1.
  • Get the random coordinates n number of times (simulating n darts thrown)
  • Determine whether the coordinate is in the shaded region by calculating the distance of the coordinate from the center. If  the distance is less or equal to the radius (1 in case of unit circle), the dart is in the shaded region.
Using Pythagorean theorem, distance is the length of the hypotenuse.
d = √  (x 2+ y 2)

if d <= 1, the dart is in the shaded area.

Calculate hit ratio,
h = (num darts in shaded region) / (total darts thrown)

To arrive at the value of Π,
Π= 4h

Run the simulation many times and average the results. When simulation yields a value that does not change much (for required floating point accuracy), we have a good estimate of Π.