Monday, February 18, 2013

Minimum manhattan distance

You have a grid with certain intersections marked. We'd like to find the intersection that is closest to all the marked intersections. That is, We need to find a point such that the sum of distances from all points is minimum



The cool thing about the Manhatan distance is that the distance itself comprises of two independent components: the distance on the x and y coordinate. Thus you can solve two simpler tasks and merge the results from them to obtain the desired results.

The task I speak of is: given are points on a line. Find the point on the line that minimizes the sum of the absolute distances to all the points. If there are many find all of them (btw they always turn to be a single segment which is easy to prove). The segment is determined by the (potentially two) points medians of the set. By median I mean the point that has equal number of points to the left and to the right of it. In case the number of points is odd there is no such point and you choose the points with difference 1 in both directions to form the segment.

Here I add examples of solutions of this simpler task:
In case the points on the line are like that:
-4 | | | 0 | 2 3 4
The solution is just a point and it is 2.In case the points on the line are like that:
-4 | | | 0 | 2 3
The whole segment [0, 2] is the solution of the problem.
You solve this task separately for the x and y coordinate and then merge the results to obtain the rectangle of minimum distanced points.

And now comes an example of the solution for the initial task.Imagine you want to find the points that are with minimum Manhattan distance to the set (0, 6), (1, 3), (3, 5), (3, 3), (4, 7), (2, 4)
You form the two simpler tasks:
For x:
0 1 2 3 3 4
And here the solution is the segment [2, 3] (note that here we have duplicated point 3, which I represented in probably not the most intuitive way).
For y:
3 3 4 5 6 7
Here the solution is the segment [4, 5].
Finally we get that the solution of the initial task is the rectangle with formula:
2 <= x <= 3; 4 <= y <= 5

Let's speak about complexity.

The complexity of the task is actually the same as the complexity of solving the simpler task (because as already discussed the solution actually consists of solving two simpler tasks). Many people will go and solve it via sorting and then choosing out the medians. However, this will cause O(nlog n) complexity, where n is the number of points in the input set.

This can be improved if a better algorithm for finding the kth element is used. This algorithm basically follows the same approach as qsort. The running time isO(n). Even in the case of two median points this will still remain linear (needing two runs of the same algorithm), and thus the total complexity of the algorithm becomes O(n). It is obvious that the task can not be solved any faster, as long as the input itself is of the mentioned complexity.

No comments: