Wednesday, August 15, 2012

C++ Program - Finding the kth element in a dynamic sorted list

A common programming problem goes something like -
To find the kth minimum(/maximum) element in a dynamic list, which is kept sorted.
The catch here is that, the list is dynamic, i.e. the list is increasing in size (due to insertion of element), and at any instant, you may be asked to report the kth minimum element.

There are in general 3 approaches (considering we have 'n' elements at any instant)-
  1. To have O(1) insertion, i.e insert at the end of the list. And O(n log n) search, i.e when asked to give kth element, sort the list and output the element at kth offset. As is evident, if the queries are frequent this approach is not going to be efficient.
  2. To have O(n) insertion, i.e insert the new element at its correct position in sorted list by shifting all elements from  kth to  nth offset by 1. And to have O(1) output. Again this is not very efficient as it is O(n2) overall.
  3. To have O(log n) insertion and O(1) output time. Now this seems rather efficient! This is overall O(n log n). This is the method I am discussing below.
The difference in second and third approach is that we use a array in former and 2 heaps in later.

Using 2 heaps-
If we need the  kth minimum element, we will need 2 heaps. One is a MIN heap other is a MAX heap. And at any time we need to ensure that MAX heap has exactly 'k' elements and MIN heap has the rest (N-K). Then the  kth element is at the top of MAX heap at any time. Let me illustrate with an example, followed by a C++ program.

Consider a stream of insertion and queries (?), for k = 4 - 
9 5 4 7 ? 6 2 ? 1 ? 2 ?

StreamMIN heapMAX heapOutput
9
9
5
9 5
4
9 5 4
7
9 7 5 4
?
9 7 5 49
697 6 5 4
27 96 5 4 2
?7 96 5 4 26
16 7 95 4 2 1
?6 7 95 4 2 15
25 6 7 94 2 2 1
?5 6 7 94 2 2 14

Now there are several points to be noted -
  • For first k insertions, we are not able to maintain the size of MAX heap as k. But we insert all of the first k elements in MAX heap.
  • After k elements are inserted, and we have new element "x", then we compare x with MAX heap's top "t". If x > t, we insert x in MIN heap, else we delete t from MAX heap, and put it in MIN heap, and then insert x in MAX heap.
  • Do we really need MIN heap? This is a question that should come in your mind. Answer is No! Just throw the numbers being inserted in MIN heap. This method is so general that it can be easily adapted for problems in which k is not a number but a fraction. Like to output the 1/3th element. In that case just maintain size of MAX heap to 1/3 of total. In this you may have to transfer elements from MIN heap to MAX heap as well.
  • C++ does not have Heap data structure, at least not with the name "heap". Either you can create your own, or use Priority Queue in STL. 
So here is the program-
#include <iostream>
#include <queue>

using namespace std;

class Rlong //Reverse Long
{
long m_l;
public:
Rlong(long l) : m_l(l)
{ }
operator long const &() const
{ return m_l; }
bool operator<(Rlong const &right) const
{ return m_l > right.m_l; }
};

int main()
{
long n, k, x, y, i;

priority_queue<Rlong> minpq;
priority_queue<long> maxpq;

cin >> n >> k;

for (i = 0; i < k; ++i)
{
cin >> x;
maxpq.push(x);
}
for (; i < n; ++i)
{
cin >> x ;
if (x != -1) // assuming -1 as '?'
{
if (x < maxpq.top())
{
y = maxpq.top();
maxpq.pop();
maxpq.push(x);
minpq.push(y);
}
else
{
minpq.push(x);
}
}
else
{
x = maxpq.top();
cout << x;
}
}
return 0;
}

No comments:

Post a Comment