The Range Minimum Query Problem
In this article, data structures that are used to solve the range minimum query problem is presented.
The Range Minimum Query problem (RMQ)
Given an array
and two indices , what is the smallest element out of Notation:
A trivial solution
- Just iterate across the elements between
and - Problem: If
is fixed in advance and we are going to make multiple queries on it
A different approach
There are only
possible RMQs in an array of length If we precompute all of them, we can answer
in time per query Building the table: we can precompute all subarrays in time
using dynamic programming
Some notation
- We will say that an RMQ data structure has time complexity
if - Preprocessing takes time at most
and - queries take time at most
- Preprocessing takes time at most
- We now have two RMQ data structures
with no preprocessing with full preprocessing
- Is there a "golden mean" between these extremes?
Another approach: Block decomposition
- Split the input into
blocks of some block size - Compute the minimum value in each block
Analyzing the approach
- Preprocessing time:
works on blocks to find minima - Total work:
- Time to evaluate
work to find block indices (divide by block size) work to scan inside and 's blocks work looking at block minima between and - Total work:
Intuiting
As
increases - The
term rises (more elements to scan within each block) - The
term drops (fewer blocks to look at)
- The
Is there an optimal choice of b given these constraints?
Starting by taking the derivative:
Which gives Then the runtime is
Then the block partition method has:
A second approach: Sparse tables
- Can we still get constant-time queries without preprocessing all possible ranges?
- Goal: precompute RMQ over a set of ranges such that
- There are
total ranges, but - There are enough ranges to support
query times
- There are
- The approach
- For each index
, compute RMQ for ranges starting at of size as long as they fit in the array - Gives both large and small ranges starting at any point in the array
- Only
ranges computed for each array element - Total number of ranges:
- For each index
- Claim: Any range in the array can be formed as the union of two of these ranges
- Doing a query
- Find the largest
such that - With the right preprocessing, this can be done in time
- With the right preprocessing, this can be done in time
- The range
can be formed as the overlap of the ranges and - Each range can be looked up in time
- Total time:
- Find the largest
- Precomputing the ranges
- Using dynamic programming, we can compute all of them in time
- Using dynamic programming, we can compute all of them in time
- This data structure is called a sparse table
- It gives an
solution to RMQ
- It gives an
A third approach: Hybrid strategies
The framework
- Split the input into blocks of size
- Form an array of the block minima
- Construct a "summary" RMQ structure over the block minima
- Construct "block" RMQ structures for each block
- Aggregate the results together
- Split the input into blocks of size
Analyzing efficiency
Suppose we use a
-time RMQ for the summary RMQ and a -time RMQ for each block, with block size What is the preprocessing time for this hybrid structure?
time to compute the minima of each block time to construct RMQ on the minima time to construct the block RMQs - Total construction time is
What is the query time for this hybrid structure?
time to query the summary RMQ time to query the block RMQs - Total query time:
Hybrid preprocessing time:
Hybrid query time:
One possible hybrid
Set the block size to
Use a sparse table for the summary RMQ
Use the "no preprocessing" structure for each block
Preprocessing time:
Query time:
An
solution!
Another hybrid
Let's suppose we use the
sparse table for both the summary and block RMQ structures with a block size of The preprocessing time is
The query time is
We have an
solution to RMQ
Optimal solution: Cartesian Trees
From this point forward, let's have
denote the index of the minimum value in the range rather than the value itself Some notation
Let
and be blocks of length We will say that
and have the same block type (denoted ) if the following holds: Intuitively, the RMQ answers for
are always the same as the RMQ answers for If we build an RMQ to answer queries on some block
, we can reuse that RMQ structure on some other block iff
Detecting block types
- If
and have the same permutation on their elements, then - Claim: If
, the minimum elements of and must occur at the same position - This property must hold recursively on the subarrays to the left and right of the minimum
- If
Cartesian trees
- The Cartesian trees for an array is a binary tree obeying the min-heaps property (each node's value is at least as large as its parent's) whose inorder traversal (from left to right) gives back the original array
- A Cartesian tree for an array is a binary tree built as follows:
- The root of the tree is the minimum element of the array
- Its left and right subtrees are formed by recursively building Cartesian trees for the subarrays to the left and right of the minimum
- Base case: if the array is empty, the Cartesian tree is empty
- Cartesian trees and RMQ
- Theorem: Let
and be blocks of length . Then iff and have isomorphic ("same shape") Cartesian trees
- Theorem: Let
- Building Cartesian trees
- High-level idea: Build a Cartesian tree for the first element, then the first two, then the first three, then the first four, etc
- We can implement this algorithm efficiently by maintaining a stack of the nodes in the right spine
- Pop the stack until the new value is no bigger than the stack top (or the stack is empty). Remember the last node popped this way
- Rewire the tree by
- making the stack top point to the new node, and
- making the most-recently-popped node the new node's left child
- This algorithm takes time
on an array of size - Idea: each element is pushed onto the stack at most once, when it's created. Each element can therefore be popped at most once
- Therefore, there are at most
pushes and pops, so the runtime is
Theorem: The number of distinct Cartesian tree shapes for arrays of length b is at most
- The actual number is
which is roughly equal to - Reference: Catalan numbers
- There are at most
stack operations during the execution of the algorithm: pushes and no more than pops - Represent the execution of the algorithm as a
-bit number, where 1 means "push" and 0 means "pop". We will pad the end with 0's. This number is the Cartesian tree number of a block
- The actual number is
Observation: If all we care about is finding blocks that can share RMQ structures, we never need to build Cartesian trees. Instead, we can just compute the Cartesian tree number for each block
How efficient is this?
We are using the hybrid approach, and all the types we're using have constant query times. Query time:
Our preprocessing time is:
The term is for computing block minima, compute Cartesian tree numbers of each block The term
is for building a sparse table on blocks of size The term
is for constructing at most block-level RMQ structures at a cost of each Suppose we pick
for some constant , then the preprocessing time is For , it is:
This data structure is called the Fischer-Heun structure. It uses a modified version of our hybrid RMQ framework:
- Set
- Split the input into blocks of size
. Compute an array of minimum values from each block - Build a sparse table on that array of minima
- Build per-block RMQ strctures for each block, using Cartesian tree numbers to avoid recomputing RMQ structures unnecessarily
- Make queries using the standard hybrid solution approach
- Set
This is an
solution to RMQ The technique emplyed here is an example of the Method of Four Russians or a Four Russians Speedup. Think of it as "divide, precompute, and conquer"
Break the problem of size
into subproblems of size , plus some top-level problem of size - This is called a macro/micro decomposition
Solve all posible subproblems of size
- Here, we only solved the subproblems that actually came up in the original array, but that's just an optimization
Solve the overall problem by combining solutions to the micro and macro problems