The Range Minimum Query Problem

In this article, data structures that are used to solve the range minimum query problem is presented.

The Range Minimum Query problem (RMQ)

  • Given an array and two indices , what is the smallest element out of

  • Notation:

A trivial solution

  • Just iterate across the elements between and
  • Problem: If is fixed in advance and we are going to make multiple queries on it

A different approach

  • There are only possible RMQs in an array of length

  • If we precompute all of them, we can answer in time per query

  • Building the table: we can precompute all subarrays in time using dynamic programming

Some notation

  • We will say that an RMQ data structure has time complexity if
    • Preprocessing takes time at most and
    • queries take time at most
  • We now have two RMQ data structures
    • with no preprocessing
    • with full preprocessing
  • Is there a "golden mean" between these extremes?

Another approach: Block decomposition

  • Split the input into blocks of some block size
  • Compute the minimum value in each block

Analyzing the approach

  • Preprocessing time:
    • works on blocks to find minima
    • Total work:
  • Time to evaluate
    • work to find block indices (divide by block size)
    • work to scan inside and 's blocks
    • work looking at block minima between and
    • Total work:

Intuiting

  • As increases

    • The term rises (more elements to scan within each block)
    • The term drops (fewer blocks to look at)
  • Is there an optimal choice of b given these constraints?

    • Starting by taking the derivative: Which gives

    • Then the runtime is

  • Then the block partition method has:

A second approach: Sparse tables

  • Can we still get constant-time queries without preprocessing all possible ranges?
  • Goal: precompute RMQ over a set of ranges such that
    • There are total ranges, but
    • There are enough ranges to support query times
  • The approach
    • For each index , compute RMQ for ranges starting at of size as long as they fit in the array
    • Gives both large and small ranges starting at any point in the array
    • Only ranges computed for each array element
    • Total number of ranges:
  • Claim: Any range in the array can be formed as the union of two of these ranges
  • Doing a query
    • Find the largest such that
      • With the right preprocessing, this can be done in time
    • The range can be formed as the overlap of the ranges and
    • Each range can be looked up in time
    • Total time:
  • Precomputing the ranges
    • Using dynamic programming, we can compute all of them in time
  • This data structure is called a sparse table
    • It gives an solution to RMQ

A third approach: Hybrid strategies

  • The framework

    • Split the input into blocks of size
    • Form an array of the block minima
    • Construct a "summary" RMQ structure over the block minima
    • Construct "block" RMQ structures for each block
    • Aggregate the results together
  • Analyzing efficiency

    • Suppose we use a -time RMQ for the summary RMQ and a -time RMQ for each block, with block size

    • What is the preprocessing time for this hybrid structure?

      • time to compute the minima of each block
      • time to construct RMQ on the minima
      • time to construct the block RMQs
      • Total construction time is
    • What is the query time for this hybrid structure?

      • time to query the summary RMQ
      • time to query the block RMQs
      • Total query time:
    • Hybrid preprocessing time:

    • Hybrid query time:

  • One possible hybrid

    • Set the block size to

    • Use a sparse table for the summary RMQ

    • Use the "no preprocessing" structure for each block

    • Preprocessing time:

    • Query time:

    • An solution!

  • Another hybrid

    • Let's suppose we use the sparse table for both the summary and block RMQ structures with a block size of

    • The preprocessing time is

    • The query time is

    • We have an solution to RMQ

Optimal solution: Cartesian Trees

  • From this point forward, let's have denote the index of the minimum value in the range rather than the value itself

  • Some notation

    • Let and be blocks of length

    • We will say that and have the same block type (denoted ) if the following holds:

    • Intuitively, the RMQ answers for are always the same as the RMQ answers for

    • If we build an RMQ to answer queries on some block , we can reuse that RMQ structure on some other block iff

  • Detecting block types

    • If and have the same permutation on their elements, then
    • Claim: If , the minimum elements of and must occur at the same position
      • This property must hold recursively on the subarrays to the left and right of the minimum
  • Cartesian trees

    • The Cartesian trees for an array is a binary tree obeying the min-heaps property (each node's value is at least as large as its parent's) whose inorder traversal (from left to right) gives back the original array
    • A Cartesian tree for an array is a binary tree built as follows:
      • The root of the tree is the minimum element of the array
      • Its left and right subtrees are formed by recursively building Cartesian trees for the subarrays to the left and right of the minimum
      • Base case: if the array is empty, the Cartesian tree is empty
    • Cartesian trees and RMQ
      • Theorem: Let and be blocks of length . Then iff and have isomorphic ("same shape") Cartesian trees
    • Building Cartesian trees
      • High-level idea: Build a Cartesian tree for the first element, then the first two, then the first three, then the first four, etc
      • We can implement this algorithm efficiently by maintaining a stack of the nodes in the right spine
      • Pop the stack until the new value is no bigger than the stack top (or the stack is empty). Remember the last node popped this way
      • Rewire the tree by
        • making the stack top point to the new node, and
        • making the most-recently-popped node the new node's left child
      • This algorithm takes time on an array of size
        • Idea: each element is pushed onto the stack at most once, when it's created. Each element can therefore be popped at most once
        • Therefore, there are at most pushes and pops, so the runtime is
  • Theorem: The number of distinct Cartesian tree shapes for arrays of length b is at most

    • The actual number is which is roughly equal to
    • Reference: Catalan numbers
    • There are at most stack operations during the execution of the algorithm: pushes and no more than pops
    • Represent the execution of the algorithm as a -bit number, where 1 means "push" and 0 means "pop". We will pad the end with 0's. This number is the Cartesian tree number of a block
  • Observation: If all we care about is finding blocks that can share RMQ structures, we never need to build Cartesian trees. Instead, we can just compute the Cartesian tree number for each block

  • How efficient is this?

    • We are using the hybrid approach, and all the types we're using have constant query times. Query time:

    • Our preprocessing time is: The term is for computing block minima, compute Cartesian tree numbers of each block

      The term is for building a sparse table on blocks of size

      The term is for constructing at most block-level RMQ structures at a cost of each

    • Suppose we pick for some constant , then the preprocessing time is For , it is:

  • This data structure is called the Fischer-Heun structure. It uses a modified version of our hybrid RMQ framework:

    • Set
    • Split the input into blocks of size . Compute an array of minimum values from each block
    • Build a sparse table on that array of minima
    • Build per-block RMQ strctures for each block, using Cartesian tree numbers to avoid recomputing RMQ structures unnecessarily
    • Make queries using the standard hybrid solution approach
  • This is an solution to RMQ

  • The technique emplyed here is an example of the Method of Four Russians or a Four Russians Speedup. Think of it as "divide, precompute, and conquer"

    • Break the problem of size into subproblems of size , plus some top-level problem of size

      • This is called a macro/micro decomposition
    • Solve all posible subproblems of size

      • Here, we only solved the subproblems that actually came up in the original array, but that's just an optimization
    • Solve the overall problem by combining solutions to the micro and macro problems

Reference