sqladvanced15 minutes

Build a SQL Function to Calculate Running Median Over a Dynamic Window

In this challenge, you will create a SQL function that calculates the running median value of a numeric column over a dynamic sliding window based on timestamps. This requires advanced use of window functions, array manipulation, and median calculation within SQL.

Challenge prompt

Create a SQL function named `running_median` that accepts a table name, a numeric column name, a timestamp column name, and an integer window size (in terms of the number of rows). The function should return the original table appended with an additional column `median_val` that contains the median of the numeric column over the previous `window_size` rows ordered by the timestamp column (including the current row). Implement this using standard SQL with window functions, arrays, or any relevant constructs available in your SQL dialect. Your solution must handle dynamic input parameters for column and table names where possible (or describe assumptions if not possible).

Guidance

  • Use window functions like ROWS BETWEEN to define sliding windows based on row counts.
  • Since SQL does not have a built-in median aggregate in all dialects, consider using arrays and percentile_cont or manual calculation using array slicing.
  • Think about performance optimizations for large datasets and how to minimize repeated calculations.

Hints

  • percentile_cont(0.5) WITHIN GROUP (ORDER BY column) can be used in some SQL dialects to approximate median within a window.
  • If dynamic table and column names cannot be parameterized in your SQL environment, create the function targeting fixed columns or symbolize with identifiers.
  • You can aggregate values into arrays over the window, then select the middle element(s) to find the median.

Starter code

CREATE FUNCTION running_median(window_size INTEGER)
RETURNS TABLE(id INT, event_time TIMESTAMP, value NUMERIC, median_val NUMERIC) AS $$
BEGIN
  -- Implement your logic here assuming a fixed table named 'events' with columns 'id', 'event_time', and 'value'
  RETURN QUERY
  SELECT
    id,
    event_time,
    value,
    -- placeholder for median calculation
    NULL::NUMERIC AS median_val
  FROM events
  ORDER BY event_time;
END;
$$ LANGUAGE plpgsql;

Expected output

A result set from the 'events' table including a new column 'median_val' where each row's value is the median of the 'value' column over the last 'window_size' rows ordered by 'event_time'.

Core concepts

Window FunctionsMedian CalculationAggregate FunctionsArray Manipulation in SQL

Challenge a Friend

Send this duel to someone else and see if they can solve it.