sqladvanced15 minutes

Create a SQL Function to Calculate Running Median by Group

Build an advanced SQL function that calculates the running median of a numeric column grouped by a category column over a dynamic order, using window functions and optimized querying.

Challenge prompt

Write a SQL function named `running_median_by_group` that takes a table name, a numeric column name, a grouping column name, and an ordering column name as parameters. The function should return a result set showing each row with an additional column containing the median of the numeric column calculated over all previous rows (including the current row) within the same group, ordered by the specified ordering column. The function should efficiently handle large datasets with multiple groups and dynamic input parameters.

Guidance

  • Use window functions to partition data by the grouping column and order by the ordering column.
  • Calculate the median dynamically by selecting the middle value(s) over the window frame for each row.
  • Ensure the function works with varying table and column names provided as parameters.

Hints

  • To compute a median in SQL, consider using percentile_cont(0.5) within a window function.
  • Dynamic SQL execution may be necessary to allow flexible table and column names.
  • Test the function on smaller datasets before scaling.

Starter code

CREATE OR REPLACE FUNCTION running_median_by_group(
    tbl_name TEXT,
    num_col TEXT,
    grp_col TEXT,
    order_col TEXT
) RETURNS TABLE(*) AS $$
DECLARE
    sql_query TEXT;
BEGIN
    sql_query := FORMAT(
        'SELECT *,
         percentile_cont(0.5) WITHIN GROUP (ORDER BY %1$I) OVER (PARTITION BY %2$I ORDER BY %3$I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_median
         FROM %4$I', num_col, grp_col, order_col, tbl_name
    );
    RETURN QUERY EXECUTE sql_query;
END;
$$ LANGUAGE plpgsql;

Expected output

Returns the original table rows with an additional 'running_median' column representing the median of the numeric column calculated cumulatively within each group, ordered by the specified column.

Core concepts

window functionsdynamic SQLmedian calculationSQL functions

Challenge a Friend

Send this duel to someone else and see if they can solve it.