Wednesday, March 12, 2008

Percentile calculation in Oracle

A percentile is the value of a variable below which a certain percent of observations fall. That is; the 80th percentile is the value below which 80% of the observations can be found. More informations about percentile and the mathematical calculations can be found on the wiki page.

I had to use percentile function in one of our applications where the only option was to use Oracle. I did some research on the subject and found the solution.

There are 2 analytic functions in oracle which can be used for the purpose finding the percentile.

1. Function: Percent_Rank
This function gives the percentile rank, between 0 and 1, in the set (table rows).
Format:
PERCENT_RANK(<value>) OVER (<partition_clause> <order_by_clause>)

E.g.:
SELECT department_id, last_name, salary, PERCENT_RANK()
OVER (PARTITION BY department_id ORDER BY salary DESC) AS pr
FROM employees
ORDER BY pr, salary;
The results of this query will present the department, employee last name, salary and the percentile rank of the salary within the department the employee belongs to, order by percentile rank and salary.

2. Function: Percentile_Cont
This assumes a continuous distribution; the function will retrieve the value for a given percentile.
Format:
PERCENTILE_CONT(<value>) WITHIN GROUP (ORDER BY <expression> [ASC | DESC]) OVER (<partition_clause>)

E.g.:
SELECT last_name, salary, department_id,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC)
OVER (PARTITION BY department_id) PCT_CONT,
PERCENT_RANK()
OVER (PARTITION BY department_id ORDER BY salary DESC) PCT_RANK
FROM employees
WHERE department_id IN (30, 60);
This query returns the employee last name, salary, department, 50th percentile value of the salaries within the departments, percentile rank of the employee salary for departments 30 and 50.

These analytics functions may take some time to response when queried against large tables.