There is a lot of information around regarding analytic functions however I haven’t been able to find many good examples.
Here I do not provide a lot of explanation but I do provide a number of (hopefully clear) examples.
Analytic functions enable us to manipulate a table in “parts” or partitions as well as allowing a sequential processing of rows making reference to earlier and later rows. They are used for grouping and summary reports.
Although many of the functions can be carried out with standard SQL, analytic functions allow us to simplify otherwise complex SQL as well as generally producing results more efficiently than the equivalent SQL.
It is easiest to demonstrate how these functions work with a number of examples.
Imagine we have a table of employees:
CREATE TABLE employees(name VARCHAR2 (4), dept# INTEGER, salary NUMBER, hiredate DATE, grade CHAR (1)); ....(INSERTS) select * from employees; NAME DEPT# SALARY HIREDATE GRADE ---- --------- -------- -------- ----- Fred 20 50000 12-SEP-09 B Tom 20 30000 11-SEP-07 C Bill 20 60000 01-JUN-04 A Jane 20 50000 23-MAR-00 C Mark 20 55000 05-NOV-08 B Ben 20 35000 10-AUG-10 C Ann 20 67000 12-AUG-05 A Hank 20 24000 21-MAR-08 C Sue 30 70000 13-JUN-07 A Lisa 30 23000 13-MAY-11 C Paul 30 20000 13-APR-08 C Bob 30 22000 10-SEP-08 B Lou 30 90000 23-MAR-00 A Kate 40 21000 11-SEP-07 B Zak 40 82000 20-APR-03 A 15 rows selected.
Partition: For each employee, report how many people are in their department:
select name, dept#, count(*) over (partition by dept#) people_count from employees; NAME DEPT# PEOPLE_COUNT ---- ---------- ------------ Fred 20 8 Tom 20 8 Hank 20 8 Ann 20 8 Ben 20 8 Mark 20 8 Jane 20 8 Bill 20 8 Lou 30 5 Bob 30 5 Paul 30 5 Lisa 30 5 Sue 30 5 Kate 40 2 Zak 40 2 15 rows selected.
The “partition by” splits the table up by dept# and treats each part separately.
Partition: For each employee, report how many people are in the same department and in the same grade:
select name, grade, dept#, count(*) over (partition by dept#, grade) people_count from employees; NAME GRADE DEPT# PEOPLE_COUNT ---- ----- ---------- ------------ Ann A 20 2 Bill A 20 2 Mark B 20 2 Fred B 20 2 Ben C 20 4 Jane C 20 4 Tom C 20 4 Hank C 20 4 Sue A 30 2 Lou A 30 2 Bob B 30 1 Lisa C 30 2 Paul C 30 2 Zak A 40 1 Kate B 40 1 15 rows selected.
Row_Number: We can put row numbers for each group (partition).
select name, row_number() over (partition by dept# order by dept#) rnum from employees; NAME RNUM ---- ---------- Fred 1 Tom 2 Hank 3 Ann 4 Ben 5 Mark 6 Jane 7 Bill 8 Lou 1 Bob 2 Paul 3 Lisa 4 Sue 5 Kate 1 Zak 2 15 rows selected.
Rank: We can put a ranking number, ordering by hiredate. (rank() puts a ranking on the “ordered by” column – here it is the hiredate). Number 1 is the first person hired.
select name, hiredate, rank() over (order by hiredate) hire_rank from employees; NAME HIREDATE HIRE_RANK ---- --------- ---------- Lou 23-MAR-00 1 Jane 23-MAR-00 1 Zak 20-APR-03 3 Bill 01-JUN-04 4 Ann 12-AUG-05 5 Sue 13-JUN-07 6 Tom 11-SEP-07 7 Kate 11-SEP-07 7 Hank 21-MAR-08 9 Paul 13-APR-08 10 Bob 10-SEP-08 11 Mark 05-NOV-08 12 Fred 12-SEP-09 13 Ben 10-AUG-10 14 Lisa 13-MAY-11 15 15 rows selected.
Dense_Rank: Note that in this listing there are 2 people in ranking 1 (hired on the same day) and then the next person has ranking of 3.
We can ensure that this gap does not occur by using “dense_rank”:
select name, hiredate, dense_rank() over (order by hiredate) hire_rank from employees; NAME HIREDATE HIRE_RANK ---- --------- ---------- Lou 23-MAR-00 1 Jane 23-MAR-00 1 Zak 20-APR-03 2 Bill 01-JUN-04 3 Ann 12-AUG-05 4 Sue 13-JUN-07 5 Tom 11-SEP-07 6 Kate 11-SEP-07 6 Hank 21-MAR-08 7 Paul 13-APR-08 8 Bob 10-SEP-08 9 Mark 05-NOV-08 10 Fred 12-SEP-09 11 Ben 10-AUG-10 12 Lisa 13-MAY-11 13 15 rows selected.
Lead/Lag: We can show for each employee what the next salary step and the previous salary step are within each grade:
select name, grade, salary, lead(salary, 1, 999999) over (partition by grade order by salary) next_step, lag(salary, 1, 0) over (partition by grade order by salary) previous_step from employees; NAME GRADE SALARY NEXT_STEP PREVIOUS_STEP ---- ----- ---------- ---------- ------------- Bill A 60000 67000 0 Ann A 67000 70000 60000 Sue A 70000 82000 67000 Zak A 82000 90000 70000 Lou A 90000 999999 82000 Kate B 21000 22000 0 Bob B 22000 50000 21000 Fred B 50000 55000 22000 Mark B 55000 999999 50000 Paul C 20000 23000 0 Lisa C 23000 24000 20000 Hank C 24000 30000 23000 Tom C 30000 35000 24000 Ben C 35000 50000 30000 Jane C 50000 999999 35000 15 rows selected.
Note that we use “lead” which means the row next to the current row being processed (here as we have ordered by salary the next row will be the next highest salary – within grade (partition by grade)). Similarly “lag” which means the row previous to the one being processed.
First_Value/Last_Value: Show the difference for each employee between their salary and the maximum and minimum in their grade:
select name, salary, grade, salary-first_value(salary) over (partition by grade order by salary) diff_min, last_value(salary) over (partition by grade order by salary rows between unbounded preceding and unbounded following)-salary diff_max from employees; NAME SALARY GRADE DIFF_MIN DIFF_MAX ---- ---------- ----- ---------- ---------- Bill 60000 A 0 30000 Ann 67000 A 7000 23000 Sue 70000 A 10000 20000 Zak 82000 A 22000 8000 Lou 90000 A 30000 0 Kate 21000 B 0 34000 Bob 22000 B 1000 33000 Fred 50000 B 29000 5000 Mark 55000 B 34000 0 Paul 20000 C 0 30000 Lisa 23000 C 3000 27000 Hank 24000 C 4000 26000 Tom 30000 C 10000 20000 Ben 35000 C 15000 15000 Jane 50000 C 30000 0 15 rows selected.
Note that you need to put “rows between unbounded preceding and unbounded following” in the last_value clause as otherwise the “last_value” is only from the first row processed up to the current row (ie the default is “rows between unbounded preceding and current row”).
Current Row (running total): To show how many employees there were already in the company when we hired each new employee:
select name, hiredate, count(*) over (order by hiredate rows between unbounded preceding and current row)-1 existing_emps from employees; NAME HIREDATE EXISTING_EMPS ---- --------- ------------- Lou 23-MAR-00 0 Jane 23-MAR-00 1 Zak 20-APR-03 2 Bill 01-JUN-04 3 Ann 12-AUG-05 4 Sue 13-JUN-07 5 Tom 11-SEP-07 6 Kate 11-SEP-07 7 Hank 21-MAR-08 8 Paul 13-APR-08 9 Bob 10-SEP-08 10 Mark 05-NOV-08 11 Fred 12-SEP-09 12 Ben 10-AUG-10 13 Lisa 13-MAY-11 14 15 rows selected.
This is a running total of the employees in the company.
Ratio_To_Report: Let’s look at the proportion that each person’s salary is contributing to the total salaries for each department (ie the relative cost of each employee):
SELECT name, dept#, salary, ROUND (ratio_to_report (salary) OVER (PARTITION BY dept#) * 100) Sal_percent FROM employees ORDER BY dept#, salary; NAME DEPT# SALARY SAL_PERCENT ---- ---------- ---------- ----------- Hank 20 24000 6 Tom 20 30000 8 Ben 20 35000 9 Jane 20 50000 13 Fred 20 50000 13 Mark 20 55000 15 Bill 20 60000 16 Ann 20 67000 18 Paul 30 20000 9 Bob 30 22000 10 Lisa 30 23000 10 Sue 30 70000 31 Lou 30 90000 40 Kate 40 21000 20 Zak 40 82000 80 15 rows selected.
AVG: Give me a list of all employees who are earning above the average salary in their department:
SELECT name, salary, dept# FROM (SELECT name, salary, dept#, salary - AVG (salary) OVER (PARTITION BY dept#) salary_diff FROM employees) salaries WHERE salaries.salary_diff > 0; NAME SALARY DEPT# ---- ---------- ---------- Fred 50000 20 Ann 67000 20 Mark 55000 20 Jane 50000 20 Bill 60000 20 Lou 90000 30 Sue 70000 30 Zak 82000 40 8 rows selected.
Grouping Sets: We would like to look at the total salaries by department and by grade. We could use a standard “group by”:
select dept#, grade, sum(salary) from employees group by dept#, grade;
Or use a more powerful “grouping sets” function:
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade)); DEPT# GRADE SUM(SALARY) ---------- ----- ----------- 20 A 127000 20 B 105000 20 C 139000 30 A 160000 30 B 22000 30 C 43000 40 A 82000 40 B 21000 8 rows selected.
If we would like to also show the sum for each department (irrespective of grade):
select dept#, grade, sum(salary) from employees group by dept#, grade UNION
select dept#, null, sum(salary) from employees group by dept#;
This is the same as the more efficient version below:
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#); DEPT# GRADE SUM(SALARY) ---------- ----- ----------- 20 A 127000 20 B 105000 20 C 139000 20 371000 30 A 160000 30 B 22000 30 C 43000 30 225000 40 A 82000 40 B 21000 40 103000 11 rows selected.
And to finally include the grand total of all salaries:
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, ()); DEPT# GRADE SUM(SALARY) ---------- ----- ----------- 20 A 127000 20 B 105000 20 C 139000 20 371000 30 A 160000 30 B 22000 30 C 43000 30 225000 40 A 82000 40 B 21000 40 103000 699000 12 rows selected.
Grouping: Now the problem is you can see that the Subtotals and Totals have null values in the columns that are being summed over (for example see the last 2 rows of the output in the previous example).
To solve this we can use the function ‘Grouping’, in this way:
select decode(grouping(dept#),1,'All Depts', dept#) dept, decode(grouping(grade),1, 'All Grades',grade) grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, ()); DEPT GRADE SUM(SALARY) --------- ---------- ----------- 20 A 127000 20 B 105000 20 C 139000 20 All Grades 371000 30 A 160000 30 B 22000 30 C 43000 30 All Grades 225000 40 A 82000 40 B 21000 40 All Grades 103000 All Depts All Grades 699000 12 rows selected.
Cube: We can use the “Cube” function to group also:
select dept#, grade, sum(salary) from employees group by cube(dept#, grade); DEPT# GRADE SUM(SALARY) ---------- ---------- ----------- 699000 A 369000 B 148000 C 182000 20 371000 20 A 127000 20 B 105000 20 C 139000 30 225000 30 A 160000 30 B 22000 30 C 43000 40 103000 40 A 82000 40 B 21000 15 rows selected.
This gives us all possible groupings and is the same as:
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, grade, ());
Rollup: We can use “rollup” to perform aggregations like the cube but only considering the columns as a hierarchy, eg:
select dept#, grade, sum(salary) from employees group by rollup(dept#, grade); DEPT# GRADE SUM(SALARY) ---------- ---------- ----------- 20 A 127000 20 B 105000 20 C 139000 20 371000 30 A 160000 30 B 22000 30 C 43000 30 225000 40 A 82000 40 B 21000 40 103000 699000 12 rows selected.
This is the same as
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, ());
Grouping_ID: when we carry out a lot of grouping we may want to filter some of the rows to obtain just the totals at a certain level.
This can be done using the “Grouping_ID” function.
Let’s just get the department level summary from the previous query:
select dept#, grade, sum(salary) from employees group by cube(dept#, grade) having grouping_id(dept#, grade)=1; DEPT# GRADE SUM(SALARY) ---------- ---------- ----------- 20 371000 30 225000 40 103000
To get aggregations only across dept# use:
select dept#, grade, sum(salary) from employees group by cube(dept#, grade) having grouping_id(dept#, grade)=2; DEPT# GRADE SUM(SALARY) ---------- ---------- ----------- A 369000 B 148000 C 182000
To get just the total of all, use:
select dept#, grade, sum(salary) from employees group by cube(dept#, grade) having grouping_id(dept#, grade)=3; DEPT# GRADE SUM(SALARY) ---------- ---------- ----------- 699000
I hope these examples have helped you in some way to become more confident with analytic functions. They may not be functions you use every day, but when the situation calls for them (basically aggregates and group work) they are very handy.