Some Analytic Function and Grouping Examples

There is a lot of information around regarding analytic functions however I haven’t been able to find many good examples.
Here I do not provide a lot of explanation but I do provide a number of (hopefully clear) examples.

Analytic functions enable us to manipulate a table in “parts” or partitions as well as allowing a sequential processing of rows making reference to earlier and later rows. They are used for grouping and summary reports.
Although many of the functions can be carried out with standard SQL, analytic functions allow us to simplify otherwise complex SQL as well as generally producing results more efficiently than the equivalent SQL.

It is easiest to demonstrate how these functions work with a number of examples.
Imagine we have a table of employees:

CREATE TABLE employees(name VARCHAR2 (4), dept# INTEGER,
salary NUMBER, hiredate DATE, grade CHAR (1));
....(INSERTS)
select * from employees;
NAME      DEPT#     SALARY HIREDATE  GRADE
----  ---------   -------- --------  -----
Fred         20      50000 12-SEP-09 B
Tom          20      30000 11-SEP-07 C
Bill         20      60000 01-JUN-04 A
Jane         20      50000 23-MAR-00 C
Mark         20      55000 05-NOV-08 B
Ben          20      35000 10-AUG-10 C
Ann          20      67000 12-AUG-05 A
Hank         20      24000 21-MAR-08 C
Sue          30      70000 13-JUN-07 A
Lisa         30      23000 13-MAY-11 C
Paul         30      20000 13-APR-08 C
Bob          30      22000 10-SEP-08 B
Lou          30      90000 23-MAR-00 A
Kate         40      21000 11-SEP-07 B
Zak          40      82000 20-APR-03 A

15 rows selected.

Partition: For each employee, report how many people are in their department:

select name, dept#, count(*) over (partition by dept#) people_count from employees;

NAME      DEPT# PEOPLE_COUNT
---- ---------- ------------
Fred         20            8
Tom          20            8
Hank         20            8
Ann          20            8
Ben          20            8
Mark         20            8
Jane         20            8
Bill         20            8
Lou          30            5
Bob          30            5
Paul         30            5
Lisa         30            5
Sue          30            5
Kate         40            2
Zak          40            2

15 rows selected.

The “partition by” splits the table up by dept# and treats each part separately.

Partition: For each employee, report how many people are in the same department and in the same grade:

select name, grade, dept#, count(*) over (partition by dept#, grade) people_count from employees;

NAME GRADE      DEPT# PEOPLE_COUNT
---- ----- ---------- ------------
Ann  A             20            2
Bill A             20            2
Mark B             20            2
Fred B             20            2
Ben  C             20            4
Jane C             20            4
Tom  C             20            4
Hank C             20            4
Sue  A             30            2
Lou  A             30            2
Bob  B             30            1
Lisa C             30            2
Paul C             30            2
Zak  A             40            1
Kate B             40            1

15 rows selected.

Row_Number: We can put row numbers for each group (partition).

select name, row_number() over (partition by dept# order by dept#) rnum from employees;

NAME       RNUM
---- ----------
Fred          1
Tom           2
Hank          3
Ann           4
Ben           5
Mark          6
Jane          7
Bill          8
Lou           1
Bob           2
Paul          3
Lisa          4
Sue           5
Kate          1
Zak           2

15 rows selected.

Rank: We can put a ranking number, ordering by hiredate. (rank() puts a ranking on the “ordered by” column – here it is the hiredate). Number 1 is the first person hired.

select name, hiredate, rank() over (order by hiredate) hire_rank from employees;

NAME HIREDATE   HIRE_RANK
---- --------- ----------
Lou  23-MAR-00          1
Jane 23-MAR-00          1
Zak  20-APR-03          3
Bill 01-JUN-04          4
Ann  12-AUG-05          5
Sue  13-JUN-07          6
Tom  11-SEP-07          7
Kate 11-SEP-07          7
Hank 21-MAR-08          9
Paul 13-APR-08         10
Bob  10-SEP-08         11
Mark 05-NOV-08         12
Fred 12-SEP-09         13
Ben  10-AUG-10         14
Lisa 13-MAY-11         15

15 rows selected.

Dense_Rank: Note that in this listing there are 2 people in ranking 1 (hired on the same day) and then the next person has ranking of 3.
We can ensure that this gap does not occur by using “dense_rank”:

select name, hiredate, dense_rank() over (order by hiredate) hire_rank from employees;

NAME HIREDATE   HIRE_RANK
---- --------- ----------
Lou  23-MAR-00          1
Jane 23-MAR-00          1
Zak  20-APR-03          2
Bill 01-JUN-04          3
Ann  12-AUG-05          4
Sue  13-JUN-07          5
Tom  11-SEP-07          6
Kate 11-SEP-07          6
Hank 21-MAR-08          7
Paul 13-APR-08          8
Bob  10-SEP-08          9
Mark 05-NOV-08         10
Fred 12-SEP-09         11
Ben  10-AUG-10         12
Lisa 13-MAY-11         13

15 rows selected.

Lead/Lag: We can show for each employee what the next salary step and the previous salary step are within each grade:

select name, grade, salary, lead(salary, 1, 999999) over (partition by grade order by salary) next_step,
lag(salary, 1, 0) over (partition by grade order by salary) previous_step from employees;

NAME GRADE     SALARY  NEXT_STEP PREVIOUS_STEP
---- ----- ---------- ---------- -------------
Bill A          60000      67000             0
Ann  A          67000      70000         60000
Sue  A          70000      82000         67000
Zak  A          82000      90000         70000
Lou  A          90000     999999         82000
Kate B          21000      22000             0
Bob  B          22000      50000         21000
Fred B          50000      55000         22000
Mark B          55000     999999         50000
Paul C          20000      23000             0
Lisa C          23000      24000         20000
Hank C          24000      30000         23000
Tom  C          30000      35000         24000
Ben  C          35000      50000         30000
Jane C          50000     999999         35000

15 rows selected.

Note that we use “lead” which means the row next to the current row being processed (here as we have ordered by salary the next row will be the next highest salary – within grade (partition by grade)). Similarly “lag” which means the row previous to the one being processed.

First_Value/Last_Value: Show the difference for each employee between their salary and the maximum and minimum in their grade:

select name, salary, grade, salary-first_value(salary) over (partition by grade order by salary) diff_min,
last_value(salary) over (partition by grade order by salary rows between unbounded
preceding and unbounded following)-salary diff_max from employees;

NAME     SALARY GRADE   DIFF_MIN   DIFF_MAX
---- ---------- ----- ---------- ----------
Bill      60000 A              0      30000
Ann       67000 A           7000      23000
Sue       70000 A          10000      20000
Zak       82000 A          22000       8000
Lou       90000 A          30000          0
Kate      21000 B              0      34000
Bob       22000 B           1000      33000
Fred      50000 B          29000       5000
Mark      55000 B          34000          0
Paul      20000 C              0      30000
Lisa      23000 C           3000      27000
Hank      24000 C           4000      26000
Tom       30000 C          10000      20000
Ben       35000 C          15000      15000
Jane      50000 C          30000          0

15 rows selected.

Note that you need to put “rows between unbounded preceding and unbounded following” in the last_value clause as otherwise the “last_value” is only from the first row processed up to the current row (ie the default is “rows between unbounded preceding and current row”).

Current Row (running total): To show how many employees there were already in the company when we hired each new employee:

select name, hiredate, count(*) over (order by hiredate rows between unbounded preceding and current row)-1 existing_emps from employees;

NAME HIREDATE  EXISTING_EMPS
---- --------- -------------
Lou  23-MAR-00             0
Jane 23-MAR-00             1
Zak  20-APR-03             2
Bill 01-JUN-04             3
Ann  12-AUG-05             4
Sue  13-JUN-07             5
Tom  11-SEP-07             6
Kate 11-SEP-07             7
Hank 21-MAR-08             8
Paul 13-APR-08             9
Bob  10-SEP-08            10
Mark 05-NOV-08            11
Fred 12-SEP-09            12
Ben  10-AUG-10            13
Lisa 13-MAY-11            14

15 rows selected.

This is a running total of the employees in the company.

Ratio_To_Report: Let’s look at the proportion that each person’s salary is contributing to the total salaries for each department (ie the relative cost of each employee):

SELECT name, dept#, salary,
ROUND (ratio_to_report (salary) OVER (PARTITION BY dept#) * 100) Sal_percent
FROM employees
ORDER BY dept#, salary;

NAME      DEPT#     SALARY SAL_PERCENT
---- ---------- ---------- -----------
Hank         20      24000           6
Tom          20      30000           8
Ben          20      35000           9
Jane         20      50000          13
Fred         20      50000          13
Mark         20      55000          15
Bill         20      60000          16
Ann          20      67000          18
Paul         30      20000           9
Bob          30      22000          10
Lisa         30      23000          10
Sue          30      70000          31
Lou          30      90000          40
Kate         40      21000          20
Zak          40      82000          80

15 rows selected.

AVG: Give me a list of all employees who are earning above the average salary in their department:

SELECT name, salary, dept#
FROM (SELECT name, salary, dept#,
salary - AVG (salary) OVER (PARTITION BY dept#) salary_diff
FROM employees) salaries
WHERE salaries.salary_diff > 0;

NAME     SALARY      DEPT#
---- ---------- ----------
Fred      50000         20
Ann       67000         20
Mark      55000         20
Jane      50000         20
Bill      60000         20
Lou       90000         30
Sue       70000         30
Zak       82000         40

8 rows selected.

Grouping Sets: We would like to look at the total salaries by department and by grade. We could use a standard “group by”:
select dept#, grade, sum(salary) from employees group by dept#, grade;
Or use a more powerful “grouping sets” function:

select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade));

DEPT# GRADE SUM(SALARY)
---------- ----- -----------
20 A          127000
20 B          105000
20 C          139000
30 A          160000
30 B           22000
30 C           43000
40 A           82000
40 B           21000

8 rows selected.

If we would like to also show the sum for each department (irrespective of grade):
select dept#, grade, sum(salary) from employees group by dept#, grade UNION
select dept#, null, sum(salary) from employees group by dept#;
This is the same as the more efficient version below:

select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#);

DEPT# GRADE SUM(SALARY)
---------- ----- -----------
20 A          127000
20 B          105000
20 C          139000
20            371000
30 A          160000
30 B           22000
30 C           43000
30            225000
40 A           82000
40 B           21000
40            103000

11 rows selected.

And to finally include the grand total of all salaries:

select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, ());

DEPT# GRADE SUM(SALARY)
---------- ----- -----------
20 A          127000
20 B          105000
20 C          139000
20            371000
30 A          160000
30 B           22000
30 C           43000
30            225000
40 A           82000
40 B           21000
40            103000
699000

12 rows selected.

Grouping: Now the problem is you can see that the Subtotals and Totals have null values in the columns that are being summed over (for example see the last 2 rows of the output in the previous example).
To solve this we can use the function ‘Grouping’, in this way:

select decode(grouping(dept#),1,'All Depts', dept#) dept,
decode(grouping(grade),1, 'All Grades',grade) grade, sum(salary)
from employees group by grouping sets((dept#, grade), dept#, ());

DEPT      GRADE      SUM(SALARY)
--------- ---------- -----------
20        A               127000
20        B               105000
20        C               139000
20        All Grades      371000
30        A               160000
30        B                22000
30        C                43000
30        All Grades      225000
40        A                82000
40        B                21000
40        All Grades      103000
All Depts All Grades      699000

12 rows selected.

Cube: We can use the “Cube” function to group also:

select dept#, grade, sum(salary) from employees group by cube(dept#, grade);

DEPT# GRADE      SUM(SALARY)
---------- ---------- -----------
699000
A               369000
B               148000
C               182000
20                 371000
20 A               127000
20 B               105000
20 C               139000
30                 225000
30 A               160000
30 B                22000
30 C                43000
40                 103000
40 A                82000
40 B                21000

15 rows selected.

This gives us all possible groupings and is the same as:
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, grade, ());

Rollup: We can use “rollup” to perform aggregations like the cube but only considering the columns as a hierarchy, eg:

select dept#, grade, sum(salary) from employees group by rollup(dept#, grade);

DEPT# GRADE      SUM(SALARY)
---------- ---------- -----------
20 A               127000
20 B               105000
20 C               139000
20                 371000
30 A               160000
30 B                22000
30 C                43000
30                 225000
40 A                82000
40 B                21000
40                 103000
699000

12 rows selected.

This is the same as
select dept#, grade, sum(salary) from employees group by grouping sets((dept#, grade), dept#, ());

Grouping_ID: when we carry out a lot of grouping we may want to filter some of the rows to obtain just the totals at a certain level.
This can be done using the “Grouping_ID” function.
Let’s just get the department level summary from the previous query:

select dept#, grade, sum(salary) from employees group by cube(dept#, grade) having grouping_id(dept#, grade)=1;

DEPT# GRADE      SUM(SALARY)
---------- ---------- -----------
20                 371000
30                 225000
40                 103000

To get aggregations only across dept# use:

select dept#, grade, sum(salary) from employees group by cube(dept#, grade) having grouping_id(dept#, grade)=2;

DEPT# GRADE      SUM(SALARY)
---------- ---------- -----------
A               369000
B               148000
C               182000

To get just the total of all, use:

select dept#, grade, sum(salary) from employees group by cube(dept#, grade) having grouping_id(dept#, grade)=3;

DEPT# GRADE      SUM(SALARY)
---------- ---------- -----------
699000

I hope these examples have helped you in some way to become more confident with analytic functions. They may not be functions you use every day, but when the situation calls for them (basically aggregates and group work) they are very handy.

About DBAdmin

Most of what I post here will be specific to the particular system I am currently working on. When I have time I will try to put some generic information too.
This entry was posted in How things work. Bookmark the permalink.

Leave a comment