Monday, July 20, 2009

How to use GROUP BY and HAVING clause to aggregate rows

The number 1 problem for newbies using SQL is the requirement to think in terms of sets. For someone who primarily uses programming languages, such as Java or C++, they are used to thinking in terms of individual operations.

Example: If you have a list of objects that you need to process you will do this by using a loop, and go through each object in the list.

However in SQL the way to do a loop is by using a cursor. This literally goes through all rows, one row at a time. Problem is SQL is not designed to be used like this, so it's totally inefficient!

In SQL whenever you need to aggregate rows and perform calculations on them as a whole you can use the GROUP BY and HAVING clauses.

So what are these?

GROUP BY: You group by all columns that are not aggregate list. For example, lets say you want to calculate the average age of all people with the same last name.
You would use this:
SELECT LastName, AVG(Age) as [Average Age]
FROM People
GROUP BY LastName

And the results would be like:
LastName Average Age
Smith 20
Obama 40
Bush 50

Here's a good guide for additional help: http://www.w3schools.com/sql/sql_groupby.asp

HAVING: This is used for when you want to only get groups that fit a certain criteria. For example, you want to get a list of all last name's for which the average age is > 20. You would use the following:

SELECT LastName, AVG(Age) as [Average Age] FROM People
GROUP BY LastName
HAVING AVG(Age) > 20

Using the previous example's query, the results would be:
LastName Average Age
Obama 40
Bush 50

*Notice how Smith 20 is missing? :)

No comments:

Post a Comment

There was an error in this gadget