Step by Step: WHERE, GROUP BY, and HAVING

People new to SQL often have trouble with the difference between WHERE, GROUP BY, and HAVING…three separate clauses in the SELECT statement. The summary of each of these is simple to tell:

  • WHERE lets you limit the rows returned.
  • GROUP BY lets you perform aggregations (like count, sum, average, minimum, and maximum).
  • HAVING lets you limit the rows returned, based on aggregated values.

But that doesn’t really sink in well, so we’ll walk through some examples.  We have a simple table (“Livestock”) that tracks livestock on different farms:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122
The Rocking J Cow Huberta 1620
Lazy River Sheep Murray 194
Lazy River Cow Zeke 1704
Lazy River Goat Bill 52

We can get different things out of it by using WHERE, GROUP BY, and/or HAVING.

IMPORTANT: You can play along at home with this script; it has the table definition, data, and all of the examples below (plus a few). For that matter, if you can’t stand long blog posts, just grab the script and go. I don’t mind.

WHERE

To limit the data I pull back – i.e., to limit what rows I get back – I use WHERE.  Maybe I only want data from The Rocking J. I can do that with WHERE:

SELECT  Farm, AnimalType, AnimalName, AnimalWeightLbs
FROM    dbo.Livestock
WHERE   Farm = 'The Rocking J';

Result:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122
The Rocking J Cow Huberta 1620

If I only want to see sheep, from all farms:

SELECT  Farm, AnimalType, AnimalName, AnimalWeightLbs
FROM    dbo.Livestock
WHERE  AnimalType = 'Sheep';

Result:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122
Lazy River Sheep Murray 194

And if I only want to see data on sheep from The Rocking J:

SELECT  Farm, AnimalType, AnimalName, AnimalWeightLbs
FROM    dbo.Livestock
WHERE   Farm = 'The Rocking J'
        AND AnimalType = 'Sheep';

Result:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122

 

Conclusion: WHERE lets you limit the rows returned.

GROUP BY

Group by is very different. It lets you divide your data up into groups. We can group our data and not bother to do any aggregations:

SELECT  Farm, AnimalType
FROM    dbo.Livestock
GROUP BY Farm, AnimalType;

All this does for us is to get a distinct list of Farms, and the animals that each farm happens to have:

Farm AnimalType
The Rocking J Sheep
The Rocking J Cow
Lazy River Sheep
Lazy River Cow
Lazy River Goat

(There are other ways to get this, too, but that’s not what we’re talking about right now. And this is a perfectly reasonable use for GROUP BY.)


 

But, what if we want a COUNT of each kind of animal at each farm? Same thing, but add in the built-in function COUNT():

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
FROM    dbo.Livestock
GROUP BY Farm, AnimalType;

Result:

Farm AnimalType AnimalCount
The Rocking J Sheep 3
The Rocking J Cow 1
Lazy River Sheep 1
Lazy River Cow 1
Lazy River Goat 1

 

There! We were able to tell SQL, “I want to make groups, defined by the farm and animal type. And then I want a count of each group’s members.”


 

We could do any aggregation, or any bunch of aggregations, on this group. In addition to the count, let’s get the average weight, maximum weight, and minimum weight of the animals in each group:

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
                , AVG(AnimalWeightLbs) AS AvgWeightLbs
                , MIN(AnimalWeightLbs) AS MinWeightLbs
                , MAX(AnimalWeightLbs) AS MaxWeightLbs
FROM    dbo.Livestock
GROUP BY Farm, AnimalType;

Results:

Farm AnimalType AnimalCount AvgWeightLbs MinWeightLbs MaxWeightLbs
The Rocking J Sheep 3 155 122 180
The Rocking J Cow 1 1620 1620 1620
Lazy River Sheep 1 194 194 194
Lazy River Cow 1 1704 1704 1704
Lazy River Goat 1 52 52 52

Side note: GROUP BY and WHERE

A quick example:

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
                , AVG(AnimalWeightLbs) AS AvgWeightLbs
                , MIN(AnimalWeightLbs) AS MinWeightLbs
                , MAX(AnimalWeightLbs) AS MaxWeightLbs
FROM    dbo.Livestock
WHERE Farm = 'The Rocking J'
GROUP BY Farm, AnimalType;

Don’t get freaked out by GROUP BY and WHERE. Take it step by step: if you had to do this by hand, the simplest thing is to narrow down your data first (WHERE Farm = The Rocking J), and THEN divide the remaining data into groups (GROUP BY Farm, AnimalType).

The resultset from our query above:

Farm AnimalType AnimalCount AvgWeightLbs MinWeightLbs MaxWeightLbs
The Rocking J Cow 1 1620 1620 1620
The Rocking J Sheep 3 155 122 180

HAVING

“Having” works on aggregations.

A query to pull back data on animals that weigh less than 200# would use a WHERE clause: “WHERE AnimalWeightLbs < 200”.

But if I want to pull back data on groups with an average weight less than 200#, I can’t use WHERE, because WHERE won’t work on aggregates. I have to use GROUP BY and HAVING:

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
                , AVG(AnimalWeightLbs) AS AvgWeightLbs
FROM    dbo.Livestock
GROUP BY Farm, AnimalType
HAVING AVG(AnimalWeightLbs) < 200;

Result:

Farm AnimalType AnimalCount AvgWeightLbs
The Rocking J Sheep 3 155
Lazy River Sheep 1 194
Lazy River Goat 1 52

Bonus and Conclusion

There are other things to know and play with, of course. (For example, you can GROUP BY and perform HAVING on columns that aren’t actually in your SELECT list.) But that’s info for another time.

So, like it says at the top:

  • WHERE lets you limit the rows returned.
  • GROUP BY lets you perform aggregations (like count, sum, average, minimum, and maximum).
  • HAVING lets you limit the rows returned, based on aggregated values.

I have a blog that expands on these topics – SELECT, Deconstructed – but we really need a targeted introduction here.

Remember to download the script with all the examples, and play with the code yourself. That’s the best way to learn!

Happy days,
Jen McCown
www.MidnightDBA.com/Jen

Practice!