Step by Step: WHERE, GROUP BY, and HAVING

People new to SQL often have trouble with the difference between WHERE, GROUP BY, and HAVING…three separate clauses in the SELECT statement. The summary of each of these is simple to tell:

  • WHERE lets you limit the rows returned.
  • GROUP BY lets you perform aggregations (like count, sum, average, minimum, and maximum).
  • HAVING lets you limit the rows returned, based on aggregated values.

But that doesn’t really sink in well, so we’ll walk through some examples.  We have a simple table (“Livestock”) that tracks livestock on different farms:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122
The Rocking J Cow Huberta 1620
Lazy River Sheep Murray 194
Lazy River Cow Zeke 1704
Lazy River Goat Bill 52

We can get different things out of it by using WHERE, GROUP BY, and/or HAVING.

IMPORTANT: You can play along at home with this script; it has the table definition, data, and all of the examples below (plus a few). For that matter, if you can’t stand long blog posts, just grab the script and go. I don’t mind.

WHERE

To limit the data I pull back – i.e., to limit what rows I get back – I use WHERE.  Maybe I only want data from The Rocking J. I can do that with WHERE:

SELECT  Farm, AnimalType, AnimalName, AnimalWeightLbs
FROM    dbo.Livestock
WHERE   Farm = 'The Rocking J';

Result:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122
The Rocking J Cow Huberta 1620

If I only want to see sheep, from all farms:

SELECT  Farm, AnimalType, AnimalName, AnimalWeightLbs
FROM    dbo.Livestock
WHERE  AnimalType = 'Sheep';

Result:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122
Lazy River Sheep Murray 194

And if I only want to see data on sheep from The Rocking J:

SELECT  Farm, AnimalType, AnimalName, AnimalWeightLbs
FROM    dbo.Livestock
WHERE   Farm = 'The Rocking J'
        AND AnimalType = 'Sheep';

Result:

Farm AnimalType AnimalName AnimalWeightLbs
The Rocking J Sheep Andrew 165
The Rocking J Sheep Bailey 180
The Rocking J Sheep Caroline 122

 

Conclusion: WHERE lets you limit the rows returned.

GROUP BY

Group by is very different. It lets you divide your data up into groups. We can group our data and not bother to do any aggregations:

SELECT  Farm, AnimalType
FROM    dbo.Livestock
GROUP BY Farm, AnimalType;

All this does for us is to get a distinct list of Farms, and the animals that each farm happens to have:

Farm AnimalType
The Rocking J Sheep
The Rocking J Cow
Lazy River Sheep
Lazy River Cow
Lazy River Goat

(There are other ways to get this, too, but that’s not what we’re talking about right now. And this is a perfectly reasonable use for GROUP BY.)


 

But, what if we want a COUNT of each kind of animal at each farm? Same thing, but add in the built-in function COUNT():

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
FROM    dbo.Livestock
GROUP BY Farm, AnimalType;

Result:

Farm AnimalType AnimalCount
The Rocking J Sheep 3
The Rocking J Cow 1
Lazy River Sheep 1
Lazy River Cow 1
Lazy River Goat 1

 

There! We were able to tell SQL, “I want to make groups, defined by the farm and animal type. And then I want a count of each group’s members.”


 

We could do any aggregation, or any bunch of aggregations, on this group. In addition to the count, let’s get the average weight, maximum weight, and minimum weight of the animals in each group:

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
                , AVG(AnimalWeightLbs) AS AvgWeightLbs
                , MIN(AnimalWeightLbs) AS MinWeightLbs
                , MAX(AnimalWeightLbs) AS MaxWeightLbs
FROM    dbo.Livestock
GROUP BY Farm, AnimalType;

Results:

Farm AnimalType AnimalCount AvgWeightLbs MinWeightLbs MaxWeightLbs
The Rocking J Sheep 3 155 122 180
The Rocking J Cow 1 1620 1620 1620
Lazy River Sheep 1 194 194 194
Lazy River Cow 1 1704 1704 1704
Lazy River Goat 1 52 52 52

Side note: GROUP BY and WHERE

A quick example:

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
                , AVG(AnimalWeightLbs) AS AvgWeightLbs
                , MIN(AnimalWeightLbs) AS MinWeightLbs
                , MAX(AnimalWeightLbs) AS MaxWeightLbs
FROM    dbo.Livestock
WHERE Farm = 'The Rocking J'
GROUP BY Farm, AnimalType;

Don’t get freaked out by GROUP BY and WHERE. Take it step by step: if you had to do this by hand, the simplest thing is to narrow down your data first (WHERE Farm = The Rocking J), and THEN divide the remaining data into groups (GROUP BY Farm, AnimalType).

The resultset from our query above:

Farm AnimalType AnimalCount AvgWeightLbs MinWeightLbs MaxWeightLbs
The Rocking J Cow 1 1620 1620 1620
The Rocking J Sheep 3 155 122 180

HAVING

“Having” works on aggregations.

A query to pull back data on animals that weigh less than 200# would use a WHERE clause: “WHERE AnimalWeightLbs < 200”.

But if I want to pull back data on groups with an average weight less than 200#, I can’t use WHERE, because WHERE won’t work on aggregates. I have to use GROUP BY and HAVING:

SELECT  Farm, AnimalType, COUNT(*) AS AnimalCount
                , AVG(AnimalWeightLbs) AS AvgWeightLbs
FROM    dbo.Livestock
GROUP BY Farm, AnimalType
HAVING AVG(AnimalWeightLbs) < 200;

Result:

Farm AnimalType AnimalCount AvgWeightLbs
The Rocking J Sheep 3 155
Lazy River Sheep 1 194
Lazy River Goat 1 52

Bonus and Conclusion

There are other things to know and play with, of course. (For example, you can GROUP BY and perform HAVING on columns that aren’t actually in your SELECT list.) But that’s info for another time.

So, like it says at the top:

  • WHERE lets you limit the rows returned.
  • GROUP BY lets you perform aggregations (like count, sum, average, minimum, and maximum).
  • HAVING lets you limit the rows returned, based on aggregated values.

I have a blog that expands on these topics – SELECT, Deconstructed – but we really need a targeted introduction here.

Remember to download the script with all the examples, and play with the code yourself. That’s the best way to learn!

Happy days,
Jen McCown
www.MidnightDBA.com/Jen

Practice!

PASS Summit 2016: Recommendations and Invitations (#sqlpass #sqlsummit)

PASS_16_Twibbon_Attending_200x200

The last week of October, we’ll be at the PASS Summit in Seattle, WA.  As some of you already know – and as I said last year and the year before – “this is the highlight of most SQL Server speakers’ years…a sort of educational, Christmassy nerdfest, with a feel of a class reunion mixed in.”  It’s gonna be a blast, again-again, and so here’s some of the stuff you should know about.

Edit for PSA: If you’re the type to get a flu shot, NOW would be a good time. #NerdFlu is a thing. Got to give the vaccination time to take effect!

Beforehand

You really ought to register, you know, and while you’re at it you might want to sign up for an all day pre-conference session (a “precon”).

Get on Twitter to connect with other attendees. #Summit16 is the official hashtag, but most of us tend to use the more general #sqlpass.

24 Hours of PASS – 24 hours of live, free webcasts – is September 7.

Taking part, if you can’t attend

The keynotes and luncheons are usually live broadcast, and you can order session recordings to cuddle up with at home. I suppose you can put on a kilt, go hit a karaoke bar with friends, drink too much, and contract the nerd flu, if it’ll make you feel more like you were at Summit. That’s what I’d do.

AND! PASS has announced that they will be live streaming keynotes, interviews, sessions, and lightning talks on PASSTV next week!

Some of the best stuff, if you’re there

I’ll update this section as new stuff is announced.

Monday

Right before the Summit starts:

  • There is no official #SQLLongRun this year that I know of, but I’m thinking of doing my own personal long run the Monday of the Summit. So, you know, that’s a thing.
  • Monday, October 24  (5-7pm), have a meetup with the MidnightDBAs at Top Pot Doughnuts on 5th Ave. Yes, it’s donuts for dinner, and we’re fine with that. Or consider it a sweet, sweet appetizer. Register now!
  • There’s also the Monday Night Networking Event that Steve Jones and Andy Warren put on. Sign up for a 15 minute spot and get in there!

Tuesday

During the Summit:

W-Th-F

There are always more events, so keep an eye on Twitter (on the official #sqlsummit hashtag!) and on the Summit website for karaoke, Kilt Day, and more.

Happy days,
Jen McCown
http://www.MidnightDBA.com/Jen

P.S. For those who missed something in particular from this announcement, email or DM or email me with your questions. Apologies to the rest of you for the cryptic postscript.

October precon in Orlando: “The Lifecycle Approach: Maintenance Problems & Solutions”

Florida!On October 7 we will be presenting a full day of maintenance and backup learnin’!

Register here: https://www.eventbrite.com/e/sql-saturday-564-the-lifecycle-approach-maintenance-problems-solutions-tickets-26867268724

And while you’re at it, register for the free SQL Saturday Orlando that follows! http://www.sqlsaturday.com/564/eventhome.aspx

Abstract for “The Lifecycle Approach”

Database maintenance isn’t a single event; it’s a lifecycle. Aside from the maintenance event itself, there’s also reporting, error management, specialized configs, HA/DR and more.  In this full day session, we’ll explore the many issues surrounding maintenance and how to solve them in a way that is cohesive and comprehensive.  If you’re managing more than one server, come to this session.

Learn how to:

  • Deploy maintenance routines to hundreds of servers, effortlessly
  • Standardize configuration throughout the enterprise
  • Set up custom alerts and reports.
  • Manage HA/DR failover scenarios and keep your maintenance intact.
  • Minimize the number of jobs
  • Tune backups and database integrity checks

For maintenance, you’ll learn how to:

  • Order index operations to maintain the most important tables first
  • Maximize your maintenance window by fine-tuning your routine.
  • Handle the most difficult CheckDB scenarios with dynamic custom snapshots, multi-threading and CheckDB offloading.

And for backups specifically, you’ll learn to:

  • Automate management of failed backups
  • Ensure that certificates and encrypted passwords are securely backed up
  • Automate configuration changes between HA/DR sites, in case of failover
  • Turn trace flags on or off automatically

Real news, real tech.