For this tutorial, let’s take a look at some tables which contain information about a group of 8 pupils at a school, and their respective test scores
| pupil_id | pupil_name | score |
| 1 | Alan | 95 |
| 2 | Billy | 64 |
| 3 | Craig | 77 |
| 4 | David | 54 |
| 5 | Edward | 88 |
| 6 | Fred | 59 |
| 7 | George | 44 |
| 8 | Harry | 59 |
Here you can see a simple table of pupil names and their respective scores in a test. To load up this data yourself, into a MySQL file, run the following SQL command in your management software. This will delete any existing table named “pupil_scores”
DROP TABLE IF EXISTS `pupil_scores`; CREATE TABLE IF NOT EXISTS `pupil_scores` ( `pupil_id` int(11) NOT NULL AUTO_INCREMENT, `score` int(11) NOT NULL, `pupil_name` varchar(64) NOT NULL, PRIMARY KEY (`pupil_id`), KEY `score` (`score`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ; INSERT INTO `pupil_scores` (`pupil_id`, `score`, `pupil_name`) VALUES (1, 95, 'Alan'), (2, 64, 'Billy'), (3, 77, 'Craig'), (4, 54, 'David'), (5, 88, 'Edward'), (6, 59, 'Fred'), (7, 44, 'George'), (8, 59, 'Harry');
Let’s take a look at a couple of SQL queries which will show us some interesting data about these pupils and their test scores
SELECT AVG(score) FROM pupil_scores;
This gives us
AVG(score) 67.5000
Which tells us the the average pupil score is, indeed, 67.5. Here, we are using the MySQL function AVG (average) to specify that we wish to return the column “score”, but instead of returining all values for “score”, we wish to return only the average value of “score”. A similar function is SUM()
SELECT SUM(score) FROM pupil_scores;
Producing the following, the SUM total of all the values held in the “score” column (although not of great value to us, since there’s no useful reason for knowing the total score over all the pupils)
SUM(score) 540
Notice how the column names “AVG(score)” or “SUM(score)” have been automatically generated based on what we are trying to select? It’s possible to generate custom column names when selecting using the AS keyword, as the following example shows
SELECT MIN(score) AS lowest, MAX(score) AS highest FROM pupil_scores;
Which gives us
lowest highest 44 95
This tells us that the lowest score of all the pupils is 44, and the highest is 95. See here how we’ve defined custom column names, “lowest” and “highest”.
Now, we’re going to take a look at some more complex data. What if we wanted to generate reports based on multiple scores per pupil, over several subjects? We can use the technique discussed in the previous tutorial, to JOIN tables together so that we can perform single queries over several tables to retrieve our desired values. We’ll also see how we can use the GROUP BY statement to organise our results to fit our needs
Pupils table
| pupil_id | pupil_name |
| 1 | Alan |
| 2 | Billy |
| 3 | Craig |
| 4 | David |
| 5 | Edward |
| 6 | Fred |
| 7 | George |
| 8 | Harry |
Test score information
This table contains references to pupil_id and subject_id so we can store multiple scores for each pupil. For example, pupil 1 (Alan) scored 78% in subject 2 (Science). This is known as a many-to-many relationship. One pupil “has many” subjects, and one subject “has many” pupiles
| pupil_id | subject_id | score |
| 1 | 1 | 94 |
| 1 | 2 | 78 |
| 1 | 3 | 81 |
| 2 | 2 | 74 |
| 2 | 4 | 76 |
| 3 | 1 | 63 |
| 3 | 3 | 57 |
| 4 | 2 | 72 |
| 5 | 1 | 92 |
| 5 | 2 | 91 |
| 5 | 4 | 82 |
| 6 | 1 | 84 |
| 6 | 2 | 39 |
| 6 | 4 | 66 |
| 7 | 2 | 43 |
| 7 | 4 | 35 |
| 8 | 3 | 55 |
Subjects table
| subject_id | subject_name |
| 1 | Maths |
| 2 | Science |
| 3 | French |
| 4 | Geography |
So that you can follow along with the examples in this tutorial, and experiment on the values for yourself, run the following MySQL code in your MySQL management software. This will erase any existing tables named “pupils”, “subjects” or “pupils_subjects”
DROP TABLE IF EXISTS `pupils`; CREATE TABLE IF NOT EXISTS `pupils` ( `pupil_id` int(11) NOT NULL AUTO_INCREMENT, `pupil_name` varchar(64) NOT NULL, PRIMARY KEY (`pupil_id`) ) TYPE=InnoDB AUTO_INCREMENT=9 ; INSERT INTO `pupils` (`pupil_id`, `pupil_name`) VALUES (1, 'Alan'), (2, 'Billy'), (3, 'Craig'), (4, 'David'), (5, 'Edward'), (6, 'Fred'), (7, 'George'), (8, 'Harry'); DROP TABLE IF EXISTS `pupils_subjects`; CREATE TABLE IF NOT EXISTS `pupils_subjects` ( `pupil_id` int(11) NOT NULL, `subject_id` int(11) NOT NULL, `score` smallint(6) NOT NULL, KEY `pupil_id` (`pupil_id`,`subject_id`,`score`) ) TYPE=InnoDB; INSERT INTO `pupils_subjects` (`pupil_id`, `subject_id`, `score`) VALUES (1, 1, 94), (1, 2, 78), (1, 3, 81), (2, 2, 74), (2, 4, 76), (3, 1, 63), (3, 3, 57), (4, 2, 72), (5, 1, 92), (5, 2, 91), (5, 4, 82), (6, 1, 84), (6, 2, 39), (6, 4, 66), (7, 2, 43), (7, 4, 35), (8, 3, 55); DROP TABLE IF EXISTS `subjects`; CREATE TABLE IF NOT EXISTS `subjects` ( `subject_id` int(11) NOT NULL AUTO_INCREMENT, `subject_name` varchar(64) NOT NULL, PRIMARY KEY (`subject_id`) ) TYPE=InnoDB AUTO_INCREMENT=5 ; INSERT INTO `subjects` (`subject_id`, `subject_name`) VALUES (1, 'Maths'), (2, 'Science'), (3, 'French'), (4, 'Geography');
first of all, let’s look at an overview of the test results, and see what the average score for each pupil is. We’re going to want to ask “Show me a list of all pupils, alongside their average test scores”
SELECT pupil_name, AVG(score) FROM pupils, pupils_subjects WHERE pupils.pupil_id = pupils_subjects.pupil_id GROUP BY pupil_name
This gives us the results
pupil_name AVG( score ) Alan 84.3333 Billy 75.0000 Craig 60.0000 David 72.0000 Edward 88.3333 Fred 63.0000 George 39.0000 Harry 55.0000
Let’s take a look at what’s going on in this SQL query
- SELECT pupil_name, AVG(score) - These are the two columns we want to return in the result set. “pupil_name” is straightforward enough. We want to return the value within the “pupil_name” column. But we have something different here, AVG(score). Instead of returning the “score” value from the “pupils_subjects” table, we want to return the average of the scores values. But a question might arise here - do we want to return one value, the average of all of them or several values, the averages of individual groups? Let’s take a look at our second new keyword, GROUP BY, which will let us define this further
- WHERE pupils.pupil_id = pupils_subjects.pupil_id - Since we’re performing our aggregate query over more than one table, we need to define a JOIN condition which defines how we wish to combine our tables. In this case, we want each value from the “pupils” table to be combined with each value from the “pupils_subject” table where pupil_id is equal in both records.
- GROUP BY pupil_name - The query above will not work if we don’t include a GROUP BY clause. The reason for this is, as mentioned previously, we must know what groups we’re calculating the average for. If we didn’t specify a GROUP BY condition, then MySQL would report an error, since it is trying to return a list of pupil names, while at the same time trying to calculate an average based on an unknown condition. In this case, we want to group by the pupil names. This means that for each pupil, we retrieve all of their “score” values, and then return the average value for that pupil’s test scores. This average value is then returned as a new field - AVG(score) - in the results set
To show how this works in a different case, here’s something slightly different, “Show me all the subjects, and the average test score for each”. Here, we’re looking for the average scores grouped by subject names, instead of pupil names
SELECT subject_name, AVG(score) FROM subjects, pupils_subjects WHERE subjects.subject_id = pupils_subjects.subject_id GROUP BY subject_name
subject_name AVG( score ) French 64.3333 Geography 64.7500 Maths 83.2500 Science 66.1667
Another useful keyword we can use when working with GROUP BY statements is WITH ROLLUP. This keyword allows us to calculate the overall average value of all the previously returned average values. For example
SELECT subject_name, AVG(score) FROM subjects, pupils_subjects WHERE subjects.subject_id = pupils_subjects.subject_id GROUP BY subject_name WITH ROLLUP
gives us
subject_name AVG( score ) French 64.3333 Geography 64.7500 Maths 83.2500 Science 66.1667 NULL 69.5294
The final column (with a subject_name of NULL, since no subject is associated with this value) returns an average score of 69.529% over all the subjects.
Let’s now try something different. How about if we want to know how each pupil has done throughout their tests, and see the minimum and maximum scores they achieved? Let’s now ask “I want to see a list of all pupils alongside the lowest and the highest scores they achieved”
SELECT pupil_name, MIN(score), MAX(score) FROM pupils, pupils_subjects WHERE pupils.pupil_id = pupils_subjects.pupil_id GROUP BY pupil_name
This SQL query gives us
pupil_name MIN(score) MAX(score) Alan 78 94 Billy 74 76 Craig 57 63 David 72 72 Edward 82 92 Fred 39 84 George 35 43 Harry 55 55
And how about if we’d like to know which pupil has the biggest deviation in their test scores? We could use this information to find pupils that may be struggling in a specific subject
SELECT pupil_name, MIN(score) AS lowest, MAX(score) AS highest, MAX( score ) - MIN( score ) AS Difference FROM pupils, pupils_subjects WHERE pupils.pupil_id = pupils_subjects.pupil_id GROUP BY pupil_name
pupil_name lowest highest Difference Alan 78 94 16 Billy 74 76 2 Craig 57 63 6 David 72 72 0 Edward 82 92 10 Fred 39 84 45 George 35 43 8 Harry 55 55 0
Here, the column “Difference” gives us the difference between the highest and the lowest scores in the table. We can specify a custom name for the columns by use of the AS keyword. See how we’re also renaming the MIN and MAX columns to “lowest” and “highest” respectively
We can put together all the MySQL methods we’ve used so far to create an overall report of the test scores, with a few useful columns
SELECT pupil_name, subject_name, score, IF (score>70,"A",IF(score>60,"B",IF(score>50,"C", "FAIL"))) AS grade FROM pupils JOIN pupils_subjects ON (pupils.pupil_id = pupils_subjects.pupil_id) JOIN subjects ON (subjects.subject_id = pupils_subjects.subject_id) GROUP BY pupil_name, subject_name
pupil_name subject_name score grade Alan French 81 A Alan Maths 94 A Alan Science 78 A Billy Geography 76 A Billy Science 74 A Craig French 57 C Craig Maths 63 B David Science 72 A Edward Geography 82 A Edward Maths 92 A Edward Science 91 A Fred Geography 66 B Fred Maths 84 A Fred Science 39 FAIL George Geography 35 FAIL George Science 43 FAIL Harry French 55 C
Here’s a brief rundown of some of the more complex things we’re doing here
- Multiple tables - Here, we’re making use of all 3 tables, via the JOIN statements, so that we can create one result set containing pupil names, subject names and the scores each pupil received for each subject
- IF (score>70,”A”,IF(score>60,”B”,IF(score>50,”C”, “FAIL”))) as grade - Here, we’re making use of the fact that MySQL can SELECT a wide range of things, including conditional statements. Here, we’re using the MySQL IF statement, which has the format IF (expr1, expr2, expr3) which means, “IF expr1 is true, return expr2, else return expr3″. Here, expr2 and expr3 can either be literal string/number values or further IF statements. Here, we’re saying “If a pupil scores more than 70, they get an A, if they get more than 60, a B. More than 50 gets a C. Anything else is a ‘fail’”
- GROUP BY pupil_name, subject_name - We are grouping using two columns instead of one. There’s a good reason for this. If we tried to group by pupil_name only, or subject_name only, MySQL would attempt to condense all the results down to 1 result for each pupil/subject. We don’t want that, instead we want to see multiple pupil names and multiple subject names, but group in a specific way. By using two GROUP parameters, we are saying “I want to see the results by pupil, and also by subject for each pupil
Hopefully this tutorial has given you a good idea of how aggregate functions in MySQL can be used to produce useful report data. You should be able to see how it can also be applied to other tables of data, like sales reports for a specific year for example, or analytical reports on various population data. Don’t worry if it all seems a bit much for you, MySQL has a wide range of tools available for performing calculations like this, and it will take a little time to get used to how it all works.
The best thing is to just take it slow and experiment with the tables I’ve provided you, see what other interesting reports you can generate




October 17th, 2008 at 8:16 pm
Thanks for giving such information to me.
I Wish U All The Best in each and every Field.
Mustafa
November 6th, 2008 at 1:53 am
Fantastic read, really interesting stuff, thanks!
December 21st, 2008 at 12:28 am
Hi, I am trying to list 14 committees, of which those in non-leadership role are initially sorted into 3 categories. I have been reading up on mysql, count, rollup, having etc. If there is only record in the Deacon category of a committee, I want the title to be singular (ie, Deacon). If there is more than one deacon, I want the title to be plural (ie, Deacons). For this, I reckon I need to be able to count how many are in that category per committee before I echo the title.
Found your article helpful as it helps me to structure my own thoughts.
Thanks very much,
Calum