Sponsored Links :
  • Generating simple reports from a table

    For this tutorial, let’s take a look at some tables which contain information about a group of 8 pupils at a school, and their respective test scores

    pupil_id pupil_name score
    1 Alan 95
    2 Billy 64
    3 Craig 77
    4 David 54
    5 Edward 88
    6 Fred 59
    7 George 44
    8 Harry 59

    Here you can see a simple table of pupil names and their respective scores in a test. To load up this data yourself, into a MySQL file, run the following SQL command in your management software. This will delete any existing table named “pupil_scores”

    DROP TABLE IF EXISTS `pupil_scores`;
    CREATE TABLE IF NOT EXISTS `pupil_scores` (
      `pupil_id` int(11) NOT NULL AUTO_INCREMENT,
      `score` int(11) NOT NULL,
      `pupil_name` varchar(64) NOT NULL,
      PRIMARY KEY  (`pupil_id`),
      KEY `score` (`score`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
     
    INSERT INTO `pupil_scores` (`pupil_id`, `score`, `pupil_name`) VALUES
    (1, 95, 'Alan'), (2, 64, 'Billy'), (3, 77, 'Craig'), (4, 54, 'David'),
    (5, 88, 'Edward'), (6, 59, 'Fred'), (7, 44, 'George'), (8, 59, 'Harry');

    Let’s take a look at a couple of SQL queries which will show us some interesting data about these pupils and their test scores

    SELECT AVG(score) FROM pupil_scores;

    This gives us

    AVG(score)
    67.5000

    Which tells us the the average pupil score is, indeed, 67.5. Here, we are using the MySQL function AVG (average) to specify that we wish to return the column “score”, but instead of returining all values for “score”, we wish to return only the average value of “score”. A similar function is SUM()

    SELECT SUM(score) FROM pupil_scores;

    Producing the following, the SUM total of all the values held in the “score” column (although not of great value to us, since there’s no useful reason for knowing the total score over all the pupils)

    SUM(score)
    540

    Notice how the column names “AVG(score)” or “SUM(score)” have been automatically generated based on what we are trying to select? It’s possible to generate custom column names when selecting using the AS keyword, as the following example shows

    SELECT MIN(score) AS lowest, MAX(score) AS highest FROM pupil_scores;

    Which gives us

    lowest 	highest
    44 		95

    This tells us that the lowest score of all the pupils is 44, and the highest is 95. See here how we’ve defined custom column names, “lowest” and “highest”.

    Performing analysis on multiple tables

    Now, we’re going to take a look at some more complex data. What if we wanted to generate reports based on multiple scores per pupil, over several subjects? We can use the technique discussed in the previous tutorial, to JOIN tables together so that we can perform single queries over several tables to retrieve our desired values. We’ll also see how we can use the GROUP BY statement to organise our results to fit our needs

    Pupils table

    pupil_id pupil_name
    1 Alan
    2 Billy
    3 Craig
    4 David
    5 Edward
    6 Fred
    7 George
    8 Harry

    Test score information

    This table contains references to pupil_id and subject_id so we can store multiple scores for each pupil. For example, pupil 1 (Alan) scored 78% in subject 2 (Science). This is known as a many-to-many relationship. One pupil “has many” subjects, and one subject “has many” pupiles

    pupil_id subject_id score
    1 1 94
    1 2 78
    1 3 81
    2 2 74
    2 4 76
    3 1 63
    3 3 57
    4 2 72
    5 1 92
    5 2 91
    5 4 82
    6 1 84
    6 2 39
    6 4 66
    7 2 43
    7 4 35
    8 3 55

    Subjects table

    subject_id subject_name
    1 Maths
    2 Science
    3 French
    4 Geography

    So that you can follow along with the examples in this tutorial, and experiment on the values for yourself, run the following MySQL code in your MySQL management software. This will erase any existing tables named “pupils”, “subjects” or “pupils_subjects”

    DROP TABLE IF EXISTS `pupils`;
    CREATE TABLE IF NOT EXISTS `pupils` (
      `pupil_id` int(11) NOT NULL AUTO_INCREMENT,
      `pupil_name` varchar(64) NOT NULL,
      PRIMARY KEY  (`pupil_id`)
    ) TYPE=InnoDB  AUTO_INCREMENT=9 ;
     
    INSERT INTO `pupils` (`pupil_id`, `pupil_name`) VALUES
    (1, 'Alan'), (2, 'Billy'), (3, 'Craig'),
    (4, 'David'), (5, 'Edward'), (6, 'Fred'),
    (7, 'George'), (8, 'Harry');
     
    DROP TABLE IF EXISTS `pupils_subjects`;
    CREATE TABLE IF NOT EXISTS `pupils_subjects` (
      `pupil_id` int(11) NOT NULL,
      `subject_id` int(11) NOT NULL,
      `score` smallint(6) NOT NULL,
      KEY `pupil_id` (`pupil_id`,`subject_id`,`score`)
    ) TYPE=InnoDB;
     
    INSERT INTO `pupils_subjects` (`pupil_id`, `subject_id`, `score`) VALUES
    (1, 1, 94), (1, 2, 78), (1, 3, 81), (2, 2, 74), (2, 4, 76),
    (3, 1, 63), (3, 3, 57), (4, 2, 72), (5, 1, 92), (5, 2, 91),
    (5, 4, 82), (6, 1, 84), (6, 2, 39), (6, 4, 66), (7, 2, 43),
    (7, 4, 35), (8, 3, 55);
     
    DROP TABLE IF EXISTS `subjects`;
    CREATE TABLE IF NOT EXISTS `subjects` (
      `subject_id` int(11) NOT NULL AUTO_INCREMENT,
      `subject_name` varchar(64) NOT NULL,
      PRIMARY KEY  (`subject_id`)
    ) TYPE=InnoDB  AUTO_INCREMENT=5 ;
     
    INSERT INTO `subjects` (`subject_id`, `subject_name`) VALUES
    (1, 'Maths'), (2, 'Science'), (3, 'French'), (4, 'Geography');

    Finding Average Values

    first of all, let’s look at an overview of the test results, and see what the average score for each pupil is. We’re going to want to ask “Show me a list of all pupils, alongside their average test scores”

    SELECT pupil_name, AVG(score) FROM pupils, pupils_subjects
    WHERE pupils.pupil_id = pupils_subjects.pupil_id GROUP BY pupil_name

    This gives us the results

    pupil_name		AVG( score )
    Alan			84.3333
    Billy			75.0000
    Craig			60.0000
    David			72.0000
    Edward		88.3333
    Fred			63.0000
    George		39.0000
    Harry			55.0000

    Let’s take a look at what’s going on in this SQL query

    • SELECT pupil_name, AVG(score) - These are the two columns we want to return in the result set. “pupil_name” is straightforward enough. We want to return the value within the “pupil_name” column. But we have something different here, AVG(score). Instead of returning the “score” value from the “pupils_subjects” table, we want to return the average of the scores values. But a question might arise here - do we want to return one value, the average of all of them or several values, the averages of individual groups? Let’s take a look at our second new keyword, GROUP BY, which will let us define this further
    • WHERE pupils.pupil_id = pupils_subjects.pupil_id - Since we’re performing our aggregate query over more than one table, we need to define a JOIN condition which defines how we wish to combine our tables. In this case, we want each value from the “pupils” table to be combined with each value from the “pupils_subject” table where pupil_id is equal in both records.
    • GROUP BY pupil_name - The query above will not work if we don’t include a GROUP BY clause. The reason for this is, as mentioned previously, we must know what groups we’re calculating the average for. If we didn’t specify a GROUP BY condition, then MySQL would report an error, since it is trying to return a list of pupil names, while at the same time trying to calculate an average based on an unknown condition. In this case, we want to group by the pupil names. This means that for each pupil, we retrieve all of their “score” values, and then return the average value for that pupil’s test scores. This average value is then returned as a new field - AVG(score) - in the results set

    To show how this works in a different case, here’s something slightly different, “Show me all the subjects, and the average test score for each”. Here, we’re looking for the average scores grouped by subject names, instead of pupil names

    SELECT subject_name, AVG(score) FROM subjects, pupils_subjects
    WHERE subjects.subject_id = pupils_subjects.subject_id GROUP BY subject_name
    subject_name		AVG( score )
    French		64.3333
    Geography		64.7500
    Maths			83.2500
    Science		66.1667

    Rollup! Rollup!

    Another useful keyword we can use when working with GROUP BY statements is WITH ROLLUP. This keyword allows us to calculate the overall average value of all the previously returned average values. For example

    SELECT subject_name, AVG(score) FROM subjects, pupils_subjects
    WHERE subjects.subject_id = pupils_subjects.subject_id GROUP BY subject_name
    WITH ROLLUP

    gives us

    subject_name 	AVG( score )
    French 	64.3333
    Geography 	64.7500
    Maths 		83.2500
    Science 	66.1667
    NULL 	69.5294

    The final column (with a subject_name of NULL, since no subject is associated with this value) returns an average score of 69.529% over all the subjects.

    Finding Minimum and Maximum Values

    Let’s now try something different. How about if we want to know how each pupil has done throughout their tests, and see the minimum and maximum scores they achieved? Let’s now ask “I want to see a list of all pupils alongside the lowest and the highest scores they achieved”

    SELECT pupil_name, MIN(score), MAX(score) FROM pupils, pupils_subjects
      WHERE pupils.pupil_id = pupils_subjects.pupil_id GROUP BY pupil_name

    This SQL query gives us

    pupil_name		MIN(score)	MAX(score)
    Alan			78		94
    Billy			74		76
    Craig			57		63
    David			72		72
    Edward		82		92
    Fred			39		84
    George		35		43
    Harry			55		55

    And how about if we’d like to know which pupil has the biggest deviation in their test scores? We could use this information to find pupils that may be struggling in a specific subject

    SELECT pupil_name, MIN(score) AS lowest, MAX(score) AS highest,
    MAX( score ) - MIN( score ) AS Difference FROM pupils, pupils_subjects
    WHERE pupils.pupil_id = pupils_subjects.pupil_id GROUP BY pupil_name
    pupil_name 	lowest 	highest Difference
    Alan 		78 	94 	16
    Billy 	74 	76 	2
    Craig 	57 	63 	6
    David 	72 	72 	0
    Edward 	82 	92 	10
    Fred 		39 	84 	45
    George 	35 	43 	8
    Harry 	55 	55 	0

    Here, the column “Difference” gives us the difference between the highest and the lowest scores in the table. We can specify a custom name for the columns by use of the AS keyword. See how we’re also renaming the MIN and MAX columns to “lowest” and “highest” respectively

    Showing a complete report

    We can put together all the MySQL methods we’ve used so far to create an overall report of the test scores, with a few useful columns

    SELECT pupil_name, subject_name, score,
    IF (score>70,"A",IF(score>60,"B",IF(score>50,"C", "FAIL"))) AS grade
    FROM pupils JOIN pupils_subjects ON (pupils.pupil_id = pupils_subjects.pupil_id)
    JOIN subjects ON (subjects.subject_id = pupils_subjects.subject_id)
    GROUP BY pupil_name, subject_name
    pupil_name 	subject_name 	score 	grade
    Alan 		French 		81 	A
    Alan 		Maths 		94 	A
    Alan 		Science 	78 	A
    Billy 	Geography 	76 	A
    Billy 	Science 	74 	A
    Craig 	French 		57 	C
    Craig 	Maths 		63 	B
    David 	Science 	72 	A
    Edward 	Geography 	82 	A
    Edward 	Maths 		92 	A
    Edward 	Science 	91 	A
    Fred 		Geography 	66 	B
    Fred 		Maths 		84 	A
    Fred 		Science 	39 	FAIL
    George 	Geography 	35 	FAIL
    George 	Science 	43 	FAIL
    Harry 	French 		55 	C

    Here’s a brief rundown of some of the more complex things we’re doing here

    • Multiple tables - Here, we’re making use of all 3 tables, via the JOIN statements, so that we can create one result set containing pupil names, subject names and the scores each pupil received for each subject
    • IF (score>70,”A”,IF(score>60,”B”,IF(score>50,”C”, “FAIL”))) as grade - Here, we’re making use of the fact that MySQL can SELECT a wide range of things, including conditional statements. Here, we’re using the MySQL IF statement, which has the format IF (expr1, expr2, expr3) which means, “IF expr1 is true, return expr2, else return expr3″. Here, expr2 and expr3 can either be literal string/number values or further IF statements. Here, we’re saying “If a pupil scores more than 70, they get an A, if they get more than 60, a B. More than 50 gets a C. Anything else is a ‘fail’”
    • GROUP BY pupil_name, subject_name - We are grouping using two columns instead of one. There’s a good reason for this. If we tried to group by pupil_name only, or subject_name only, MySQL would attempt to condense all the results down to 1 result for each pupil/subject. We don’t want that, instead we want to see multiple pupil names and multiple subject names, but group in a specific way. By using two GROUP parameters, we are saying “I want to see the results by pupil, and also by subject for each pupil

    Conclusion

    Hopefully this tutorial has given you a good idea of how aggregate functions in MySQL can be used to produce useful report data. You should be able to see how it can also be applied to other tables of data, like sales reports for a specific year for example, or analytical reports on various population data. Don’t worry if it all seems a bit much for you, MySQL has a wide range of tools available for performing calculations like this, and it will take a little time to get used to how it all works.

    The best thing is to just take it slow and experiment with the tables I’ve provided you, see what other interesting reports you can generate

3 Responses

WP_Cloudy
  • mustafa Says:

    Thanks for giving such information to me.
    I Wish U All The Best in each and every Field.
    Mustafa

  • Michael Says:

    Fantastic read, really interesting stuff, thanks!

  • Calum MacLeod Says:

    Hi, I am trying to list 14 committees, of which those in non-leadership role are initially sorted into 3 categories. I have been reading up on mysql, count, rollup, having etc. If there is only record in the Deacon category of a committee, I want the title to be singular (ie, Deacon). If there is more than one deacon, I want the title to be plural (ie, Deacons). For this, I reckon I need to be able to count how many are in that category per committee before I echo the title.
    Found your article helpful as it helps me to structure my own thoughts.
    Thanks very much,
    Calum

Leave a Comment

Want to ask a question about anything in this tutorial? Have you spotted an inaccuracy, or noticed areas for improvement? Fancy just having a chat? Leave your comments below...

Recommended Reading from Amazon.com

Previous Tutorial
Using Multiple MySQL Tables with Unique IDs, Joins and Normalization


Next tutorial
Creating a Simple Comments and Login System using MySQL and PHP