Data Structures and Algorithms using Python
Copyright © 2023 by Rance Necaise
Table Of Contents
Data Structures and Algorithms using Python
Copyright © 2023
by Rance Necaise

2.10 Application: Histograms

Graphical displays or charts of tabulated frequencies are very common in statistics. These charts, known as ''histograms, are used to show the distribution of data across discrete categories. A histogram consists of a collection of categories and counters. The number and types of categories can vary depending on the problem. The counters are used to accumulate the number of occurrences of values within each category for a given data collection. Consider the example histogram from Figure 2.10.1. The five letter grades are the categories and the heights of the bars represent the value of the counters.

Figure 2.10.1: Sample histogram for a distribution of grades.

We can define an abstract data type for collecting and storing the frequency counts used in constructing a histogram. An ideal ADT would allow for building a general purpose histogram that can contain many different categories and be used with many different problems.

The Histogram ADT

A histogram is a container that can be used to collect and store discrete frequency counts across multiple categories representing a distribution of data. The category objects must be comparable.

  • Histogram(catSeq)

    Creates a new histogram containing the categories provided in the given sequence, catSeq. The frequency counts of the categories are initialized to zero.

  • getCount(category)

    Returns the frequency count for the given category, which must be valid.

  • incCount(category)

    Increments the count by 1 for the given category. The supplied category must be valid.

  • totalCount()

    Returns a sum of the frequency counts for all of the categories.

  • iterator()

    Creates and returns an iterator for traversing over the histogram categories.

Building a Histogram

The program in the source listing below produces a text based version of the histogram from Figure 2.10.1 and illustrates the use of the Histogram ADT. The program extracts a collection of numeric grades from a text file and assigns a letter grade to each value based on the common 10-point scale: A: 100 - 90, B: 89 - 80, C: 79 - 70, D: 69 - 60, F: 59 - 0. The frequency counts of the letter grades are tabulated and then used to produce a histogram.

Program Listing
Program: buildhist.py
  1. # Prints a histogram for a distribution of letter grades computed
  2. # from a collection of numeric grades extracted from a text file.  
  3. from maphist import Histogram
  4.  
  5. def main() :
  6.    # Create a Histogram instance for computing the frequencies.
  7.   gradeHist = Histogram("ABCDF")
  8.  
  9.    # Open the text file containing the grades.
  10.   gradeFile = open("cs101grades.txt", "r")
  11.  
  12.    # Extract the grades and increment the appropriate counter.
  13.   for line in gradeFile :
  14.     grade = int(line)
  15.     gradeHist.incCount(letterGrade(grade))
  16.  
  17.    # Print the histogram chart.
  18.   printChart( gradeHist )  
  19.        
  20. # Determine the letter grade for the given numeric value.
  21. def letterGrade(grade) :
  22.   if grade >= 90 :
  23.     return 'A'
  24.   elif grade >= 80 :
  25.     return 'B'
  26.   elif grade >= 70 :
  27.     return 'C'
  28.   elif grade >= 60 :
  29.     return 'D'
  30.   else :
  31.     return 'F'
  32.  
  33. # Print the histogram as a horizontal bar chart.  
  34. def printChart(gradeHist) :
  35.   print("           Grade Distribution")  
  36.    # Print the body of the chart.
  37.   letterGrades = ('A', 'B', 'C', 'D', 'F')
  38.   for letter in letterGrades :
  39.     print("  |")
  40.     print(letter + " +", end = "")
  41.     freq = gradeHist.getCount(letter)
  42.     print('*' * freq)
  43.  
  44.    # Print the x-axis.
  45.   print("  |")
  46.   print("  +----+----+----+----+----+----+----+----")
  47.   print("  0    5    10   15   20   25   30   35")
  48.    
  49. main()    

The buildhist.py program consists of three functions. The main function drives the program which extracts the numeric grades and builds an instance of the Histogram ADT. It initializes the histogram to contain the five letter grades as its categories. The letterGrade function is a helper function which simply returns the letter grade for the given numeric value. The printChart function prints the text based histogram using the frequency counts computed in the main routine. Assuming the following grades are extracted from the text file

77 89 53 95 68 86 91 89 60 70 80 77 73 73 93 85 83 67 75 71 94 64 
79 97 59 69 61 80 73 70 82 86 70 45 100

the program would produce the following text based histogram.

                     Grade Distribution
            |
          A +******
            |
          B +*********
            |
          C +***********
            |
          D +******
            |
          F +***
            |
            +----+----+----+----+----+----+----+----
            0    5    10   15   20   25   30   35

Implementation

To implement the Histogram ADT, we must select an appropriate data structure for storing the categories and corresponding frequency counts. There are several different structures and approaches that can be used, but the Map ADT provides an ideal solution since it already stores key/value mappings and allows for a full implementation of the Histogram ADT. To use a map, the categories can be stored in the key part of the key/value pairs and a counter (integer value) can be stored in the value part. When a category counter is incremented, the entry is located by its key and the corresponding value can be incremented and stored back into the entry.

The Constructor

The constructor has to create the Map structure that will be used to store the histogram categories and create map entries for each category specified in the catSeq list. The key for each entry will be the category and the value will be a counter set to 0. Here is the implementation of the constructor:

class Histogram :
  def __init__(self, catSeq) :
    self._freqCounts = Map()
    for cat in catSeq :
      self._freqCounts.add(cat, 0)

Retrieving the Count

Since the count for a given category is stored as the value part of a key/value entry in the map, the getCount method can simply obtain the value part of the key/value pair. Since this method has a precondition, we it must first be verified. The implementation of the getCount method follows:

def getCount(self, category) :
  assert category in self._freqCounts, "Invalid histogram category."
  return self._freqCounts.valueOf(category)

Incrementing the Count

When adding an entry to a category in the histogram, the value part in the entry containing the category key has to be incremented by one. This can be done by obtaining the current value for the category, incrementing it by one, and then adding the category/value pair back to the map. This operation also has a precondition that must be checked before searching for the given category:

def incCount(self, category) :
  assert category in self._freqCounts, "Invalid histogram category."
  value = self._freqCounts.valueOf(category)
  self._freqCounts.add(category, value + 1)

Iterator

Since the Map ADT already provides an iterator for traversing over the keys, we can have Python access and return that iterator as if we had created our own. This is done using the iter function as shown below:

def __iter__(self) :                        
  return iter(self._freqCounts)

Source Listing

A a partial implementation of the Histogram ADT is provided in the source listing below. Implementation of the totalCount method is left as an exercise.

Program Listing
Program: maphist.py
  1. # Implementation of the Histogram ADT using a Map.
  2. from linearmap import Map
  3.  
  4. class Histogram :
  5.    # Creates a histogram containing the given categories.
  6.   def __init__(self, catSeq) :
  7.     self._freqCounts = Map()
  8.     for cat in catSeq :
  9.       self._freqCounts.add(cat, 0)
  10.    
  11.    # Returns the frequency count for the given category.    
  12.   def getCount(self, category) :
  13.     assert category in self._freqCounts, "Invalid histogram category."
  14.     return self._freqCounts.valueOf(category)
  15.        
  16.    # Increments the counter for the given category.
  17.   def incCount(self, category) :
  18.     assert category in self._freqCounts, "Invalid histogram category."
  19.     value = self._freqCounts.valueOf(category)
  20.     self._freqCounts.add(category, value + 1)
  21.      
  22.    # Returns the sum of the frequency counts.
  23.   def totalCount(self) :
  24.     ......
  25.    
  26.    # Returns an iterator for traversing the categories.
  27.   def __iter__(self) :                        
  28.     return iter(self._freqCounts)
Page last modified on July 30, 2023, at 08:32 PM