2.10 Application: HistogramsGraphical displays or charts of tabulated frequencies are very common in statistics. These charts, known as ''histograms, are used to show the distribution of data across discrete categories. A histogram consists of a collection of categories and counters. The number and types of categories can vary depending on the problem. The counters are used to accumulate the number of occurrences of values within each category for a given data collection. Consider the example histogram from Figure 2.10.1. The five letter grades are the categories and the heights of the bars represent the value of the counters. We can define an abstract data type for collecting and storing the frequency counts used in constructing a histogram. An ideal ADT would allow for building a general purpose histogram that can contain many different categories and be used with many different problems. The Histogram ADT
Building a HistogramThe program in the source listing below produces a text based version of the histogram from Figure 2.10.1 and illustrates the use of the Histogram ADT. The program extracts a collection of numeric grades from a text file and assigns a letter grade to each value based on the common 10-point scale: A: 100 - 90, B: 89 - 80, C: 79 - 70, D: 69 - 60, F: 59 - 0. The frequency counts of the letter grades are tabulated and then used to produce a histogram. Program Listing
The 77 89 53 95 68 86 91 89 60 70 80 77 73 73 93 85 83 67 75 71 94 64 79 97 59 69 61 80 73 70 82 86 70 45 100 the program would produce the following text based histogram. Grade Distribution | A +****** | B +********* | C +*********** | D +****** | F +*** | +----+----+----+----+----+----+----+---- 0 5 10 15 20 25 30 35 ImplementationTo implement the Histogram ADT, we must select an appropriate data structure for storing the categories and corresponding frequency counts. There are several different structures and approaches that can be used, but the Map ADT provides an ideal solution since it already stores key/value mappings and allows for a full implementation of the Histogram ADT. To use a map, the categories can be stored in the key part of the key/value pairs and a counter (integer value) can be stored in the value part. When a category counter is incremented, the entry is located by its key and the corresponding value can be incremented and stored back into the entry. The ConstructorThe constructor has to create the class Histogram : def __init__(self, catSeq) : self._freqCounts = Map() for cat in catSeq : self._freqCounts.add(cat, 0) Retrieving the CountSince the count for a given category is stored as the value part of a key/value entry in the map, the def getCount(self, category) : assert category in self._freqCounts, "Invalid histogram category." return self._freqCounts.valueOf(category) Incrementing the CountWhen adding an entry to a category in the histogram, the value part in the entry containing the category key has to be incremented by one. This can be done by obtaining the current value for the category, incrementing it by one, and then adding the category/value pair back to the map. This operation also has a precondition that must be checked before searching for the given category: def incCount(self, category) : assert category in self._freqCounts, "Invalid histogram category." value = self._freqCounts.valueOf(category) self._freqCounts.add(category, value + 1) IteratorSince the Map ADT already provides an iterator for traversing over the keys, we can have Python access and return that iterator as if we had created our own. This is done using the def __iter__(self) : return iter(self._freqCounts) Source ListingA a partial implementation of the Histogram ADT is provided in the source listing below. Implementation of the Program Listing
|