The implementation of the Set ADT using a list was quick and rather simple, but several of the operations require quadratic time in the worst case. This inefficiency is due to the linear search used to find an element in the unsorted list that is required by several of the operations. We saw earlier in the chapter the efficiency of the search operation can be improved by using the binary search algorithm. We can use the binary search with the Set ADT to improve the efficiency of many of the set operations. To do this, the list of elements must be in sorted order and that order must be maintained. The definition of the Set ADT, however, indicates the elements have no particular ordering. While this is true, it does not preclude us from storing the elements in sorted order. It only means there is no requirement that the items must be stored in a particular order.
In using the binary search algorithm to improve the efficiency of the set operations, the list cannot be sorted each time a new element is added because it would increase the time-complexity of the add
operation. For example, suppose we used one of the sorting algorithms presented earlier in the chapter to sort the list after each element is added. Since those algorithms require time in the worst case, the add
operation would then also require quadratic time. Instead, the sorted order must be maintained when new elements are added by inserting each into its proper position.
In this section, we provide a new implementation of the Set ADT using a sorted list and the binary search algorithm. Only several methods of the original implementation of the set from Chapter 5 will need to be modified. Before we explore those changes, briefly review the original implementation provided below:
Original Implementation of the Set ADT |
# Implementation of the Set ADT container using a Python list.
class Set :
# Creates an empty set instance.
def __init__(self) :
self._theElements = list()
# Returns the number of items in the set.
def __len__(self) :
return len(self._theElements)
# Returns True if the set is empty and False otherwise.
def isEmpty(self) :
return len(self._theElements) == 0
# Determines if an element is in the set.
def __contains__(self, element) :
return element in self._theElements
# Adds a new unique element to the set.
def add(self, element) :
if element not in self :
self._theElements.append( element )
# Removes an element from the set.
def remove(self, element) :
assert element in self, "The element must be in the set."
self._theElements.remove( item )
# Removes all of the elements from the set.
def clear(self) :
......
# Determines if two sets are equal.
def __eq__(self, setB) :
if len(self) != len(setB) :
return False
else :
return self.isSubsetOf(setB)
# Determines if this set is a subset of setB.
def isSubsetOf(self, setB) :
for element in self._theElements :
if element not in setB :
return False
return True
# Creates a new set from the union of this set and setB.
def union(self, setB) :
newSet = Set()
newSet._theElements.extend(self._theElements)
for element in setB._theElements :
if element not in self :
newSet._theElements.append(element)
return newSet
# Creates a new set from the intersection: self set and setB.
def interset( self, setB ):
......
# Creates a new set from the difference: self set and setB.
def difference( self, setB ):
......
# Returns an iterator for traversing the list of items.
def __iter__( self ):
return iter(self._theElements)
|
Helper Method
Performing a binary search to locate an element in the sorted list or to find the position where an element belongs in the sorted list is needed in several methods. Instead of reimplementing the operation each time, we implement the modified version of the binary search algorithm from Section 6.8 in the _findPosition
helper method.
class Set :
# ...
# Finds the position of the element within the ordered list.
def _findPosition(self, target) :
low = 0
high = len(self) - 1
while low <= high :
mid = (high + low) // 2
if self._theElements[ mid ] == target :
return mid
elif target < self._theElements[ mid ] :
high = mid - 1
else :
low = mid + 1
return low
The helper method does not detect nor distinguish between unique and duplicate values. It only returns the index where the element is located within the list or where it should be placed if it were added to the list. Thus, care must be taken when implementing the various methods to check for the existence of an element when necessary.
Element Search
The __contains__
method is easily implemented using the _findPosition
helper method. The index value returned by the helper method indicates the location where the element should be within the sorted list, but it says nothing about the actual existence of the element.
class Set:
# ...
# Determines if an element is in the set.
def __contains__(self, element) :
ndx = self._findPosition(element)
return ndx < len(self) and self._theElements[ndx] == element
element in set
To determine if the element is in the set, we can compare the element at the ndx
position within the list to the target element. Note the inclusion of the condition ndx < len(self)
within the compound expression. This is needed since the value returned by _findPosition
can be one larger than the number of items in the list, which occurs when the target should be located at the end of the list. If this value were directly used in examining the contents of the list without making sure it was in range, an out-of-range exception could be raised. The __contains__
method has a worst case time of since it uses the binary search to locate the given element within the sorted list.
Adding Elements
To implement the add
method, we must first determine if the element is unique since a set cannot contain duplicate values. This is done with the use of the in
operator, which automatically calls the __contains__
operator method. If the element is not already a member of the set, the insert
method is called to insert the new element in its proper position within the ordered list. The new implementation follows
class Set:
# ...
# Adds a new unique element to the set.
def add(self, element) :
if element not in self :
ndx = self._findPosition(element)
self._theElements.insert(ndx, element)
add set element
Even though the __contains__
method has a better time-complexity when using a sorted list and the binary search, the add
operation still requires time, the proof of which is left as an exercise.
Removing Elements
The remove
method requires that the target element must be a member of the set. To verify this precondition, an assertion is made using the in
operator. After which, the _findPosition
helper method is called to obtain the location of the element, which can then be used with the pop
list method to remove the element from the underlying sorted list.
class Set:
# ...
# Removes an element from the set.
def remove(self, element) :
assert element in self, "The element must be in the set."
ndx = self._findPosition(element)
self._theElements.pop(ndx)
remove set element
Like the add operation, the remove
method still has a worst case time of , the proof of which is left as an exercise.
Subset and Set Equals
We can implement the isSubsetOf
method in the same fashion as was done in the original version that used the unsorted list as shown below:
class Set:
# ...
# Determines if this set is a subset of setB.
def isSubsetOf(self, setB ) :
for element in self._theELements :
if element not in setB :
return False
return True
subset operation
To evaluate the efficiency of the method, we again assume both sets contain elements. The isSubsetOf
method performs a traversal over the self
set during which the in
operator is applied to setB
. Since the in
operator requires time and it's called times, isSubsetOf
has a time-complexity of .
The implementation of the original equals method called the isSubsetOf
method after determining the two sets contained the same number of elements. Thus, no changes are needed when using a sorted list in place of the original unsorted list.
class Set:
# ...
# Determines if two sets are equal.
def __eq__(self, setB) :
if len(self) != len(setB) :
return False
else :
return self.isSubsetOf(setB)
set equals
Since the equals method calls the isSubsetOf
operation, the efficiency of the equals operation is the same as the subset operation, namely .
Set Union
In the original implementation of the set union, we were able to add all of the elements of the self
set to the new set and then iterate over setB
and
simply append unique elements from that set to the end of the new set's list. Now that we are using a sorted list to store the elements, we can no longer simply append the elements to the list. We must maintain a sorted list as new elements are added to the set. Instead of appending the elements to the underlying list, we can simply use the add
method of the Set ADT itself. This method not only adds unique elements to the set, but it also ensures the elements are placed into their proper sorted position.
class Set :
# ...
# Creates a new set from the union of this set and setB.
def union(self, setB) :
newSet = Set()
newSet._theElements.extend(self._theElements)
for element in setB._theElements :
if element not in self :
newSet.add(element)
return newSet
set union
Using the add
method to add the elements to the set comes at a price in terms of efficiency. As you learned above, the add operation for the Set ADT implemented using a sorted list requires time. Since this method would be called times in the worst case, the set union operation requires time in the worst case.
New Implementation
A partial implementation of the Set ADT using a sorted list and the binary search algorithm is provided in the listing below
Program Listing
# Implementation of the Set ADT using a sorted list.
class Set :
# Creates an empty set instance.
def __init__(self) :
self._theElements = list()
# Returns the number of items in the set.
def __len__(self) :
return len(self._theElements)
# Determines if an element is in the set.
def __contains__(self, element) :
ndx = self._findPosition(element)
return ndx < len(self) and self._theElements[ndx] == element
# Adds a new unique element to the set.
def add(self, element) :
if element not in self :
ndx = self._findPosition(element)
self._theElements.insert(ndx, element)
# Removes an element from the set.
def remove(self, element) :
assert element in self, "The element must be in the set."
ndx = self._findPosition(element)
self._theElements.pop(ndx)
# Determines if this set is a subset of setB.
def isSubsetOf(self, setB ) :
for element in self._theELements :
if element not in setB :
return False
return True
# Creates a new set from the union of this set and setB.
def union(self, setB) :
newSet = Set()
newSet._theElements.extend(self._theElements)
for element in setB._theElements :
if element not in self :
newSet.add(element)
return newSet
# -- The remaining methods go here --
# Returns an iterator for traversing the list of items.
def __iter__(self) :
return iter(self._theElements)
# Finds the position of the element within the ordered list.
def _findPosition(self, target) :
low = 0
high = len(self) - 1
while low <= high :
mid = (high + low) // 2
if self._theElements[ mid ] == target :
return mid
elif target < self._theElements[ mid ] :
high = mid - 1
else :
low = mid + 1
return low
|