6.10 The Set ADT Revisited

On this page...

Helper Method
Element Search
Adding Elements
Removing Elements
Subset and Set Equals
Set Union
New Implementation

The implementation of the Set ADT using a list was quick and rather simple, but several of the operations require quadratic time in the worst case. This inefficiency is due to the linear search used to find an element in the unsorted list that is required by several of the operations. We saw earlier in the chapter the efficiency of the search operation can be improved by using the binary search algorithm. We can use the binary search with the Set ADT to improve the efficiency of many of the set operations. To do this, the list of elements must be in sorted order and that order must be maintained. The definition of the Set ADT, however, indicates the elements have no particular ordering. While this is true, it does not preclude us from storing the elements in sorted order. It only means there is no requirement that the items must be stored in a particular order.

In using the binary search algorithm to improve the efficiency of the set operations, the list cannot be sorted each time a new element is added because it would increase the time-complexity of the add operation. For example, suppose we used one of the sorting algorithms presented earlier in the chapter to sort the list after each element is added. Since those algorithms require $O (n^{2})$ time in the worst case, the add operation would then also require quadratic time. Instead, the sorted order must be maintained when new elements are added by inserting each into its proper position.

In this section, we provide a new implementation of the Set ADT using a sorted list and the binary search algorithm. Only several methods of the original implementation of the set from Chapter 5 will need to be modified. Before we explore those changes, briefly review the original implementation provided below:

Original Implementation of the Set ADT

Program: linearset.py

# Implementation of the Set ADT container using a Python list.
class Set :
# Creates an empty set instance.
def __init__(self) :
self._theElements = list()

# Returns the number of items in the set.
def __len__(self) :
return len(self._theElements)

# Returns True if the set is empty and False otherwise.
def isEmpty(self) :
return len(self._theElements) == 0

# Determines if an element is in the set.
def __contains__(self, element) :
return element in self._theElements

# Adds a new unique element to the set.
def add(self, element) :
if element not in self :
self._theElements.append( element )

# Removes an element from the set.
def remove(self, element) :
assert element in self, "The element must be in the set."
self._theElements.remove( item )

# Removes all of the elements from the set.
def clear(self) :
......

# Determines if two sets are equal.
def __eq__(self, setB) :
if len(self) != len(setB) :
return False
else :
return self.isSubsetOf(setB)

# Determines if this set is a subset of setB.
def isSubsetOf(self, setB) :
for element in self._theElements :
if element not in setB :
return False
return True

# Creates a new set from the union of this set and setB.
def union(self, setB) :
newSet = Set()
newSet._theElements.extend(self._theElements)
for element in setB._theElements :
if element not in self :
newSet._theElements.append(element)
return newSet

# Creates a new set from the intersection: self set and setB.
def interset( self, setB ):
......

# Creates a new set from the difference: self set and setB.
def difference( self, setB ):
......

# Returns an iterator for traversing the list of items.
def __iter__( self ):
return iter(self._theElements)

Helper Method

Performing a binary search to locate an element in the sorted list or to find the position where an element belongs in the sorted list is needed in several methods. Instead of reimplementing the operation each time, we implement the modified version of the binary search algorithm from Section 6.8 in the _findPosition helper method.

class Set :

# ...

   # Finds the position of the element within the ordered list.

  def _findPosition(self, target) :

    low = 0

    high = len(self) - 1  

    while low <= high :

      mid = (high + low) // 2

      if self._theElements[ mid ] == target :

        return mid            

      elif target < self._theElements[ mid ] :

        high = mid - 1

      else :

        low = mid + 1        

    return low

The helper method does not detect nor distinguish between unique and duplicate values. It only returns the index where the element is located within the list or where it should be placed if it were added to the list. Thus, care must be taken when implementing the various methods to check for the existence of an element when necessary.

Element Search

The __contains__ method is easily implemented using the _findPosition helper method. The index value returned by the helper method indicates the location where the element should be within the sorted list, but it says nothing about the actual existence of the element.

class Set:

# ...

   # Determines if an element is in the set.

  def __contains__(self, element) :

    ndx = self._findPosition(element)

    return ndx < len(self) and self._theElements[ndx] == element

element in set

worst-case:

O (log n)

To determine if the element is in the set, we can compare the element at the ndx position within the list to the target element. Note the inclusion of the condition ndx < len(self) within the compound expression. This is needed since the value returned by _findPosition can be one larger than the number of items in the list, which occurs when the target should be located at the end of the list. If this value were directly used in examining the contents of the list without making sure it was in range, an out-of-range exception could be raised. The __contains__ method has a worst case time of $O (log n)$ since it uses the binary search to locate the given element within the sorted list.

Adding Elements

To implement the add method, we must first determine if the element is unique since a set cannot contain duplicate values. This is done with the use of the in operator, which automatically calls the __contains__ operator method. If the element is not already a member of the set, the insert method is called to insert the new element in its proper position within the ordered list. The new implementation follows

class Set:

# ...

   # Adds a new unique element to the set.

  def add(self, element) :

    if element not in self :

      ndx = self._findPosition(element)

      self._theElements.insert(ndx, element)

add set element

worst-case:

O (n)

Even though the __contains__ method has a better time-complexity when using a sorted list and the binary search, the add operation still requires $O (n)$ time, the proof of which is left as an exercise.

Removing Elements

The remove method requires that the target element must be a member of the set. To verify this precondition, an assertion is made using the in operator. After which, the _findPosition helper method is called to obtain the location of the element, which can then be used with the pop list method to remove the element from the underlying sorted list.

class Set:

# ...

   # Removes an element from the set.

  def remove(self, element) :

    assert element in self, "The element must be in the set."

    ndx = self._findPosition(element)  

    self._theElements.pop(ndx)

remove set element

worst-case:

O (n)

Like the add operation, the remove method still has a worst case time of $O (n)$ , the proof of which is left as an exercise.

Subset and Set Equals

We can implement the isSubsetOf method in the same fashion as was done in the original version that used the unsorted list as shown below:

class Set:

# ...

   # Determines if this set is a subset of setB.

  def isSubsetOf(self, setB ) :                

    for element in self._theELements :

      if element not in setB :

        return False

    return True

subset operation

worst-case:

O (n log n)

To evaluate the efficiency of the method, we again assume both sets contain $n$ elements. The isSubsetOf method performs a traversal over the self set during which the in operator is applied to setB. Since the in operator requires $O (log n)$ time and it's called $n$ times, isSubsetOf has a time-complexity of $O (n log n)$ .

The implementation of the original equals method called the isSubsetOf method after determining the two sets contained the same number of elements. Thus, no changes are needed when using a sorted list in place of the original unsorted list.

class Set:

# ...

   # Determines if two sets are equal.

  def __eq__(self, setB) :

    if len(self) != len(setB) :

      return False

    else :

      return self.isSubsetOf(setB)

set equals

worst-case:

O (n log n)

Since the equals method calls the isSubsetOf operation, the efficiency of the equals operation is the same as the subset operation, namely $Onlogn$ .

Set Union

In the original implementation of the set union, we were able to add all of the elements of the self set to the new set and then iterate over setB and simply append unique elements from that set to the end of the new set's list. Now that we are using a sorted list to store the elements, we can no longer simply append the elements to the list. We must maintain a sorted list as new elements are added to the set. Instead of appending the elements to the underlying list, we can simply use the add method of the Set ADT itself. This method not only adds unique elements to the set, but it also ensures the elements are placed into their proper sorted position.

class Set :

# ...

  # Creates a new set from the union of this set and setB.

  def union(self, setB) :                    

    newSet = Set()  

    newSet._theElements.extend(self._theElements)

    for element in setB._theElements :

      if element not in self :

        newSet.add(element)

    return newSet

set union

worst-case:

O (n^{2})

Using the add method to add the elements to the set comes at a price in terms of efficiency. As you learned above, the add operation for the Set ADT implemented using a sorted list requires $O (n)$ time. Since this method would be called $n$ times in the worst case, the set union operation requires $O (n^{2})$ time in the worst case.

New Implementation

A partial implementation of the Set ADT using a sorted list and the binary search algorithm is provided in the listing below

Program Listing

Program: binaryset.py

# Implementation of the Set ADT using a sorted list.
class Set :
# Creates an empty set instance.
def __init__(self) :
self._theElements = list()
# Returns the number of items in the set.
def __len__(self) :
return len(self._theElements)
# Determines if an element is in the set.
def __contains__(self, element) :
ndx = self._findPosition(element)
return ndx < len(self) and self._theElements[ndx] == element
# Adds a new unique element to the set.
def add(self, element) :
if element not in self :
ndx = self._findPosition(element)
self._theElements.insert(ndx, element)
# Removes an element from the set.
def remove(self, element) :
assert element in self, "The element must be in the set."
ndx = self._findPosition(element)
self._theElements.pop(ndx)
# Determines if this set is a subset of setB.
def isSubsetOf(self, setB ) :
for element in self._theELements :
if element not in setB :
return False
return True
# Creates a new set from the union of this set and setB.
def union(self, setB) :
newSet = Set()
newSet._theElements.extend(self._theElements)
for element in setB._theElements :
if element not in self :
newSet.add(element)
return newSet
# -- The remaining methods go here --
# Returns an iterator for traversing the list of items.
def __iter__(self) :
return iter(self._theElements)
# Finds the position of the element within the ordered list.
def _findPosition(self, target) :
low = 0
high = len(self) - 1
while low <= high :
mid = (high + low) // 2
if self._theElements[ mid ] == target :
return mid
elif target < self._theElements[ mid ] :
high = mid - 1
else :
low = mid + 1
return low

6.9 Merging Sorted Lists

Chapter 6. Searching and Sorting

6.11 Improved Set Implementation