Friday, September 25, 2009

Separating Java: ADTs

In Java, Abstract Data Types and Objects are rather intertwined, often to the detriment of programmers. In this post and the next, I'll try to separate the two concepts as best as I understand. Note that most of these ideas, and the Set example specifically, come to me from William Cook's recent essay. The ideas are also well-covered in TAPL, without the Java code.

Here, I will use a subset of Java's features to express an ADT, such as one would have in a language like ML or Ada. It's a (not very efficient) set. Notes follow.

final public class IntSet {
 
  private int[] members;
  private IntSet() {
    this.members = new int[0];  
  }
  public static IntSet empty() {
    return new IntSet();
  }
  public static IntSet add(IntSet set, int i) {
    if( Arrays.binarySearch(set.members, i) != -1 ) {
      return set;
    }
    else {
      IntSet result = new IntSet();
      int old_length = set.members.length;
      result.members = Arrays.copyOf(set.members, old_length + 1);
      result.members[old_length] = i;
      Arrays.sort(result.members);
      return result;
    }
  }
  public static boolean contains(IntSet set, int i) {
    return Arrays.binarySearch(set.members, i) != -1;
  }

  public static IntSet union(IntSet s1, IntSet s2) { /* Performs merge sort. */ }
}

Notes:
  • The class Set is the abstract data type. It has been declared final, which tells the programmer that the representation is fixed and known completely inside of the class.
  • The members field is private to hide the representation. 
  • The constructor is private so that new Sets must be created through one of the existing introduction means, namely empty().
  • All methods are static. No dynamic dispatch is being used here. We do not want to send messages to objects that can handle that message in an arbitrary way, rather we want to perform some operation on a hidden representation by functions inside of the abstraction that are allowed to view the hidden representation.
  • Benefits: Functions inside the Set class can operate on sets with complete knowledge of a set's representation. This can occasionally be beneficial, particularly in cases where performance is important. Knowing that all sets are implemented as sorted arrays of ints can sometimes allow me to write better performing methods. (Binary methods in particular, but other methods as well.) For instance, in the union method above which is incomplete, can perform its task using merge sort if that is more efficient. Still, the representation is hidden from the rest of the program, so we can change things internally without affecting code as long as we don't change the public methods.
  • Caveats: No one else can define a Set that is compatible with this Set! If a certain method expects an IntSet of the sort we've just defined, then the IntSet can only be created through this class, and not through any other means. The representation for all IntSets is fixed (but hidden).

No comments:

Post a Comment