A set is one of the most fundamental data structure. It allows to keep a unique, unordered enumeration of elements.
Ordered Lists
We could use an ordered list to implement a set data structure.
1 2 3 4 5 6 7 8 9 10 11 |
type Set a = [a] instance SetSpec [] where empty = [] singleton a = [a] size = length member _ [] = False member a (b:x) = case compare a b of LT -> False EQ -> True GT -> member a x |
The problem: these are still linear.
Binary Trees
Sets can be implemented using the list as the underlying data structure, but important operations such as member will be performed in linear time. We can attempt to improve complexity by using binary trees. As we don't have to make any assumptions regarding the ordering of data, we can use recursive binary division to find and alter the data structure in logarithmic time. In order to achieve the maximum benefit of binary trees, we need to make them balanced. This aim of this section is to get you started, so we will not go into binary tree variations like AVL or Red-Black. If you are not familiar with the principles behind binary trees, the principle is quite simple: there is a root element which contains a value, the values on the left hand are always smaller to that value and values on the right are always greater than that value.
Our data type Set
will be defined in terms of data type Tree
. The tree can either be empty (Null
) or have node (Node
). The node itself contains the value and two subtrees on the left and right (which are also of type Tree
, because we are dealing with recursion). If the node is the last one in the whole tree, then left (or right) will evaluate to Null
.
1 2 3 |
type Set a = Tree a data Tree a = Null | Node (Tree a) a (Tree a) |
With these data types, we can already start thinking about the instance of SetSpec
. An empty
set, will simply be Null
since there are no trees. If we want to create a single value (without worrying about left and right trees), we can use singleton x
. This function will create a node which contains only the value x
and both subtress would be Null
. Finally, we can also calculate the size of the tree, which simply recursively enumerates through through the subtrees on the left and on the right of the root element (+1 for the actual root element).
1 2 3 4 5 |
instance SetSpec Tree where empty = Null singleton a = Node Null a Null size Null = 0 size (Node x _ y) = size x + 1 + size y |
Before we go into the glory details of how to implement various operations, we need to consider how to change the list representation to a binary tree one. After all, we don't even have a way of combining two trees together (like ++
for lists). Let us, therefore, define a function cat
which takes two trees and puts them together with the values in the left subtree are lesser than the values in the right subtree. Joining two trees requires a parent node. What if we don't have one? The solution is to pick the extreme value from one of the trees - minimum from the right subtree (used in this example) or maximum from the left tree. Once we extracted the value, we also need to destroy this node. The function min
takes a tree and returns that minimum value. delMin
takes a tree and traverses through to left branches to find the minimum element. This is done, by looking for a Null
value. Once encountered, we simply replace it with the Tree
on the right-hand side.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
cat :: Ord a => Tree a -> Tree a -> Tree a cat Null x = x cat x Null = x cat x y = Node x (min y) (delMin y) -- finds the minimum value in the right subtree by traversing -- the left most branches min :: Tree a -> a min (Node Null a _) = a min (Node x _ _) = min x delMin :: Tree a -> Tree a delMin (Node Null _ x) = x delMin (Node x a y) = Node (delMin x) a y |
Now that we know how to merge two (ordered) trees, we can implement member
, add
and delete. The implementation of the member
is quite simple: look for the Node
which has that specific value by comparing value a
with the current Node
value b
.
1 2 3 4 5 |
member a Null = False member a (Node x b y) = case compare a b of LT -> member a x EQ -> True GT -> member a y |
Adding element to a tree is also quite simple. We traverse through the tree until we find appropriate node to add the node.
1 2 3 4 5 |
add a Null = singleton a add a x@(Node y b z) = case compare a b of LT -> Node (add a y) b z EQ -> x GT -> Node y b (add a z) |
Finally, the delete
function. This is where we make use of the previously defined cat
function. We traverse through the tree in the search for the node which contains a
value. When the value is found we apply the cat
function which simply joins the two trees by looking for an minimum value in the right hand tree.
1 2 3 4 5 |
delete _ Null = Null delete a x@(Node y b z) = case compare a b of LT -> Node (add a y) b z EQ -> cat x y GT -> Node y b (add a z) |
Now that the basic operations are pretty efficient, let's see how we can implement operations such as union
. Merging an ordered list is quite simple: just compare the head of each list, but in a binary tree the root doesn't really convey all the information we need. Instead what we need to do is take the root of one tree r
and split the other tree into two: a tree of values before r
(i.e. less then) and beyond (i.e. greater than) r
.
There are of course two base cases: if one of the trees is Null
, then the other value is returned. The inductive case creates a new tree Node y' b z'
with before
and beyond
functions.
Yey! So that's that... Well... Kind of... There is one small detail: we need to ensure that the trees are balanced, otherwise, this solution may not work out so well. There is a trade-off between the benefits of balance and the costs of rebalancing the trees.
In order to avoid computing the size of each tree, we will cache this value (this is what the Int
in the definition stands for. We will limit the ratio of sizes of sibling subtrees thereby limiting the maximum difference in height.
1 2 3 4 5 |
type Set a = BalTree a data BalTree a = Null | Node Int (BalTree a) a (BalTree a) ratio :: Int ratio = 5 |
We will create a smart constructor which will
A smart constructor (unlike the standard "dumb" one), carries out the extra computation.