Concrete Collections Core Java

Rather than getting into more details about all the interfaces, we thought it would be helpful to first discuss the concrete data structures that the Java library supplies. Once we have thoroughly described the classes you might want to use, we will return to abstract considerations and see how the collections framework organizes these classes. The collections in the Java library and briefly describes the purpose of each collection class shows below. All classes in Table below implement the Collection interface, with the exception of the classes with names ending in Map. Those classes implement the Map interface instead.

Concrete Collections in the Java Library

Concrete Collections in the Java Library

Linked Lists

We already used arrays and their dynamic cousin, the ArrayList class, for many examples in this book. However, arrays and array lists suffer from a major drawback. Removing an element from the middle of an array is expensive since all array elements beyond the removed one must be moved toward the beginning of the array . The same is true for inserting elements in the middle.

Removing an element from an array

Removing an element from an array

Another well-known data structure, the linked list, solves this problem. Whereas an array stores object references in consecutive memory locations, a linked list stores each object in a separate link. Each link also stores a reference to the next link in the sequence. In the Java programming language, all linked lists are actually doubly linked; that is, each link also stores a reference to its predecessor .

A doubly linked list

A doubly linked list

Removing an element from the middle of a linked list is an inexpensive operation only the links around the element to be removed need to be updated . Perhaps you once took a data structures course in which you learned how to implement linked lists. You may have bad memories of tangling up the links when removing or adding elements in the linked list. If so, you will be pleased to learn that the Java collections library supplies a class LinkedList ready for you to use.

The following code example adds three elements and and then removes the second one:

There is, however, an important difference between linked lists and generic collections. A linked list is an ordered collection in which the position of the objects matters. The LinkedList.add method adds the object to the end of the list. But you often want to add objects somewhere in the middle of a list. This position-dependent add method is the

Removing an element from a linked list

Removing an element from a linked list

Responsibility of an iterator, since iterators describe positions in collections. Using iterators to add elements makes sense only for collections that have a natural ordering. For example, the set data type that we discuss in the next section does not impose any ordering on its elements. Therefore, there is no add method in the Iterator interface. Instead, the collections library supplies a subinterface ListIterator that contains an add method:

Unlike Collection.add, this method does not return a boolean it is assumed that the add operation always modifies the list. In addition, the ListIterator interface has two methods that you can use for traversing a list backwards.

E previous()
boolean hasPrevious()

Like the next method, the previous method returns the object that it skipped over. The listIterator method of the LinkedList class returns an iterator object that implements the ListIterator interface.

ListIterator<String> iter = staff.listIterator();

The add method adds the new element before the iterator position. For example, the following code skips past the first element in the linked list and adds "Juliet" before the second element :

Adding an element to a linked list

Adding an element to a linked list

If you call the add method multiple times, the elements are simply added in the order in which you supplied them. They are all added in turn before the current iterator position. When you use the add operation with an iterator that was freshly returned from the list- Iterator method and that points to the beginning of the linked list, the newly added element becomes the new head of the list.

When the iterator has passed the last element of the list (that is, when hasNext returns false), the added element becomes the new tail of the list. If the linked list has n elements, there are n + 1 spots for adding a new element. These spots correspond to the n + 1 possible positions of the iterator. For example, if a linked list contains three elements, A, B, and C, then there are four possible positions (marked as |) for inserting a new element:
|ABC
A|BC
AB|C
ABC|

NOTE: You have to be careful with the “cursor” analogy. The remove operation does not quite work like the BACKSPACE key. Immediately after a call to next, the remove method indeed removes the element to the left of the iterator, just like the BACKSPACE key would. However, if you just called previous, the element to the right is removed. And you can’t call remove twice in a row. Unlike the add method, which depends only on the iterator position, the remove method depends on the iterator state.

Finally, a set method replaces the last element returned by a call to next or previous with a new element. For example, the following code replaces the first element of a list with a new value:

ListIterator<String> iter = list.listIterator();
String oldValue = iter.next(); // returns first element
iter.set(newValue); // sets first element to newValue

As you might imagine, if an iterator traverses a collection while another iterator is modifying it, confusing situations can occur. For example, suppose an iterator points before an element that another iterator has just removed. The iterator is now invalid and should no longer be used. The linked list iterators have been designed to detect such modifications. If an iterator finds that its collection has been modified by another iterator or by a method of the collection itself, then it throws a ConcurrentModificationException. For example, consider the following code:

The call to iter2.next throws a ConcurrentModificationException since iter2 detects that the list was modified externally. To avoid concurrent modification exceptions, follow this simple rule: You can attach as many iterators to a collection as you like, provided that all of them are only readers. Alternatively, you can attach a single iterator that can both read and write.

Concurrent modification detection is achieved in a simple way. The collection keeps track of the number of mutating operations (such as adding and removing elements). Each iterator keeps a separate count of the number of mutating operations that it was responsible for. At the beginning of each iterator method, the iterator simply checks whether its own mutation count equals that of the collection. If not, it throws a ConcurrentModificationException.

NOTE: There is, however, a curious exception to the detection of concurrent modifications. The linked list only keeps track of structural modifications to the list, such as adding and removing links. The set method does not count as a structural modification. You can attach
multiple iterators to a linked list, all of which call set to change the contents of existing links.

Now you have seen the fundamental methods of the LinkedList class. You use a List- Iterator to traverse the elements of the linked list in either direction and to add and remove elements.

As you saw in the preceding section, many other useful methods for operating on linked lists are declared in the Collection interface. These are, for the most part, implemented in the AbstractCollection superclass of the LinkedList class. For example, the toString method invokes toString on all elements and produces one long string of the format [A, B, C]. This is handy for debugging. Use the contains method to check whether an element is present in a linked list. For example, the call staff.contains("Harry") returns true if the linked list already contains a string that is equal to the string "Harry".

The library also supplies a number of methods that are, from a theoretical perspective, somewhat dubious. Linked lists do not support fast random access. If you want to see the nth element of a linked list, you have to start at the beginning and skip past the first n – 1 elements first. There is no shortcut. For that reason, programmers don’t usually use linked lists in programming situations in which elements need to be accessed by an integer index.

Nevertheless, the LinkedList class supplies a get method that lets you access a particular element:

Of course, this method is not very efficient. If you find yourself using it, you are probably using the wrong data structure for your problem. You should never use this illusory random access method to step through a linked list.

The code

is staggeringly inefficient. Each time you look up another element, the search starts again from the beginning of the list. The LinkedList object makes no effort to cache the position information.

NOTE: The get method has one slight optimization: If the index is at least size() / 2, then the search for the element starts at the end of the list.

The list iterator interface also has a method to tell you the index of the current position. In fact, because Java iterators conceptually point between elements, it has two of them: The nextIndex method returns the integer index of the element that would be returned by the next call to next; the previousIndex method returns the index of the element that would be returned by the next call to previous. Of course, that is simply one less than nextIndex. These methods are efficient the iterators keep a count of the current position. Finally, if you have an integer index n, then list.listIterator(n) returns an iterator that points just before the element with index n. That is, calling next yields the same element as list.get(n); obtaining that iterator is inefficient.

If you have a linked list with only a handful of elements, then you don’t have to be overly paranoid about the cost of the get and set methods. But then why use a linked list in the first place? The only reason to use a linked list is to minimize the cost of insertion and removal in the middle of the list. If you have only a few elements, you can just use an ArrayList.

We recommend that you simply stay away from all methods that use an integer index to denote a position in a linked list. If you want random access into a collection, use an array or ArrayList, not a linked list.

The program in Listing below puts linked lists to work. It simply creates two lists, merges them, then removes every second element from the second list, and finally tests the removeAll method. We recommend that you trace the program flow and pay special attention to the iterators. You may find it helpful to draw diagrams of the iterator positions, like this:
|ACE |BDFG
A|CE |BDFG
AB|CE B|DFG
. . .
Note that the call

System.out.println(a);

prints all elements in the linked list a by invoking the toString method in AbstractCollection.

LinkedListTest.java

java.util.List<E> 1.2

  • ListIterator<E> listIterator()
    returns a list iterator for visiting the elements of the list.
  • ListIterator<E> listIterator(int index)
    returns a list iterator for visiting the elements of the list whose first call to next will return the element with the given index.
  • void add(int i, E element)
    adds an element at the specified position.
  • void addAll(int i, Collection<? extends E> elements) adds all elements from a collection to the specified position.
  • E remove(int i)
    removes and returns the element at the specified position.
  • E get(int i)
    gets the element at the specified position.
  • E set(int i, E element)
    replaces the element at the specified position with a new element and returns the old element.
  • int indexOf(Object element)
    returns the position of the first occurrence of an element equal to the specified element, or –1 if no matching element is found.
  • int lastIndexOf(Object element)
    returns the position of the last occurrence of an element equal to the specified element, or –1 if no matching element is found.

java.util.ListIterator<E> 1.2

  • void add(E newElement)
    adds an element before the current position.
  • void set(E newElement)
    replaces the last element visited by next or previous with a new element. Throws an IllegalStateException if the list structure was modified since the last call to next or previous.
  • boolean hasPrevious()
    returns true if there is another element to visit when iterating backwards through the list.
  • E previous()
    returns the previous object. Throws a NoSuchElementException if the beginning of the list has been reached.
  • int nextIndex()
    returns the index of the element that would be returned by the next call to next.
  • int previousIndex()
    returns the index of the element that would be returned by the next call to previous.

java.util.LinkedList<E> 1.2

  • LinkedList()
    constructs an empty linked list.
  • LinkedList(Collection<? extends E> elements)
    constructs a linked list and adds all elements from a collection.
  • void addFirst(E element)
  • void addLast(E element)
    adds an element to the beginning or the end of the list.
  • E getFirst()
  • E getLast()
    returns the element at the beginning or the end of the list.
  • E removeFirst()
  • E removeLast()
    removes and returns the element at the beginning or the end of the list.

Array Lists

In the preceding section, you saw the List interface and the LinkedList class that implements it. The List interface describes an ordered collection in which the position of elements matters. There are two protocols for visiting the elements: through an iterator and by random access with methods get and set. The latter is not appropriate for linked lists, but of course get and set make a lot of sense for arrays. The collections library supplies the familiar ArrayList class that also implements the List interface. An ArrayList encapsulates a dynamically reallocated array of objects.

NOTE: If you are a veteran Java programmer, you may have used the Vector class whenever you needed a dynamic array. Why use an ArrayList instead of a Vector? For one simple reason: All methods of the Vector class are synchronized. It is safe to access a Vector object from two threads. But if you access a vector from only a single thread by far the more common case your code wastes quite a bit of time with synchronization. In contrast, the Array- List methods are not synchronized. We recommend that you use an ArrayList instead of a Vector whenever you don’t need synchronization.

Hash Sets

Linked lists and arrays let you specify the order in which you want to arrange the elements. However, if you are looking for a particular element and you don’t remember its position, then you need to visit all elements until you find a match. That can be time consuming if the collection contains many elements. If you don’t care about the ordering of the elements, then there are data structures that let you find elements much faster. The drawback is that those data structures give you no control over the order in which the elements appear. The data structures organize the elements in an order that is convenient for their own purposes.

A well-known data structure for finding objects quickly is the hash table. A hash table computes an integer, called the hash code, for each object. A hash code is an integer that is somehow derived from the instance fields of an object, preferably such that objects with different data yield different codes. Table below lists a few examples of hash codes that result from the hashCode method of the String class.

Hash Codes Resulting from the hashCode Function

Hash Codes Resulting from the hashCode Function

If you define your own classes, you are responsible for implementing your own hashCode method with the equals method: If a.equals(b), then a and b must have the same hash code. What’s important for now is that hash codes can be computed quickly and that the computation depends only on the state of the object that needs to be hashed, and not on the other objects in the hash table.

In Java, hash tables are implemented as arrays of linked lists. Each list is called a bucket (see Figure below). To find the place of an object in the table, compute its hash code and reduce it modulo the total number of buckets. The resulting number is the index of the bucket that holds the element. For example, if an object has hash code 76268 and there are 128 buckets, then the object is placed in bucket 108 (because the remainder 76268 % 128 is 108). Perhaps you are lucky and there is no other element in that bucket. Then, you simply insert the element into that bucket. Of course, it is inevitable that you sometimes hit a bucket that is already filled. This is called a hash

A hash table

A hash table

collision. Then, you compare the new object with all objects in that bucket to see if it is already present. Provided that the hash codes are reasonably randomly distributed and the number of buckets is large enough, only a few comparisons should be necessary.

If you want more control over the performance of the hash table, you can specify the initial bucket count. The bucket count gives the number of buckets that are used to collect objects with identical hash values. If too many elements are inserted into a hash table, the number of collisions increases and retrieval performance suffers.

If you know approximately how many elements will eventually be in the table, then you can set the bucket count. Typically, you set it to somewhere between 75% and 150% of the expected element count. Some researchers believe that it is a good idea to make the bucket count a prime number to prevent a clustering of keys. The evidence for this isn’t conclusive, however. The standard library uses bucket counts that are a power of 2, with a default of 16. (Any value you supply for the table size is automatically rounded to the next power of 2.)

Of course, you do not always know how many elements you need to store, or your initial guess may be too low. If the hash table gets too full, it needs to be rehashed. To rehash the table, a table with more buckets is created, all elements are inserted into the new table, and the original table is discarded. The load factor determines when a hash table is rehashed. For example, if the load factor is 0.75 (which is the default) and the table is more than 75% full, then it is automatically rehashed, with twice as many buckets. For most applications, it is reasonable to leave the load factor at 0.75.

Hash tables can be used to implement several important data structures. The simplest among them is the set type. A set is a collection of elements without duplicates. The add method of a set first tries to find the object to be added, and adds it only if it is not yet present.

The Java collections library supplies a HashSet class that implements a set based on a hash table. You add elements with the add method. The contains method is redefined to make a fast lookup to find if an element is already present in the set. It checks only the elements
in one bucket and not all elements in the collection.

The hash set iterator visits all buckets in turn. Because the hashing scatters the elements around in the table, they are visited in seemingly random order. You would only use a HashSet if you don’t care about the ordering of the elements in the collection.

The program reads all words from the input and adds them to the hash set. It then iterates through the unique words in the set and finally prints out a count. (Alice in Wonderland has 5,909 unique words, including the copyright notice at the beginning.) The words appear in random order.

CAUTION: Be careful when you mutate set elements. If the hash code of an element were to change, then the element would no longer be in the correct position in the data structure.

SetTest.java

java.util.HashSet<E> 1.2

  • HashSet()
    constructs an empty hash set.
  • HashSet(Collection<? extends E> elements)
    constructs a hash set and adds all elements from a collection.
  • HashSet(int initialCapacity)
    constructs an empty hash set with the specified capacity (number of buckets).
  • HashSet(int initialCapacity, float loadFactor)
    constructs an empty hash set with the specified capacity and load factor (a number between 0.0 and 1.0 that determines at what percentage of fullness the hash table will be rehashed into a larger one).
  • java.lang.Object 1.0
  • int hashCode()
    returns a hash code for this object. A hash code can be any integer, positive or negative. The definitions of equals and hashCode must be compatible: If x.equals(y) is true, then x.hashCode() must be the same value as y.hashCode().

Tree Sets

The TreeSet class is similar to the hash set, with one added improvement. A tree set is a sorted collection. You insert elements into the collection in any order. When you iterate through the collection, the values are automatically presented in sorted order. For example, suppose you insert three strings and then visit all elements that you added.

Then, the values are printed in sorted order: Amy Bob Carl. As the name of the class suggests, the sorting is accomplished by a tree data structure. (The current implementation uses a red-black tree. For a detailed description of red-black trees, see, for example, Introduction to Algorithms by Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein [The MIT Press, 2001].) Every time an element is added to a tree, it is placed into its proper sorting position. Therefore, the iterator always visits the elements in sorted order.

Adding an element to a tree is slower than adding it to a hash table, but it is still much faster than adding it into the right place in an array or linked list. If the tree contains n elements, then an average of log2 n comparisons are required to find the correct position for the new element. For example, if the tree already contains 1,000 elements, then adding a new element requires about 10 comparisons. Thus, adding elements into a TreeSet is somewhat slower than adding into a HashSet .For a comparison but the TreeSet automatically sorts the elements see below.

Adding Elements into Hash and Tree Sets

Adding Elements into Hash and Tree Sets

java.util.TreeSet<E> 1.2

  • TreeSet()
    constructs an empty tree set.
  • TreeSet(Collection<? extends E> elements)
    constructs a tree set and adds all elements from a collection.

Object Comparison

How does the TreeSet know how you want the elements sorted? By default, the tree set assumes that you insert elements that implement the Comparable interface. That interface defines a single method:

The call a.compareTo(b) must return 0 if a and b are equal, a negative integer if a comes before b in the sort order, and a positive integer if a comes after b. The exact value does not matter; only its sign (>0, 0, or < 0) matters. Several standard Java platform classes implement the Comparable interface. One example is the String class. Its compare To method compares strings in dictionary order (sometimes called lexicographic order). If you insert your own objects, you must define a sort order yourself by implementing the Comparable interface. There is no default implementation of compareTo in the Object class.

For example, here is how you can sort Item objects by part number:

If you compare two positive integers, such as part numbers in our example, then you can simply return their difference—it will be negative if the first item should come before the second item, zero if the part numbers are identical, and positive otherwise.

CAUTION: This trick only works if the integers are from a small enough range. If x is a large positive integer and y is a large negative integer, then the difference x − y can overflow.

However, using the Comparable interface for defining the sort order has obvious limitations. A given class can implement the interface only once. But what can you do if you need to sort a bunch of items by part number in one collection and by description in another? Furthermore, what can you do if you need to sort objects of a class whose creator didn’t bother to implement the Comparable interface?

In those situations, you tell the tree set to use a different comparison method, by passing a Comparator object into the TreeSet constructor. The Comparator interface declares a compare method with two explicit parameters:

Just like the compareTo method, the compare method returns a negative integer if a comes before b, zero if they are identical, or a positive integer otherwise.
To sort items by their description, simply define a class that implements the Comparator interface:

You then pass an object of this class to the tree set constructor:

If you construct a tree with a comparator, it uses this object whenever it needs to compare two elements. Note that this item comparator has no data. It is just a holder for the comparison method. Such an object is sometimes called a function object.

Function objects are commonly defined “on the fly,” as instances of anonymous inner classes:

NOTE: Actually, the Comparator<T> interface is declared to have two methods: compare and equals. Of course, every class has an equals method; thus, there seems little benefit in adding the method to the interface declaration. The API documentation explains that you need not override the equals method but that doing so may yield improved performance in some cases. For example, the addAll method of the TreeSet class can work more effectively if you add elements from another set that uses the same comparator.

If you look back at Table above, you may well wonder if you should always use a tree set instead of a hash set. After all, adding elements does not seem to take much longer, and the elements are automatically sorted. The answer depends on the data that you are collecting. If you don’t need the data sorted, there is no reason to pay for the sorting overhead. More important, with some data it is much more difficult to come up with a sort order than a hash function. A hash function only needs to do a reasonably good job of scrambling the objects, whereas a comparison function must tell objects apart with complete precision.

To make this distinction more concrete, consider the task of collecting a set of rectangles. If you use a TreeSet, you need to supply a Comparator<Rectangle>. How do you compare two rectangles? By area? That doesn’t work. You can have two different rectangles with different coordinates but the same area. The sort order for a tree must be a total ordering. Any two elements must be comparable, and the comparison can only be zero if the elements are equal. There is such a sort order for rectangles (the lexicographic ordering on its coordinates), but it is unnatural and cumbersome to compute. In contrast, a hash function is already defined for the Rectangle class. It simply hashes the coordinates.

NOTE: As of Java SE 6, the TreeSet class implements the NavigableSet interface. That interface adds several convenient methods for locating elements, and for backward traversal. See the API notes for details. The program in Listing below builds two tree sets of Item objects. The first one is sorted by part number, the default sort order of Item objects. The second set is sorted by description, by means of a custom comparator.

TreeSetTest.java

java.lang.Comparable<T> 1.2

  • int compareTo(T other)
    compares this object with another object and returns a negative value if this comes before other, zero if they are considered identical in the sort order, and a positive value if this comes after other.

java.util.Comparator<T> 1.2

  • int compare(T a, T b)
    compares two objects and returns a negative value if a comes before b, zero if they are considered identical in the sort order, and a positive value if a comes after b.

java.util.SortedSet<E> 1.2

  • Comparator<? super E> comparator()
    returns the comparator used for sorting the elements, or null if the elements are compared with the compareTo method of the Comparable interface.
  • E first()
  • E last()
    returns the smallest or largest element in the sorted set.

java.util.NavigableSet<E> 6

  • E higher(E value)
  • E lower(E value)
    returns the least element > value or the largest element < value, or null if there is no such element.
  • E ceiling(E value)
  • E floor(E value)
    returns the least element _value or the largest element _value, or null if there is no such element.
  • E pollFirst()
  • E pollLast
    removes and returns the smallest or largest element in this set, or null if the set is empty.
  • Iterator<E> descendingIterator()
    returns an iterator that traverses this set in descending direction.

java.util.TreeSet<E> 1.2

  • TreeSet()
    constructs a tree set for storing Comparable objects.
  • TreeSet(Comparator<? super E> c)
    constructs a tree set and uses the specified comparator for sorting its elements.
  • TreeSet(SortedSet<? extends E> elements)
    constructs a tree set, adds all elements from a sorted set, and uses the same element comparator as the given sorted set.


Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Core Java Topics