Java Collections

The article presents the classification of collections in Java.


While writing programs we frequently face a necessity of storing collections of some objects. Those may be numbers, strings, user class objects etc. This article will try to classify and describe the main Java collection classes in a plain language.

Some may argue “Why do we need some other collections when we do have arrays?”. Many people indeed use collections when it is unnecessary. But there are situations when we need, for example, to dynamically change the size of the data structure in use or to automatically order its elements upon their addition.

This article deals explicitly with Java Collections Framework. There exist numerous alternatives exist to it though:

So, as we can see, there are quite some options to choose from. But first things first we have to understand the basic Java collections that are most common in use. By the way, some third-party libraries implement Java Collections Framework interfaces (for example, Guava ). Therefore, basic collections classes hierarchy knowledge will allow you to master third-party libraries faster.

Basic interfaces

There exists two basic interfaces in Java collections library whose implementation constitute all the collection classes:

Even though the framework is called Java Collections Framework, the map interface and its realizations do belong to it as well! Collection and Map interfaces are basic but they are not the only ones. They are extended by other interfaces adding additional functionality. We will talk about those later.

Interface Collection

Let’s consider main interfaces that belong to Collection:

List interface

As you can see from the diagram, Collection is not a base interface (what a surprise :D).

Collection interface extends Iterable interface which has the only method iterator(). It means that any collection which inherits from Iterable has to return an iterator.

Iterator (wiki) is an object representing an abstraction for uniform access to elements of a collection. Iterator is a pattern which provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation.

Let’s move on. As you can see from the picture, Collection interface is extended by List, Set and Queue interfaces. Let’s consider the purpose of each of them.

List interface implementations

Let’s look at the following classes hierarchy.

List interface implementations

Interfaces are depicted in red, abstract classes - in green, completed implementations - in blue. It’s not the complete hierarchy, but only its main part.

As you can see in the picture, there are several abstract classes between interface and concrete implementations of the collection. It is made in order to separate out a common functionality in an abstract class for code re-usage.

ArrayList is perhaps the most commonly used collection. ArrayList encapsulates a simple array whose length is automatically increased when the new elements are inserted.

Since ArrayList uses array, the access time of an element by index is minimal (unlike in LinkedList). When the arbitrary element is deleted from the list, all elements that are located to the right of it are shifted one cell left, while the real size of an array (its capacity) remains unchanged. When a new element is inserted and the array is filled, the new array of the size (n*3)/2+1 will be created containing all the elements from the previous array as well as the newly inserted element.

LinkedList is a doubly-linked list. It is a data structure consisting from nodes, each of which contains the data itself as well as two links to the next and previous nodes of the list. Access to the arbitrary element is performed in linear time (but the access to the first and last elements of the list is always performed in constant time — list always stores links to its first and last elements, that is why insertion of the element in the end of the list does not mean that all the list should be iterated to find the last element). In general, ArrayList outperforms LinkedList in absolute terms by memory consumption and speed of operation.

Set interface implementations

Let’s consider and try to grasp the following diagram. :)

Set interface implementations

HashSet is a collection which does not allow storing the same objects (like any Set). HashSet encapsulates the HashMap object (that is, using the hash-table for storing elements).

As majority of readers do probably know, hash table stores information using the so-called hashing mechanism, in which the content of the key is used to determine a unique value called a hash. This hash is then used as an index, which is associated with the data available by this key. The transformation of the key into hash is done automatically — you will never see the hash value itself. The benefits of hashing is that it allows the following methods to be executed in a constant time: add(), contains(), remove() and size() even for big sets.

If you want to use HashSet for storing user-defined objects, you SHOULD override methods hashCode() and equals(), otherwise two logically equal objects will be considered different, since hashCode() method of the Object class will be called when element is inserted into a collection (which is likely to return different hash for your objects).

It is important to note that HashSet class doesn’t guarantee ordering of elements, since the hashing process itself typically doesn’t produce sorted sets. If you need sorted sets, the best choice could be a different type of collections, such as, for example, a TreeSet class. LinkedHashSet keeps a linked list of elements in their insertion order. It allows organizing an ordered insertion operation. That is, elements of the LinkedHashSet will be returned in the order of their insertion when iterating through it.

TreeSet is a collection which stores elements in the form of a tree ordered by values. TreeSet encapsulates TreeMap, which uses balanced binary Red-Black-Tree for storing elements. The advantage of TreeSet is guaranteed time of log(n) for add, remove and contains operations.

Queue interface implementations

The very simplified hierarchy is depicted below.

Queue interface implementations

PriorityQueue is the only direct implementation of the Queue interface (not considering a LinkedList, which is more of a list than of a queue).

This queue arranges elements either by their natural order (using Comparable interface) or using Comparator interface obtained in constructor.

Map interface implementations

Map interface matches unique keys with associated values. The key is an object which is used for the following data retrieval. You can put values inside a map object by specifying the key and the value. Once this value has been saved, you can get it by its key. Map is a generalized interface defined as shown below.

interface Мар<К, V>

K is a key type while V is a type of stored values.

The class hierarchy is very similar to the Set one:

Map interface implementations

Obsolete collections

The following collections are obsolete and their use is discouraged, but not prohibited.

All of Hashtable, Stack and Vector methods are synchronized, which makes them less efficient for usage in single-threaded applications.

Synchronized collections

Synchronized collection objects could be obtained by using static methods synchronizedMap and synchronizedList of a Collections class.

  Map m = Collections.synchronizedMap(new HashMap());
  List l = Collections.synchronizedList(new ArrayList());

Synchronized wrappers of synchronizedMap and synchronizedList collections are sometimes called conditionally thread-safe - all operations individually are thread-safe, but the sequences of operations, where the control thread depends on the results of the previous operations may be the cause of competition for data.

Relative thread safety provided by synchronizedList and synchronizedMap is a hidden threat - developers believe that if these collections are synchronized, then they are fully thread safe, and neglect the proper synchronization of composite operations. As a result, although these programs work under the moderately light load, they may start throwing NullPointerException or ConcurrentModificationException under the heavy one.

In addition there is always a possibility of a "classical" synchronization via synchronized block.

Putting it all together

Let’s see the final class diagram:

Map interface implementations

Big picture.

As you can see, the class diagram is quite massive. But such an architecture is considered standard in OOP.

Original text by vovanok

Concurrent collections collections lists sets maps iterators ArrayList Iterator

To check your knowledge the following tests are recommended:
Check your java java skills.