HashSet and LinkedList

7 min read

ArrayList and HashMap cover most of the territory; the two siblings worth knowing fill the remaining corners. HashSet stores unique values — adding the same value twice is silently ignored, and "is X in the set?" is fast. LinkedList is a doubly-linked list optimised for adding and removing at both ends, which makes it the natural fit for queues. Neither is your everyday default — you'll reach for ArrayList and HashMap 80% of the time — but recognising when one of these two is the right tool is the difference between elegant and over-complicated code.

HashSet — uniqueness for free

HashSet is a collection that enforces uniqueness. Add an element that's already there and add returns false and the set is unchanged:

import java.util.HashSet;
import java.util.Set;
 
public class UniqueErrors {
    public static void main(String[] args) {
        Set<String> seenErrors = new HashSet<>();
        seenErrors.add("Element not found");
        seenErrors.add("Timeout waiting for AJAX");
        boolean wasNew = seenErrors.add("Element not found");      // duplicate
        System.out.println("was new? " + wasNew);                  // false
 
        System.out.println("size: " + seenErrors.size());          // 2, not 3
 
        System.out.println("seen Timeout? " + seenErrors.contains("Timeout waiting for AJAX"));
        System.out.println("contents: " + seenErrors);
    }
}

Output:

was new? false
size: 2
seen Timeout? true
contents: [Element not found, Timeout waiting for AJAX]

Three things to internalise:

  • add returns booleantrue if the element was new, false if it was already there. Useful for "have I already seen this?" checks in one line.
  • contains is fast — average constant time, like HashMap.containsKey. Beats ArrayList.contains (which scans every element) once you have more than a handful.
  • The set is unordered, just like HashMap. If you need insertion order, use LinkedHashSet; for sorted, TreeSet.

When to reach for HashSet:

  • You want to deduplicate a list — new HashSet<>(list) strips duplicates in one line.
  • You're tracking whether you've seen something — flaky test fingerprints, executed test cases, error messages.
  • You need fast membership tests where a List.contains would be too slow.

A real QA snippet — collect unique error messages across a test run:

import java.util.HashSet;
import java.util.Set;
 
public class ErrorBucket {
    public static void main(String[] args) {
        String[] runErrors = {
            "Element not found: #submit",
            "Timeout waiting for AJAX",
            "Element not found: #submit",
            "Stale element reference",
            "Timeout waiting for AJAX",
            "Element not found: #submit"
        };
 
        Set<String> unique = new HashSet<>();
        for (String e : runErrors) {
            unique.add(e);
        }
 
        System.out.println("Raw errors:    " + runErrors.length);
        System.out.println("Unique errors: " + unique.size());
        System.out.println(unique);
    }
}

Output:

Raw errors:    6
Unique errors: 3
[Stale element reference, Element not found: #submit, Timeout waiting for AJAX]

Six raw failures collapse to three real fingerprints. That collapse is why test reporters often pipe error messages through a Set before showing them.

LinkedList — queues, not random access

LinkedList implements both the List interface (so you can use it like an ArrayList) and the Deque interface — a double-ended queue with cheap operations at both ends:

import java.util.LinkedList;
import java.util.Queue;
 
public class TestQueueDemo {
    public static void main(String[] args) {
        LinkedList<String> queue = new LinkedList<>();
        queue.addLast("smoke");
        queue.addLast("regression");
        queue.addLast("e2e");
        queue.addFirst("setup");
 
        System.out.println("Queue: " + queue);
 
        while (!queue.isEmpty()) {
            String next = queue.removeFirst();
            System.out.println("running: " + next);
        }
    }
}

Output:

Queue: [setup, smoke, regression, e2e]
running: setup
running: smoke
running: regression
running: e2e

addFirst, addLast, removeFirst, removeLast — that's the queue/deque API. They're constant-time on LinkedList because the structure is a chain of nodes; adding at the head or tail is just rewiring two pointers.

LinkedList also has get(i) and set(i, value), but they're slow — finding index i walks i nodes from one end. For random access, ArrayList is dramatically faster (constant time vs linear).

The decision rule:

  • Mostly random access (get(i), set(i, v)): use ArrayList. Faster, less memory.
  • Mostly insert/remove at either end: use LinkedList.
  • In doubt: ArrayList. It's faster more often than people expect.

In real QA codebases LinkedList shows up rarely — usually as a queue of pending tasks or a sliding window of recent events. ArrayList covers most of the job.

The Java collections hierarchy

Collections framework
  • – ArrayList — default
  • – LinkedList — queue/deque
  • – Vector — legacy, avoid
  • – HashSet — fast, unordered
  • – LinkedHashSet — preserves insertion order
  • – TreeSet — sorted
  • – HashMap — fast, unordered
  • – LinkedHashMap — preserves insertion order
  • – TreeMap — sorted by key
  • LinkedList — doubly-linked, also a List –
  • ArrayDeque — array-backed deque –
  • PriorityQueue — sorted by priority –

The four families: ordered with duplicates (List), unordered without (Set), keyed (Map), and FIFO/LIFO (Queue/Deque). Each family has a default class (the unordered hash one) and ordered/sorted variants. Memorise the four families; the specific classes are easy to look up.

Programming to the interfaces

Same pattern as List/ArrayList from lesson 1. Declare via the interface, build via the implementation:

Set<String> seenErrors = new HashSet<>();
Map<String, String> config = new HashMap<>();
Queue<String> pending = new LinkedList<>();

A method that takes Set<String> accepts any set type. A test runner that returns List<TestResult> can switch from ArrayList to a custom List implementation in the future without changing every caller.

A combined example — deduplicating a queue

import java.util.HashSet;
import java.util.LinkedList;
import java.util.Queue;
import java.util.Set;
 
public class FlakeQueue {
    public static void main(String[] args) {
        Queue<String> retryQueue = new LinkedList<>();
        Set<String> alreadyEnqueued = new HashSet<>();
 
        String[] flakyTests = {"login", "checkout", "login", "search", "checkout", "login"};
 
        for (String name : flakyTests) {
            if (alreadyEnqueued.add(name)) {     // add returns true ONLY if it was new
                retryQueue.add(name);
            }
        }
 
        System.out.println("Queue: " + retryQueue);
        while (!retryQueue.isEmpty()) {
            System.out.println("retrying: " + retryQueue.remove());
        }
    }
}

Output:

Queue: [login, checkout, search]
retrying: login
retrying: checkout
retrying: search

alreadyEnqueued.add(name) does the membership test and the insertion in one call: it returns true if it was newly added, false if it was already present. Using a Set as a "seen" guard alongside a Queue is a common pattern in retry logic and crawlers.

⚠️ Common mistakes

  • Reaching for LinkedList whenever you hear "list." ArrayList is faster more often than people expect — array-backed structures benefit from CPU caches in ways linked structures don't. Use LinkedList only when the access pattern is genuinely queue/deque.
  • Storing custom objects in a HashSet without overriding equals and hashCode. Two TestUser objects with identical fields will both fit in the set because Object's default equals is reference equality. If your set should treat "same fields" as duplicates, override both methods. (IntelliJ's Generate → equals() and hashCode() writes them for you.)
  • Confusing HashSet's unordered iteration for "random." It isn't random — it's deterministic given the keys' hash codes — but it isn't insertion order or sorted order either. If your output flickers between runs (which can happen across JVM versions), switch to LinkedHashSet.

🎯 Practice task

Build a deduplicating retry queue. 25 minutes.

  1. Create RetryDemo.java. import java.util.HashSet;, import java.util.LinkedList;, import java.util.Queue;, import java.util.Set;.
  2. Declare String[] failed = {"login", "search", "login", "checkout", "search", "login", "logout"}; — a list of failed test names with duplicates.
  3. Build Set<String> seen = new HashSet<>(); and Queue<String> retryQueue = new LinkedList<>();.
  4. Loop over failed. For each name, call seen.add(name) — if it returns true, also add to retryQueue. After the loop, the queue should hold every unique failure once, in first-seen order.
  5. Print retryQueue and confirm: [login, search, checkout, logout].
  6. Drain the queue with while (!retryQueue.isEmpty()) { retryQueue.remove() ... } and print each name.
  7. Try replacing HashSet with LinkedHashSet and Queue with ArrayDeque — same APIs, different classes. Confirm everything still works. That swappability is what programming to the interface buys you.
  8. Stretch: make a small class Failure { String name; long when; } with a constructor and equals / hashCode based only on name. Put Failure objects (with different when values but same name) into a HashSet<Failure> and confirm the set still treats same-name failures as duplicates. That's the hash contract working for custom objects.

Lesson 4 brings the chapter together with the iteration patterns — for-each, Iterator, lambdas, streams — that walk every collection you've now met.

// tip to track lessons you complete and pick up where you left off across devices.