Tuples, Sets, and When to Use Each

Lists and dicts cover most QA collection work, but two more built-in collections — tuple and set — fill specific niches. A tuple is an ordered, immutable record: data that shouldn't change once set. A set is an unordered bag of unique values, perfect for deduplication and membership checks. This lesson covers both, plus a side-by-side recap of all four built-in collection types so you know which to reach for next time.

Tuples — fixed-shape records

A tuple looks like a list with parentheses instead of brackets:

test_result = ("Login Test", True, 1250)   # name, passed, duration
print(test_result[0])    # "Login Test"
print(test_result[2])    # 1250

The big difference from a list: tuples are immutable. You can't change, add, or remove items after creation:

test_result[1] = False   # TypeError: 'tuple' object does not support item assignment

That immutability is the whole point. A tuple says "this is a record with this exact shape — don't mess with it."

The parentheses are optional in many places — 1, 2, 3 and (1, 2, 3) are the same tuple. The comma is what makes it a tuple, not the parens. (One quirk: a one-element tuple needs a trailing comma — ("only",). Without the comma, ("only") is just a parenthesised string.)

Unpacking — tuples shine here

The cleanest use of tuples is unpacking — splitting a tuple into named variables in a single statement:

test_result = ("Login Test", True, 1250)
name, passed, duration = test_result
 
print(name)        # "Login Test"
print(passed)      # True
print(duration)    # 1250

This is the same syntax we used for lists in the previous lesson — it works for any sequence. Tuples are the conventional choice when the values are heterogeneous (different types, fixed positions).

Returning multiple values

Python doesn't have a special "multiple return values" feature — it just returns a tuple. The caller unpacks it:

def run_test(name):
    # ... pretend we ran it ...
    return True, 1250        # actually returns the tuple (True, 1250)
 
passed, duration_ms = run_test("Login Test")
print(passed)        # True
print(duration_ms)   # 1250

That's the cleanest "return two things" syntax in any mainstream language. Java would need a Pair<Boolean, Integer> class; Python just packs and unpacks tuples.

For functions returning more than two or three values, use a dict or a dataclass instead — covered in chapter 5. Anything beyond a triple becomes hard to remember the order of.

When to use a tuple

A function returns multiple values (return passed, duration_ms).
You need a dict key that holds composite info — only immutable types can be dict keys, so {("staging", "chromium"): "fastest"} works but […] doesn't.
You want to signal "this record is fixed shape — (x, y) always, in that order."
You're storing many records and want a small, fast container without method overhead.

When you'd prefer a list: anything that grows, shrinks, or gets reordered. When you'd prefer a dict: anything where the names of fields matter more than positions.

Sets — unordered, unique

A set holds unique values. Curly braces, like dicts, but with values not key/value pairs:

tested_browsers = {"Chrome", "Firefox", "Chrome"}
print(tested_browsers)        # {'Chrome', 'Firefox'} — duplicate dropped
print(len(tested_browsers))   # 2

Adding a duplicate is silently ignored:

tested_browsers.add("Chrome")
tested_browsers.add("Safari")
print(tested_browsers)        # {'Chrome', 'Firefox', 'Safari'}

Two things to know up front:

Sets are unordered. Don't expect insertion order. Don't index with [0] — that raises TypeError. If order matters, use a list.
An empty set is set(), not {}. {} is an empty dict, as we saw in the last lesson.

Set operations — the killer feature

Sets shine when you need to compare two collections. Python supports the mathematical set operations directly:

suite_a = {"login", "checkout", "search", "logout"}
suite_b = {"login", "logout", "profile", "settings"}
 
# Tests in BOTH suites
common = suite_a & suite_b              # {'login', 'logout'}
 
# Tests in EITHER suite (no duplicates)
all_tests = suite_a | suite_b           # {'checkout', 'login', 'logout', 'profile', 'search', 'settings'}
 
# Tests in A but NOT in B
only_a = suite_a - suite_b              # {'checkout', 'search'}
 
# Tests in exactly ONE suite (symmetric difference)
divergent = suite_a ^ suite_b           # {'checkout', 'profile', 'search', 'settings'}

Operator	Method	Meaning
`a & b`	`a.intersection(b)`	In both
`a \| b`	`a.union(b)`	In either
`a - b`	`a.difference(b)`	In a, not in b
`a ^ b`	`a.symmetric_difference(b)`	In exactly one

Use the operators for short snippets, the named methods when you want the code to read aloud.

A QA case where sets are exactly right: comparing the tests that passed in two CI runs to find regressions:

passed_yesterday = {"login", "checkout", "search", "logout", "profile"}
passed_today     = {"login", "checkout", "logout", "settings"}
 
new_failures = passed_yesterday - passed_today   # {'profile', 'search'}
new_passes   = passed_today - passed_yesterday   # {'settings'}
 
print(f"Newly failing: {new_failures}")
print(f"Newly passing: {new_passes}")

Output:

Newly failing: {'profile', 'search'}
Newly passing: {'settings'}

Three lines of set arithmetic do what a manual loop comparison would take twenty.

Membership is fast

Checking if "Chrome" in browsers on a list is O(n) — Python scans every element. On a set it's O(1) — sets are hashed. For small lists you'll never notice; for thousands of values, sets are dramatically faster.

seen = set()
for name in test_names:
    if name in seen:                # constant time
        print(f"Duplicate: {name}")
    seen.add(name)

This "track what we've seen" pattern is one of the most common reasons to reach for a set.

The four built-in collections — a recap

list vs tuple vs set vs dict

list — [a, b, c]

Ordered
Mutable — change, add, remove freely
Duplicates allowed
Index by position: x[0], x[-1]
Use for: a growing collection of similar items
QA example: a run's accumulating test results

tuple — (a, b, c)

Ordered
Immutable — can't change after creation
Duplicates allowed
Index by position; usually unpacked into names
Use for: fixed-shape records, multi-return values
QA example: returning (passed, duration_ms) from a test

set — {a, b, c}

Unordered
Mutable — add, remove freely
No duplicates — added duplicates are silently dropped
Fast O(1) membership: x in s
Use for: deduplication and comparing two collections
QA example: tests that passed today minus those that passed yesterday

dict — {k: v}

Insertion-ordered (since Python 3.7)
Mutable — set/update keys freely
Keys unique; values can repeat
Look up by key: x['name']
Use for: name → value mappings, JSON, config
QA example: launch config, API headers, parsed JSON

The decision tree most QA engineers carry in their head:

Need a name → value lookup? Dict.
Need fast "is this value in there" or to compare two groups? Set.
Need a fixed-shape record or multi-return? Tuple.
Otherwise, default to list.

⚠️ Common mistakes

One-element tuple without the trailing comma. ("only") is a string in parentheses; ("only",) is a one-element tuple. The comma is what makes the tuple. Forget it and len(x) returns the string's length, not 1.
Treating {} as an empty set. {} is an empty dict. To create an empty set you have to write set(). The literal-set syntax {1, 2, 3} only works once there's at least one element.
Indexing or slicing a set. tested[0] and tested[:2] raise TypeError. Sets are unordered — there's no "first" element. Convert to a list (list(tested)[0]) if you absolutely must, but if order matters you probably want a list to begin with.

🎯 Practice task

Use tuples and sets to compare two runs. 20-25 minutes.

Create compare_runs.py.
Define two sets — yesterday = {"login", "checkout", "search", "logout", "profile"} and today = {"login", "checkout", "logout", "settings", "profile"}.
Print four lines using the set operators:
- tests in both runs (&)
- tests in either run (|)
- tests that ran yesterday but not today (-)
- tests that ran in exactly one run (^)
Build a regression-style report: regressed = passed_yesterday - passed_today (use the same sets but read them as "tests that passed"). Print the count.
Define a function def run_test(name: str) -> tuple: that returns the tuple (name, True, 1250). Call it three times with different names and unpack each call's result with name, passed, duration = run_test(...). Print each.
Define summary = (4, 1, 0) — a (passed, failed, skipped) triple. Unpack and print each piece. Try summary[0] = 99 and observe the TypeError.
Use a set to deduplicate a list with repeats: raw = ["P0", "P1", "P0", "P2", "P1"], then unique_priorities = set(raw). Print both lengths.
Stretch: write a function def diff_runs(a: set, b: set) -> dict: that returns {"both": …, "only_a": …, "only_b": …, "either": …} using all four set operators. Call it with the suites from step 2 and pretty-print the result.

You can now pick the right collection for any QA job. The next lesson stitches together everything we've learned about lists and dicts to read and write JSON — the data format every API and every fixture file speaks.