Language : en | de | fr | es

UUID vs GUID – Key Differences and When to Use Each
Back to blogs

Is UUID Unique? Understanding How and Why It Works

Universally Unique Identifiers (UUIDs) are widely used in distributed systems, databases, APIs, and microservices to identify entities without central coordination. Developers rely on them to avoid collisions across machines, regions, and even time periods. But does a UUID truly guarantee uniqueness in practice, or is that assumption overstated?

This article provides a technical, example-driven explanation of how UUIDs work, where their guarantees come from, and under which conditions problems such as a duplicate UUID may theoretically appear.

What Makes a UUID Unique in Theory and Practice

To understand whether a UUID can collide, we need to look at its structure and generation mechanisms.

A UUID is a 128-bit value. In canonical textual form, it looks like this:

550e8400-e29b-41d4-a716-446655440000

With 128 bits, the total number of possible combinations is:

2^128 ≈ 3.4 × 10^38

That is an astronomically large space. Even generating billions of identifiers per second for millions of years would not exhaust it.

Versions and Entropy Sources

There are several UUID versions defined in RFC 4122 (and updated drafts). The most commonly used are:

  • Version 1 (Time-based) – Uses timestamp + MAC address.
  • Version 4 (Random-based) – Uses cryptographically strong random numbers.
  • Version 3 and 5 (Namespace-based) – Deterministic hash of namespace + name.
  • Version 7 (Time-ordered, modern draft) – Combines timestamp with randomness for better indexing performance.

Each version achieves uniqueness differently:

Versión

Fuente de singularidad

Riesgo de colisión

Caso de uso típico

v1

Marca de tiempo + MAC

Muy bajo (basado en hardware)

Sistemas heredados

v4

122 bits aleatorios

Extremadamente bajo

Sistemas distribuidos

v5

Hash del espacio de nombres SHA-1

Determinista

Identificadores estables basados en nombres

v7

Tiempo + aleatorio

Extremadamente bajo

Bases de datos, inserciones ordenadas

In practical systems, version 4 and version 7 are most common.

UUID uniqueness guarantee: What It Actually Means

The term UUID uniqueness assurance is often misunderstood. UUIDs do not mathematically guarantee absolute uniqueness. Instead, they provide probabilistic uniqueness with such an enormous address space that collisions are practically negligible.

Probability of Collision (Birthday Paradox Perspective)

For random UUID v4:

  • 122 bits are random.
  • Collision probability after generating 1 billion UUIDs is approximately:

~1.47 × 10^-15

To reach a 50% collision probability, you would need to generate around:

2^61 ≈ 2.3 × 10^18 UUIDs

That scale exceeds realistic system requirements.

Practical Example

Imagine:

  • 1,000 servers
  • Each generates 10 million UUIDs per day
  • For 100 years

Even at that scale, collision probability remains negligible.

This is why engineers confidently use UUIDs in:

  • Globally distributed databases
  • Multi-region SaaS platforms
  • Event-driven architectures
  • Client-side ID generation in offline apps

Why UUID Is Unique in Distributed Architectures

The question of UUID uniqueness often arises in the context of distributed systems where no central authority assigns IDs.

Traditional approaches:

  • Auto-increment integers require a central database.
  • Snowflake IDs require coordination or timestamp management.

UUID advantages:

  1. No central coordination required.
  2. Safe offline generation.
  3. No dependency on clock synchronization (v4).
  4. Practically collision-resistant at internet scale.

Example: Microservices Environment

Suppose you have:

  • User service in US-East
  • Order service in EU-West
  • Inventory service in Asia

Each service independently generates identifiers. With UUID v4:

  • No communication required.
  • No ID registry.
  • No coordination latency.
  • No global lock.

This drastically reduces system coupling.

Is UUID Always Unique? Understanding Edge Cases

Now let’s address a critical concern: whether UUIDs remain unique in all cases.?

The short answer: No system can guarantee absolute uniqueness unless it tracks all generated values globally. UUIDs rely on probability and entropy quality.

Edge Case 1: Poor Random Number Generator

If your system:

  • Uses a weak PRNG
  • Has insufficient entropy at boot
  • Runs in virtualized environments with shared entropy pool

You may theoretically generate identical random sequences.

Real-world example:

  • Early Linux systems in VMs sometimes started with low entropy.
  • If applications generated v4 UUIDs immediately at startup, collision probability increased.

Mitigation:

  • Use cryptographically secure RNG.
  • Ensure entropy pool is properly seeded.
  • Avoid custom UUID implementations.

Is UUID Really Unique Under Heavy Load?

The phrase UUID uniqueness under heavy load often appears in high-throughput database discussions.

Let’s examine two concerns:

1. Storage Index Collisions vs Value Collisions

In databases:

  • B-tree fragmentation can occur.
  • Random UUIDs degrade insert locality.
  • This is not a collision problem.
  • It is a performance issue.

Solution:

  • Use UUID v7 or ULID for time-ordered inserts.
  • Or use binary(16) storage instead of string.

2. Deterministic UUID (v5) Reuse

If you use namespace-based generation:

UUIDv5(namespace, "user@example.com")

You will always get the same output.

This is expected behavior.

But if different systems use identical namespace + name combinations, collisions are intentional and deterministic.

How Does a UUID Maintain Uniqueness Across Billions of Devices?

The question how can a UUID be unique across global infrastructure comes down to scale math and entropy isolation.

Consider:

  • 122 bits random.
  • Each bit independent.
  • Uniform distribution.

This produces a search space so large that even if:

  • Every human on Earth generated 1 million UUIDs per second,
  • For 100 years,

The collision probability remains negligible.

Key reasons:

  • Massive bit space.
  • Cryptographic randomness.
  • No central reuse of sequences.
  • Independent generation.

When UUID Is Not Unique: Realistic Scenarios

Although extremely unlikely, situations involving UUID duplication may occur:

  1. Custom UUID generators with flawed implementations.
  2. Hardcoded UUID constants reused accidentally.
  3. Database restore from backup without resetting application state.
  4. Snapshot cloning of VM before RNG reseeding.
  5. Truncated UUID storage (e.g., storing first 8 bytes only).

Example: Truncation Mistake

Wrong:

CHAR(8)

Correct:

BINARY(16)

If you truncate UUIDs, collision probability skyrockets.

Detecting and Handling Duplicate UUID Events

Even though rare, you should design defensively against a UUID collision scenario.

Best practices:

  • Enforce unique constraints at database level.
  • Log collision attempts.
  • Retry ID generation if insert fails.
  • Use transactional integrity.

Example Pattern (Pseudo-code)

while True:

id=generate_uuid()

try:

insert(id)

break

except UniqueConstraintError:

continue

The retry will almost certainly succeed immediately.

Performance Considerations

UUID uniqueness is only one part of the story.

Storage Recommendations

  • Use BINARY(16) instead of VARCHAR(36).
  • Avoid storing with dashes.
  • Consider time-ordered variants for clustered indexes.

Database Index Impact

Random v4:

  • Causes index fragmentation.
  • Slower insert locality.

Time-ordered v7:

  • Improves page locality.
  • Better write performance.

Practical Recommendations

For most modern applications:

  • Use UUID v4 if randomness and independence matter.
  • Use UUID v7 for database-heavy systems.
  • Never implement your own algorithm.
  • Never truncate identifiers.
  • Always enforce database uniqueness constraints.

Avoid:

  • Weak random sources.
  • Reusing namespace seeds carelessly.
  • Storing UUIDs as plain text without need.

Final Verdict

UUIDs do not offer an absolute mathematical guarantee. However, under correct implementation with proper entropy sources, the probability of collision is so small that it is effectively zero for real-world systems.

They remain one of the safest and most scalable identification strategies for distributed environments.

The practical conclusion:

  • Collisions are theoretically possible.
  • In correctly implemented systems, they are astronomically improbable.
  • Architectural mistakes, not the standard itself, are the real risk factor.

Understanding how UUIDs work at the bit level, how randomness is sourced, and how databases handle them allows engineers to design systems that are both safe and performant at global scale.