Documentation

Estimating Total Copies Sold / Total Owners on Steam

One of the most frequently requested metrics is the total number of copies sold (or total owners) for a game on Steam.
Because Valve does not publicly provide ownership counts, this value must be estimated indirectly.
There are currently three primary methods used to approximate total ownership, each with different assumptions, strengths, and weaknesses.


1. Multiples Method

This is the most commonly used public estimate.
The idea is that the number of user reviews (or wishlists) is proportional to the total number of copies sold.

A commonly cited rule of thumb is:

Total copies ≈ 40 × number of reviews

The origin of the “40×” multiplier is unclear, but it has been widely repeated in developer discussions and industry forums.

In reality, the relationship between reviews and copies sold is not guaranteed to be constant.
It may vary depending on factors such as:

  • Genre
  • Price
  • Player engagement
  • Age of the game
  • Player demographics

A useful validation approach is to compare known ownership counts (from public companies, leaks, or developer disclosures) against review counts to test whether the relationship is linear and whether the multiplier changes across different ranges.

Summary

copies ≈ multiplier × reviews

Pros:

  • Simple
  • Works without access to user-level data
  • Useful for quick estimates

Cons:

  • Multiplier may not be constant
  • Sensitive to genre and player behavior
  • Cannot capture inactive owners vs active players

2. Representative Sampling Method

This method assumes that if you know the size of the total Steam population, and you have a sufficiently large and representative sample of users, then ownership can be estimated statistically.

If:

Total Steam population = N, Sample size = S, Owners in sample = O

Then:

Estimated total owners ≈ (O / S) × N

The main difficulty is obtaining a sample that is truly representative of the entire Steam population.
Bias in the sample (for example, only active players, only public profiles, or only certain regions) will skew the results.

Accuracy improves as sample size increases and as sampling bias decreases.

Typical workflow

→ Estimate total Steam population → Gather user sample → Count owners of game in sample → Scale to population → Repeat with larger samples to reduce error

Pros:

  • Statistically grounded
  • Can be very accurate with large samples
  • Works for any game, even with few reviews

Cons:

  • Requires population estimate
  • Requires large user dataset
  • Sensitive to sampling bias

3. Playtime Averaging Method

This method estimates ownership using total playtime instead of direct ownership counts.

If you can measure:

  • Total playtime across a sample
  • Average playtime per owner

Then ownership can be approximated as:

Estimated owners ≈ Total playtime / Average playtime per owner

This method works because total playtime scales with the number of players, but it has a known limitation:

It excludes players who own the game but never played it.

As a result, this method tends to underestimate total ownership but gives a strong estimate of the active player base.

This makes it especially useful for:

  • DLC attach-rate analysis
  • Engagement analysis
  • Active population estimates
  • Live-service game modeling

Typical workflow

→ Gather sample → Calculate total playtime → Calculate average playtime per owner → Estimate owners → Repeat with larger samples

Pros:

  • Good estimate of active players
  • Useful for engagement modeling
  • Works well with playtime datasets

Cons:

  • Undercounts inactive owners
  • Sensitive to heavy-playtime outliers
  • Requires playtime data

Notes

These methods are not mutually exclusive.

For example:

  • Multiples method gives a fast baseline
  • Sampling method gives a statistical estimate
  • Playtime method gives an active-player estimate

Think of each method as a bound / limit on the estimate:

  • Multiples method → upper bound
  • Sampling method → expected value
  • Playtime method → lower bound

Other Less-Rigorous Methods

Other methods do exist but are generally less reliable or require insider data:

  • A. Insider Information
  • B. Ranking / Top Sellers Method
  • C. Achievement Ownership Extrapolation
  • D. Title → Revenue conversion
  • E. MAU / DAU → Copies method

Steam Tag System

What is a Tag?

Every app on Steam is assigned Genres, Categories, and Tags, which together form a loose classification system used by the Steam store.

These can be understood as three different layers of abstraction.


Genres

Genres are the highest-level classification of an app.
They describe the broadest type of product.

Examples:

  • Action
  • Strategy
  • RPG
  • Simulation
  • Video Production
  • Movie

While Steam is now primarily known as a game platform, the system was originally designed to support a much wider range of software and media. Because of this, the Genre system is relatively small and very general.

There are currently 33 Genres.


Categories

Categories describe the technical features of an app rather than what the app is about.

Examples:

  • Singleplayer
  • Multiplayer
  • VR Supported
  • Steam Achievements
  • Demo Available
  • Controller Support

Categories can be thought of as the feature flags of a product — the kind of details you would use to describe software to someone familiar with computers, but not necessarily with games.

There are currently 63 Categories.


Tags

Tags are the most detailed classification layer, and for most games they function as what players would normally call the genre.

Examples:

  • Roguelike
  • 4X
  • MOBA
  • Souls-like
  • Voxel
  • 1980s
  • Pixel Graphics
  • Turn-Based
  • Open World

Tags are much more flexible than Genres or Categories and can describe:

  • Gameplay mechanics (4X, MOBA, Roguelike)
  • Perspective or structure (First-Person, Turn-Based)
  • Theme or setting (Sci-Fi, Medieval, Cyberpunk)
  • Visual style (Voxel, Pixel Graphics, Anime)
  • Tone or mood (Relaxing, Horror, Funny)
  • Audience or difficulty (Casual, Hardcore)

At the time of writing (March 2026):

  • 446 Tags
  • 63 Categories
  • 33 Genres

Tags are also dynamic — new tags are added over time.

Full taxonomy reference:

https://steamdb.info/tags/


Developer Tags vs User Tags

Tags on Steam can be assigned by both:

  • Developers
  • Users (community voting)

The exact weighting system used by Steam is not public, but tags appear to be influenced by:

  • Number of users applying the tag
  • Playtime of users applying the tag
  • Developer-assigned tags

Because of this, a game may have tags that are not universally agreed upon.

Example:

  • 90% of players consider a game Strategy
  • 10% consider it MOBA
  • Both tags may appear

This creates a problem for quantitative analysis.


Tag Attribution in Project Mimir

When analyzing playtime by tag, there is no reliable way to split playtime proportionally between tags.

Because of this, Project Mimir uses the following rule:

All playtime for a game is assigned to all of its tags.

This means:

  • A game with 3 tags contributes 100% of its playtime to each tag
  • Tag totals therefore overlap
  • Tag-level metrics should be considered non-conservative

This approach is intentional because:

  • Tag weights are unknown
  • Tag vote counts are not publicly available
  • Any proportional split would be arbitrary

As a result, tag analysis should be interpreted as:

Tag-associated playtime, not exclusive playtime.


Tag Analysis

As of March 2026, there are approximately 250,000 apps on Steam.

We queried Steam for:

  • All apps
  • All tags per app
  • Total playtime per app
  • Historical playtime data

This allows several types of tag-level analysis.


1. Tag Pair Occurrence Analysis

For each tag A, we compare it against every other tag B.

For each pair, we calculate:

  • Number of games with both A and B
  • Number of games with A but not B
  • Number of games with B but not A

These values can be plotted as:

  • X axis → Games with both tags (A ∩ B)
  • Y axis → Games with only one tag (A ⊕ B)

This produces four regions.

Region 1 — High Y, Low X (Top Left)

Many games have A or B, but few have both.

Interpretation:

  • Tags rarely combined
  • Possible unexplored design space

Region 2 — High Y, High X (Top Right)

Many games have both tags, and many exist overall.

Interpretation:

  • Very common combination
  • Likely saturated
  • Low novelty

Region 3 — Low Y, Low X (Bottom Left)

Few games exist with either tag, and few with both.

Interpretation:

  • Niche or uncommon tags
  • Small market
  • High uncertainty

Region 4 — Low Y, High X (Bottom Right)

Many games have both tags, but few exist outside the pair.

Interpretation:

  • Tags strongly linked
  • Combination defines a sub-genre

Example:

  • Roguelike + Deckbuilder

This analysis is useful for:

  • Market saturation detection
  • Genre discovery
  • Design space exploration
  • Identifying unusual combinations

2. Tag Playtime Popularity Over Time

Using historical playtime data, we can calculate total playtime per tag for each time period.

Because tags overlap, this measures:

Total playtime associated with a tag, not exclusive playtime.

This allows:

  • Trend detection
  • Rising genres
  • Declining genres
  • Long-term popularity shifts

Examples of possible insights:

  • Survival popularity spike
  • Battle Royale boom
  • Roguelike growth after 2018

3. Top Game in Each Tag

For each tag, we find the game with the highest total playtime.

This identifies:

  • Dominant titles per genre
  • Tag monopolies
  • Tags defined by a single game

Examples of possible results:

  • MOBA → Dota 2
  • Battle Royale → PUBG / Apex / Fortnite (not on Steam)
  • Factory → Factorio / Satisfactory

This analysis helps determine:

  • Whether a tag is broad or narrow
  • Whether success is concentrated or distributed

4. Tag Playtime Breakdown by Year

For each year, total playtime is summed across all games and grouped by tag.

Results are visualized as:

  • Pie charts
  • Stacked charts
  • Percent share of total playtime

This shows:

  • Market composition changes
  • Genre dominance over time
  • Shifts in player preference

Example observations:

  • FPS dominance in early 2010s
  • Indie tags growing after 2015
  • Roguelike / Survival growth after 2018