Documentation
Estimating Total Copies Sold / Total Owners on Steam
One of the most frequently requested metrics is the total number of copies sold (or total owners) for a game on Steam.
Because Valve does not publicly provide ownership counts, this value must be estimated indirectly.
There are currently three primary methods used to approximate total ownership, each with different assumptions, strengths, and weaknesses.
1. Multiples Method
This is the most commonly used public estimate.
The idea is that the number of user reviews (or wishlists) is proportional to the total number of copies sold.
A commonly cited rule of thumb is:
Total copies ≈ 40 × number of reviews
The origin of the “40×” multiplier is unclear, but it has been widely repeated in developer discussions and industry forums.
In reality, the relationship between reviews and copies sold is not guaranteed to be constant.
It may vary depending on factors such as:
- Genre
- Price
- Player engagement
- Age of the game
- Player demographics
A useful validation approach is to compare known ownership counts (from public companies, leaks, or developer disclosures) against review counts to test whether the relationship is linear and whether the multiplier changes across different ranges.
Summary
copies ≈ multiplier × reviews
Pros:
- Simple
- Works without access to user-level data
- Useful for quick estimates
Cons:
- Multiplier may not be constant
- Sensitive to genre and player behavior
- Cannot capture inactive owners vs active players
2. Representative Sampling Method
This method assumes that if you know the size of the total Steam population, and you have a sufficiently large and representative sample of users, then ownership can be estimated statistically.
If:
Total Steam population = N, Sample size = S, Owners in sample = O
Then:
Estimated total owners ≈ (O / S) × N
The main difficulty is obtaining a sample that is truly representative of the entire Steam population.
Bias in the sample (for example, only active players, only public profiles, or only certain regions) will skew the results.
Accuracy improves as sample size increases and as sampling bias decreases.
Typical workflow
→ Estimate total Steam population → Gather user sample → Count owners of game in sample → Scale to population → Repeat with larger samples to reduce error
Pros:
- Statistically grounded
- Can be very accurate with large samples
- Works for any game, even with few reviews
Cons:
- Requires population estimate
- Requires large user dataset
- Sensitive to sampling bias
3. Playtime Averaging Method
This method estimates ownership using total playtime instead of direct ownership counts.
If you can measure:
- Total playtime across a sample
- Average playtime per owner
Then ownership can be approximated as:
Estimated owners ≈ Total playtime / Average playtime per owner
This method works because total playtime scales with the number of players, but it has a known limitation:
It excludes players who own the game but never played it.
As a result, this method tends to underestimate total ownership but gives a strong estimate of the active player base.
This makes it especially useful for:
- DLC attach-rate analysis
- Engagement analysis
- Active population estimates
- Live-service game modeling
Typical workflow
→ Gather sample → Calculate total playtime → Calculate average playtime per owner → Estimate owners → Repeat with larger samples
Pros:
- Good estimate of active players
- Useful for engagement modeling
- Works well with playtime datasets
Cons:
- Undercounts inactive owners
- Sensitive to heavy-playtime outliers
- Requires playtime data
Notes
These methods are not mutually exclusive.
For example:
- Multiples method gives a fast baseline
- Sampling method gives a statistical estimate
- Playtime method gives an active-player estimate
Think of each method as a bound / limit on the estimate:
- Multiples method → upper bound
- Sampling method → expected value
- Playtime method → lower bound
Other Less-Rigorous Methods
Other methods do exist but are generally less reliable or require insider data:
- A. Insider Information
- B. Ranking / Top Sellers Method
- C. Achievement Ownership Extrapolation
- D. Title → Revenue conversion
- E. MAU / DAU → Copies method
Steam Tag System
What is a Tag?
Every app on Steam is assigned Genres, Categories, and Tags, which together form a loose classification system used by the Steam store.
These can be understood as three different layers of abstraction.
Genres
Genres are the highest-level classification of an app.
They describe the broadest type of product.
Examples:
- Action
- Strategy
- RPG
- Simulation
- Video Production
- Movie
While Steam is now primarily known as a game platform, the system was originally designed to support a much wider range of software and media. Because of this, the Genre system is relatively small and very general.
There are currently 33 Genres.
Categories
Categories describe the technical features of an app rather than what the app is about.
Examples:
- Singleplayer
- Multiplayer
- VR Supported
- Steam Achievements
- Demo Available
- Controller Support
Categories can be thought of as the feature flags of a product — the kind of details you would use to describe software to someone familiar with computers, but not necessarily with games.
There are currently 63 Categories.
Tags
Tags are the most detailed classification layer, and for most games they function as what players would normally call the genre.
Examples:
- Roguelike
- 4X
- MOBA
- Souls-like
- Voxel
- 1980s
- Pixel Graphics
- Turn-Based
- Open World
Tags are much more flexible than Genres or Categories and can describe:
- Gameplay mechanics (4X, MOBA, Roguelike)
- Perspective or structure (First-Person, Turn-Based)
- Theme or setting (Sci-Fi, Medieval, Cyberpunk)
- Visual style (Voxel, Pixel Graphics, Anime)
- Tone or mood (Relaxing, Horror, Funny)
- Audience or difficulty (Casual, Hardcore)
At the time of writing (March 2026):
- 446 Tags
- 63 Categories
- 33 Genres
Tags are also dynamic — new tags are added over time.
Full taxonomy reference:
https://steamdb.info/tags/
Developer Tags vs User Tags
Tags on Steam can be assigned by both:
- Developers
- Users (community voting)
The exact weighting system used by Steam is not public, but tags appear to be influenced by:
- Number of users applying the tag
- Playtime of users applying the tag
- Developer-assigned tags
Because of this, a game may have tags that are not universally agreed upon.
Example:
- 90% of players consider a game Strategy
- 10% consider it MOBA
- Both tags may appear
This creates a problem for quantitative analysis.
Tag Attribution in Project Mimir
When analyzing playtime by tag, there is no reliable way to split playtime proportionally between tags.
Because of this, Project Mimir uses the following rule:
All playtime for a game is assigned to all of its tags.
This means:
- A game with 3 tags contributes 100% of its playtime to each tag
- Tag totals therefore overlap
- Tag-level metrics should be considered non-conservative
This approach is intentional because:
- Tag weights are unknown
- Tag vote counts are not publicly available
- Any proportional split would be arbitrary
As a result, tag analysis should be interpreted as:
Tag-associated playtime, not exclusive playtime.
Tag Analysis
As of March 2026, there are approximately 250,000 apps on Steam.
We queried Steam for:
- All apps
- All tags per app
- Total playtime per app
- Historical playtime data
This allows several types of tag-level analysis.
1. Tag Pair Occurrence Analysis
For each tag A, we compare it against every other tag B.
For each pair, we calculate:
- Number of games with both A and B
- Number of games with A but not B
- Number of games with B but not A
These values can be plotted as:
- X axis → Games with both tags (A ∩ B)
- Y axis → Games with only one tag (A ⊕ B)
This produces four regions.
Region 1 — High Y, Low X (Top Left)
Many games have A or B, but few have both.
Interpretation:
- Tags rarely combined
- Possible unexplored design space
Region 2 — High Y, High X (Top Right)
Many games have both tags, and many exist overall.
Interpretation:
- Very common combination
- Likely saturated
- Low novelty
Region 3 — Low Y, Low X (Bottom Left)
Few games exist with either tag, and few with both.
Interpretation:
- Niche or uncommon tags
- Small market
- High uncertainty
Region 4 — Low Y, High X (Bottom Right)
Many games have both tags, but few exist outside the pair.
Interpretation:
- Tags strongly linked
- Combination defines a sub-genre
Example:
- Roguelike + Deckbuilder
This analysis is useful for:
- Market saturation detection
- Genre discovery
- Design space exploration
- Identifying unusual combinations
2. Tag Playtime Popularity Over Time
Using historical playtime data, we can calculate total playtime per tag for each time period.
Because tags overlap, this measures:
Total playtime associated with a tag, not exclusive playtime.
This allows:
- Trend detection
- Rising genres
- Declining genres
- Long-term popularity shifts
Examples of possible insights:
- Survival popularity spike
- Battle Royale boom
- Roguelike growth after 2018
3. Top Game in Each Tag
For each tag, we find the game with the highest total playtime.
This identifies:
- Dominant titles per genre
- Tag monopolies
- Tags defined by a single game
Examples of possible results:
- MOBA → Dota 2
- Battle Royale → PUBG / Apex / Fortnite (not on Steam)
- Factory → Factorio / Satisfactory
This analysis helps determine:
- Whether a tag is broad or narrow
- Whether success is concentrated or distributed
4. Tag Playtime Breakdown by Year
For each year, total playtime is summed across all games and grouped by tag.
Results are visualized as:
- Pie charts
- Stacked charts
- Percent share of total playtime
This shows:
- Market composition changes
- Genre dominance over time
- Shifts in player preference
Example observations:
- FPS dominance in early 2010s
- Indie tags growing after 2015
- Roguelike / Survival growth after 2018