What is Franchise DNA?
Franchise DNA maps the complete statistical identity of an MLB franchise across its entire history — not just wins and losses, but how they played. Every season is scored across six dimensions and normalized to league average so a 1927 Yankees number is directly comparable to a 2019 Dodgers number despite decades of rule changes and era shifts.
The result is a fingerprint: you can see at a glance whether a franchise has always been a power-hitting club or reinvented itself, when pitching was the foundation and when it collapsed, and which eras were genuine dynasties vs. paper contenders.
Data Sources
Lahman DB
Primary source for all seasons 1871–2020. The Lahman Baseball Database is the most comprehensive open historical baseball dataset, maintained by Sean Lahman and updated annually. Provides team-level R, HR, SB, ERA, fielding %, and salary data.
Baseball Ref
2021–2025 stats manually sourced from Baseball Reference season pages and converted to the same normalized scale using known league averages for each year.
Hardcoded
Era labels, manager names, milestone events, WS years, and DNA badge assignments are manually curated and editorial in nature.
Scoring methodology
Every metric uses the same normalization formula: (team stat ÷ league average for that season) × 100. The result is an index where 100 is always exactly league average for that year. A score of 120 means 20% above average; 80 means 20% below. This makes cross-era comparison valid — a dead-ball era team and a juiced-ball era team can be compared on equal footing.
After computing raw per-season values, a ±4 year rolling average is applied to the Lahman portion (1871–2020). This smooths out single-season noise — an injury-ravaged year or a fluke outlier — without obscuring genuine multi-year trends. The 2021–2025 data is presented as-is without smoothing.
Known limitations
DefenseUses errors per game inverted against league average. Errors capture fielding mistakes but not range, arm strength, or positioning. A team with great range but occasional errors (e.g. an aggressive shortstop) will score lower than a conservative team that never attempts difficult plays. DRS and OAA are more accurate but only exist from 2003 onward.
EfficiencySalary data in Lahman covers 1985–2016 only. Outside that window the Efficiency line is flat at 100 (neutral) and should be ignored. 2017–2025 efficiency numbers are estimated from publicly reported Opening Day payrolls.
SpeedStolen base strategy has changed dramatically across eras. The SB nearly disappeared from baseball between ~1990–2022. A low Speed score during that window may reflect era-wide strategy rather than a team's lack of athleticism.
Pre-1920Statistics from the dead-ball era (pre-1920) are real but context differs significantly. Power scores near 100 in 1910 mean "average for 1910" — not comparable in raw output to 1960s or 1990s power numbers.