Groundhog Day 2026 by the Numbers

Which prognosticators actually get it right

Eric Green

April 2026

Ask most people who predicts the weather on February 2nd and you’ll get one name: Punxsutawney Phil. But Phil has a lot of company. In 2026, 117 prognosticators across 36 U.S. states and Canadian provinces were recorded making a Groundhog Day call. This note is a quick tour of that field: who they are, and which of them is actually any good at it.

What is Groundhog Day (and why do we care?)

I teach a data science course at Duke University, and I’ve been using Groundhog Day data for several years to teach students about relational data, APIs, and prediction models. I see a lot of confused faces when I explain Groundhog Day, especially among our foreign students. Despite its German roots, Groundhog Day is uniquely American.

The holiday falls on February 2, the midpoint between the winter solstice and the spring equinox, and it inherits an old European weather superstition tied to the feast of Candlemas: a bright, sunny February 2 means winter will hold, while a cloudy one means spring is near. German-speaking immigrants, the Pennsylvania Dutch, carried that lore across the Atlantic in the 1800s and pinned it on an animal that was easy to find in the Pennsylvania hills: the groundhog. The first ceremony recognized as “official” was held at Gobbler’s Knob in Punxsutawney, Pennsylvania, on February 2, 1887.

The rule is simple. A groundhog emerges from its burrow; if it sees its shadow (a sunny day), expect six more weeks of winter, and if it doesn’t (a cloudy one), spring arrives early. In Punxsutawney the ritual is staged by the tuxedo-and-top-hat “Inner Circle,” who maintain that the forecast comes straight from Phil himself. For the official version of the story, see the Punxsutawney Groundhog Club.

The 2026 ceremony, live from Gobbler’s Knob. Phil saw his shadow and called for six more weeks of winter.

A bigger, stranger cast than you’d think

Phil has stiff competition. Of the 117 prognosticators, 70 are living creatures, 40 are inanimate (stuffed groundhogs, puppets, statues), and 7 are human mascots in costume.

Only 78 of the 117 are actually groundhogs. The other 39 are a small ark of stand-ins—marmots, opossums, prairie dogs, dogs, and a long tail of one-offs that includes an alligator, a capybara, and a flamingo.

Show the code

others <- p26 |>
  filter(prognosticator_creature != "Groundhog") |>
  count(prognosticator_creature, sort = TRUE)

singles  <- others |> filter(n == 1)
repeated <- others |> filter(n > 1)

plot_df <- repeated |>
  add_row(prognosticator_creature = paste0(nrow(singles), " one-off species"),
          n = nrow(singles)) |>
  mutate(prognosticator_creature = fct_reorder(prognosticator_creature, n))

ggplot(plot_df, aes(n, prognosticator_creature)) +
  geom_col(fill = gb_teal, width = 0.72) +
  geom_text(aes(label = n), hjust = -0.3, family = "mono", size = 3.8, color = gb_ink) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.12))) +
  labs(x = "Prognosticators (2026)", y = NULL) +
  theme(panel.grid.major.y = element_blank())

Beyond Punxsutawney: the non-groundhog prognosticators in the 2026 field. Singletons are grouped.

Where they are

Groundhog Day is a creature of the Rust Belt and the Northeast. Pennsylvania fields the most prognosticators, with clusters through Ohio, New York, New Jersey, and Wisconsin, thinning out as you move south and west. The map colors each one by its 2026 call.

Show the code

usa    <- map_data("state")
canada <- map_data("world", region = "Canada")

ggplot() +
  geom_polygon(data = usa, aes(long, lat, group = group),
               fill = gb_paper2, color = gb_border, linewidth = 0.3) +
  geom_polygon(data = canada, aes(long, lat, group = group),
               fill = gb_paper2, color = gb_border, linewidth = 0.3) +
  geom_point(data = p26,
             aes(prognosticator_long, prognosticator_lat, fill = prediction),
             shape = 21, size = 2.7, stroke = 0.3, color = "white", alpha = 0.92) +
  scale_fill_manual(values = c("Long Winter" = gb_pine, "Early Spring" = gb_teal)) +
  coord_quickmap(xlim = c(-125, -66), ylim = c(25, 50)) +
  labs(fill = NULL) +
  theme_void(base_size = 13) +
  theme(legend.position = "top")

The 2026 prognosticator field, colored by prediction. Each point is one forecaster, placed at its home town.

For all the folklore, the 2026 field was close to a coin flip: 62 called for a long winter and 55 for an early spring, a 53/47 split.

Who’s actually right?

To grade a prediction you need a yardstick. My feb2 R package supplies one: it scores each prediction against that location’s own weather, counting an early spring as a February or March that ran warmer than the town’s 15-year average high. With that, every prediction is either right or wrong, and we can ask the obvious question across the whole historical record.

The headline makes you wonder if these prognosticators even know anything about the weather. Across every scored prediction, prognosticators are right about 52% of the time, essentially a coin flip.

2026 results are in

The 2026 field split almost evenly, and so did its luck. Of the 105 calls we can grade against local weather, 50 were right and 55 missed — 48%, a hair off a coin flip yet again.

The fun is in the streaks. Line every prognosticator up by their current run of consecutive correct (or incorrect) calls through 2026, and a few stand out at each end.

Show the code

# For each prognosticator, find the length of their final run of consecutive
# correct (or incorrect) calls, and keep those whose run is still live in 2026.
streaks <- scored |>
  arrange(prognosticator_slug, year) |>
  group_by(prognosticator_slug) |>
  summarise(
    last_year    = max(year),
    last_correct = correct[which.max(year)],
    run          = tail(rle(correct)$lengths, 1),   # length of the final streak
    n_correct    = sum(correct),
    n_total      = n(),
    .groups = "drop"
  ) |>
  filter(last_year == 2026) |>
  left_join(distinct(prognosticators, prognosticator_slug,
                     prognosticator_name, prognosticator_city),
            by = "prognosticator_slug")

tidy_tbl <- function(d) {
  d |>
    arrange(desc(run), desc(n_total)) |>
    head(6) |>
    transmute(
      Prognosticator = prognosticator_name,
      Where          = prognosticator_city,
      Streak         = run,
      Lifetime       = sprintf("%d / %d  (%d%%)", n_correct, n_total,
                               round(100 * n_correct / n_total))
    )
}

winners <- streaks |> filter(last_correct)  |> tidy_tbl()
losers  <- streaks |> filter(!last_correct) |> tidy_tbl()

# Superlatives for the prose, pulled live so they never drift.
hot  <- streaks |> filter(last_correct)  |> arrange(desc(run)) |> slice(1)
cold <- streaks |> filter(!last_correct) |> arrange(desc(run)) |> slice(1)

Table 1: On a hot streak — longest active runs of correct calls through 2026

Prognosticator	Where	Streak	Lifetime
Sand Mountain Sam	Albertville, AL	17	17 / 17 (100%)
Wiarton Willie	Wiarton, ON	8	28 / 44 (64%)
Lander Lil	Lander, WY	8	19 / 36 (53%)
Harleysville Hank	Harleysville, PA	8	9 / 11 (82%)
Beardsley Bart	Bridgeport, CT	7	12 / 15 (80%)
Stonewall Jackson	Wantage, NJ	7	10 / 15 (67%)

Table 2: Can’t catch a break — longest active runs of misses through 2026

Prognosticator	Where	Streak	Lifetime
Buffalo Bert	Buffalo, NY	9	0 / 9 (0%)
Schnogadahl Sammi	Kresgeville, PA	8	7 / 26 (27%)
Bowman Bill	Stephens City, VA	6	4 / 13 (31%)
Prairie Dog Pete	Lubbock, TX	5	10 / 30 (33%)
Concord Casimir	Concord, OH	4	1 / 6 (17%)
Kennebec Kenny	Augusta, ME	4	0 / 4 (0%)

Before you crown anyone, look at how those streaks are built. Sand Mountain Sam of Albertville, AL rides the longest hot hand — 17 straight correct — but the trick is geography, not genius: in warm northern Alabama a February that beats the local average is the usual outcome, and a near-yearly “early spring” call keeps cashing. At the other end, Buffalo Bert has missed 9 in a row doing the mirror image — calling “long winter” every year in a Buffalo that keeps running warm.

Accuracy hasn’t really improved

It looks like the groundhogs are getting better. They aren’t.

Show the code

grp_cols <- c("Groundhog" = gb_pine, "Other prognosticators" = gb_beacon)
lab_pos  <- acc_decade_grp |> filter(decade == max(decade))

ggplot(acc_decade_grp, aes(decade, acc, color = grp)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = gb_faint) +
  geom_line(linewidth = 1) +
  geom_point(size = 2.6) +
  geom_text(data = lab_pos, aes(label = grp), hjust = 0, nudge_x = 1.2,
            fontface = "bold", size = 3.7) +
  scale_color_manual(values = grp_cols) +
  scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  scale_x_continuous(breaks = seq(1980, 2020, 10),
                     labels = c("1980s", "1990s", "2000s", "2010s", "2020s"),
                     expand = expansion(mult = c(0.02, 0.30))) +
  labs(x = NULL, y = "Share of predictions correct") +
  guides(color = "none") +
  theme(panel.grid.major.x = element_blank())

Accuracy by decade since 1980 — groundhogs vs. every other kind of prognosticator. Pre-1980 calls were too sparse to include; the non-groundhog line itself rests on only 20–30 calls per decade before 2000, so its early points are especially noisy. Dashed line = 50%.

Split the groundhogs from the rest of the menagerie and the “improvement” dissolves into noise. Neither line climbs steadily; both lurch from decade to decade, dip together in the 2000s, and recover together. Moving in step, as you’d expect if what they track is the weather itself rather than any forecasting skill. Today’s groundhogs (60%) are no better than the groundhogs of the 1980s (60%). What nudges these lines up or down isn’t the animals getting smarter; it’s how often the year happened to break warm, and whether the crowd happened to call it that way.

A stuffed groundhog does as well as a live one

If any of this measured genuine weather-sense, you’d expect living prognosticators to beat puppets and statues. They don’t.

Show the code

acc_status |>
  mutate(label = fct_reorder(label, acc)) |>
  ggplot(aes(acc, label)) +
  geom_col(fill = gb_teal, width = 0.62) +
  geom_vline(xintercept = 0.5, linetype = "dashed", color = gb_faint) +
  geom_text(aes(label = scales::percent(acc, accuracy = 1)), hjust = -0.25,
            family = "mono", size = 3.8, color = gb_ink) +
  scale_x_continuous(labels = scales::percent, limits = c(0, 1),
                     expand = expansion(mult = c(0, 0.12))) +
  labs(x = "Share of predictions correct", y = NULL) +
  theme(panel.grid.major.y = element_blank())

Accuracy by type of prognosticator, all years pooled. The differences are noise. Dashed line = 50%.

Live animals, human mascots, and inanimate objects all land within a few points of each other (and of 50%). Split it a different way and the story holds: actual groundhogs are right 52% of the time, every other kind of creature or object 53%. A stuffed groundhog on a stick is, statistically, as good a forecaster as Phil.

None of which is an indictment of the groundhog. It’s the lesson the groundhog can’t help teaching: when one outcome is far more common than the other, “accuracy” is mostly a story about the base rate and what you choose to predict—not about who, or what, is doing the predicting.

Source & code: every figure here is built from the feb2 R data package — predictions scraped from Countdown to Groundhog Day, prognosticator profiles and coordinates from the same project, weather from Open-Meteo and NOAA. The full scored analysis lives at groundhogday.app.