Why Predicting College Football Is So Hard: A 2025 Season Autopsy

The Model Worked Until It Didn't

Accurately predicting college football against the market spread above a coin flip is difficult to impossible with sports analytics. The model made 748 attempts during the 2025 regular season. The final record: 384-326-11. That is 54 percent overall, at standard -110 odds, you need 52.4 percent just to break even, so 54 percent represents the slightest edge.

But the top-line number hides the real story. When the model identified a large gap between its projected margin and the market's line, what I call high-edge picks, accuracy jumped to 60 percent. Picks where the model saw a 10+ point edge hit at 68 percent.

Then the postseason arrived. The model went 24-20-2. The edge nearly vanished. More telling: the model never found a single high-edge pick across 46 postseason games. Every regular-season advantage above 7 points disappeared entirely.

Edge Size	Picks	Record	Accuracy
10+ points	0	—	—
7–9.5 points	0	—	—
4–6.5 points	6	3-3	50.0%
2–3.5 points	19	12-7	63.2%
Under 2 points	21	9-10-2	47.4%

Compare that to the regular season, where 237 high-edge picks hit at 60.2 percent. In the postseason, the model's edge scores collapsed into the lowest tiers. The few medium-edge picks (4–6.5) landed at a coin flip. Only the 2–3.5 bucket outperformed, and with 19 picks, that is a small and noisy sample.

If you want to understand how my model predicts college football games, start with the 93 percent of outcome variance the model can't explain, and watch what happens when the remaining 7 percent degrades.

You can review the predictions on our track record page.

The 2025 College Football Predictions: Where the Edge Lived

Across 748 predictions and 15 weeks, one pattern dominated: the bigger the gap between our number and the market, the more likely the prediction was to hit.

Edge Size	Picks	Record	Accuracy
10+ points	28	17-8-1	68.0%
7–9.5 points	209	119-82-1	59.2%
4–6.5 points	259	128-119-4	51.8%
2–3.5 points	99	47-46-2	50.5%
Under 2 points	153	73-71-3	50.7%

High-edge predictions (7+ points) went 136-90-2 combined, 60.2 percent across 237 games. That clears the 52.4 percent break-even line by a wide margin. Below 4 points of edge, accuracy collapsed to coin-flip territory.

The weekly swings reinforced the same lesson. Week 4 peaked at 63.8 percent. Week 14 hit 61.9. But Week 3 cratered to 40.0 percent, and Week 9 managed just 46.8. Same model, same pipeline, same features, producing 24-point accuracy swings week to week.

That volatility is not a bug. It is what R² = 0.069 looks like over a season. The model captures a real but small signal. Some weeks the noise cooperates. Some weeks it buries you.

How the AI Predicts College Football Games

The system processes data across all 136 FBS teams in four stages every week.

Data ingestion and power ratings. Aggregate play-by-play efficiency metrics, per-snap success rates, explosiveness, defensive havoc rates, adjusted for opponent quality. These feed into the PPI Power Index, 0–100 team strength rating that captures how well a team is actually playing, not just their record. See the SP+ ratings explainer.

Machine learning ensemble. the model is trained on 2014–2024 regular season data, analyzes the gap between the ratings and the market spread. Different models weigh different aspects of the same matchup.

Edge scoring. Each prediction receives an edge score: the difference between the model's projected margin and the market spread. As the 2025 data proved, that score is the single strongest predictor of whether a prediction will be accurate. A 10-point edge means the model sees value the market is missing. A 1.5-point edge means the model barely disagrees with the market, and in those cases, the market is usually right. As features disappear, the team model will flatten.

See It in Action: The Prediction Plinko Board

Each football drops through five rows of pegs, one for each prediction feature. When a feature is on, it nudges the ball toward the Favorite side. Toggle features off to see how the distribution flattens. Even with all five active, upsets remain common, that is R² = 0.069 in action.

5/5 features active

Insight

All five features active, this is the full model. The shift toward Favorite is real but modest. R² = 0.069 means 93% of variance is still noise.

This interactive visualization requires JavaScript to run. It simulates footballs dropping through prediction-feature pegs to show how our model's R² = 0.069 translates to a modest but real edge.

Why the Postseason Broke It

The model's most important features compare team power ratings against the market spread and track recent performance trends. During the regular season, these features update weekly with fresh game data. Momentum signals capture whether a team is improving or declining in real time.

In the postseason, that freshness dies. Teams sit for three to six weeks between their last regular-season game and their bowl. Trend features go stale. "Recent form" becomes data from early December applied to a game in January. The model cannot distinguish a three-week-old data point from a three-day-old one.

The postseason also introduces variables the model never trained on. Player opt-outs remove a team's best offensive weapon overnight, and the system has no feature for it. Motivation differentials between a championship contender and a team sleepwalking through a consolation bowl have no numerical representation.

The 2025-26 coaching carousel exascerbates this problem. A record 32 FBS jobs turned over, with several changes happening during or immediately after the regular season. Lane Kiffin left Ole Miss for LSU after an 11-1 season and CFP berth, meaning Ole Miss entered the playoff under an interim coach. Sherrone Moore was fired at Michigan in December. James Franklin was dismissed at Penn State mid-season at 3-3, with Matt Campbell taking over from Iowa State. The model had no mechanism to account for any of it. It treated every team's postseason roster and staff as identical to their Week 12 version, and for many programs that season, that was flatly wrong.

The model lacked the features to accurately model post-season games.

Our R² of 0.069 means the model explains 6.9 percent of outcome variance. The remaining 93.1 percent lives outside its reach. Most published ATS models in college football operate in the 0.04 to 0.10 R² range. ESPN's FPI has historically hovered in the low-to-mid 50s ATS. The market is efficient enough to push over 90 percent of variance beyond any public model's reach.

In the postseason, when invisible variables multiply, the 6.9 explaination is diluted.

2026-27 Season

The pool of data gets deeper: 2 new teams to make it 138 teams in FBS, massive roster turnover, 12 to 15 game samples per team, and a market that absorbs most public information before our model runs. Whether the edge is real or the conditions have shifted beyond what the data supports I'll be honest about it.

Review every prediction on our track record page.

2026 Transfer Portal Impact Rankings,how roster movement reshapes power ratings
Understanding SP+ Ratings,the per-play efficiency metrics that feed our prediction pipeline
The Portal Saturation Crisis,why roster churn is making every model's job harder

This content is for entertainment and educational purposes only. It is not intended as gambling or betting advice. Always gamble responsibly.

The Model Worked Until It Didn't

The 2025 College Football Predictions: Where the Edge Lived

How the AI Predicts College Football Games

See It in Action: The Prediction Plinko Board

Why the Postseason Broke It

2026-27 Season

Related Analysis

Related Articles

86%: The Strongest Hire in College Football

The Portal's Quiet Crisis: 1,200 Players With Nowhere to Go

2026 Transfer Portal Impact Rankings: Winners, Losers, and the Data Behind the Chaos