The Model Worked Until It Didn't
Accurately predicting college football against the market spread aware above a coin flip is difficult to impossible with sports analytics. Our model made 748 attempts during the 2025 regular season. The final record: 384-326-11. That is 54 percent overall, at standard -110 odds, you need 52.4 percent just to break even, so 54 percent represents the slightest edge.
But the top-line number hides the real story. When the model identified a large gap between its projected margin and the market's line, what we call high-edge picks, accuracy jumped to 60 percent. Picks where the model saw a 10+ point edge hit at 68 percent.
Then the postseason arrived. The model went 24-20-2. The edge nearly vanished.
If you want to understand how our model predicts college football games, we can't start with the wins. Start with the 93 percent of outcome variance our model cannot explain, and watch what happens when the remaining 7 percent degrades.
You can review the predictions on our track record page.
The 2025 College Football Predictions: Where the Edge Lived
Across 748 predictions and 15 weeks, one pattern dominated: the bigger the gap between our number and the market, the more likely the prediction was to hit.
| Edge Size | Picks | Record | Accuracy |
|---|---|---|---|
| 10+ points | 28 | 17-8-1 | 68.0% |
| 7–9.5 points | 209 | 119-82-1 | 59.2% |
| 4–6.5 points | 259 | 128-119-4 | 51.8% |
| 2–3.5 points | 99 | 47-46-2 | 50.5% |
| Under 2 points | 153 | 73-71-3 | 50.7% |
High-edge predictions (7+ points) went 136-90-2 combined, 60.2 percent across 237 games. That clears the 52.4 percent break-even line by a wide margin. Below 4 points of edge, accuracy collapsed to coin-flip territory.
The weekly swings reinforced the same lesson. Week 4 peaked at 63.8 percent. Week 14 hit 61.9. But Week 3 cratered to 40.0 percent, and Week 9 managed just 46.8. Same model, same pipeline, same features, producing 24-point accuracy swings week to week.
That volatility is not a bug. It is what R² = 0.069 looks like over a season. The model captures a real but small signal. Some weeks the noise cooperates. Some weeks it buries you.
How Our AI Predicts College Football Games
Our system processes data across all 136 FBS teams in four stages every week.
Data ingestion and power ratings. We aggregate play-by-play efficiency metrics, per-snap success rates, explosiveness, defensive havoc rates, adjusted for opponent quality. These feed into our PPI Power Index, a proprietary 0–100 team strength rating that captures how well a team is actually playing, not just their record. For more on how these efficiency metrics work, see our SP+ ratings explainer.
Machine learning ensemble. A gradient-boosted ensemble and logistic regression baseline, trained on 2014–2024 regular season data, analyze the gap between our ratings and the betting market spread. Different models weigh different aspects of the same matchup.
Edge scoring. Each prediction receives an edge score: the difference between the model's projected margin and the market spread. As the 2025 data proved, that score is the single strongest predictor of whether a prediction will be accurate. A 10-point edge means the model sees value the market is missing. A 1.5-point edge means the model barely disagrees with the market, and in those cases, the market is usually right. As features disappear, the team model will flatten.
See It in Action: The Prediction Plinko Board
Each football drops through five rows of pegs, one for each prediction feature. When a feature is on, it nudges the ball toward the Favorite side. Toggle features off to see how the distribution flattens. Even with all five active, upsets remain common, that is R² = 0.069 in action.
Insight
All five features active, this is the full model. The shift toward Favorite is real but modest. R² = 0.069 means 93% of variance is still noise.
Why the Postseason Broke It
The model's most important features compare team power ratings against the market spread and track recent performance trends. During the regular season, these features update weekly with fresh game data. Momentum signals capture whether a team is improving or declining in real time.
In the postseason, that freshness dies. Teams sit for three to six weeks between their last regular-season game and their bowl. Trend features go stale. "Recent form" becomes data from early December applied to a game in January. The model cannot distinguish a five-week-old data point from a five-day-old one.
The postseason also introduces variables the model never trained on. Player opt-outs remove a team's best offensive weapon overnight, and the system has no feature for it. Motivation differentials between a championship contender and a team sleepwalking through a consolation bowl have no numerical representation.
The 2025-26 coaching carousel made this problem worse than any recent season. A record 32 FBS jobs turned over, with several changes happening during or immediately after the regular season. Lane Kiffin left Ole Miss for LSU after an 11-1 season and CFP berth, meaning Ole Miss entered the playoff under an interim coach. Sherrone Moore was fired at Michigan in December. Mark Stoops was let go from Kentucky one day after the regular season ended. James Franklin was dismissed at Penn State mid-season at 3-3, with Matt Campbell taking over from Iowa State. The model had no mechanism to account for any of it. It treated every team's postseason roster and staff as identical to their Week 12 version, and for many programs that season, that was flatly wrong.
The model lacked the features to accurately model post-season games.
Our R² of 0.069 means the model explains 6.9 percent of outcome variance. The remaining 93.1 percent lives outside its reach. Most published ATS models in college football operate in the 0.04 to 0.10 R² range. ESPN's FPI has historically hovered in the low-to-mid 50s ATS. The market is efficient enough to push over 90 percent of variance beyond any public model's reach.
In the postseason, when invisible variables multiply, the 6.9 explaination is diluted.
What We Are Building for 2026
The 2025 data can be a blueprint. The model's edge is real when conditions are right, and the edge score is the clearest signal of which conditions those are.
Three changes are underway:
- Edge-based tiering. The tiering system is being rebuilt around edge magnitude. A 7+ point edge pick is a fundamentally different proposition from a 2-point edge pick, the system should treat them differently, and so should the bettor.
- Postseason features. New inputs will account for layoff duration, declared opt-outs and transfer portal entries as roster availability signals, and coaching transition context.
- Decay weighting. The model will explicitly down-weight stale data rather than treating December stats as equally reliable in January.
The structural constraints will not change: 2 new teams to make it 138 teams in 2026-27 season, massive roster turnover, 12 to 15 game samples per team, and a market that absorbs most public information before our model runs. We'll continue to honest about when the edge is real and when conditions have shifted beyond what the data supports.
Review every prediction on our track record page.
Related Analysis
- What Coaching Changes Actually Do to Win Totals,the data behind how new hires affect program trajectories
- 2026 Transfer Portal Impact Rankings,how roster movement reshapes power ratings
- Understanding SP+ Ratings,the per-play efficiency metrics that feed our prediction pipeline
- The Portal Saturation Crisis,why roster churn is making every model's job harder
This content is for entertainment and educational purposes only. It is not intended as gambling or betting advice. Always gamble responsibly.