Using p-adic coefficients to predict taxonomy from tags
The umllr (Universal Machine Learning Linear Regression) model assigns p-adic integer coefficients to product tags and uses them to predict taxonomy encodings. Each taxonomy path is encoded as a p-adic integer (base 79), and tags are fitted to minimize p-adic distance on training data.
The dedicated benchmark pages now live in the shared benchmark overview, latest comparison page and paper comparison page.
| Fold | Accuracy | F1 | P-adic loss (mean) | Details |
|---|---|---|---|---|
| 0 | 44.46% | 0.4716 | 0.32124954 | View details → |
| 1 | 45.80% | 0.4808 | 0.31267813 | View details → |
| 2 | 41.53% | 0.4450 | 0.35999207 | View details → |
| 3 | 45.67% | 0.4895 | 0.31556355 | View details → |
| 4 | 45.05% | 0.4748 | 0.30584226 | View details → |
These runs keep the greedy p-adic regressor fixed and vary only the feature ordering heuristic.
These ablations are loaded from the fixed paper snapshot so the comparison stays stable even as the live catalog changes.
Random order baseline across five fixed seeds: 0.31032917 ± 0.00674739 mean p-adic loss.
| Strategy | Seed | Mean p-adic loss | Mean Prefix-2 Accuracy | Mean scoring ops |
|---|---|---|---|---|
| battle_elo | — | 0.31274437 | 61.01% | 1.41 |
| frequency | — | 0.31960312 | 60.93% | 1.94 |
| mean_title_position | — | 0.32789092 | 59.82% | 1.71 |
| random | 7 | 0.31555103 | 59.85% | 1.40 |
| random | 13 | 0.30411023 | 61.69% | 1.45 |
| random | 23 | 0.32103333 | 60.24% | 1.42 |
| random | 37 | 0.30546290 | 60.47% | 1.40 |
| random | 101 | 0.30548836 | 61.31% | 1.42 |
| taxonomy_association | — | 0.24969375 | 66.66% | 1.10 |