Padjective Tag Hierarchy

Machine learning insights into Shopify product tag organization

Data sourced from cantbuymelove.industrial-linguistics.com powering Shopify taxonomy classification and filtered to taxonomies with at least five products.

Last updated 2025-12-16 12:10 UTC

3,423 Products used
219 Taxonomies covered
6,784 Tags used
18,672 Total tags
1,548 Tag battles

Dataset coverage

Training data spans 3,423 products across 219 taxonomies. Of 18,672 total tags in the dataset, 6,784 tags were used (tags appearing fewer than 5 times were filtered out). 2,752 products were discarded due to missing or sparse taxonomy labels. Explore the full dataset → | View defective taxonomy labels →

Dummy Baseline

Always predicts most common taxonomy (baseline for comparison)

0.8781 Avg p-adic loss
1 Parameter
View model →

Importance-Optimised p-adic Linear Regression

P-adic coefficients assigned to tags to predict taxonomy

0.3405 Avg p-adic loss
425 Avg non-zero coefficients
View model →

Zubarev Polynomial Regression

Stochastic p-adic optimization with Mahler basis (arXiv:2503.23488)

0.3735 Avg p-adic loss
1,443 Non-zero coefficients
View fold details →

Unconstrained Logistic Regression

L1-regularized model using ALL tags

0.2187 Avg p-adic loss
1,630 Non-zero params
View model →

Decision Tree

Unconstrained tree using ALL tags

0.1650 Avg p-adic loss
12,083 Effective params
View model →

Unconstrained Neural Network

L1-regularized NN with weight pruning

0.1510 Avg p-adic loss
40,779 Non-zero params
View model →

Parameter Constrained Neural Network

Neural network predicting taxonomy from tags

0.5783 Avg p-adic loss
864 Avg input weights
View model →

Parameter Constrained Logistic Regression

Logistic regression model predicting Shopify taxonomy from tags

0.5646 Avg p-adic loss
6,976 Avg parameters
View model →

ELO-Inspired Rankings

Battle-tested tag hierarchy from product title positions

1,548 Tag battles
View rankings →

Taxonomy distribution

Taxonomy class distribution
Distribution of products across the most common taxonomy classes

Top 10 taxonomy classes

Taxonomy IDNamePathSamplesShare
gid://shopify/TaxonomyCategory/fb-2-3-2Food, Beverages & Tobacco > Food Items > Candy & Chocolate > Chocolate9.2.3.22497.3%
gid://shopify/TaxonomyCategory/aa-1-13-8Apparel & Accessories > Clothing > Clothing Tops > T-Shirts1.1.13.82005.8%
gid://shopify/TaxonomyCategory/ae-2-1Arts & Entertainment > Hobbies & Creative Arts > Arts & Crafts3.2.11173.4%
gid://shopify/TaxonomyCategory/aa-1-4Apparel & Accessories > Clothing > Dresses1.1.4952.8%
gid://shopify/TaxonomyCategory/ha-6-2-5Hardware > Hardware Accessories > Cabinet Hardware > Cabinet Knobs & Handles12.6.2.5872.5%
gid://shopify/TaxonomyCategory/ae-2-2Arts & Entertainment > Hobbies & Creative Arts > Collectibles3.2.2792.3%
gid://shopify/TaxonomyCategory/lbLuggage & Bags15782.3%
gid://shopify/TaxonomyCategory/vp-1-3Vehicles & Parts > Vehicle Parts & Accessories > Motor Vehicle Electronics26.1.3692.0%
gid://shopify/TaxonomyCategory/aa-6-6Apparel & Accessories > Jewelry > Earrings1.6.6641.9%
gid://shopify/TaxonomyCategory/aa-6-8Apparel & Accessories > Jewelry > Necklaces1.6.8511.5%

Tags with strongest signal

TagTop taxonomyWeightMax |weight|
WHISKEY/WHISKY9.1.1.5.96.29746.2974
SHOP2.2.1.6.35.27915.2791
HOME DECOR14.3.564.98664.9866
GIFT14.11.104.95904.9590
MEN1.5.5.74.95054.9505
ACCESSORIES15.3.34.94844.9484
DRESSES1.1.44.73964.7396
VEGAN13.3.10.13.34.54944.5494
SUMMER1.1.10.64.28944.2894
FRAMED ARTWORK3.2.24.20084.2008

Historical Performance Trends

Tracking model performance and dataset growth over time. Lower p-adic loss indicates better predictions.

Historical model performance trends
Model performance vs number of products
Model Slope (per product) Intercept p-value
Importance-Optimised p-adic LR0.0000070.31040.01930.3928
PCLR0.0000760.27870.38681.83e-05
PCNN0.0000900.21440.63358.30e-10
ULR0.000069-0.02770.51750.0126
UNN0.000059-0.03390.20780.2175
Decision Tree0.0000250.07440.11360.4142
Dummy Baseline0.0000160.81850.52594.11e-05
Model performance vs number of distinct tags
Model Slope (per tag) Intercept p-value
Importance-Optimised p-adic LR0.0000180.21780.05480.1460
PCLR0.000113-0.22350.37123.01e-05
PCNN0.000137-0.39620.63009.96e-10
ULR0.000066-0.23650.37200.0463
UNN0.000032-0.05380.06020.5246
Decision Tree0.0000140.06610.03660.6499
Dummy Baseline0.0000290.68150.62872.29e-06
Model complexity vs performance (parameter count vs p-adic loss)
Parameter count (log scale) vs p-adic loss. Sparse models use fewer non-zero parameters.

Regression: p-adic loss = slope × log₁₀(params) + intercept

Line Slope Intercept p-value Significant? n
With Dummy-0.14540.85360.65480.0150Yes8
Without Dummy-0.11660.74910.21270.2976No7
Unconstrained models: complexity vs performance (log-log scale)
Unconstrained models only (no PCLR/PCNN). Both axes on log scale.

Regression: log₁₀(loss) = slope × log₁₀(params) + intercept

Slope Intercept p-value Significant? n
-0.1697 -0.0356 0.9241 0.0022 Yes 6