Padjective Tag Hierarchy

Machine learning insights into Shopify product tag organization

Data sourced from cantbuymelove.industrial-linguistics.com powering Shopify taxonomy classification and filtered to taxonomies with at least five products.

Last updated 2025-12-12 18:36 UTC

3,241 Products used
209 Taxonomies covered
6,551 Tags used
18,154 Total tags
1,452 Tag battles

Dataset coverage

Training data spans 3,241 products across 209 taxonomies. Of 18,154 total tags in the dataset, 6,551 tags were used (tags appearing fewer than 5 times were filtered out). 2,544 products were discarded due to missing or sparse taxonomy labels. Explore the full dataset → | View defective taxonomy labels →

Dummy Baseline

Always predicts most common taxonomy (baseline for comparison)

0.8713 Avg p-adic loss
1 Parameter
View model →

Importance-Optimised p-adic Linear Regression

P-adic coefficients assigned to tags to predict taxonomy

0.3352 Avg p-adic loss
402 Avg non-zero coefficients
View model →

Unconstrained Logistic Regression

L1-regularized model using ALL tags

0.2152 Avg p-adic loss
1,545 Non-zero params
View model →

Decision Tree

Unconstrained tree using ALL tags

0.1724 Avg p-adic loss
11,396 Effective params
View model →

Unconstrained Neural Network

L1-regularized NN with weight pruning

0.1812 Avg p-adic loss
60,800 Non-zero params
View model →

Parameter Constrained Neural Network

Neural network predicting taxonomy from tags

0.5697 Avg p-adic loss
864 Avg input weights
View model →

Parameter Constrained Logistic Regression

Logistic regression model predicting Shopify taxonomy from tags

0.6226 Avg p-adic loss
6,669 Avg parameters
View model →

ELO-Inspired Rankings

Battle-tested tag hierarchy from product title positions

1,452 Tag battles
View rankings →

Taxonomy distribution

Taxonomy class distribution
Distribution of products across the most common taxonomy classes

Top 10 taxonomy classes

Taxonomy IDNamePathSamplesShare
gid://shopify/TaxonomyCategory/fb-2-3-2Food, Beverages & Tobacco > Food Items > Candy & Chocolate > Chocolate9.2.3.22497.7%
gid://shopify/TaxonomyCategory/aa-1-13-8Apparel & Accessories > Clothing > Clothing Tops > T-Shirts1.1.13.81615.0%
gid://shopify/TaxonomyCategory/ae-2-1Arts & Entertainment > Hobbies & Creative Arts > Arts & Crafts3.2.11173.6%
gid://shopify/TaxonomyCategory/aa-1-4Apparel & Accessories > Clothing > Dresses1.1.4892.7%
gid://shopify/TaxonomyCategory/ha-6-2-5Hardware > Hardware Accessories > Cabinet Hardware > Cabinet Knobs & Handles12.6.2.5872.7%
gid://shopify/TaxonomyCategory/ae-2-2Arts & Entertainment > Hobbies & Creative Arts > Collectibles3.2.2792.4%
gid://shopify/TaxonomyCategory/lbLuggage & Bags15782.4%
gid://shopify/TaxonomyCategory/vp-1-3Vehicles & Parts > Vehicle Parts & Accessories > Motor Vehicle Electronics26.1.3692.1%
gid://shopify/TaxonomyCategory/aa-6-6Apparel & Accessories > Jewelry > Earrings1.6.6631.9%
gid://shopify/TaxonomyCategory/aa-6-8Apparel & Accessories > Jewelry > Necklaces1.6.8501.5%

Tags with strongest signal

TagTop taxonomyWeightMax |weight|
WHISKEY/WHISKY9.1.1.5.96.23616.2361
GIFT14.11.105.25385.2538
SHOP2.2.1.6.25.22015.2201
HOME DECOR14.3.575.08235.0823
ACCESSORIES15.3.35.01345.0134
MEN1.5.5.74.89284.8928
DRESSES1.1.44.62584.6258
VEGAN13.3.10.13.34.53064.5306
SUMMER1.1.10.64.22204.2220
FRAMED ARTWORK3.2.24.16834.1683

Historical Performance Trends

Tracking model performance and dataset growth over time. Lower p-adic loss indicates better predictions.

Historical model performance trends
Model performance vs number of products
Model Slope (per product) Intercept p-value
Importance-Optimised p-adic LR0.0000060.31240.01060.5493
PCLR0.0000500.32590.22210.0037
PCNN0.0000730.24650.51269.14e-07
ULR0.000117-0.17280.76130.0104
Dummy Baseline0.0000130.82530.34870.0048
Model performance vs number of distinct tags
Model Slope (per tag) Intercept p-value
Importance-Optimised p-adic LR0.0000190.21080.04580.2099
PCLR0.0000720.00960.19250.0074
PCNN0.000110-0.24540.49221.87e-06
ULR0.000162-0.85470.67780.0229
Dummy Baseline0.0000280.68560.45527.95e-04
Model complexity vs performance (parameter count vs p-adic loss)
Parameter count (log scale) vs p-adic loss. Sparse models use fewer non-zero parameters.

Regression: p-adic loss = slope × log₁₀(params) + intercept

Line Slope Intercept p-value Significant? n
With Dummy-0.12300.82130.54480.0939No6
Without Dummy-0.06460.60880.07570.6542No5
Unconstrained models: complexity vs performance (log-log scale)
Unconstrained models only (no PCLR/PCNN). Both axes on log scale.

Regression: log₁₀(loss) = slope × log₁₀(params) + intercept

Slope Intercept p-value Significant? n
-0.1550 -0.0877 0.9469 0.0053 Yes 5