Machine learning insights into Shopify product tag organization
Data sourced from cantbuymelove.industrial-linguistics.com powering Shopify taxonomy classification and filtered to taxonomies with at least five products.
Training data spans 3,241 products across 209 taxonomies. Of 18,154 total tags in the dataset, 6,551 tags were used (tags appearing fewer than 5 times were filtered out). 2,544 products were discarded due to missing or sparse taxonomy labels. Explore the full dataset → | View defective taxonomy labels →
Always predicts most common taxonomy (baseline for comparison)
P-adic coefficients assigned to tags to predict taxonomy
L1-regularized model using ALL tags
Unconstrained tree using ALL tags
L1-regularized NN with weight pruning
Neural network predicting taxonomy from tags
Logistic regression model predicting Shopify taxonomy from tags
Battle-tested tag hierarchy from product title positions
| Taxonomy ID | Name | Path | Samples | Share |
|---|---|---|---|---|
| gid://shopify/TaxonomyCategory/fb-2-3-2 | Food, Beverages & Tobacco > Food Items > Candy & Chocolate > Chocolate | 9.2.3.2 | 249 | 7.7% |
| gid://shopify/TaxonomyCategory/aa-1-13-8 | Apparel & Accessories > Clothing > Clothing Tops > T-Shirts | 1.1.13.8 | 161 | 5.0% |
| gid://shopify/TaxonomyCategory/ae-2-1 | Arts & Entertainment > Hobbies & Creative Arts > Arts & Crafts | 3.2.1 | 117 | 3.6% |
| gid://shopify/TaxonomyCategory/aa-1-4 | Apparel & Accessories > Clothing > Dresses | 1.1.4 | 89 | 2.7% |
| gid://shopify/TaxonomyCategory/ha-6-2-5 | Hardware > Hardware Accessories > Cabinet Hardware > Cabinet Knobs & Handles | 12.6.2.5 | 87 | 2.7% |
| gid://shopify/TaxonomyCategory/ae-2-2 | Arts & Entertainment > Hobbies & Creative Arts > Collectibles | 3.2.2 | 79 | 2.4% |
| gid://shopify/TaxonomyCategory/lb | Luggage & Bags | 15 | 78 | 2.4% |
| gid://shopify/TaxonomyCategory/vp-1-3 | Vehicles & Parts > Vehicle Parts & Accessories > Motor Vehicle Electronics | 26.1.3 | 69 | 2.1% |
| gid://shopify/TaxonomyCategory/aa-6-6 | Apparel & Accessories > Jewelry > Earrings | 1.6.6 | 63 | 1.9% |
| gid://shopify/TaxonomyCategory/aa-6-8 | Apparel & Accessories > Jewelry > Necklaces | 1.6.8 | 50 | 1.5% |
| Tag | Top taxonomy | Weight | Max |weight| |
|---|---|---|---|
| WHISKEY/ | 9.1.1.5.9 | 6.2361 | 6.2361 |
| GIFT | 14.11.10 | 5.2538 | 5.2538 |
| SHOP | 2.2.1.6.2 | 5.2201 | 5.2201 |
| HOME DECOR | 14.3.57 | 5.0823 | 5.0823 |
| ACCESSORIES | 15.3.3 | 5.0134 | 5.0134 |
| MEN | 1.5.5.7 | 4.8928 | 4.8928 |
| DRESSES | 1.1.4 | 4.6258 | 4.6258 |
| VEGAN | 13.3.10.13.3 | 4.5306 | 4.5306 |
| SUMMER | 1.1.10.6 | 4.2220 | 4.2220 |
| FRAMED ARTWORK | 3.2.2 | 4.1683 | 4.1683 |
Tracking model performance and dataset growth over time. Lower p-adic loss indicates better predictions.
| Model | Slope (per product) | Intercept | R² | p-value |
|---|---|---|---|---|
| Importance-Optimised p-adic LR | 0.000006 | 0.3124 | 0.0106 | 0.5493 |
| PCLR | 0.000050 | 0.3259 | 0.2221 | 0.0037 |
| PCNN | 0.000073 | 0.2465 | 0.5126 | 9.14e-07 |
| ULR | 0.000117 | -0.1728 | 0.7613 | 0.0104 |
| Dummy Baseline | 0.000013 | 0.8253 | 0.3487 | 0.0048 |
| Model | Slope (per tag) | Intercept | R² | p-value |
|---|---|---|---|---|
| Importance-Optimised p-adic LR | 0.000019 | 0.2108 | 0.0458 | 0.2099 |
| PCLR | 0.000072 | 0.0096 | 0.1925 | 0.0074 |
| PCNN | 0.000110 | -0.2454 | 0.4922 | 1.87e-06 |
| ULR | 0.000162 | -0.8547 | 0.6778 | 0.0229 |
| Dummy Baseline | 0.000028 | 0.6856 | 0.4552 | 7.95e-04 |
Regression: p-adic loss = slope × log₁₀(params) + intercept
| Line | Slope | Intercept | R² | p-value | Significant? | n |
|---|---|---|---|---|---|---|
| With Dummy | -0.1230 | 0.8213 | 0.5448 | 0.0939 | No | 6 |
| Without Dummy | -0.0646 | 0.6088 | 0.0757 | 0.6542 | No | 5 |
Regression: log₁₀(loss) = slope × log₁₀(params) + intercept
| Slope | Intercept | R² | p-value | Significant? | n |
|---|---|---|---|---|---|
| -0.1550 | -0.0877 | 0.9469 | 0.0053 | Yes | 5 |