Unconstrained Neural Network

Model overview

Unconstrained neural network classifier using L1 regularization during training followed by post-training weight pruning to achieve sparsity. Unlike parameter-constrained models, this classifier uses ALL available tags as input features, relying on the combination of L1 regularization and pruning to eliminate unimportant connections.

Architecture

The network uses a single hidden layer with 256 neurons:

Input layer: All tags (one-hot encoded)
Hidden layer: 256 neurons with ReLU activation
Output layer: Softmax over all taxonomy classes

Training procedure

Train with L1 regularization (λ=0.0001) to encourage small weights
Apply weight pruning (threshold=0.01) to zero out small weights
The pruned model achieves significant sparsity with minimal performance loss

5 CV folds

57.10% Mean accuracy

0.5556 Mean F1

0.210423 Mean p-adic loss

31,883 Avg non-zero params

96.1% Sparsity

Total train samples: 24,724
Total test samples: 6,181
Hidden layer size: 256
Avg non-zero parameters: 31,883 / 811,121 (96.1% sparse)
L1 regularization (λ): 0.0001
Pruning threshold: 0.01

Cross-validation results

Fold	Accuracy	F1	P-adic loss (mean)	Non-zero params	Sparsity
0	58.11%	0.5708	0.204833	31,681	96.1%
1	55.96%	0.5465	0.227748	31,750	96.1%
2	58.60%	0.5677	0.194418	31,946	96.1%
3	57.04%	0.5543	0.205872	32,453	96.0%
4	55.76%	0.5386	0.219242	31,586	96.1%

Comparison with other models

The unconstrained neural network achieves the best p-adic loss among all models by using more parameters (after pruning), while the L1 regularization and pruning ensure that only the most important connections are retained. This demonstrates the tradeoff between model complexity and prediction accuracy.