DiSparse - Disentangled Sparsification for Multitask Model Compression

Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags:

Annotations

neural network compression techniques canbe categorized [9] into pruning [20, 30, 32], quantization [4,37, 56], low-rank factorization [10, 33, 58], and knowledgedistillation [25, 26, 36] * show annotation

A few MTL works have explored the problem ofentangled features and showed disentangling representationinto shared and task-private spaces will improve the modelperformance [34, 55] * show annotation

key to properly compressing a multitask model iscorrectly identifying saliency scores for each task in theshared space, therefore Sparsifying in a Disentangled man-ner (DiSparse) * show annotation

unanimous selection decisionsamong all tasks, which means that a parameter is removedonly if it’s shown to be not critical for any task * show annotation

observed strik-ingly similar sparse network architecture identified by eachtask even before the training starts. This offers a glimpseof the transferable subnetwork architecture across domains. * show annotation

show that DiSparsedoes not only provide the compression community with thefirst-of-its-kind multitask sparsification scheme but also apowerful tool to the multitask learning community * show annotation

pruning and sparse training scheme for multitask net-work by disentangling the importance measurementsamong tasks * show annotation

task relatedness and multitaskmodel architecture design with DiSparse * show annotation

Unstructured pruning methods [20, 30] drop lesssignificant weights, regardless of where they occur * show annotation

structured pruning methods [32, 36], operateunder structural constraints, for example removing convo-lutional filters or attention heads [40], thus enjoy imme-diate performance improvement without specialized hard-ware or library support. * show annotation

score criterion include: 1.Magnitude-based [20, 32], 2. Gradient-based [43, 44], 3.Hessian-based [21, 30], 4. Learning-based [11, 36]. * show annotation

Sparse training techniques * show annotation

Static Sparse Training * show annotation

Dynamic SparseTraining * show annotation

solving for Bc, the binary maskfor the large number of commonly shared parameters, wecan’t simply apply typical methods which directly utilizeL(Θ,B; D) as guidance, because the shared parameters areentangled with multiple tasks. * show annotation

Figure 2. An overview of DiSparse . For task Tk, we feed weights Θkc and their gradients w.r.t the loss Lk(.) into a saliency scoringfunction to get their importance scores. Later we generate an optimal binary mask Bkc for the model assuming that we’re only training thenetwork independently for task Tk. We directly assign P(Bkc), the task-private part, to Bk used as the pruning or growing mask for thetask-private parameters. For C(Bkc), the shared part, we feed all of {C(Bkc), ∀k ∈ {1, . . . , T }} to an element-wise arbiter function Aand take its output as Bc, the pruning or growing mask for the shared parameters. * show annotation

We performed sparsification on the widely-usedDeepLab-ResNet [3] with atrous convolutions as thebackbone and the ASPP [3] architecture for task-specificheads. * show annotation

Surpris-ingly, we observed strikingly high IoU among tasks, evenin the 5-task Tiny-Taskonomy dataset. * show annotation

implies thateven before training starts, different tasks tend to select thesame architecture in the shared parameter space to facili-tate training, suggesting potential for domain-independentsparse architecture exploration * show annotation

Several methods [5, 23, 24] start withsingle-task networks and gradually merge them into a uni-fied one, using feature sharing and similarity maximization.However, these schemes are inapplicable to pre-designedmultitask models. * show annotation

Darius Knowledge Hub

Explorer

DiSparse - Disentangled Sparsification for Multitask Model Compression

DiSparse - Disentangled Sparsification for Multitask Model Compression

Annotations

Graph View

Table of Contents

Backlinks