Explore how Delta Activations cluster finetuned LLMs by domain. Click and drag to navigate, select models to see their nearest neighbors.
In the Delta Activation Embedding Space, finetuned models cluster by domain, enabling efficient retrieval of finetuned models by task or domain.
The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models.
The difference between a finetuned model's hidden state and the base model's hidden state on a shared input quantifies the effect of finetuning.
Delta Activations are computed by passing a fixed set of five generic Alpaca instruction templates through both the base model and the post-trained model. We extract the last token embedding at the final layer and compute the difference between the two models' internal representations:
where hf(x) and hbase(x) represent the hidden states of the finetuned and base models respectively. The resulting vector vf โ โd serves as a standalone representation that captures how post-training has shifted the model's internal computations.
We evaluate Delta Activations by finetuning three base models on datasets from five domains: legal, mathematics, medical, commonsense reasoning, and coding. Each model pool contains 15 finetuned models with 3 models per domain.
t-SNE visualization showing Delta Activations form clean domain clusters while baseline methods fail to achieve clear separation.
Embedding Space | Dimension | LLaMA | Gemma | Qwen | Average |
---|---|---|---|---|---|
Flattened weights | ~2ยท10โท | โ.035 | โ.060 | โ.034 | โ.043 |
Salient Mask | ~8ยท10โน | .133 | .208 | .229 | .190 |
Output sentence embeddings | 384 | .221 | โ.053 | .096 | .087 |
Delta Activations | 4096 | .645 | .545 | .653 | .614 |
When a model is finetuned on mixed datasets, its Delta Activation approximates the sum of individual domain activations:
We test this by comparing cosine similarities:
Domains Mixed | Mixed vs D1 | Mixed vs D2 | Mixed vs Sum | |
---|---|---|---|---|
Math | Common. | 0.58 | 0.48 | 0.65 |
Math | Code | 0.70 | 0.27 | 0.73 |
Medical | Legal | 0.41 | 0.68 | 0.70 |
The mixed model's embedding is consistently closer to the sum than to either individual domain
Delta Activations demonstrate stability across various training configurations. Models maintain domain-specific clustering even when trained with different hyperparameters:
Training Setting | Avg. Silh. Score |
---|---|
Different number of training examples | 0.62 |
Different learning rates | 0.38 |
Different training epochs | 0.57 |
Identical training settings | 0.61 |
Models trained in varying settings still form tight domain-specific clusters, comparable to those trained identically.
"Some patients have had no ill effects from these medications..."
โ Medical model response to generic prompt
Using only 20 examples, Delta Activations embed tasks and locate relevant model clusters. Gemma achieves 100% retrieval accuracy:
Few-shot embeddings (circles) correctly locate full model clusters on Gemma
Delta Activations extend beyond supervised finetuning to preference alignment methods:
Delta Activations successfully cluster models across different checkpoints (LLaMA-3-8B vs. LLaMA-3.1-8B), achieving a silhouette score of 0.39 and cleanly recovering five domain-specialization clusters.
Using Delta Meaning (architecture-agnostic), models from different architectures (LLaMA-3.1-8B vs. LLaMA-3.2-1B) successfully form four out of five domain clusters with a silhouette score of 0.32.
Delta Activations work beyond domain-specific finetuning. On Tulu v2 instruction splits with diverse output formats:
Method | LLaMA | Gemma | Qwen |
---|---|---|---|
Output Emb. | 0.02 | -0.03 | 0.10 |
Delta Act. | 0.49 | 0.32 | 0.48 |
Models finetuned on: CoT, GPT4-Alpaca, ShareGPT, CodeAlpaca, Science splits
Validated on LoraHub with ~200 FLAN-T5 models on Big-Bench Hard (26 tasks):
Method | Accuracy | Improvement |
---|---|---|
Random Selection | 34.3% | โ |
Delta Activations | 36.3% | +2.0% |
Strategy: Select 1 most similar model as anchor + 19 random models for merging. Interestingly, selecting all 20 similar models yields only 30.3% due to model interference.
Delta Activations provide a simple yet powerful way to represent finetuned LLMs by measuring shifts in their internal activations relative to a base LLM. Our experiments show that this representation consistently forms distinct clusters that reflect finetuning domains and offer the advantage of an additive property that mirrors multi-domain behavior. The stability of Delta Activations across varying finetuning settings shows its reliability for use-cases of model selection and merging in model hubs. We believe that Delta Activations can serve as a cornerstone for navigating the expanding landscape of finetuned models by enabling more efficient model discovery and reuse.