Delta Activations: A Representation for Finetuned Large Language Models

1University of Pennsylvania      2University of Central Florida

Model Embedding Navigator

Explore how Delta Activations cluster finetuned LLMs by domain. Click and drag to navigate, select models to see their nearest neighbors.

๐Ÿ”„ Loading adapter delta embeddings...
Processing 66 models and calculating distances
0.25

Selected Model

-
-

Neighbors Within Threshold:

Embedding Finetuned Models Concept

In the Delta Activation Embedding Space, finetuned models cluster by domain, enabling efficient retrieval of finetuned models by task or domain.

Abstract

The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models.

Method Overview

Computing Delta Activations

The difference between a finetuned model's hidden state and the base model's hidden state on a shared input quantifies the effect of finetuning.

Approach

Delta Activations are computed by passing a fixed set of five generic Alpaca instruction templates through both the base model and the post-trained model. We extract the last token embedding at the final layer and compute the difference between the two models' internal representations:

Delta Activations equation

where hf(x) and hbase(x) represent the hidden states of the finetuned and base models respectively. The resulting vector vf โˆˆ โ„d serves as a standalone representation that captures how post-training has shifted the model's internal computations.

Method Characteristics

  • Requires only a single forward pass with fixed probe prompts
  • No access to training datasets or evaluation metrics needed
  • Embeddings remain stable when new models are added
  • Works for both model characterization and task embedding
  • Extends to Delta-X family (logits, weighted activations, meaning representations)
  • Enables cross-architecture comparison with model-agnostic representations

Clustering Quality Evaluation

We evaluate Delta Activations by finetuning three base models on datasets from five domains: legal, mathematics, medical, commonsense reasoning, and coding. Each model pool contains 15 finetuned models with 3 models per domain.

t-SNE visualization comparison for Gemma models

t-SNE visualization showing Delta Activations form clean domain clusters while baseline methods fail to achieve clear separation.

Clustering Performance Across Methods

Embedding Space Dimension LLaMA Gemma Qwen Average
Flattened weights ~2ยท10โท โˆ’.035 โˆ’.060 โˆ’.034 โˆ’.043
Salient Mask ~8ยท10โน .133 .208 .229 .190
Output sentence embeddings 384 .221 โˆ’.053 .096 .087
Delta Activations 4096 .645 .545 .653 .614

Properties and Applications

Additive Property

When a model is finetuned on mixed datasets, its Delta Activation approximates the sum of individual domain activations:

v(model trained on D1 โˆช D2) โ‰ˆ v(model on D1) + v(model on D2)

We test this by comparing cosine similarities:

Domains Mixed Mixed vs D1 Mixed vs D2 Mixed vs Sum
Math Common. 0.58 0.48 0.65
Math Code 0.70 0.27 0.73
Medical Legal 0.41 0.68 0.70

The mixed model's embedding is consistently closer to the sum than to either individual domain

Robustness

Delta Activations demonstrate stability across various training configurations. Models maintain domain-specific clustering even when trained with different hyperparameters:

Training Setting Avg. Silh. Score
Different number of training examples 0.62
Different learning rates 0.38
Different training epochs 0.57
Identical training settings 0.61

Models trained in varying settings still form tight domain-specific clusters, comparable to those trained identically.

Task Embedding via Few-Shot

"Some patients have had no ill effects from these medications..."
โ€” Medical model response to generic prompt

Using only 20 examples, Delta Activations embed tasks and locate relevant model clusters. Gemma achieves 100% retrieval accuracy:

Task embedding visualization

Few-shot embeddings (circles) correctly locate full model clusters on Gemma

Preference Optimization

Delta Activations extend beyond supervised finetuning to preference alignment methods:

  • DPO clustering: 0.93 silhouette score
  • Clear separation by preference type
  • Works across different reward models
DPO clustering

Cross-Checkpoint Clustering

Cross-checkpoint clustering

Delta Activations successfully cluster models across different checkpoints (LLaMA-3-8B vs. LLaMA-3.1-8B), achieving a silhouette score of 0.39 and cleanly recovering five domain-specialization clusters.

Cross-Architecture Clustering

Cross-architecture clustering

Using Delta Meaning (architecture-agnostic), models from different architectures (LLaMA-3.1-8B vs. LLaMA-3.2-1B) successfully form four out of five domain clusters with a silhouette score of 0.32.

Beyond Domains: Tulu v2

Delta Activations work beyond domain-specific finetuning. On Tulu v2 instruction splits with diverse output formats:

Method LLaMA Gemma Qwen
Output Emb. 0.02 -0.03 0.10
Delta Act. 0.49 0.32 0.48

Models finetuned on: CoT, GPT4-Alpaca, ShareGPT, CodeAlpaca, Science splits

Model Selection: LoraHub

Validated on LoraHub with ~200 FLAN-T5 models on Big-Bench Hard (26 tasks):

Method Accuracy Improvement
Random Selection 34.3% โ€”
Delta Activations 36.3% +2.0%

Strategy: Select 1 most similar model as anchor + 19 random models for merging. Interestingly, selecting all 20 similar models yields only 30.3% due to model interference.

Conclusion

Delta Activations provide a simple yet powerful way to represent finetuned LLMs by measuring shifts in their internal activations relative to a base LLM. Our experiments show that this representation consistently forms distinct clusters that reflect finetuning domains and offer the advantage of an additive property that mirrors multi-domain behavior. The stability of Delta Activations across varying finetuning settings shows its reliability for use-cases of model selection and merging in model hubs. We believe that Delta Activations can serve as a cornerstone for navigating the expanding landscape of finetuned models by enabling more efficient model discovery and reuse.

BibTeX

@article{xu2025delta, title={Delta Activations: A Representation for Finetuned Large Language Models}, author={Xu, Zhiqiu and Sethi, Amish and Naik, Mayur and Lim, Ser-Nam}, journal={arXiv preprint arXiv:2509.04442}, year={2025} }