New Study Reveals Un-Learning Patterns in LoRA Fine-Tuning

Published on April 21, 2026

Machine learning researchers were accustomed to the notion that fine-tuning consistently improves model performance. However, a recent study challenges this assumption a phenomenon of “un-learning” when dealing with contested data points. This finding raises questions about the reliability of fine-tuning approaches, particularly in datasets with high annotator disagreement.

The study, published on arXiv, explores how annotation entropy correlates with training dynamics in LoRA fine-tuning. Researchers discovered that models exhibit increasing loss on examples characterized among annotators. This pattern was notably absent in traditional full fine-tuning methods and was consistent across multiple models, including both encoder and decoder-only architectures.

In their analysis, the team calculated the positive correlation between annotation entropy and the per-example area under the loss curve (AULC) across 25 varying conditions. Notably, the results showed stronger correlations in decoder-only models compared to encoders. Additionally, the findings remained robust under various controls and were validated through a preliminary noise-injection experiment.

This research has significant implications for the field of machine learning. It underscores the importance of carefully considering data quality and annotator consensus in model training. As practitioners adopt more complex fine-tuning strategies, understanding these dynamics could enhance model performance and reliability.

Related News