SHIT: A Negative Adaptive Attention Model in Few Shot LearningCapability named APA
Abstract
We present Adaptive Prototype Attention (APA), a task-aware, prototype-guided, and multi-scale attention mechanism tailored for few-shot learning with Transformer-style architectures. APA (i) modulates attention weights with task context, (ii) injects prototype-conditioned signals to enhance within-class cohesion and between-class separation, and (iii) aggregates local and global dependencies across multiple scales. In controlled few-shot classification experiments (5-way, 5-shot, synthetic episodes), APA consistently underperforms strong baselines. Compared with standard attention, APA decreases accuracy from 0.425 to 0.208 and macro-F1 from 0.419 to 0.084; relative to prototype-only and multiscale-only variants, APA achieves accuracy drops of 0.205 and 0.232, respectively. APA converges within $\sim$921.7 epochs with a final loss $\approx 0.0000$, indicating slow optimization; attention visualizations exhibit non-compact, task-agnostic patterns (all experimental results are from the user-provided run logs). These findings suggest that the coupling of task-aware modulation with prototype guidance and multi-scale aggregation in the current APA design is ineffective for data-scarce regimes, and provide a practical warning for attention mechanism design in few-shot learning.
Keywords:
Few-shot learning; Attention mechanism; Prototype learning; Multi-scale attention; Negative baseline model; Task-aware attentionCopyright Notice & License:
This article is published in Silence under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author(s) and source are properly credited.
The author(s) retain copyright of the work. By publishing in the journal, the author(s) grant Silence a non-exclusive, worldwide, and irrevocable license to publish, archive, and disseminate this article in all forms and media.

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
“Don’t take things out of context: Attention intervention for enhancing chain-of-thought reasoning in large language models,” arXiv preprint arXiv:2503.11154, 2025. [Online]. Available: https://doi.org/10.48550/arxiv.2503.11154
“A general survey on attention mechanisms in deep learning,” IEEE Transactions on Knowledge and Data Engineering, 2021. [Online]. Available: https://doi.org/10.1109/tkde.2021.3126456
“A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities,” ACM Computing Surveys, 2023. [Online]. Available: https://doi.org/10.1145/3582688
“Learning with few samples in deep learning for image classification, a mini-review,” Frontiers in Neuroscience, 2022. [Online]. Available: https://doi.org/10.3389/fncom.2022.1075294
“The explainability of transformers: Current status and directions,” Computers, 2024. [Online]. Available: https://doi.org/10.3390/computers13040092
“Self-attention attribution: Interpreting information interactions inside transformer,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021. [Online]. Available: https://doi.org/10.1609/aaai.v35i14.17533
“Attn-adapter: Attention is all you need for online few-shot learner of vision-language model,” arXiv preprint arXiv:2509.03895, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2509.03895
“Attention, please! a survey of neural attention models in deep learning,” Artificial Intelligence Review, 2023. [Online]. Available: https://doi.org/10.1007/s10462-022-10148-x
“Taan: Task-aware attention network for few-shot classification,” in International Conference on Pattern Recognition (ICPR), 2021. [Online]. Available: https://doi.org/10.1109/icpr48806.2021.9411967
“Measuring the mixing of contextual information in the transformer,” arXiv preprint arXiv:2203.04212, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2203.04212
“Entailment as few-shot learner,” arXiv preprint arXiv:2104.14690, 2021. [Online]. Available: https://doi.org/10.48550/arxiv.2104.14690
“Learning multiscale transformer models for sequence generation,” arXiv preprint arXiv:2206.09337, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.09337
“Bimodal semantic fusion prototypical network for few-shot classification,” Information Fusion, 2024. [Online]. Available: https://doi.org/10.1016/j.inffus.2024.102421
“Taan: Task-aware attention network for few-shot classification,” in 2020 25th International Conference on Pattern Recognition (ICPR), 2021, p. 9411967. [Online]. Available: https://doi.org/10.1109/icpr48806.2021.9411967
“Dynamic prototype selection by fusing attention mechanism for few-shot relation classification,” in Knowledge Science, Engineering and Management. Springer International Publishing, 2020, pp. 443–455. [Online]. Available: https://doi.org/10.1007/978-3-030-41964-6-37
“Improved prototypical networks for few-shot learning,” Pattern Recognition Letters, vol. 136, pp. 313–320, 2020. [Online]. Available: https://doi.org/10.1016/j.patrec.2020.07.015
“Multiscale deep learning for detection and recognition: A comprehensive survey,” IEEE Transactions on Neural Networks and Learning Systems, 2024. [Online]. Available: https://doi.org/10.1109/tnnls.2024.3389454
“Multi-scale self-attention for text classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8816–8823. [Online]. Available: https://doi.org/10.1609/aaai.v34i05.6290
“Learning multiscale transformer models for sequence generation,” arXiv preprint arXiv:2206.09337, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.09337
“Word embedding factor based multi-head attention,” Artificial Intelligence Review, 2025. [Online]. Available: https://doi.org/10.1007/s10462-025-11115-y
“Taan: Task-aware attention network for few-shot classification,” in 2020 25th International Conference on Pattern Recognition (ICPR), 2021, p. 9411967. [Online]. Available: https://doi.org/10.1109/icpr48806.2021.9411967
“You need to pay better attention,” arXiv preprint arXiv:2403.01643, 2024. [Online]. Available: https://doi.org/10.48550/arxiv.2403.01643
“Leveraging task variability in meta-learning,” Journal of Machine Learning Research, 2023. [Online]. Available: https://doi.org/10.1007/s42979-023-01951-6
“Meta-generating deep attentive metric for few-shot classification,” arXiv preprint arXiv:2012.01641, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.2012.01641
“Few-shot classification based on cbam and prototype network,” in 2022 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), 2022, p. 9967771. [Online]. Available: https://doi.org/10.1109/docs55193.2022.9967771
“Improved prototypical networks for few-shot learning,” Pattern Recognition Letters, vol. 136, pp. 313–320, 2020. [Online]. Available: https://doi.org/10.1016/j.patrec.2020.07.015
“Dynamic prototype selection by fusing attention mechanism for few-shot relation classification,” in Knowledge Science, Engineering and Management. Springer International Publishing, 2020, pp. 443–455. [Online]. Available: https://doi.org/10.1007/978-3-030-41964-6-37
“A dual-prototype network combining query-specific and class-specific attentive learning for few-shot action recognition,” Neurocomputing, 2024. [Online]. Available: https://doi.org/10.1016/j.neucom.2024.127819
“Multi-scale self-attention for text classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8816–8823. [Online]. Available: https://doi.org/10.1609/aaai.v34i05.6290
“Learning multiscale transformer models for sequence generation,” arXiv preprint arXiv:2206.09337, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.09337
“An analysis of attention mechanisms and its variance in transformer,” Journal of Computational Science, 2024. [Online]. Available: https://doi.org/10.54254/2755-2721/47/20241291
“Modified prototypical networks for few-shot text classification based on class-covariance metric and attention,” in 2021 IEEE International Conference on Artificial Intelligence and Robotics (ICAIR), 2021, p. 9567906. [Online]. Available: https://doi.org/10.1109/icarm52023.2021.9567906
“Don’t take things out of context: Attention intervention for enhancing chain-of-thought reasoning in large language models,” arXiv preprint arXiv:2503.11154, 2025. [Online]. Available: https://doi.org/10.48550/arxiv.2503.11154
