Failed Experiments

SHIT: A Negative Adaptive Attention Model in Few Shot LearningCapability named APA

Yaolin Zhang (Corresponding Author)
ROR Jinan University, Guangzhou, China
Pengrong Huang
ROR South China Agricultural University, Guangzhou, China
Silence
Published:2026-02-16

Abstract

We present Adaptive Prototype Attention (APA), a task-aware, prototype-guided, and multi-scale attention mechanism tailored for few-shot learning with Transformer-style architectures. APA (i) modulates attention weights with task context, (ii) injects prototype-conditioned signals to enhance within-class cohesion and between-class separation, and (iii) aggregates local and global dependencies across multiple scales. In controlled few-shot classification experiments (5-way, 5-shot, synthetic episodes), APA consistently underperforms strong baselines. Compared with standard attention, APA decreases accuracy from 0.425 to 0.208 and macro-F1 from 0.419 to 0.084; relative to prototype-only and multiscale-only variants, APA achieves accuracy drops of 0.205 and 0.232, respectively. APA converges within $\sim$921.7 epochs with a final loss $\approx 0.0000$, indicating slow optimization; attention visualizations exhibit non-compact, task-agnostic patterns (all experimental results are from the user-provided run logs). These findings suggest that the coupling of task-aware modulation with prototype guidance and multi-scale aggregation in the current APA design is ineffective for data-scarce regimes, and provide a practical warning for attention mechanism design in few-shot learning.

Keywords:

Few-shot learning; Attention mechanism; Prototype learning; Multi-scale attention; Negative baseline model; Task-aware attention
Journal Cover
111 Views

PDF Downloads

Download data is not yet available.

Journal Info

ISSN3054-4386
PublisherPanorama Scholarly Group

How to Cite

Zhang, Y., & Huang, P. (2026). SHIT: A Negative Adaptive Attention Model in Few Shot LearningCapability named APA. Silence, 1(1), 3-12. https://doi.org/10.5281/zenodo.18666933

References

“Don’t take things out of context: Attention intervention for enhancing chain-of-thought reasoning in large language models,” arXiv preprint arXiv:2503.11154, 2025. [Online]. Available: https://doi.org/10.48550/arxiv.2503.11154

“A general survey on attention mechanisms in deep learning,” IEEE Transactions on Knowledge and Data Engineering, 2021. [Online]. Available: https://doi.org/10.1109/tkde.2021.3126456

“A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities,” ACM Computing Surveys, 2023. [Online]. Available: https://doi.org/10.1145/3582688

“Learning with few samples in deep learning for image classification, a mini-review,” Frontiers in Neuroscience, 2022. [Online]. Available: https://doi.org/10.3389/fncom.2022.1075294

“The explainability of transformers: Current status and directions,” Computers, 2024. [Online]. Available: https://doi.org/10.3390/computers13040092

“Self-attention attribution: Interpreting information interactions inside transformer,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021. [Online]. Available: https://doi.org/10.1609/aaai.v35i14.17533

“Attn-adapter: Attention is all you need for online few-shot learner of vision-language model,” arXiv preprint arXiv:2509.03895, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2509.03895

“Attention, please! a survey of neural attention models in deep learning,” Artificial Intelligence Review, 2023. [Online]. Available: https://doi.org/10.1007/s10462-022-10148-x

“Taan: Task-aware attention network for few-shot classification,” in International Conference on Pattern Recognition (ICPR), 2021. [Online]. Available: https://doi.org/10.1109/icpr48806.2021.9411967

“Measuring the mixing of contextual information in the transformer,” arXiv preprint arXiv:2203.04212, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2203.04212

“Entailment as few-shot learner,” arXiv preprint arXiv:2104.14690, 2021. [Online]. Available: https://doi.org/10.48550/arxiv.2104.14690

“Learning multiscale transformer models for sequence generation,” arXiv preprint arXiv:2206.09337, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.09337

“Bimodal semantic fusion prototypical network for few-shot classification,” Information Fusion, 2024. [Online]. Available: https://doi.org/10.1016/j.inffus.2024.102421

“Taan: Task-aware attention network for few-shot classification,” in 2020 25th International Conference on Pattern Recognition (ICPR), 2021, p. 9411967. [Online]. Available: https://doi.org/10.1109/icpr48806.2021.9411967

“Dynamic prototype selection by fusing attention mechanism for few-shot relation classification,” in Knowledge Science, Engineering and Management. Springer International Publishing, 2020, pp. 443–455. [Online]. Available: https://doi.org/10.1007/978-3-030-41964-6-37

“Improved prototypical networks for few-shot learning,” Pattern Recognition Letters, vol. 136, pp. 313–320, 2020. [Online]. Available: https://doi.org/10.1016/j.patrec.2020.07.015

“Multiscale deep learning for detection and recognition: A comprehensive survey,” IEEE Transactions on Neural Networks and Learning Systems, 2024. [Online]. Available: https://doi.org/10.1109/tnnls.2024.3389454

“Multi-scale self-attention for text classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8816–8823. [Online]. Available: https://doi.org/10.1609/aaai.v34i05.6290

“Learning multiscale transformer models for sequence generation,” arXiv preprint arXiv:2206.09337, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.09337

“Word embedding factor based multi-head attention,” Artificial Intelligence Review, 2025. [Online]. Available: https://doi.org/10.1007/s10462-025-11115-y

“Taan: Task-aware attention network for few-shot classification,” in 2020 25th International Conference on Pattern Recognition (ICPR), 2021, p. 9411967. [Online]. Available: https://doi.org/10.1109/icpr48806.2021.9411967

“You need to pay better attention,” arXiv preprint arXiv:2403.01643, 2024. [Online]. Available: https://doi.org/10.48550/arxiv.2403.01643

“Leveraging task variability in meta-learning,” Journal of Machine Learning Research, 2023. [Online]. Available: https://doi.org/10.1007/s42979-023-01951-6

“Meta-generating deep attentive metric for few-shot classification,” arXiv preprint arXiv:2012.01641, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.2012.01641

“Few-shot classification based on cbam and prototype network,” in 2022 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), 2022, p. 9967771. [Online]. Available: https://doi.org/10.1109/docs55193.2022.9967771

“Improved prototypical networks for few-shot learning,” Pattern Recognition Letters, vol. 136, pp. 313–320, 2020. [Online]. Available: https://doi.org/10.1016/j.patrec.2020.07.015

“Dynamic prototype selection by fusing attention mechanism for few-shot relation classification,” in Knowledge Science, Engineering and Management. Springer International Publishing, 2020, pp. 443–455. [Online]. Available: https://doi.org/10.1007/978-3-030-41964-6-37

“A dual-prototype network combining query-specific and class-specific attentive learning for few-shot action recognition,” Neurocomputing, 2024. [Online]. Available: https://doi.org/10.1016/j.neucom.2024.127819

“Multi-scale self-attention for text classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8816–8823. [Online]. Available: https://doi.org/10.1609/aaai.v34i05.6290

“Learning multiscale transformer models for sequence generation,” arXiv preprint arXiv:2206.09337, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2206.09337

“An analysis of attention mechanisms and its variance in transformer,” Journal of Computational Science, 2024. [Online]. Available: https://doi.org/10.54254/2755-2721/47/20241291

“Modified prototypical networks for few-shot text classification based on class-covariance metric and attention,” in 2021 IEEE International Conference on Artificial Intelligence and Robotics (ICAIR), 2021, p. 9567906. [Online]. Available: https://doi.org/10.1109/icarm52023.2021.9567906

“Don’t take things out of context: Attention intervention for enhancing chain-of-thought reasoning in large language models,” arXiv preprint arXiv:2503.11154, 2025. [Online]. Available: https://doi.org/10.48550/arxiv.2503.11154