ISSN: XXXX-XXXX

Enhancing Transformer-based Object Detection with Novel Encoders and Matching Strategies

Abstract

This paper seeks to improve transformer-based object detectors for dealing with several issues arising in terms of the large scale features with fusion, redundant tokens, and biased scales with respect to big objects. Here, innovative proposals include similarity-based deduplication encoding for removal of redundancy, Hybrid Multi-object encoding for robust cross-size attentions, and an One-to-many Positive matching for stable generation. The study used quantitative methods in evaluating detection accuracy, training convergence speed, and performance metrics using benchmark datasets such as COCO and VOC2007. Results are shown to exhibit significant improvements in accuracy, efficiency in training, and overall performance while reducing training time by 66% without decreasing or even raising the detection accuracy. These innovations provide a balance in optimizing object detection based on the Transformer, forming a basis for further advancements in object detection technologies.

References

  1. Carion, N., et al. (2020). "End-to-End Object Detection with Transformers." European Conference on Computer Vision (ECCV).
  2. Dosovitskiy, A., et al. (2020). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations (ICLR).
  3. Lin, T.-Y., et al. (2014). "Microsoft COCO: Common Objects in Context." European Conference on Computer Vision (ECCV).
  4. Zhu, X., et al. (2021). "Deformable DETR: Deformable Transformers for End-to-End Object Detection." International Conference on Learning Representations (ICLR).
  5. Vaswani, A., et al. (2017). "Attention is All You Need." Advances in Neural Information Processing Systems (NeurIPS).
  6. Sun, C., et al. (2021). "Sparse Feature Sampling for Efficient Object Detection." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  7. Ren, S., et al. (2015). "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." Advances in Neural Information Processing Systems (NeurIPS).
  8. Dai, Z., et al. (2021). "Dynamic Head: Unifying Object Detection Heads with Attentions." CVPR. He, K., et al. (2016). "Deep Residual Learning for Image Recognition." CVPR.
  9. Tan, M., et al. (2020). "EfficientDet: Scalable and Efficient Object Detection." CVPR.
Download PDF

How to Cite

Leszek Ziora, (2025-02-21 19:28:05.318). Enhancing Transformer-based Object Detection with Novel Encoders and Matching Strategies. Abhi International Journal of Artificial Intelligence Applications in Engineering, Volume zZUTWSDuBR78pc6zbKip, Issue 1.