图像文本匹配相关工作

图像文本匹配相关工作

introduction

  1. 什么是图像文本匹配?
    计算机视觉任务逐渐不在满足于简单的图像分类、或者为图像分配一个或几个标签的任务,越来越多的研究者希望能够通过匹配图像和文本,为图像生成丰富的文本描述,从而更好地理解图像的语义。
  2. 已经有大量的研究工作,这些工作的方法怎么做的?
  3. 我的工作在别人的工作上有什么改进?有什么优点、贡献?
  • 通过生成网络生成更多的正例(positive fact), 从而扩充训练数据集;
  • 设计了一个高效的训练算法,交替优化生成网络和判别网络的参数,得到强判别器;
  • 在多个数据集上实现了与其他方法可比较甚至更优异的结果。

1. CCA

  • Canonical Correlation Analysis (CCA)
  • Kernel Canonical Correlation Analysis (KCCA)
  • deep CCA
  • Sparse Kernel CCA
  • Randomized CCA
  • Nonparametric CCA

    2. ranking based method

  • Y. Verma and C. Jawahar, “Im2text and text2im: Associating images and texts for cross-modal retrieval,” in British Machine Vision Conference (BMVC), vol. 1, 2014, p. 2.
  • R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, “Grounded compositional semantics for finding and describing images with sentences,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 207–218, 2014. OK
  • A. Karpathy, A. Joulin, and F. F. F. Li, “Deep fragment embeddings for bidirectional image sentence mapping,” in Neural Information Processing Systems (NIPS), 2014, pp. 1889–1897. OK
  • R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Unifying visual-semantic embeddings with multimodal neural language models,” arXiv preprint arXiv:1411.2539, 2014. OK
  • L. Wang, Y. Li, and S. Lazebnik, “Learning deep structure-preserving image-text embeddings,” in Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5005–5013.
  • “Learning two-branch neural networks for image-text matching tasks,” arXiv preprint arXiv:1704.03470, 2017.
  • Huang Y, Wang W, Wang L. Instance-aware image and sentence matching with selective multimodal lstm[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 2(6): 7.
  • Huang Y, Wu Q, Wang L. Learning semantic concepts and order for image and sentence matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6163-6171.

3.Generative Adversarial Networks(GANs)

  • @inproceedings{goodfellow2014generative,
    title={Generative adversarial nets},
    author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua},
    booktitle={Advances in neural information processing systems},
    pages={2672–2680},
    year={2014}
    }
  • @article{mirza2014conditional,
    title={Conditional generative adversarial nets},
    author={Mirza, Mehdi and Osindero, Simon},
    journal={arXiv preprint arXiv:1411.1784},
    year={2014}
    }
  • @article{radford2015unsupervised,
    title={Unsupervised representation learning with deep convolutional generative adversarial networks},
    author={Radford, Alec and Metz, Luke and Chintala, Soumith},
    journal={arXiv preprint arXiv:1511.06434},
    year={2015}
    }
  • @article{reed2016generative,
    title={Generative adversarial text to image synthesis},
    author={Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak},
    journal={arXiv preprint arXiv:1605.05396},
    year={2016}
    }
  • @inproceedings{wang2017irgan,
    title={Irgan: A minimax game for unifying generative and discriminative information retrieval models},
    author={Wang, Jun and Yu, Lantao and Zhang, Weinan and Gong, Yu and Xu, Yinghui and Wang, Benyou and Zhang, Peng and Zhang, Dell},
    booktitle={Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval},
    pages={515–524},
    year={2017},
    organization={ACM}
    }
  • @article{cai2017kbgan,
    title={Kbgan: Adversarial learning for knowledge graph embeddings},
    author={Cai, Liwei and Wang, William Yang},
    journal={arXiv preprint arXiv:1711.04071},
    year={2017}
    }

experiment

  1. 数据集的扩充:
    对每一张图片进行裁剪,4个角以及中间,并将这5个裁剪的图翻转,一张图片扩充得到10张尺寸为$128 \times 128$的图片。
-------------本文结束感谢您的阅读-------------