图像文本匹配相关工作
introduction
- 什么是图像文本匹配?
计算机视觉任务逐渐不在满足于简单的图像分类、或者为图像分配一个或几个标签的任务,越来越多的研究者希望能够通过匹配图像和文本,为图像生成丰富的文本描述,从而更好地理解图像的语义。 - 已经有大量的研究工作,这些工作的方法怎么做的?
- 我的工作在别人的工作上有什么改进?有什么优点、贡献?
- 通过生成网络生成更多的正例(positive fact), 从而扩充训练数据集;
- 设计了一个高效的训练算法,交替优化生成网络和判别网络的参数,得到强判别器;
- 在多个数据集上实现了与其他方法可比较甚至更优异的结果。
related work
1. CCA
- Canonical Correlation Analysis (CCA)
- Kernel Canonical Correlation Analysis (KCCA)
- deep CCA
- Sparse Kernel CCA
- Randomized CCA
- Nonparametric CCA
2. ranking based method
- Y. Verma and C. Jawahar, “Im2text and text2im: Associating images and texts for cross-modal retrieval,” in British Machine Vision Conference (BMVC), vol. 1, 2014, p. 2.
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, “Grounded compositional semantics for finding and describing images with sentences,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 207–218, 2014. OK
- A. Karpathy, A. Joulin, and F. F. F. Li, “Deep fragment embeddings for bidirectional image sentence mapping,” in Neural Information Processing Systems (NIPS), 2014, pp. 1889–1897. OK
- R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Unifying visual-semantic embeddings with multimodal neural language models,” arXiv preprint arXiv:1411.2539, 2014. OK
- L. Wang, Y. Li, and S. Lazebnik, “Learning deep structure-preserving image-text embeddings,” in Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5005–5013.
- “Learning two-branch neural networks for image-text matching tasks,” arXiv preprint arXiv:1704.03470, 2017.
- Huang Y, Wang W, Wang L. Instance-aware image and sentence matching with selective multimodal lstm[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 2(6): 7.
- Huang Y, Wu Q, Wang L. Learning semantic concepts and order for image and sentence matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6163-6171.
3.Generative Adversarial Networks(GANs)
- @inproceedings{goodfellow2014generative,
title={Generative adversarial nets},
author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua},
booktitle={Advances in neural information processing systems},
pages={2672–2680},
year={2014}
} - @article{mirza2014conditional,
title={Conditional generative adversarial nets},
author={Mirza, Mehdi and Osindero, Simon},
journal={arXiv preprint arXiv:1411.1784},
year={2014}
} - @article{radford2015unsupervised,
title={Unsupervised representation learning with deep convolutional generative adversarial networks},
author={Radford, Alec and Metz, Luke and Chintala, Soumith},
journal={arXiv preprint arXiv:1511.06434},
year={2015}
} - @article{reed2016generative,
title={Generative adversarial text to image synthesis},
author={Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak},
journal={arXiv preprint arXiv:1605.05396},
year={2016}
} - @inproceedings{wang2017irgan,
title={Irgan: A minimax game for unifying generative and discriminative information retrieval models},
author={Wang, Jun and Yu, Lantao and Zhang, Weinan and Gong, Yu and Xu, Yinghui and Wang, Benyou and Zhang, Peng and Zhang, Dell},
booktitle={Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval},
pages={515–524},
year={2017},
organization={ACM}
} - @article{cai2017kbgan,
title={Kbgan: Adversarial learning for knowledge graph embeddings},
author={Cai, Liwei and Wang, William Yang},
journal={arXiv preprint arXiv:1711.04071},
year={2017}
}
experiment
- 数据集的扩充:
对每一张图片进行裁剪,4个角以及中间,并将这5个裁剪的图翻转,一张图片扩充得到10张尺寸为$128 \times 128$的图片。