图像文本匹配相关工作

introduction

什么是图像文本匹配？
计算机视觉任务逐渐不在满足于简单的图像分类、或者为图像分配一个或几个标签的任务，越来越多的研究者希望能够通过匹配图像和文本，为图像生成丰富的文本描述，从而更好地理解图像的语义。
已经有大量的研究工作，这些工作的方法怎么做的？
我的工作在别人的工作上有什么改进？有什么优点、贡献？

通过生成网络生成更多的正例(positive fact), 从而扩充训练数据集;
设计了一个高效的训练算法，交替优化生成网络和判别网络的参数，得到强判别器；
在多个数据集上实现了与其他方法可比较甚至更优异的结果。

1. CCA

Canonical Correlation Analysis (CCA)
Kernel Canonical Correlation Analysis (KCCA)
deep CCA
Sparse Kernel CCA
Randomized CCA
Nonparametric CCA
2. ranking based method
Y. Verma and C. Jawahar, “Im2text and text2im: Associating images and texts for cross-modal retrieval,” in British Machine Vision Conference (BMVC), vol. 1, 2014, p. 2.
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, “Grounded compositional semantics for finding and describing images with sentences,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 207–218, 2014. OK
A. Karpathy, A. Joulin, and F. F. F. Li, “Deep fragment embeddings for bidirectional image sentence mapping,” in Neural Information Processing Systems (NIPS), 2014, pp. 1889–1897. OK
R. Kiros, R. Salakhutdinov, and R. S. Zemel, “Unifying visual-semantic embeddings with multimodal neural language models,” arXiv preprint arXiv:1411.2539, 2014. OK
L. Wang, Y. Li, and S. Lazebnik, “Learning deep structure-preserving image-text embeddings,” in Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5005–5013.
“Learning two-branch neural networks for image-text matching tasks,” arXiv preprint arXiv:1704.03470, 2017.
Huang Y, Wang W, Wang L. Instance-aware image and sentence matching with selective multimodal lstm[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 2(6): 7.
Huang Y, Wu Q, Wang L. Learning semantic concepts and order for image and sentence matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6163-6171.

3.Generative Adversarial Networks(GANs)

@inproceedings{goodfellow2014generative,
title={Generative adversarial nets},
author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua},
booktitle={Advances in neural information processing systems},
pages={2672–2680},
year={2014}
}
@article{mirza2014conditional,
title={Conditional generative adversarial nets},
author={Mirza, Mehdi and Osindero, Simon},
journal={arXiv preprint arXiv:1411.1784},
year={2014}
}
@article{radford2015unsupervised,
title={Unsupervised representation learning with deep convolutional generative adversarial networks},
author={Radford, Alec and Metz, Luke and Chintala, Soumith},
journal={arXiv preprint arXiv:1511.06434},
year={2015}
}
@article{reed2016generative,
title={Generative adversarial text to image synthesis},
author={Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak},
journal={arXiv preprint arXiv:1605.05396},
year={2016}
}
@inproceedings{wang2017irgan,
title={Irgan: A minimax game for unifying generative and discriminative information retrieval models},
author={Wang, Jun and Yu, Lantao and Zhang, Weinan and Gong, Yu and Xu, Yinghui and Wang, Benyou and Zhang, Peng and Zhang, Dell},
booktitle={Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval},
pages={515–524},
year={2017},
organization={ACM}
}
@article{cai2017kbgan,
title={Kbgan: Adversarial learning for knowledge graph embeddings},
author={Cai, Liwei and Wang, William Yang},
journal={arXiv preprint arXiv:1711.04071},
year={2017}
}

experiment

数据集的扩充：
对每一张图片进行裁剪，4个角以及中间，并将这5个裁剪的图翻转，一张图片扩充得到10张尺寸为 $128 \times 128$ 的图片。

图像文本匹配相关工作

introduction

related work

1. CCA

2. ranking based method

3.Generative Adversarial Networks(GANs)

experiment