句子蕴含丰富的语义信息,为商品图像标注句子能准确刻画商品特性,并改善信息检索准确率。现有商品图像句子标注方法存在特征学习不充分、特征表现单一等问题,针对这些问题,提出了基于高效匹配核(efficient match kernels,EMK)进行特征学习,抽取判别性能更优的形状核特征来刻画商品图像,并综合图像的形状、纹理、梯度等特征,在多核学习模型内融合出多核特征(multiple kernel feature,MKF),丰富特征表现形式,更好地解释图像中的形状和纹理视觉特性。基于MKF完成图像分类,检索关键文本标注商品图像。实验表明,MKF获取了最优的图像分类准确率,并且具有鲜明纹理或形状特性的商品图像,其MAP(mean average precision)指标更优。另据BLEU(bilingual evaluation understudy)评分显示,所标句子包含的语义信息贴近商品图像内容,且它的连贯性、可读性更好,具有很高的实用价值。
Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words' weights. A new word integration algorithm named word sequence blocks building (WSBB) is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.