Evaluation of feature-embedding methods for word spotting in historical arabic documents

Abir Fathallah; Mohamed Ibn Khedher; Mounim El Yacoubi; Najoua Essoukri Ben Amara

doi:10.1109/SSD49366.2020.9364134

Communication Dans Un Congrès Année : 2020

Evaluation of feature-embedding methods for word spotting in historical arabic documents

(1, 2, 3, 4) , (5) , (4) , (6)

1
2
3
4
5
6

Abir Fathallah

Fonction : Auteur
PersonId : 1276322
IdHAL : abir-fathallah
ORCID : 0000-0003-0433-1029

Institut Supérieur d'informatique et des Techniques de Communication [Hammam Sousse]

Département Electronique et Physique

Institut Polytechnique de Paris

ARMEDIA

Mohamed Ibn Khedher

Fonction : Auteur
PersonId : 177487
IdHAL : mohamed-ibn-khedher

IRT SystemX

Mounim El Yacoubi

Fonction : Auteur
PersonId : 12279
IdHAL : ma-el-yacoubi
ORCID : 0000-0002-7383-0588
IdRef : 193493217

ARMEDIA

Najoua Essoukri Ben Amara

Fonction : Auteur
PersonId : 966987

Laboratory of Advanced Technology and Intelligent Systems

Résumé

Retrieving and indexing historical Arabic documents remain a very significant challenge. The purpose of this paper is to compare the feature representation spaces for word spotting in historical Arabic documents. Our goal is to create embedding spaces using the characteristics of different machine learning methods: i) linear such as principal component analysis and linear discriminant analysis, and ii) non-linear including convolutional neural networks for triplets and Siamese. Subsequently, each word image is represented by a dense vector. Thus, to match feature representations, a Euclidean distance is used. An evaluation of various representation space models is presented. The embedding word models are evaluated on the VML-HD dataset, and the experiments show the effectiveness of non-linear methods compared to linear ones.

Mots clés

Historical Arabic documents Word spotting Feature embedding

Domaines

Intelligence artificielle [cs.AI] Informatique [cs]

Mohamed IBN KHEDHER : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03094910

Soumis le : lundi 4 janvier 2021-14:40:54

Dernière modification le : mercredi 6 décembre 2023-13:23:26

Dates et versions

hal-03094910 , version 1 (04-01-2021)

Identifiants

HAL Id : hal-03094910 , version 1
DOI : 10.1109/SSD49366.2020.9364134

Citer

Abir Fathallah, Mohamed Ibn Khedher, Mounim El Yacoubi, Najoua Essoukri Ben Amara. Evaluation of feature-embedding methods for word spotting in historical arabic documents. SSD 2020: 17th international multi-conference on Systems, Signals and Devices, Jul 2020, Monastir (online), Tunisia. pp.34-39, ⟨10.1109/SSD49366.2020.9364134⟩. ⟨hal-03094910⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM TELECOM-SUDPARIS IRT-SYSTEMX IP_PARIS

50 Consultations

0 Téléchargements

Evaluation of feature-embedding methods for word spotting in historical arabic documents

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager