New regulations on transparency and the recent policy for privacy force the public administration (PA) to make their documents available, but also to limit the diffusion of personal data. The present work displays a first approach to the extraction of sensitive data from PA documents in terms of named entities and semantic relations among them, speeding up the process of extraction of these personal data in order to easily select those which need to be hidden. We also present the process of collection and annotation of the dataset.
REDIT: A Tool and Dataset for Extraction of Personal Data in Documents of the Public Administration Domain
Teresa Paccosi;Alessio Palmero
2022-01-01
Abstract
New regulations on transparency and the recent policy for privacy force the public administration (PA) to make their documents available, but also to limit the diffusion of personal data. The present work displays a first approach to the extraction of sensitive data from PA documents in terms of named entities and semantic relations among them, speeding up the process of extraction of these personal data in order to easily select those which need to be hidden. We also present the process of collection and annotation of the dataset.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
paper58.pdf
accesso aperto
Licenza:
Copyright dell'editore
Dimensione
426.58 kB
Formato
Adobe PDF
|
426.58 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.