Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: Women in Machine Learning

Erased text retrieval from historical palimpsest manuscripts using deep autoregressive priors

Anna Starynska · David Messinger


Abstract:

Historical palimpsests are manuscripts that, at some point in time, were erased and overwritten with a newer text. Currently, there is significant interest in discovering and studying these erased texts since they can contain previously unknown written work. The value of discovering these previously unknown works in some of these documents can be compared to finding new archaeological sights, leading to new insights. In our work, we propose to use a Bayesian approach for reconstructing the erased text from the palimpsest manuscript by using multispectral imaging and an autoregressive generative network. We formulate a problem as a Blind Source Separation problem where the erased and foreground inks and parchment were mixed together by some unknown mixing process. An autoregressive network is used as a spatial prior for undertext script, and multispectral imaging allows the presentation of signals in different modalities to decrease ambiguity during the reconstruction. We assume that the erased text script can be identified from the unprocessed palimpsest, such that its counterpart can be found among the other old but “clean” manuscripts for the training of generative network that would serve as a prior. The choice of using the autoregressive network is motivated by the fact that it has a dynamic scope of view, which is more suitable for continuous signals, such as handwriting, compared to other generative models, such as GANs, diffusion models, or score networks. Since the optimization process would happen directly in the pixel space rather than to hidden parameters, the usual deep learning algorithms, such as stochastic gradient descent, would be stuck in local minima before arriving at a meaningful solution. Therefore, for our problem, we apply annealed Langevin dynamics sampling with better convergence properties for non-convex problems.

Chat is not available.