How science is deciphering ancient languages with AI.
Advertisements
Science is deciphering ancient languages with AI. — and what seemed like a science fiction dream has already begun to happen silently, inside laboratories and digital archives.
Pieces of burnt clay, corroded bronze plates, fragments of papyrus barely able to hold together: for centuries these objects have preserved voices that no one could properly hear.
Suddenly, algorithms are able to suggest what's missing, connect what's scattered, and sometimes reveal meanings that have gone unnoticed by generations of experts.
It's not magic.
It is the result of a brutal convergence: tons of digitized data, neural models that have learned to "read" historical context, and, most importantly, researchers willing to share the stage with the machine.
Continue reading the text!
Summary
- Why exactly now? Science is deciphering ancient languages with AI.?
- How can these machines "understand" something that even humans didn't fully understand?
- Which writings are under scrutiny — and which ones still resist?
- Two cases that show what has truly changed.
- What we gained, what we lost, and how does this affect our understanding of the past?
- Questions people ask most often (in table)
Why exactly now?

It wasn't a big, isolated technological "eureka" moment. It was a perfect storm of conditions.
First, the collections. Museums and universities have spent the last two decades scanning, photographing in high resolution, and indexing everything they could.
There are tens of thousands of Greek, Latin, cuneiform, and hieroglyphic inscriptions—material that previously remained locked away in drawers or in publications with negligible print runs.
Second, the architecture of the models. Tools like Aeneas (DeepMind, 2025) no longer treat text as a sequence of characters.
They construct vector representations that blend handwriting, dating, location of discovery, form style, and even stylistic traits of the scribe.
It's as if the machine has developed a kind of "historical intuition".
Third — and perhaps most decisive — is the change in attitude among researchers.
Many have stopped seeing AI as a threat and have begun to treat it as a tireless (and somewhat obsessive) collaborator.
The cycle is self-reinforcing: more restored texts → better training → more restored texts.
And in the midst of all this, the Science is deciphering ancient languages with AI. It stops being a futuristic headline and becomes routine news reporting.
Read also: The Science Behind Déjà Vu: Brain Malfunction or Protective Mechanism?
How do these machines manage to "understand"?
They don't understand. They recognize patterns on a superhuman scale.
The process begins with extensive training on already deciphered corpora.
The network learns not only words, but administrative rituals, legal formulas, regional dialectal variations, even systematic errors made by hasty scribes.
When a new fragment with gaps arrives, the model generates dozens of hypotheses ordered by probability — and visually highlights which parts of the context most influenced each suggestion.
Aeneas, for example, can cross-reference a Roman military diploma found in Sardinia with 1,800 other similar diplomas in seconds, pointing out parallels that a researcher would take months to find in their memory or in filing cabinets.
But the key to success lies in the interface: the expert remains in control.
He sees the network's attention map, disagrees, corrects it, and explains why. The machine learns from the disagreement.
Think of it like an archaeologist who, after forty years of digging, finally gained an assistant who has read every excavation report on the planet and never sleeps.
The assistant makes serious mistakes regarding cultural nuances, but gets patterns right that escape the human eye.
THE science is deciphering languages old ones with AI Because we discovered that the combination of the two perspectives — the slow and profound human gaze + the fast and exhaustive machine gaze — is more powerful than either one in isolation.
Which writings are under scrutiny — and which ones still resist?
Those with more "fuel" for training advance faster.
Classical Greek and Latin dominate because we have both volume and variety: thousands of public inscriptions, contracts, epitaphs, coins, military diplomas.
++ Why time flies by so fast: interesting facts that really explain it.
The Ithaca project (later evolved into Aeneas) is already achieving impressive accuracy rates in short gaps.
Egyptian—hieroglyphs, hieratic, demotic—also took off.
Recent systems combine computer vision with linguistic modeling and are able to handle ostraca (pottery shards) that change style depending on the region and the scribe.
Result: administrative differences between Alexandria and Thebes that previously went unnoticed now become apparent within minutes.
Hittite, Sumerian, and Akkadian cuneiform advanced more slowly, but accelerated with databases such as the Thesaurus Linguarum Hethaeorum Digitalis.
Linear A, Rongorongo, the script of the Indus Valley Civilization... those are still largely unknown. Very few texts exist, no bilingual keys, and the cultural context is highly fragmented.
AI can detect statistical patterns, but without a semantic anchor, it falters.
And perhaps it's better this way: some mysteries still need to wait for people, not GPUs.
Two cases that show how much has already changed.
1. Bronze military diploma, Sardinia, 113–114 AD.
Fragmented, with partial text.
Traditionally, someone would spend weeks cross-referencing other known diplomas to try to understand who had been granted citizenship.
In this regard, Aeneas found parallels with Trajan's documents in Germania and Pannonia, suggested coherent restorations, and revealed that the beneficiary belonged to an auxiliary unit that had fought in Dacia.
In short, what was just a list of names became a window into Roman integration policy in the provinces. A detail that changes the narrative.
2. Ptolemaic Ostraca (3rd–1st century BC)
A collection of shards containing hieroglyphs and demotic script, many of them damaged.
The system presented at SIGGRAPH 2025 processed everything together: it transliterated, translated, and mapped regional variations in signals.
Thus, it was discovered that certain administrative formulas used in Thebes did not appear in Alexandria—an indication of greater decentralization than previously imagined in the Ptolemaic bureaucracy.
Today, students use the same tool to practice reading, something that previously required years of practice with printed facsimiles.
What we gained, what we lost, and how does this affect our understanding of the past?
We gained speed and scale. Texts that would have required an entire career can now be explored by small teams in just a few months.
We gained access: open tools allow researchers from Brazil, Turkey, and Egypt to work with the same material that previously required traveling to London or Berlin.
In this sense, we lose some of the romanticism of solitary deciphering—that moment when the philologist, after ten years, understands the meaning of a sign.
And there is a real risk of bias: languages with more preserved texts dominate training, while marginalized languages become even more invisible.
But perhaps the greatest effect is epistemological. When the machine suggests an interpretation and the human validates (or refutes) it, we are redefining what counts as "evidence" in history.
The past ceases to be a museum of certainties and becomes a continuous dialogue between fragments, probabilities, and interpretation.
And that, deep down, is unsettling: it means that some of our most established narratives about Greece, Rome, and Egypt may be about to gain large asterisks.
In short, are we really prepared for a past that responds faster than we imagined?
Science is deciphering ancient languages with AI: Direct comparison
| Aspect | Before (humans only) | Now (humans + AI) | Real difference felt |
|---|---|---|---|
| Average time per fragment | Weeks to months | Minutes to a few hours | Brutal economies of scale |
| Accuracy in short gaps | ~20–30% | 62–78% (depending on the corpus) | More viable hypotheses |
| Ability to cross-reference texts | Limited to memory and bibliography | Thousands of documents in seconds | Context explodes |
| Access for researchers | Concentrated in large centers | Digital and free in most cases. | Concrete democratization |
| Main focus of the specialist | Transcription and paleography | Historical and cultural interpretation | More space for thought. |
Questions people ask most often
| Question | Direct answer |
|---|---|
| Has AI ever deciphered a completely unknown script? | No. You need a reasonable amount of text plus context or a known parallel. |
| Will she replace Egyptologists and Assyriologists? | It doesn't replace it. It does the manual labor and suggests; the profound judgment remains human. |
| Which writing system is expected to see the greatest advancement in 2026–2027? | Greek and Latin still lead; Egyptian and Hittite come next. |
| Are the results reliable enough for publication? | Only after rigorous human review. Fees vary widely depending on the case. |
| Where can I try out these tools? | DeepMind platforms (Aeneas/Ithaca), TLH dig, and SIGGRAPH 2025 repositories. |
THE Science is deciphering ancient languages with AI. It's not just about recovering lost words.
In this way, it is restoring agency to voices that time has tried to erase — and, incidentally, forcing us to rethink how much of the past we truly control.
To go further:
