Back to blog

Blog

The First Odia NLP Datasets Hit GitHub

@openodia
datasetshistory

What happened

In 2019, the first open-source Odia language datasets and NLP utilities began appearing on GitHub. These early efforts included basic text corpora, word lists, and simple transliteration tools — the foundational building blocks for Odia in modern AI.

Why it matters

Before 2019, Odia was largely absent from the open-source NLP landscape. There were no publicly available datasets, no pre-trained models, and no tooling for developers. These first repositories changed that, proving that community-driven open-source could work for a language spoken by 50+ million people.

The beginning of a movement

This wasn't just about code — it was about representation. Every major language needs digital infrastructure to thrive in the age of AI. Odia's journey started here, with a few repositories and a belief that the community could build what big tech wouldn't.

Links