Dr Atul Kumar Ojha

PhD, M.Phil, M.A.. B.A.

 
researcher
 

Biography

I am currently a Postdoctoral Researcher on the European Union's Horizon 2020 programme under grant agreements 731015 EuropeanLexical Infrastructure (ELEXIS) project, led by Dr. John P. McCrae, in the Unit for Linguistic Data, Data Science Institute, National University of Ireland, Galway. Since October 05th 2020, I have been also working as an adjunct lecturer at the College of Science and Engineering, NUI Galway, Ireland.
Before joining NUIG, I have worked as a Vědecký Pracovník (Postdoc Researcher ) at UFAL, Charles University, Prague from July 2019 to June 2020. 
In 2017, I co-founded a startup named Panlingua Language Services LLP situated in New Delhi, India.
I received my PhD in March 2019 from Jawaharlal Nehru University, India under the supervision of Prof. Girish Nath Jha. My PhD title was English-Bhojpuri SMT System: Insights from the Kāraka Model. I am actively working on the creation of linguistic resources for low-resource languages. During the PhD, I worked on various projects as Sr. NLP Research Engineer, Sr. Linguist cum Project Manager, and Linguist in Machine Translation Evaluation Platform and Indian Languages Corpora Initiative (ILCI) at Jawaharlal Nehru University, New Delhi. I organised several workshops/conferences/shared tasks as co-chair/organiser or manager including:
Github: shashwatup9k
Google Scholar: Atul Kr. Ojha
ORCID: 0000-0002-9800-9833
Linkedin: Atul Kr. Ojha
Email: atulkumar.ojha@nuigalway.ie
            atulkumar.ojha@insight-centre.org

Research Interests

  • Statistical and neural machine translation (especially for low-resource languages)
  • Corpus Mining
  • Lexical Induction
  • Dependency Parsing
  • Hate and Aggressive Speech
  • NLP and Machine Learning

Peer Reviewed Journals

  Year Publication
(2021) 'Aggressive and Offensive Language Identification in Hindi, Bangla, and English: A Comparative Study'
Kumar, Ritesh; Lahiri, Bornini; and Ojha, Atul Kr. (2021) 'Aggressive and Offensive Language Identification in Hindi, Bangla, and English: A Comparative Study'. SN Computer Science, 2 (1):1-20 [DOI] [Details]
(2012) 'NV, AV Complex Predicate Constructions in Hindi'
Pathak, Sanket and Ojha, Atul Kumar (2012) 'NV, AV Complex Predicate Constructions in Hindi'. Shodh Prerak, A Multidisciplinary Quarterly International Refereed Research Journal, 2 (III):93-100 [Details]
(2012) 'A Language Engineering Approach to Ameliorate Hindi Morph Analyzer'
Pathak, Sanket and Ahmad, Rashid and Ojha, Atul Kr. (2012) 'A Language Engineering Approach to Ameliorate Hindi Morph Analyzer'. Shodh Prerak, A Multidisciplinary Quarterly International Refereed Research Journal, ISSN2231-413X, 4 [Details]
(2012) 'A Language Engineering Approach to Enhance the Accuracy of Machine Translation Systems'
Pathak, Sanket and Ahmad, Rashid and Ojha, Atul Kr. (2012) 'A Language Engineering Approach to Enhance the Accuracy of Machine Translation Systems'. Shodh Prerak, A Multidisciplinary Quarterly International Refereed Research Journal, ISSN2231-413X, 1 [Details]
(2012) 'Syntactic Evidence of Mixed Transitivity in Hindi Complex Predicates'
Pathak, Sanket and Ojha, Atul Kr. (2012) 'Syntactic Evidence of Mixed Transitivity in Hindi Complex Predicates'. Shodh Prerak, A Multidisciplinary Quarterly International Refereed Research Journal, ISSN2231-413X, 2 (II):217-223 [Details]

Book Chapters

  Year Publication
(2019) 'A corpus-based study of semantics of bare nominals in Magahi and Bhojpuri: The case of article-less languages'
Alok, Deepak; Ojha, Atul Kr. and Mishra, Sriniket (2019) 'A corpus-based study of semantics of bare nominals in Magahi and Bhojpuri: The case of article-less languages' In: Linguistic Ecology of Bihar. Germany: LINCOM Language Research. [Details]

Conference Publications

  Year Publication
(2021) Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Ojha, Atul Kr.; Rani, Priya; Goswami, Koustava; Chakravarthi, Bharathi Raja; and McCrae, John P. (2021) ULD-NUIG at Social Media Mining for Health Applications (#SMM4H) Shared Task 2021 Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task Online, , pp.149-152 [Details]
(2021) The Seventh Biennial Conference on Electronic Lexicography, eLex 2021
McCrae, John P.; Ojha, Atul Kr.; Chakravarthi, Bharathi Raja; Kelly, Ian; Buffini, Patricia; Tang, Grace; Paquin, Eric and Locria, Manuel (2021) Enriching a terminology for under-resourced languages using knowledge graphs The Seventh Biennial Conference on Electronic Lexicography, eLex 2021 [Details]
(2020) Proceedings of the WILDRE5--5th Workshop on Indian Language Data: Resources and Evaluation
Ojha, Atul Kr. and Zeman, Daniel (2020) Universal Dependency Treebanks for Low-Resource Indian Languages: The Case of Bhojpuri Proceedings of the WILDRE5--5th Workshop on Indian Language Data: Resources and Evaluation Online, , pp.33-38 [Details]
(2020) Working Notes of FIRE 2020-Forum for Information Retrieval Evaluation, Hyderabad, India
Kumar, Ritesh; Lahiri, Bornini; Ojha, Atul Kr. and Bansal, Akanksha (2020) ComMA@FIRE 2020: Exploring Multilingual JointTraining across different Classification Tasks Working Notes of FIRE 2020-Forum for Information Retrieval Evaluation, Hyderabad, India Hyderabad, India, , 16-DEC-20 - 20-DEC-20 [Details]
(2020) Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
Bhattacharya, Shiladitya; Singh, Siddharth; Kumar, Ritesh; Bansal, Akanksha; Bhagat, Akash; Dawer, Yogesh; Lahiri, Bornini and Ojha, Atul Kr. (2020) Developing a Multilingual Annotated Corpus of Misogyny and Aggression Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying Marseille, France, , pp.158-168 [Details]
(2020) Proceedings of the Fifth Conference on Machine Translation
Ojha, Atul Kr.; Rani, Priya; Bansal, Akanksha; Chakravarthi, Bharathi Raja; Kumar, Ritesh and McCrae, John P. (2020) NUIG-Panlingua-KMI Hindi-Marathi MT Systems for Similar Language Translation Task @ WMT 2020 Proceedings of the Fifth Conference on Machine Translation Online, , pp.416-421 [ARAN Link] [Details]
(2020) Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Ojha, Atul Kr.; Malykh, Valentin; Karakanta, Alina and Liu, Chao-Hong (2020) Findings of the LoResMT 2020 Shared Task on Zero-Shot for Low-Resource languages Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages Suzhou, China, , 04-DEC-20 - 04-DEC-20 , pp.33-37 [ARAN Link] [Details]
(2020) Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
Kumar, Ritesh; Ojha, Atul Kr.; Malmasi, Shervin and Zampieri, Marcos (2020) Evaluating aggression identification in social media Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying Online, , pp.1-5 [Details]
(2019) First International Sanskrit and Other Indian Languages Technology (SOIL-Tech) 2019
Ojha, Atul Kr.; Uniyal, Arushi and Jha, Girish Nath (2019) Issues & Challenges in Building SMT Systems for Lesser-known Languages (The Case of English-Bhojpuri & English-Garhwali) First International Sanskrit and Other Indian Languages Technology (SOIL-Tech) 2019 JNU, New Delhi, India, , 15-FEB-19 - 17-FEB-19 [Details]
(2019) Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ojha, Atul Kr.; Kumar, Ritesh; Bansal, Akanksha and Rani, Priya (2019) Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019 Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2) Florence, Italy, , pp.213-218 [Details]
(2019) First International Sanskrit and Other Indian Languages Technology (SOIL-Tech) 2019
Kumar, Ritesh; Ojha, Atul Kr. and Lahiri, Bornini; Lungleng, Chingrimung (2019) Aggression in Hindi & English Speech: Acoustic Correlates & Automatic Identification First International Sanskrit and Other Indian Languages Technology (SOIL-Tech) 2019 JNU, New Delhi, India, , 15-FEB-19 - 17-FEB-19 [Details]
(2019) Proceedings of the 13th International Workshop on Semantic Evaluation
Rani, Priya and Ojha, Atul Kr. (2019) KMI-coling at SemEval-2019 task 6: Exploring N-grams for offensive language detection Proceedings of the 13th International Workshop on Semantic Evaluation Minneapolis, Minnesota, USA, , 06-JUN-19 - 07-JUN-19 , pp.668-671 [Details]
(2018) Proceedings of the 4th Workshop on Indian Language Data: Resources and Evaluation (under the 11th LREC2018, May 07-12, 2018) by European Language Resources Association (ISBN: 979-10-95546-09-2 EAN: 9791095546092)
Pandey, Rajneesh and Ojha, Atul Kr. and Jha, Girish Nath (2018) Demo of Sanskrit-Hindi SMT System Proceedings of the 4th Workshop on Indian Language Data: Resources and Evaluation (under the 11th LREC2018, May 07-12, 2018) by European Language Resources Association (ISBN: 979-10-95546-09-2 EAN: 9791095546092) , pp.34-35 [Details]
(2018) Proceedings of the 4th Workshop on Indian Language Data: Resources and Evaluation (under the 11th LREC2018, May 07-12, 2018) by European Language Resources Association (ISBN: 979-10-95546-09-2 EAN: 9791095546092)
Rani, Priya; Ojha, Atul Kr. and Jha, Girish Nath (2018) Automatic Language Identification System for Hindi and Magahi . In: Jha, Girish Nath; Bali, Kalika; L, Sobha and Ojha, Atul Kr eds. Proceedings of the 4th Workshop on Indian Language Data: Resources and Evaluation (under the 11th LREC2018, May 07-12, 2018) by European Language Resources Association (ISBN: 979-10-95546-09-2 EAN: 9791095546092) Miyazaki, Japan, , 07-MAY-18 - 12-MAY-18 , pp.23-28 [Details]
(2018) 3-Day International Conference on India & Southeast Asia: One Indic Belt, Shared Culture & Common Destiny, 26-28 April 2018
Ojha, Atul Kr. and Jha, Girish Nath (2018) Developing English-Javanese Statistical Machine Translation System 3-Day International Conference on India & Southeast Asia: One Indic Belt, Shared Culture & Common Destiny, 26-28 April 2018 [Details]
(2018) Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)
Kumar, Ritesh; Ojha, Atul Kr.; Malmasi, Shervin and Zampieri, Marcos (2018) Benchmarking aggression identification in social media Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) , pp.1-11 [Details]
(2018) Proceedings of the 4th Workshop on Indian Language Data: Resources and Evaluation (under the 11th LREC2018, May 07-12, 2018)
Kumar, Ritesh and Lahiri, Bornini and Alok, Deepak and Ojha, Atul Kr. and Jain, Mayank and Basit, Abdul and Dawar, Yogesh (2018) Automatic Identification of Closely-related Indian Languages: Resources and Experiments Proceedings of the 4th Workshop on Indian Language Data: Resources and Evaluation (under the 11th LREC2018, May 07-12, 2018) , pp.68-74 [Details]
(2018) WILDRE4--4th Workshop on Indian Language Data: Resources and Evaluation
Ojha, Atul Kr and Jha, Girish Nath (2018) Graphic-based Statistical Machine Translator . In: Girish Nath Jha and Kalika Bali and Sobha L and Atul Kr. Ojha eds. WILDRE4--4th Workshop on Indian Language Data: Resources and Evaluation Miyazaki, Japan, , 07-MAY-18 - 12-MAY-18 , pp.44-45 [Details]
(2018) Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
Ojha, Atul Kr.; Chowdhury, Koel Dutta; Liu, Chao-Hong and Saxena, Karan (2018) The RGNLP Machine Translation Systems for WAT 2018 Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation Hong Kong, , 01-DEC-18 - 03-DEC-18 [Details]
(2017) International Conference on Applied and Theoretical Computing and Communication Technology
Pandey, Anupama and Singh, Sirshti and Ojha, Atul Kr. and Jha, Girish Nath (2017) Challenges in Annotation and Domain Adaptation in Hindi POS Tagger: with Reference to Cricket International Conference on Applied and Theoretical Computing and Communication Technology , pp.155-159 [Details]
(2016) Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation (under the 10th LREC2016, May 23-28, 2016)
Kumar, Ritesh and Ojha, Atul Kr. and Lahiri, Bornini (2016) Developing annotated multimodal corpus for automatic recognition of verbal aggression in Hindi Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation (under the 10th LREC2016, May 23-28, 2016) , pp.73-78 [Details]
(2016) Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation (under the 10th LREC2016, May 23-28, 2016)
Ojha, Atul Kr and Singh, Srishti and Behera, Pitambar and Jha, Girish Nath (2016) A Hybrid Chunker for Hindi and Indian English Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation (under the 10th LREC2016, May 23-28, 2016) , pp.93-99 [Details]
(2016) 4th International Endangered and Lesser-known Languages Conference(ELKL-4)
Ojha, Atul Kr. (2016) Developing a Machine Readable Multilingual Dictionary for Bhojpuri-Hindi-English 4th International Endangered and Lesser-known Languages Conference(ELKL-4) [Details]
(2016) 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP-2016) under the COLING2016
Behera, Pitambar and Muzaffar, Sharmin and Ojha, Atul kr. and Jha, Girish (2016) The IMAGACT4ALL Ontology of Animated Images: Implications for Theoretical and Machine Translation of Action Verbs from English-Indian Languages 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP-2016) under the COLING2016 , pp.64-73 [Details]
(2016) MODELACT Conference Action, Language and Cognition, CNR, Rome, 6-7 June 2016
Jha, Girish Nath and Ojha, Atul Kumar and Muzaffar, Sharmin and Behera, Pitambar (2016) Indo Aryan languages on IMAGACT MODELACT Conference Action, Language and Cognition, CNR, Rome, 6-7 June 2016 [Details]
(2016) Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation (under the 10th LREC2016, May 23-28, 2016)
Singh, Renu and Ojha, Atul Kr and Jha, Girish Nath (2016) Classification and Identification of Reduplicated Multi-Word Expressions in Hindi Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation (under the 10th LREC2016, May 23-28, 2016) , pp.18-22 [Details]
(2016) Regional ICON(regICON) 2016, IIT-BHU,Varanasi, December 16, 2016
Kumar, Ritesh and Ojha, Atul Kr. and Lahiri, Bornini and Alok, Deepak (2016) Developing Resources and Tools for some Lesser-known Languages of India Regional ICON(regICON) 2016, IIT-BHU,Varanasi, December 16, 2016 [Details]
(2015) Proceedings of 7th Language \& Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics
Ojha, Atul Kr. and Behera, Pitambar and Singh, Srishti and Jha, Girish Nath (2015) Training \& Evaluation of POS Taggers in Indo-Aryan Languages: A Case of Hindi, Odia and Bhojpuri Proceedings of 7th Language \& Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics , pp.524-529 [Details]
(2015) Proceedings of 7th Language \& Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics
Behera, Pitambar and Ojha, Atul Kr. and Jha, Girish Nath (2015) Issues and Challenges in Developing Statistical POS Taggers for Sambalpuri Proceedings of 7th Language \& Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics , pp.349-354 [Details]
(2014) Proceedings of the 2nd Workshop on Indian Language Data: Resources and Evaluation (under the 9th LREC2014, May26-31, 2014)
Massimo, Moneglia and Susan, Brown and Aniruddha, Kar and Anand, Kumar and Atul Kr., Ojha and Heliana, Mello and Niharika and Girish Nath, Jha and Bhaskar, Ray and and Annu, Sharma (2014) Mapping Indian Languages onto the IMAGACT Visual Ontology of Action Proceedings of the 2nd Workshop on Indian Language Data: Resources and Evaluation (under the 9th LREC2014, May26-31, 2014) , pp.51-55 [Details]
(2014) Proceedings of the 2nd Workshop on Indian Language Data: Resources and Evaluation (under the 9th LREC2014, May26-31, 2014)
Ojha, Atul Kr. and Bansal, Akanksha and Hadke, Sumedh and Jha, Girish Nath (2014) Evaluation of Hindi-English MT Systems Proceedings of the 2nd Workshop on Indian Language Data: Resources and Evaluation (under the 9th LREC2014, May26-31, 2014) , pp.94-101 [Details]

Edited Books

  Year Publication
(2020)
Jha, Girish Nath; Bali, Kalika; Sobha, L; Agrawal, SS and Ojha, Atul Kr (Ed.). (2020) Proceedings of the WILDRE5--5th Workshop on Indian Language Data: Resources and Evaluation Paris: European Language Resources Association (ELRA). [Details]
(2020)
Kumar, Ritesh; Ojha, Atul Kr.; Lahiri, Bornini; Zampieri, Marcos; Malmasi, Shervin; Murdock, Vanessa and Kadar, Daniel (Ed.). (2020) Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying TRAC-2, 2020 Paris: European Language Resources Association (ELRA). [Details]
(2020)
Karakanta, Alina; Ojha, Atul Kr. Ojha; Liu, Chao-Hong; Abbott, Jade; Ortega, John; Washington, Jonathan; Oco, Nathaniel; Lakew, Surafel Melaku; Pirinen, Tommi A; Malykh, Valentin; Logacheva, Varvara; and Zhao, Xiaobing (Ed.). (2020) Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages Association for Computational Linguistics: Association for Computational Linguistics. [Details]
(2019)
Karakanta, Alina; Ojha, Atul Kr.; Liu, Chao-Hong; Washington, Jonathan; Oco, Nathaniel; Lakew, Surafel Melaku; Malykh, Valentin and Zhao, Xiaobing (Ed.). (2019) Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages European Association for Machine Translation: European Association for Machine Translation. [Details]
(2019)
Jha, Girish Nath; Arya, Sudhir Kumar; Dixit, Abhijit and Ojha, Atul Kr (Ed.). (2019) Veda As Global Heritage Scientific Perspectives New Delhi: Vidyanidhi. [Details]
(2018)
Jha, Girish Nath and Bali, Kalika and L, Sobha and Ojha, Atul Kr (Ed.). (2018) Proceedings of the LREC 2018 Workshop“WILDRE4 – 4th Workshop on Indian Language Data: Resources and Evaluation” Paris: European Language Resources Association (ELRA). [Details]
(2018)
Kumar, Ritesh and Ojha, Atul Kr. and Zampieri, Marcos and Malmasi, Shervin (Ed.). (2018) Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) Association for Computational Linguistics: Association for Computational Linguistics. [Details]
(2016)
Jha, Girish Nath and Bali, Kalika and L, Sobha and Ojha, Atul Kr (Ed.). (2016) Proceedings of the LREC 2016 Workshop“WILDRE3 – 3rd Workshop on Indian Language Data: Resources and Evaluation” Paris: European Language Resources Association (ELRA). [Details]

Thesis

  Year Publication
(2019) English-Bhojpuri SMT System: Insights from the Kāraka Model.
Ojha, Atul Kr. (2019) English-Bhojpuri SMT System: Insights from the Kāraka Model. Jawaharlal Nehru University, New Delhi, India: Thesis [Details]

Honours and Awards

  Year Title Awarding Body
2021 FLORES 101 Compute Grant Facebook
2018 Google scholarship for attending the Lisbon Machine Learning School 2018 Lisbon Machine Learning School 2018 by Google
2017 Heyning-Roelli fellowship to participate as an exchange student from Feb-July 2017 at University of Zürich, Switzerland Heyning-Roelli Foundation fellowship
2014 Travel grant from Microsoft Research India Microsoft Research India

Professional Associations

  Association Function From / To
Association for Computational Linguistics (ACL) Member /
Association for Computational Linguistics (ACL) Special Interest Group on Linguistic Typology (SIGTYP) Member /

Employment

  Employer Position From / To
College of Science and Engineering, NUI Galway Adjunct Lecturer 05-OCT-20 / 04-APR-23
European Lexical Infrastructure (ELEXIS) Project atDSI, NUI Galway Postdoc Researcher 29-JUN-20 /
ADAPT Center Postdoc Researcher /
Fidelity Investments Postdoc Researcher 01-MAR-21 /

Education

  Year Institution Qualification Subject
2019 Jawaharlal Nehru University PHD NLP

Languages

  Language
English
Hindi

Reviews

  Journal Role
Artificial Intelligence Reviewer
Transactions on Asian and Low-Resource Language Information Processing Reviewer
Machine Translation Reviewer
Language Resources And Evaluation Reviewer

Other Activities

  Description

Program Committee/Reviewing- • Conferences: ACL Rolling Review, ACL 2021, MT Summit 2021, EMNLP 2021, EACL 2021, AACL-IJCNLP 2020, ACL 2020, SOIL-Tech 2019, ICON 2018 • Workshops: LoResMT2019-2021, WILDRE 2020, TRAC-2020, TRAC-2018, CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies