About me

I am a researcher at Inria, Paris in the ALMAnaCH team, working on Natural Language Processing (NLP) and more specifically Machine Translation (MT). I was previously a research associate at the University of Edinburgh, working on MT for low-resource languages, having completed my PhD on the contextual MT of dialogue at the LIMSI laboratory (now the LISN) in 2018 under the supervision of Sophie Rosset and Thomas Lavergne.

Research Experience

Please see my CV for a complete list of my research and teaching experience.

Researcher (Chargée de recherches)

2020 - present
Inria, Paris, within the ALMAnaCH team
Natural Language Processing and Machine Translation.
Holder of a "springboard" chair position in the PRAIRIE research institute (2021-present)

Research Associate

2018 - 2020
ILCC, University of Edinburgh
Supervised by Alexandra Birch. Working on the MTStretch fellowship (held by Alexandra Birch) and the GoURMET EU project (leader of the work package on morphological modelling)

PhD in Computer Science

2015 - 2018
LIMSI, CNRS, Univ. Paris-Sud, Université Paris-Saclay, France
Going beyond the sentence: Machine Translation of Dialogue in Context
Supervised by Sophie Rosset and Thomas Lavergne

Research visit

May - August 2017
ILCC, University of Edinburgh
Working with Alexandra Birch, Rico Sennrich and Barry Haddow

Master's research placement

February - August 2015
Alpage-Inria, Paris, France
Boosting for model selection in syntactic parsing
Supervised by Benoit Crabbé

Master's research placement

May 2014 - August 2014
LaTTiCe (CNRS/Paris III) and Télécom Paris-Tech, Paris, France
Modelling communicative acts for an embodied conversational agent
Supervised by Frédéric Landragin and Chloé Clavel

Proof-reading and revision

2013

Tesnière Lucien (2015), Elements of structural syntax, translated by Timothy Osborne and Sylvain Kahane, John Benjamins, Amsterdam, U. Paris X.

Master's research placement

March - August 2013
MoDyCo Laboratory (CNRS/Paris X), Paris, France
Annotation, development and semi-automatic correction of a treebank of spoken French for syntactic dependencies (Rhapsodie Treebank of Spoken French).
Supervised by Sylvain Kahane

Publications

Phd Thesis

Going beyond the sentence: Contextual Machine Translation of Dialogue
Rachel Bawden supervised by Sophie Rosset and Thomas Lavergne.
29th November 2018. LIMSI, CNRS, Université Paris-Sud, Université Paris-Saclay.
Committee members:

  • President: Nicolas Sabouret
  • Reviewers: Jörg Tiedemann and Loïc Barrault
  • Examiners: Lucia Specia and Andrei Popescu-Belis

Journal articles

Ancien ou moderne? Pistes computationnelles pour l’analyse graphématique des textes écrits au XVIIe siècle Simon Gabay, Philippe Gambette, Rachel Bawden, Benoît Sagot (2022). In Linx. Revue des linguistes de l’université Paris X Nanterre. Département de Sciences du langage 85, Université Paris Ouest.
Survey of low-resource machine translation Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, Alexandra Birch (2022) Computational Linguistics. 48(3):673-732..
Le changement linguistique au XVIIe s.: nouvelles approches scriptométriques. Simon Gabay, Rachel Bawden, Philippe Gambette, Jonathan Poinhos, Eleni Kogkitsidou, Benoît Sagot (2022). SHS Web of Conferences 138, 02006.
DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation. Rachel Bawden, Sophie Rosset, Thomas Lavergne, Eric Bilinski (2020). Language Resources and Evaluation. DOI: 10.1007/s10579-020-09514-4 Dataset, interface and website
Towards the generation of dialogue acts in socio-effective ECAs.
Rachel Bawden, Chloé Clavel and Frédéric Landragin (2016). Language Resources and Evaluation. 4(2):821-838. Springer Netherlands. First online 31 July 2015. doi:10.1007/s10579-015-9312-9

Conference papers

À propos des difficultés à traduire automatiquement de longs documents. Ziqian Peng, Rachel Bawden and François Yvon (2024). In Proceedings of the 31st Conférence sur le Traitement Automatique des Langues Naturelles, volume 1: articles longs et prises de position EAMT'24. Pages 2-21. Toulouse, France.
Évaluer BLOOM en français. Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, Marco Naguib, François Yvon (2024). In Proceedings of EvalLLM2024 : Atelier sur l'évaluation des modèles génératifs (LLM) et challenge d'extraction d'information few-shot. Toulouse, France.
An extended version of the paper (technical report) can be found here.
Translate your Own: a Post-Editing Experiment in the NLP domain. Rachel Bawden, Ziqian Peng, Maud Bénard, Eric Villemonte de La Clergerie, Raphaël Esamotunu, Mathilde Huguin, Natalie Kübler, Alexandra Mestivier, Mona Michelot, Laurent Romary, Lichao Zhu and François Yvon (2024). In Proceedings of The 25th Annual Conference of the European Association for Machine Translation EAMT'24. Pages 431–443. Sheffield, UK.
Exploring Inline Lexicon Injection for Cross-Domain Transfer in Neural Machine Translation. Jesujoba O. Alabi and Rachel Bawden (2024). In Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation Pages 7–20. Sheffield, UK.
When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages. Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot and Rachel Bawden (2024). In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Pages 17544–17556. Torino, Italia.
Making Sentence Embeddings Robust to User-Generated Content. Lydia Nishimwe, Benoît Sagot and Rachel Bawden (2024). In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Pages 10984–10998. Torino, Italia.
Topic-guided Example Selection for Domain Adaptation in LLM-based Machine Translation. Seth Aycock and Rachel Bawden (2024). In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop Pages 175-195. St Julian's, Malta.
Reconnaissance des écritures dans les imprimés. Simon Gabay, Thibault Clérice, Pauline Jacsont, Elina Leblanc, Marie Jeannot-Tirole, Sonia Solfrini, Sophie Dolto, Floriane Goy, Carmen Carrasco Luján, Maddalena Zaglio, Myriam Perregaux, Juliette Janès, Benoît Sagot, Rachel Bawden, Rasul Dent, Oriane Nédey, Alix Chagué (2024). In Proceedings of Humanistica 2024-Colloque annuel de l'Association francophone des humanités numériques. Meknès, Morocco.
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation. Matthieu Futeral, Cordelia Schmid, Ivan Laptev, Benoît Sagot and Rachel Bawden (2023). In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM. Rachel Bawden and François Yvon (2023). in Proceedings of the 24th Annual Conference of the European Association for Machine Translation. EAMT'23. Pages 157–170. Tampere, Finland.
RoCS-MT: Robustness Challenge Set for Machine Translation. Rachel Bawden and Benoît Sagot (2023). in Proceedings of the Eighth Conference on Machine Translation. WMT'23. Pages 198–216. Singapore.
Findings of the 2023 conference on machine translation (WMT23): LLMs are here but not quite there yet. Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Philipp Koehn, Benjamin Marie, Christof Monz, Makoto Morishita, Kenton Murray, Makoto Nagata, Toshiaki Nakazawa, Martin Popel, Maja Popović and Mariya Shmatova (2023). In Proceedings of the Eighth Conference on Machine Translation. WMT'23. Pages 1–42. Singapore.
Findings of the WMT 2023 biomedical translation shared task: Evaluation of ChatGPT 3.5 as a comparison system. Mariana Neves, Antonio Jimeno Yepes, Aurélie Névéol, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Lana Yeganova, Dina Wiemann and Cristian Grozea (2023). In Proceedings of the Eighth Conference on Machine Translation. WMT'23. Pages 43–54. Singapore.
Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages. Sonal Sannigrahi and Rachel Bawden (2023). In Proceedings of the 24th Annual Conference of the European Association for Machine Translation. EAMT'23. Pages 181–192. Tampere, Finland.
MaTOS: traduction automatique pour la science ouverte. Maud Bénard, Alexandra Mestivier, Natalie Kubler, Lichao Zhu, Rachel Bawden, Eric De La Clergerie, Laurent Romary, Mathilde Huguin, Jean-François Nominé, Ziqian Peng and François Yvon (2022). In Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles. TALN'23. Paris, France.
Cross-lingual Strategies for Low-resource Language Modeling: A Study on Five Indic Dialects. Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot and Rachel Bawden (2023). In Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles. TALN'23. Paris, France.
Gallic(orpor)a: Extraction, annotation et diffusion de l’information textuelle et visuelle en diachronie longue. Benoît Sagot, Laurent Romary, Rachel Bawden, Pedro Javier Ortiz Suárez, Kelly Christensen, Simon Gabay, Ariane Pinche, Jean-Baptiste Camps (2022). In Actes de DataLab de la BnF: Restitution des travaux 2022.
Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports. Mariana Neves, Antonio Jimeno Yepes, Amy Siu, Roland Roller, Philippe Thomas, Maika Vicente Navarro, Lana Yeganova, Dina Wiemann, Giorgio Maria Di Nunzio, Federica Vezzani, Christel Gérardin, Rachel Bawden, Darryl Johan Estrada, Salvador Lima-López, Eulàlia Farré-Maduell, Martin Krallinger, Cristian Grozea, Aurélie Névéol (2022). In Proceedings of the Seventh Conference on Machine Translation. WMT'22. Pages 694-723. Abu Dhabi, United Arab Emirates.
Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation? Jesujoba Alabi, Lydia Nishimwe, Benjamin Muller, Camille Rey, Benoît Sagot, Rachel Bawden (2022). In Proceedings of the Seventh Conference on Machine Translation. WMT'22. Pages 233-243. Abu Dhabi, United Arab Emirates.
Findings of the 2022 conference on machine translation (WMT22). Tom Kocmi, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Thamme Gowda, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Rebecca Knowles, Philipp Koehn, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Michal Novák, Martin Popel, Maja Popović (2022). In Proceedings of the Seventh Conference on Machine Translation. WMT'22. Pages 1-45. Abu Dhabi, United Arab Emirates.
Vers l’étude linguistique sur données artificielles. Simon Gabay, Rachel Bawden, Benoît Sagot, Philippe Gambette. (2022) In Proceedings of Variation (s) en français. Nancy, France.
Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France’s Court of Cassation Rulings. Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot and Rachel Bawden (2022). In Proceedings of the 13th Language Resources and Evaluation Conference. LREC'22. Pages 4754–4766. Marseille, France.
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French. Simon Gabay, Pedro Ortiz Suarez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot (2022). In Proceedings of the 13th Language Resources and Evaluation Conference. LREC'22. Pages 3367–3374. Marseille, France.
Automatic Normalisation of Early Modern French. Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot and Simon Gabay (2022). In Proceedings of the 13th Language Resources and Evaluation Conference. LREC'22. Pages 3354–3366. Marseille, France.
Le projet FREEM: ressources, outils et enjeux pour l’étude du français d’Ancien Régime (The FREEM project: Resources, tools and challenges for the study of Ancien Régime French). Simon Gabay, Pedro Ortiz Suarez, Rachel Bawden, Alexandre Bartz, Philippe Gambette, Benoît Sagot (2022). In Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1: conférence principale. TALN'22. Pages 154-165. Avignon, France.
Multitask Prompt Tuning Enables Zero-Shot Task Generalization. Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush (2022). In Proceedings of the 10th International Conference on Learning Representations. ICLR'22. Online.
Findings of the WMT 2021 biomedical translation shared task: Summaries of animal experiments as new test set. Lana Yeganova, Dina Wiemann, Mariana Neves, Federica Vezzani, Amy Siu, Iñigo Jauregi Unanue, Maite Oronoz, Nancy Mah, Aurélie Névéol, David Martinez, Rachel Bawden, Giorgio Maria Di Nunzio, Roland Roller, Philippe Thomas, Cristian Grozea, Olatz Perez de Viñaspre, Maika Vicente Navarro and Antonio Jimeno Yepes (2021). In Proceedings of the 6th Conference on Machine Translation. WMT'2021. Online.
Expanding the content model of annotation Block. Alexandre Bartz, Juliette Janes, Laurent Romary, Philippe Gambette, Rachel Bawden, Pedro Javier Ortiz Suárez, Benoît Sagot, Simon Gabay (2021). In Proceedings of Next Gen TEI, 2021-TEI Conference and Members’ Meeting.
Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? Clémentine Fourrier, Rachel Bawden and Benoît Sagot (2021). In ACL-IJCNLP 2021-Findings of the Association for Computational Linguistics. ACL-Findings'2021. Pages 846-861. Online.
Few-shot learning through contextual data augmentation. Farid Arthaud, Rachel Bawden and Alexandra Birch (2021). In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. EACL'2021. Online.
A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing. Rachel Bawden, Biao Zhang, Lisa Yankovskaya, Andre Tättar and Matt Post (2020). In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. EMNLP'20. Pages 918–932. Online.
The University of Edinburgh-Uppsala University's Submission to the WMT 2020 Chat Translation Task. Nikita Moghe, Christian Hardmeier and Rachel Bawden (2020). In Proceedings of the 5th Conference on Machine Translation. WMT'20. Pages 473–478. Online.
ParBLEU: Augmenting Metrics with Automatic Paraphrases for the WMT'20 Metrics Shared Task. Rachel Bawden, Biao Zhang, Andre Tättar and Matt Post (2020). In Proceedings of the 5th Conference on Machine Translation. WMT'20. Pages 887–894. Online.
The University of Edinburgh's English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task. Rachel Bawden, Alexandra Birch, Radina Dobreva, Arturo Oncevay, Antonio Valerio Miceli Barone and Philip Williams (2020). In Proceedings of the 5th Conference on Machine Translation. WMT'20. Pages 92–99. Online.
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages. Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Iñigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez de Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann, Lana Yeganova (2020). In Proceedings of the 5th Conference on Machine Translation. WMT'20. Pages 660–687. Online.
Document-level Neural MT: A Systematic Comparison. António Lopes, M. Amin Farajian, Rachel Bawden, Michael Zhang and André T. Martins (2020). In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. EAMT'20. Pages 225–234. Lisbon, Portugal. Dataset
Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media. Susie Coleman, Andrew Secker, Rachel Bawden, Barry Haddow and Alexandra Birch (2020). In Proceedings of the 1st International Workshop on Language Technology Platforms. IWLPT'20. Marseille, France.
Document Sub-structure in Neural Machine Translation. Radina Dobreva, Jie Zhou, and Rachel Bawden (2020). In Proceedings of the 12th Language Resources and Evaluation Conference. LREC'20. Marseille, France. Datasets
The University of Edinburgh’s Submissions to the WMT19 News Translation Task (Updated version). Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio Valerio Miceli Barone, and Alexandra Birch (2019). In Proceedings of the Fourth Conference on Machine Translation. WMT'19. Florence, Italy. Gujarati models
Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies. Rachel Bawden, Kevin Bretonnel Cohen, Cristian Grozea, Antonio Jimeno Yepes, Madeleine Kittner, Martin Krallinger, Nancy Mah, Aurelie Neveol, Mariana Neves, Felipe Soares, Amy Siu, Karin Verspoor, and Maika Vicente Navarro (2019). In Proceedings of the Fourth Conference on Machine Translation. WMT'19. Florence, Italy.
Global under-resourced media translation (GoURMET). Alexandra Birch, Barry Haddow, Ivan Tito, Antonio Valerio Miceli Barone, Rachel Bawden, Felipe Sánchez-Martínez, Mikel L. Forcada, Miquel Esplà-Gomis, Víctor Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Wilker Aziz, Andrew Secker, and Peggy van der Kreeft (2019). In Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks. Dublin, Ireland.
Evaluating Discourse Phenomena in Neural Machine Translation. Rachel Bawden, Rico Sennrich, Alexandra Birch and Barry Haddow (2018). In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. NAACL'18. New Orleans, USA. code and test set.
Detecting context-dependent sentences in parallel corpora. Rachel Bawden, Thomas Lavergne and Sophie Rosset (2018). In Proceedings of the 25th Conférence sur le Traitement Automatique des Langues Naturelles. TALN'18. Rennes, France.
Machine Translation, it's a question of style, innit? The case of English tag questions. Rachel Bawden (2017). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. EMNLP'17. Copenhagen, Denmark. code
Machine Translation of Speech-Like Texts: Strategies for the Inclusion of Context. Rachel Bawden (2017). In Proceedings of the 19th REncontres jeunes Chercheurs en Informatique pour le TAL. RECITAL 2017. Orléans, France.
Boosting for Efficient Model Selection for Syntactic Parsing. Rachel Bawden and Benoit Crabbé (2016). In Proceedings of the 26th International Conference on Computational Linguistics. COLING'16. Osaka, Japan.
Investigating gender adaptation for speech translation. Rachel Bawden, Guillaume Wisniewski and Heéleène Maynard (2016). In Proceedings of the 23rd Conférence sur le Traitement Automatique des Langues Naturelles. TALN'16. Paris, France.
Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie. Rachel Bawden, Marie-Amélie Bottala, Kim Gerdes and Sylvain Kahane (2014). In Proceedings of the 9th International Conference on Language Resources and Evaluation. LREC'14. Reykjavik, Iceland.

Preprints

Bloom: A 176b-parameter open-access multilingual language model. Le Scao et al. (Many authors, including Rachel Bawden) (2022). ArXiv.
Maskeval: Weighted MLM-based evaluation for text summarization and simplification. Yu Lu Liu, Rachel Bawden, Thomas Scialom, Benoît Sagot and Jackie Chi Kit Cheung (2022). ArXiv.

Book chapters

Chapter 4. Microsyntactic annotation. Sylvain Kahane, Kim Gerdes, and Rachel Bawden. In Rhapsodie – A Prosodic and Syntactic Treebank for Spoken French. Eds. Anne Lacheret-Dujour, Sylvain Kahane, and Paola Pietrandrea. John Benjamins, Amsterdam, 2019.
Chapter 7. Annotation tools for syntax. Kim Gerdes, Sylvain Kahane, Rachel Bawden, Julie Belião, Eric de la Clergerie, and Ilaine Wag. In Rhapsodie – A Prosodic and Syntactic Treebank for Spoken French. Eds. Anne Lacheret-Dujour, Sylvain Kahane, and Paola Pietrandrea. John Benjamins, Amsterdam, 2019.
Chapter 15. Exploration of the Rhapsodie corpus: Data structure, formats and query tools. Anne Lacheret-Dujour, Sylvain Kahane, Rachel Bawden, Serge Fleury and Ilaine Wang. In Rhapsodie – A Prosodic and Syntactic Treebank for Spoken French. Eds. Anne Lacheret-Dujour, Sylvain Kahane, and Paola Pietrandrea. John Benjamins, Amsterdam, 2019.

Technical report

Protocole de codage microsyntaxique Sylvain Kahane, Kim Gerdes, Pietrandrea Paola, Benzitoun Christophe, Rachel Bawden, Marie-Amélie Botalla and Adèle Désoyer (2013). Translated into English by Rachel Bawden: Protocol for micro-syntactic coding. Link to the Rhapsodie project website.

Other

[Book Review] Understanding Dialogue: Language Use and Social Interaction Rachel Bawden (2021). Computational Linguistics:703–705. MIT press.
Boosting for Model Selection in Syntactic Parsing Rachel Bawden (2015). Master's thesis. Alpage-INRIA. Supervised by Benoit Crabbé

Supervision

PhD students

Armel Zebaze

November 2023 - present
PhD funded by Inria
Analogy for multilingual NLP. Co-supervised with Benoît Sagot.

Nicolas Dahan

October 2023 - present
PhD funded by the MaTOS ANR project
Evaluation of the machine translation of scientific documents. Co-supervised with François Yvon (CNRS).

Ziqian Peng

October 2023 - present
PhD funded by the MaTOS ANR project (recruited at ISIR, CNRS)
Machine translation of scientific documents. Co-supervised with François Yvon (CNRS).

Lydia Nishimwe

October 2021 - present
PR[AI]RIE-funded PhD
Robust Neural Machine Translation. Co-supervised with Benoît Sagot.

Matthieu Futeral-Peter

October 2021 - present
PhD funded by Inria and PR[AI]RIE
Multimodal Machine Translation. Co-supervised with with Ivan Laptev, Benoît Sagot and Cordelia Schmid.

Clémentine Fourrier

September 2020 - September 2022
Inria-funded PhD
Neural models of language evolution. Co-supervised with Benoît Sagot and Laurent Romary.

Interns and engineers

Malik Marmonier

May 2024 - present
Research Engineer
Translating with large language models without parallel data for low-resource languages (TraLaLaM project)

Oriane Nédey

December 2023 - present
Research Engineer
Data collection and translation models for a regional language of France (COLaF project)

Seth Aycock

August 2023 - October 2023
Research Engineer
Domain adaptation for neural machine translation in low-resource settings.

Niyati Bafna

October 2022 - June 2023
Research Engineer
Linguistically inspired language models for closely related languages.

Jesujoba Alabi

February 2022 - June 2022
Engineer for the DadaNMT project
Domain adaptation in NMT.

Camille Rey

September 2021 - June 2022
Intern, Inalco
Contrastive training for NMT models for lexical disambiguation. Co-supervised with Benoît Sagot.

Sonal Sannigrahi

End June 2021 - August 2021
Intern, École Polytechnique
Investigating the effect of input representations on language sharing in multilingual models.

Matthieu Futeral-Peter

May 2021 - October 2021
Master 2 intern, ENSAE and ENS Paris- Saclay
Exploration of multilingual and multimodal word embeddings. Co-supervised with Benoît Sagot, Cordelia Schmid and Ivan Laptev.

Thibault Charmet

February 2021 - January 2022
Research Engineer
Automatic tools for improving jurisprudence consistency, Co-supervised with Benoît Sagot and in collaboration with the Cour de Cassation.

Quentin Burthier

Septembre 2020 - January 2021
Master 2 student, ENS Paris-Saclay
Machine Translation of Noisy Texts. Co-supervised with Djamé Seddah.

Ashwani Tanwar

April - August 2020
MSC thesis, University of Edinburgh
Improving Low-Resource Neural Machine Translation of Related Languages by Transfer Learning. Co-supervised with Alexandra Birch.

Farid Arthaud

February - June 2020
Master 1 (ENS, Paris), visiting student at the University of Edinburgh
Continuous learning for Neural Machine Translation from Human Post-edits. Co-supervised with Alexandra Birch.

Radina Dobreva

March - August 2019
MSC thesis, University of Edinburgh
Integrating document structure information into Neural Machine Translation using cache-based models. Co-supervised with Annie Louis and Bonnie Webber.

Jie Zhou

March - August 2019
MSC thesis, University of Edinburgh
Exploiting Predictable Document Substructure in Neural Machine Translation. Co-supervised with Annie Louis and Bonnie Webber.

Teaching Experience

Machine Translation

2023-2024
Master MVA (Algorithms for speech and natural language processing)

Lecture on Machine Translation in the context of the Algorithms for speech and natural language processing course.

Introduction to NLP

2016-17 and 2017-18
PolyTech Paris-Sud, France

4th year lectures, tutorials and practical classes (30hrs/year)

Introduction to algorithmics and C++

2017-18
PolyTech Paris-Sud, France

2nd year practical classes (10hrs)

Programming with C

2016-17
PolyTech Paris-Sud, France

3rd year tutorials and practical classes (27hrs)

Databases

2015-16 and 2016-17
PolyTech Paris-Sud, France

3rd year tutorials and practical classes (24hrs/year)

Algorithmics and C

2015-16 and 2016-17
PolyTech Paris-Sud, France

3rd year tutorials and practical classes (28hrs)

Tutoring in computer science

2015-16
PolyTech Paris-Sud, France

3rd year (12hrs)

English language assistant

2009-10
4 primary schools, Le Puy-en-Velay, France

Work placement as an English language assistant

2005
Collège St. Martin, Tours, France