Discontinuous Parsing
Discontinous phrase-based Parser (LIPN)
Incremental Discontinuous Phrase Structure Parsing with the GAP Transition (LLF)
Large Scale Multilingual Parsing
Multilingual Lexicalized Constituency Parsing with Word-Level Auxiliary Tasks (LLF)
Neural based and hybrid approaches to parsing morphologically-rich languages (Almanach)
Ressource-poor language parser with an emphasis on tranfert learning (LIMSI)
Datasets
The datasets we produced will be made availabe upon publications. Please contact us directly for inquiry.
- Spanish/English code switched Twitter corpus (1000 tweets) annotated in part-of-speech following the Universal Part-of-Speech tagset (Petrov et al, 2012)
- The Parallel Cr#pank, 1500 sentences annotated following the PTB tagset and translated from the French Social Media Bank (Seddah et al, 2012)
- The Arabizi Treebank, 1800 sentences in a North-African Arabic dialect used in user-generated content, code mixed with French, with manual tranlsations to French, annotated following
- (i) a rich morpho-syntactic tagset inspired from the French Social Media Bank,
- (ii) the Universal tagset (Petrov et al, 2012) from the Universal Dependencies project (Nivre et al, 2017; 2018),
- (iii) 630 sentences (ongoing work, targetting 1000) annotated following the UD annotation scheme.