A strategic challenge for Europe in today's globalised economy is to overcome language barriers through technological means. In particular, Machine Translation (MT) systems are expected to have a significant impact on the management of multilingualism in Europe, making it possible to translate the huge quantity of textual data produced, and thus, covering the needs of hundreds of millions of citizens. PANACEA addressed a critical thread to this vision: the so-called, language-resource bottleneck. Although MT technologies may consist of language independent engines, they highly depend on the availability of language-dependent data for their real-life implementation, i.e., they require Language Resources (LRs). In order to equip MT systems for every pair of European languages, for every domain, and for every text genre, appropriate LRs covering every language, domain and genre must be produced. Moreover, a Language Resource for a given language can never be considered complete or final. Language change and new knowledge domains emerge at rapid pace. A company willing to cover the enlarged Union market needs to produce and maintain 500 bilingual glossaries, for instance.
Traditionally, LRs production is done by hand, and its high cost (highly skilled human work and development time) hindered full coverage. Automatic production lowers the cost and time required for producing basic LR for languages which are currently not well covered. Such reductions are the only way to guarantee a continuous supply of LRs that MT and other Language Technologies may demand in a multilingual Europe.
PANACEA has contributed to demonstrate that the “LR bottleneck” problem can be effectively addressed by automation with the production of (a) the PANACEA Platform, (b) the web services integrated within the platform, (c) the associated workflows to manage the sequencing of web services (d) the tools for LR acquisition developed during the project and (e) the Language Resources (LRs) produced during the project, mainly for Machine Translation tools but not only, exploiting the platform, web services, and the specialized workflows.
The PANACEA factory has been thoroughly evaluated within R&D and industrial settings. The platform and the LRs production lines based on advanced technological components have proved the feasibility of the concept. PANACEA’s contribution and potential impact has been demonstrated in an industrial evaluation carried out with the adaptation of Machine Translation systems to specific and specialized domains. In terms of effort, to produce a domain-adapted bilingual glossary of 1000 entries with PANACEA reduces costs from 30 person/hours to 0.5 person/hours. In terms of quality, there were no significant negative effects in the translation quality of the systems using automatically produced resources. A human evaluation showed that PANACEA domain-tuned SMT gained in quality up to a 6% with respect to the not tuned baseline, and that the quality of SMT with automatically acquired LRs was not significantly worse than the achieved for language pairs like Italian-German by other state-of-the-art systems as Google Translator.
More details can be found at the following link: http://cordis.europa.eu/wire/index.cfm?fuseaction=article.Detail&rcn=35911