MT Technology Background

Any MT development combines two different views: (1) the linguistic view describes the linguistic structures of the two languages, and meaning-preserving transformations between them; (2) the computational view describes how linguistically motivated data structures and transformations are realized in a computer environment.

An Ad Hoc MT System: Linguistic and Computational Spaghetti

A simple machine translator disregards such a distinction. A computer programmer might directly write program code and insert diffusely in the code whatever linguistic data he or she decides to use in translation.

Such a simple approach may be fast, but it is not recommended. The translator would be difficult to update, and its translation "theory", if so noble a term can be applied, is not portable to other language pairs.

Sunda's MT Technology: Linguistic and Computational Views Clearly Distinct

Sunda's technology separates the linguistic view of translation and its computational view explicitly. A linguist "sees" only the linguistic facts and constraints of translation; the computational aspects of translation are invisible to him or her. The computational view is generic and fully implemented in the MT Engine. Therefore the development work is strictly linguistic.

MT System: MT Engine + Set of Rule Files

Compiled linguistic rules are automatically associated with the generic MT Engine. When a designer activates the Engine, he or she immediately sees the effects of new rules on a given data. The MT Engine also gives a trace of the activated rules and operations.

Linguistic Rule Types

Linguistic rules come in three basic categories:

  • Default lexical rules
  • Context-sensitive lexical rules
  • Grammar rules

The English-to-Finnish translator has currently about 200 000 simple lexical rules, over 100 000 contextual lexical rules and about 3 000 grammar rules.

Linguistic Team

A minimum development team for a new machine translator requires 2-3 linguistic developers and one part-time systems engineer.