

Other features include typological distance, error rate and acceptable amount of manual checking.

The value of a parallel corpus usually grows with its size and with the number of languages for which translations exist. Another important feature of a parallel corpus is its textual size. Noise is defined as the amount of omissions or the difference in segmentations of the text pair. The second feature is the amount of noise available in the text pair.

It is admitted in the machine translation community that the training data of literal type better suits statistical machine translation (SMT) systems at their present level of intelligence. A literal translation (also known as word- for-word translation) is a translation that closely follows the form of source language. Literal and free translations are two basic skills of human translation. The first feature is the structural distance between the text pair which indicates whether the translation is literal or free. Parallel corpora possess some properties that should be taken into account in their development. for lesser studied languages like Persian.
