3) Combing the results of Part I and Part II The results of the two translation approaches are combined by an algorithm that includes concept generalization. The purpose of the algorithm is to generalize the results of the first and second parts into a more acceptable Korean ConceptNet. Because OpenMind sentences have words not found in dictionaries (we call them “out-of-vocabulary (OOV) words”), which are usually broken words or newly coined words, they have been handled through our auto-correction word list (pairs of frequently occurring typos and their correct expressions) and a Web dictionary.

Because machine translation is still an active research area awaiting a breakthrough, English-Korean translation results of OpenMind have many incorrect sentences. Our simple experiment reveals that the errors are mostly caused by complex sentences, which include those with double quotation marks and long sentences. To alleviate these problems, the OpenMind sentences were preprocessed by the following schemes. 28 Y. Jung et al. – If a sentence length (number of words in a sentence) is greater than N (currently, N = 30), we remove that sentence.

1999) 95–130 20. Pearson, J. : Terms in context. (1998) 21. Clef06: Cross language evaluation campaign. org/. kr Abstract. This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of ConceptRelation-Concept in the net were extracted from English sentences collected from volunteers through a Web site, who were interested in entering commonsense knowledge.

