Publication Details
Issue: Vol 2, No 5 (2025)
ISSN: 2997-9439
Visit Journal Website

Abstract

In various fields of natural language processing (NLP), especially in machine translation, achieving accurate alignment of parallel texts requires a deep understanding of the expressions, grammatical structures, and semantics in both the source and target languages. Although Uzbek and English share a structural similarity in their use of head and subordinate words within phrases, English uniquely features compound word combinations as well. In Uzbek, word combinations are classified based on the grammatical properties and structures of their components. Specifically, the classification relies on the part of speech of the dominant element and the syntactic role of the subordinate element. During alignment, differences arise due to the grammatical characteristics of the words making up the phrases in each language and their syntactic positions within sentences.

Keywords
alignment word combination tokenization source language target language (TL) grammatical structure semantics