contact us | support Technology to Bridge the Language Gap
Products
| Other Linguistic Tools |
|
|
|
|
|
Other Linguistic Tools AppTek utilizes its own proprietary CASE tool for machine translation development, allowing rapid development of new language pairs. It has also developed the following modules: Diacritizer: A software module used in text-to-speech—but can also be decoupled—that introduces vowels to Arabic text. Morphogen: A morphological analyzer that provides both stemming (reducing words to their dictionary base form) and inflection (deriving grammatical variants from base forms). It is available for all of the languages that AppTek products translate. Language Recognizer: A software component that identifies more than 38 different languages. LexAPI™: An Applications Programming Interface (API) for AppTek’s lexicons. Transliteration/Romanization Tool: Tool used to identify names represented in different writing systems. Parallel Corpora: Large set of electronic texts, grammatically tagged and aligned across languages. Telephony/Servers: Utilized for collection of speech data from a variety of countries and dialects. WordTag™: An automated tool for word tagging. Grammatical parameters, such as parts of speech, can be pre-set by users.Diacritizer™The AppTek Diacritizer™ (Vowelizer) is a software component of the Text-To-Speech (TTS) engine. It adds the appropriate diacritics (short vowels) to text. In particular, the Diacritizer offers the following services:
WordTag™Word Tag™ identifies all words in a document automatically and tags them with a set of linguistic information (annotations). During tokenization, the tokens are tested to determine whether they are potential words or merely data entities (numbers, alpha-numeric text segments, etc.). Data entities are tagged as such; no further linguistic identification is done on the entities according to their class. Word tokens undergo the following steps:
Language RecognizerThe Language Recognizer is a tool used to identify the language and code page of any electronic text. The system can recognize more than 54 different languages and code pages.LexAPI™AppTek has developed a special API for its morphology, lexicon, and linguistic analyzer (LexAPI™) to facilitate the integration with third party applications. LexAPI™ adds powerful linguistic capabilities such as morphology, query translation, thematic and domain search, and word linguistic attributes to such application.Transliteration/Romanization ToolThis tool transliterates the phonetic representation of a given name, word, or text from foreign languages into English and vice versa. It uses statistical information and linguistic algorithms to perform this task. Different permutations of spellings can be produced. Additionally, different options for translation standards are also available.Running TextAppTek has parallel corpora with millions of words of running text. This text is used for text language technology, statistical applications, as well as speech research and development. The following domains are covered:
Corpora TaggingAppTek has worked on grammatical tagging of words and phrases of corpora, which has been diacritized. The study of the inherent grammatical features of the word, combined with its behavior and function in the sentence as it relates to other words and phrases, helped in tracing the changes that occur in transforming the sentence with all of its constituents into another language as target. As a result, a correspondence was created between the tagging parameters and their source counterparts to provide the following:
|


