Artículos con la etiqueta ‘Computation and Language (cs.CL)’

Category theory, logic and formal linguistics: some connections, old and new

Por • 30 ene, 2014 • Category: Crítica

We seize the opportunity of the publication of selected papers from the \emph{Logic, categories, semantics} workshop in the \emph{Journal of Applied Logic} to survey some current trends in logic, namely intuitionistic and linear type theories, that interweave categorical, geometrical and computational considerations. We thereafter present how these rich logical frameworks can model the way language conveys meaning.

Complexity measurement of natural and artificial languages

Por • 2 dic, 2013 • Category: Ambiente

We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and specific word diversity. Code text written in artificial languages showed higher entropy than text of similar length expressed in natural languages. Spanish texts exhibit more symbolic diversity than English ones. Results showed that algorithms based on complexity measures differentiate artificial from natural languages, and that text analysis based on complexity measures allows the unveiling of important aspects of their nature. We propose specific expressions to examine entropy related aspects of tests and estimate the values of entropy, emergence, self-organization and complexity based on specific diversity and message length.

Exploiting Similarities among Languages for Machine Translation

Por • 27 sep, 2013 • Category: Ciencia y tecnología

Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It uses distributed representation of words and learns a linear mapping between vector spaces of languages. Despite its simplicity, our method is surprisingly effective: we can achieve almost 90% precision@5 for translation of words between English and Spanish. This method makes little assumption about the languages, so it can be used to extend and refine dictionaries and translation tables for any language pairs.

The most controversial topics in Wikipedia: A multilingual and geographical analysis

Por • 31 may, 2013 • Category: sociologia

We present, visualize and analyse the similarities and differences between the controversial topics related to “edit wars” identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 list of most controversial articles in different languages and the content related geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia, and, in general, about cultures of peer-production with focus on universal and specifically, local features. We demonstrate that Wikipedia is more than just an encyclopaedia; it is also a window into divergent social-spatial priorities, interests and preferences.