28 January, 2016

Translating Alien Languages

Epistemic Status: Likely

Related: Three Worlds Collide

I'm learning Lojban sorta off and on, mostly off, and yesterday I was idly pondering whether the meanings of words in Lojban would be biased the words used in their English descriptions (I'm not considering this seriously, sense I doubt someone would make a conlang without knowing a few languages). It eventually lead to me thinking about languages in general, and how dictionaries create endless circularity of meaning.

i.


Here's an example of this circularity I generated:

Real: something which exists in reality

Reality: the set of real things

So from that you'd get pathological cases like:

Real: something which exists in the set of thing which exist in the set of things which ...

But another way to think about it is in terms of graphs, like:





 Which exorcises the infinite recursion, and makes the circularity obvious.

Then I thought of the general case, which looks something like:





This is more of a proof of concept or prototype than anything, but it should help give you my mental image of my big idea(tm).

ii.


The idea is that, in the 'complete' map of a language, every word is connected to every 'relevant' word, like the words in a definition, and maybe synonyms, antonyms, etc.

This would probably be enough to be able to unique identify a word with its connections, especially seeing as there would be e.g. antonym connections, subset connections, etc.

Call these 'semantic skeletons'.

The cool part is that so far, this talk just been unmoored abstraction. All we've been using is intensional definitions. But a semantic skeleton would
encode the ontological structure of the universe they describe. So, a 'complete' network graph of a language should map nicely onto reality, or barring direct access to reality, a another language.

If you had enough computational power, you might even be able to brute force match two semantic skeletons without even knowing their language.

iii.


In reality, trees are made of wood.

So, the the semantic skeleton would have a 'tree' node connected to a 'wood' node and maybe a 'can burn' node'. Without the context provided by the nice labels, you wouldn't know what those meant, it'd just be a node connected to two others. but with the  billions of nodes and quadrillions of connections on a complete deconstruction of language. There might be enough context to make a few educated guesses.

But since having a complete ontological map of reality is really nontrivial, and probably impossible in general, and likely impossible even at the limit, any map you'd come across would be only a partial map.


But those are details.

Say, you had two ontological maps, and you needed to find a mapping between them. It's take exponential time, but it should be possible.


iv.


From Three Worlds Collide:


The Lord Programmer smiled, ever so slightly.  "You see, that enormous dump of data they sent us - I think that was their Local Archive, or equivalent.  A sizable part of their Net, anyway.  Their text, image, and holo formats are utterly straightforward - either they don't bother compressing anything, or they decompressed it all for us before they sent it.  And here's the thing: back in the Dawn era, when there were multiple human languages, there was this notion that people had of statistical language translation.  Now, the classic method used a known corpus of human-translated text.  But there were successor methods that tried to extend the translation further, by generating semantic skeletons and trying to map the skeletons themselves onto one another.  And there are also ways of automatically looking for similarity between images or holos.  Believe it or not, there was a program already in the Archive for trying to find points of linkage between an alien corpus and a human corpus, and then working out from there to map semantic skeletons... and it runs quickly, since it's designed to work on older computer systems.  So I ran the program, it finished, and it's claiming that it can translate the alien language with 70% confidence.  Could be a total bug, of course.  But the aliens sent a second message that followed their main data dump - short, looks like text-only.  Should I run the translator on that, and put the results on the main display?"

I'm embarrassed it took me until now to understand this paragraph.

No comments:

Post a Comment