A linguistic Red Herring

I’m still trying to fathom the linguistics of the Climategate III text. The author(s) appears to be overly concerned with linguistics and I quote the first few sentences in full:-

It’s time to tie up loose ends and dispel some of the speculation surrounding the Climategate affair.

Indeed, it’s singular “I” this time.  After certain career developments I can no longer use the papal plural ;-)

If this email seems slightly disjointed it’s probably my linguistic background and the problem of trying to address both the wider audience (I expect this will be partially reproduced sooner or later) and the email recipients (whom I haven’t decided yet on).

They write to tie up loose ends, dispel speculation … then rather than doing this, they focus on the linguistics of “I” vs. “WE” and then go on to explicitly state the linguistics are odd attributing this to their “linguistic background”. There is certainly a strong emphasis on language right from the second paragraph.

This was probably what sent me off trying to locate the likely linguistic background from the turn of phrase used. At first I thought that would be a latin derived language, but there are also a number of noteworthy Germanic (group of language) phrases (which may have been picked up as phrases). But nothing I found linked the phrases used to foreigners speaking English. If they were created by someone because their first language wasn’t English, the odd phrases should have been readily found on the internet being used by foreign speakers. They weren’t.

So, my next assumption was that this was some kind of computer generated speech. The obvious example would have been to translate the text through a language translation program several times. This would tend to “standardise” the words used into the linguo-franco of the world “USScandoLatinoChienglish”.

However, having translated a few texts using online software, I’m familiar with the kind of oddities they produce and the kind of phrases that don’t translate well. And the author’s text is full of the oddest turns of phrase.

Next as I explained in the WUWT post I tried searching for the turns of phrases used in the text. These as I said, suggest some kind of link (or copying) from Republicans.

However, this morning, going back to read the text, I was hit by the very high profile given to “linguistics” in the introductory text. Clearly the author(s) had this on their mind. Which has started me thinking whether there is a way to create this text using some kind of algorithm which like the encryption of the emails, is certain to hide the linguistic origin of the author and which might explain the apparent connection with republicans (and not democrats).

So, how could this be achieved? Computers don’t understand intention, so any algorithm must deal with words and phrases and not concepts. So, the original text has to be written by a person. But, English is a wonderful language where many words share the same meaning. E.g. hole, pit, trench … could all be used in the phrase “when in a … stop digging”. So, an obvious means to obfuscate text is to search for replaceable words and replace them (or not) at random with new similarl words. Likewise, the grammar of language has several variants.

E.g. “whom …. I haven’t decided yet on”. This phrase might have been originally written: “whom …. I haven’t yet decided on”. A computer could have recognised the linguistic elements and knowing that this pattern can be replaced by another “acceptable” version, resulting in an entirely new phrase.

But can this explain phrases such as “did little to  garner my trust in the state of climate science“. Perhaps an alternative might have been “to improve my trust”, but that doesn’t carry the same subtle meaning as “garner my trust” (One of the phrases used predominantly by US republicans). So, the specific language used which appears to show a connection to republicans contains subtleties in meaning which would have been lost by any automated substitution algorithm.

Which leads me to my next potential candidate: “manual replacement”. In concept, one picks each sentence, searches for ???? and finds equivalent text which one then uses to replace the original.

E.g. The cat sat on the mat.

Search cat and we find “feline”. Perhaps looking for sat we find no suitable change but mat is easily replaced by carpet. We then try to rearrange the sentence leading to:

On the carpet sat the feline.

However, my feeling is that this is very dangerous, because we are substituting the most common words (at least for us), with our second choices, which are a set of even less common words and possibly even more unique and illuminating about us. E.g. everyone uses the word cat, but “feline” in the above example may only be used by those with a scientific education. To see how much quicker that identifies the person, if each word we use is used by 99% of the population, after 100 words, 0.36% if the population will commonly use all those words. In contrast if we select words used by only 90% of the population, after 100 such words we find that only about 26 people in a million would use that set.

So, intentionally obfuscating the text can paradoxically make it easier to identify the individual because we replace common words with those more likely to identify us.

Which still leads me stuggling to explain the linguistics in this text which are decidely “ODD”. And as poirot would say: “when we have excluded all other alternatives … ” I am left with the following:

  • That the author was a geek who used an extremely complex text authoring system
  • That the author was a geek who used a multitude of methods including computer generated speech, adding the favourite phrases and e.g. adding intentionally bad grammar.
  • That the author wasn’t a geek, didn’t understand information theory and how such a long piece of text was bound to identify them to a large degree.

So which is it? I am still struggling to understand how anyone with even a modicum of knowledge of coding (i.e. hacking) would have produced such a lengthy piece of text. Short messages are the hardest to decrypt. Likewise, short texts are the least identifiable and a text of 976 words is just crazy IF someone wanted to keep their identity secret.

So, could the text have been produced with the intention of fingering some group or individual? I’ve not checked through the text exhaustively, but as far as I can see there is no obvious “source” document on the internet. It is possible the author intended sceptics to compare the word count with someone. This could explain the odd sentences used as the author had so many of each word to use. it may also explain the length, because a long text is needed to clearly link an individual. But it doesn’t explain why it seems to point back at the republicans, because … who would want to lay the blame on the republicans? Or perhaps, the objective is to let the republicans bask in the glory of outing climategate?

The Bitcoin address.

Not being familiar with Bitcoin myself, a quick google last night did not show any easy way to pay through this mechanism, and did suggest that may not be technically secure. If they are a geek, they are confident of their ability to avoid the technical pitfalls of Bitcoin which could leed to them being discovered (or they don’t care). If they are not a geek, they may have added this purely to suggest they are a geek who feels they deserve some reward for their work. I would suggest this is a non-geek who asked a geek “how would someone get paid without being detected”. Indeed, a geek wanting paid, would have given more instructions …. unless they are responding to someone who has already offered to pay?

The only contrary indicator is the suggestion that the Bitcoin address could uniquely identify “FOIA”.

The address can also serve as a digital signature to ward off those identity thefts which are part of climate scientists’ repertoire of tricks these days.

This suggests a concept known as a public/private key. This is a geek concept and I would not readily expect a none-geek to have come up with this.

\\\\\\\\\\\Slashes\\\\\\\\\\\\\\

The use of: “one-thing\another” was quite obscure. To quote wikipedia on the history:

Bob Bemer introduced the “\” character into ASCII[3] on September 18, 1961,[4] as the result of character frequency studies. In particular the \ was introduced so that the ALGOL boolean operators ∧ (AND) and ∨ (OR) could be composed in ASCII as “/\” and “\/” respectively.[4][5] Both these operators were included in early versions of the C programming language supplied with Unix V6, Unix V7 and more currently BSD 2.11.

In other words, there is no linguistic reason for using a backslash rather than the more common  “one-thing/another”. I am however reminded that DOS and Linux/Unix use alternative versions of the slash for path separators (I forget which way around). However, if someone were trying to suggest the author was a geek, they may have intentionally changed all the “/” to “\” to try to suggest they come from an environment that uses “\”. I can’t quite believe anyone is that stupid, which strongly suggests this is a joke. … but then again I can’t believe people accept the non-science of doomsday warming. So, on reflection, one shouldn’t discount people doing things just because I think they look stupid.

A few final comments:

  • Producing any text in the manner of another person is very difficult and usually fails. So producing 1000 words of such text would be either the work of a linguistic genius or a madman.
  • Whether a geek, or not, I’m certain the person is an active supporter of the republican party.
  • The text is very unlike the typical sceptic … all moralising and talking about “the poorest”.  And if anything, hackers tend to be on the extreme of sceptics: social isolates with an overwhelming passion for technology. The author has a passion for politics (or was carefully crafted to appear so … by someone who has a high social intellect … not the strongest facet of many sceptics or geeks).
  • I’m unnerved by the shear stupidity of putting such a long message into the public realm … and the high profile talk of linguistics and oddities like “\” is this another red-hearring? I think this is either a geek someone who is far more familiar with linguistics than computing, which would suggest that e.g. bad grammar has been introduced intentionally.
  • The last possibility is a “committee” of authors. Ask a dozen people to add a few lines … they all do so in their own style, which MIGHT explain the odd language.
  • Having searched, there appears to be no event triggering this release, so as far as I can see, the author has had many years to write this.
  • The odd language may add credence to the suggestion that it was/is a lone individual, because a committee tends to remove the “outlying” linguistics i.e. tone it down.

Summary

KISS … keep it simple, or the most obvious explanation. Following my analysis, I’ve ruled out the idea that the author is a UEA insider. A UEA insider may be the original source, but the author comes from a very different environment. In particular, I feel the author is not technical. I do not believe they could have carried out any hack. The phraseology strongly suggests a link to the US republicans. The reference to “linguistics” appears to be a clumsy attempt to suggest the author does not speak English which is in sharp contrast to where the phraseology points and therefore the the clumsy use of language is probably intentionally clumsy.

Moreover, the whole tone is one of an astute “politician” writing to a non-sceptic audience. There are whole phrases and concepts which in years of reading sceptics blogs, I’ve never seen. Like “Those millions and billions already struggling with malnutrition, sickness, violence, illiteracy, etc.  don’t have that luxury.  The price of “climate protection” with its cumulative and collateral effects is bound to destroy and debilitate in great numbers, for decades and generations.

So, the author is uniquely political aware (for a sceptic) which re-enforces my view that they were not involved in the technical IT side of events such as hiding the source of the original files using proxy servers (or such like). So this strongly suggests that there is someone else involved who is the technical wizard who could e.g. hack (un)realclimate. And, … I can’t even find my own files on my own PC, so for an outsider to work out where the UEA stored all its material still seems unlikely. So, I still think the likeliest source is a disgruntled insider – someone who likely has had no involvement since they found someone to whom they could send the original files (or perhaps it was just some outline information which allowed a more detailed hack).

Could I have done it?

I’ve often wondered if I could, and yes technically I have the right background to at least try, but I simply lack the experience, the hardware and … the interest. So, I don’t think someone just “does it” because if they did, unless by shere dumb luck they got it right, they were bound to leave breadcrumbs which someone (probably not the Norfolk police) could follow.

The first problem, was that despite having an active interest in the HadCrut stats … I didn’t know it was the University of East Anglia that kept the data. So, if I had wanted to “hack the data”, I’d have tried the Met Office and prominent US scientists. So either they had expert guidance on who to hack … or an insider led them to the material.

How to hack? It sounds quite a dedicated occupation. I would certainly want a dedicated PC, because the kinds of sites visited by hackers are ones which would try and hack back and e.g. introduce viruses “for fun”. So, I’d want a complete physical barrier between my hacking and my normal PC. And, it would have to be anonimised … which might mean steeling a Wifi connection from a neighbour or using public cafes … which wouldn’t be easy.

As I said, finding the relevant emails would be a nightmare. I can’t find files on my own PC. I wouldn’t willingly go searching through a whole department or even University’s file server looking for something. Maybe if I already worked with such systems, it would be a lot easier.

Encrypting. I think I would go paranoid trying to work out how to bundle up everything in a way that didn’t leave an “Author=Mike Haseler. Address … Tel. Date they did this”. Littered around.

Could I have written this text. No … IT’S TOO LONG. It would have taken me months to produce this text, and it certainly wouldn’t be written like this. It might however contain the inference that it was the US republicans “what did it”. So ….

I cannot see any way that any individual could both be a hacker and the author of this text.

So we are still not being told the truth by “FOIA” whoever (s\th)he are.

Addendum

Just been reading the original “readme” doc with ClimategateII. The forward, which is the main identifiable text (the rest quotes the emails) contains a mere 137 words. Not only that, but the sentence structure is simple and straightforward and therefore lacks unique features which could help identify the individual. So, it is largely anonymous. Interesting, whilst some of the subjects are similar, the focus is quite different. Climategate II focuses on facts like “$2 a day“, whereas Climategate III omits the facts to emphasise the ethics: “what happens among the poorest?” In a nutshell, Climategate II text (below) is far from media friendly and this is the BIG CHANGE to Climategate III. This adds to my conviction that this isn’t a lone geek as the content of the text would suggest.

/// FOIA 2011 — Background and Context ///
“Over 2.5 billion people live on less than $2 a day.”
“Every day nearly 16.000 children die from hunger and related causes.”
“One dollar can save a life” — the opposite must also be true.
“Poverty is a death sentence.”
“Nations must invest $37 trillion in energy technologies by 2030 to stabilize
greenhouse gas emissions at sustainable levels.”
Today’s decisions should be based on all the information we can get, not on
hiding the decline.
This archive contains some 5.000 emails picked from keyword searches.  A few
remarks and redactions are marked with triple brackets.
The rest, some 220.000, are encrypted for various reasons.  We are not planning
to publicly release the passphrase.
We could not read every one, but tried to cover the most relevant topics such
as…

This entry was posted in climate. Bookmark the permalink.

6 Responses to A linguistic Red Herring

  1. Roy Hogue says:

    A very thorough analysis but unfortunately it doesn’t lead to my knowing more about FOIA than I did when I started. He/she has a conscience and may or may not be something else as well. You ask more questions than you answer. That may in the end be a good thing. But we don’t seem to be any closer to who FOIA is.

    • Roy, it is dangerous to speculate when individuals are involved.

      • Roy Hogue says:

        I agree. On the other hand, you seemed to be leading somewhere…

        • Mike Haseler says:

          I’ll be frank, and say I came up with a named individual. At which point I stopped because even if “following statistical linguistic breadcrumbs” had led me to someone it would be ridiculous (and libellous) to suggest they had any involvement.

          However, before even trying to “follow the breadcrumbs” the bigger question is how much are we following a inadvertently left trail, and how much is it a planned trail. How much are we being led up the garden path, perhaps completely away from the real truth?

          After a day of watching others, I’m now very dubious about e.g. the 500.000 or the “\” because they are so easy to fake. In contrast, the technique I used is clearly far from obvious even to most sceptics and therefore I feel it is much less susceptible to any intentional misdirection. But even if by a long shot, I had found “the person/institution”, what does it matter?

          Because the real test of whether it is a carefully planned operation (as I suspect) or a haphazard release, will only come in the way this story is fed to the press. If it falls flat (again) I think we are talking a lone individual. If however it suddenly gets legs, and the media take a lot of interest, then it suggests a more planned approach.

          So, let us assume this has been thoroughly thought through. My hunch is that any obviously juicy material would have been released before. So, I guess that they have found a more complex story, which although less obvious, must have the potential to get into the MSM. Give it a week … and my guess is we are going to see rather complex but far from mundane story appearing quickly followed by a lot of press activity.

          … or perhaps not.

          • Roy Hogue says:

            My suspicion is much like yours – that we have something well planned by probably one person. I suspect only one because the problem of maintaining necessary secrecy grows rapidly as a function of the number of people who know the secret. FOIA shows a firm grasp of how to stay hidden. And as you say, we only need to wait a while to see what happens.

  2. Mike Haseler says:

    The indications are that there are at least two people. One is the author of the latest text. The other is a “hacker” who was involved in hacking (un)realclimate and may have authored the Climategate II text or perhaps the current author dumbed down. However, simply on the basis that they chose their timing of the original climategate release carefully to match Jokenhagen, I cannot see them just randomly dumping this material without having planned to tie it into something else.

Leave a comment