How the Catalan government uses IPFS to sidestep Spain's legal block

Catalonia, with Barcelona as its capital, is currently one of Spain’s seventeen autonomous comunities. It has a long story of pro-independence movements, and has even declared itself independent in the past. Nowadays, the political climate is very heated up. A majority of catalan people want to hold a referendum for independence, with a significant share of the population supporting independence.

Long story short, Catalonia ended up unilaterally organizing a referendum on its independence, which would be held on October 1st. The vote has been declared illegal by Spain’s courts, and the Spanish government is doing everything it can to stop it. One of Spain’s actions to try to stop the vote has been to block all websites supporting it. This includes national police forces raiding ISPs, seizing control of a number of websites offering information about the referendum, and even prosecuting people who cloned those.

Where to vote?

People are usually told where to vote through an official postal mail notification. However, the official postal carrier correos is state-controlled, and hence it would immediately seize those notifications should the catalan government send them.

With any referendum-related websites being promplty shut down and no possibility for postal mail, how is the catalan government supposed to notify people their assigned polling stations?

Catalonia’s answer

Catalonia’s solution involves IPFS, some crypto and some ingenuity. Here is the resulting website (as of Sep. 27): Referèndum 2017. Let’s see how it works!

A website stored in IPFS

The website is published through https://ipfs.io, which has a number of advantages for this purpose:

Using an international TLD makes it hard for Spain to mandate a redirection of the domain itself to its own servers (something it has been doing for .cat domains).
The domain owner is not related to the “independentist cause” in any way. This makes it harder to legally justify actions against the domain, more so when those actions would have to be carried out by the United Kingdom authorities (because the .io TLD operates from UK soil).
A bit obvious, but this is an https website. This makes it hard to tamper with the contents using MITM attacks through ISPs. The government may mandate ISPs to block all traffic to/from ipfs’s addresses, but it cannot force ISPs to show another website without triggering bad certificate warnings in browsers.
Even if spain was to cut all connections to/from ipfs.io, the content can still be accessed (and cloned) because ipfs.io is just a proxy to the IPFS peer-to-peer distributed, content-addressed file system. Anyone can download the IPFS client and get instant access to all the content stored there.

The peer-to-peer distributed part takes care of distribution: it is nearly-impossible for any actor to block access to this content because it is replicated around the network ~~automatically~~^*, using peer-to-peer encrypted connections that would be very hard to identify and block at the ISP level. Maybe China could do it, but Spain definitely cannot. [Issue: can users be easily identified?].

* Thanks to diggan from hacker news for pointing out that content is only replicated by users explicitly requesting or pinning the content. That is, the replication is not automatic.

The content-addressed part solves any concerns regarding tampering. Catalan officials can just distribute the hash of the main page file, and everyone else can be sure that all content linked from that file has been published by the catalan authorities.

There is one important challenge when using that scheme though: to be effective, all information used by the website must be public (because all content in IPFS may be accessed by anyone!). Otherwise you would need servers to hold the non-public information, and the adversaries could then attack those and render all your IPFS goodness ineffective.

Therefore, the Catalan government had to somehow compile a database that can be distributed within ipfs and easily queried using either direct URLs or at most some javascript. Let’s see how they did that.

A static database

In past elections, citizens were able to check their assigned voting station on a government website. To do so, they had to enter a limited set of their personal information (birthdate, government ID number (sort of ISSN) and current zip code) and, if all those are correct, they would get the voting station back.

This is not an ideal solution, because it can be used to “fill the gaps” of information about any citizen. If you know someone’s birth date and the area where they live, you can obtain their government ID by trial-and-error on that website. Of course, with a standard website this issue can be mitigated: the server may implement rate controls, IP blocking and so on to make these kind of attacks unfeasible in practice.

However, Catalonia cannot use any servers in this case, because this would introduce an easy way for the Spanish authorities to render the website inoperative. The entire website had to be static, and the database distributed with it. To see how they did it, let’s first see how the website queries that information. In the where to vote? page, you are prompted to enter some of your details:

DNI (national id): this is an 8-digit + 1 control character ID. Every spanish citizen has one of those. However, the website only uses the last 5 digits and the letter. Hence, this always has the form [0-9]{5}[A-Z].
Birth date: in YYYYMMDD format.
Zip code: a five-digit number.

This information is combined to form a key that will be used to lookup the corresponding polling station. This raises a huge potential privacy concern, because, if these keys were published in plain text, they would be exposing some personal details of all catalan citizens! Well… how did they do it?

Here comes the crypto

Adapted for easier understanding, the code that looks up the polling station is the following:

Edit: I made a huge mistake here, swapping the lookup and key variables (the lookup is derived from the key, not the other way around!). Thanks geofft for pointing it out, it is corrected now!

function lookup(dni, birth, zip, callback) {
  var key = dni + birth + zip;
  var passkey = sha256_times(key, 1714); 
  var search = sha256(passkey);

  var dir = search.substring(0, 2);
  var file = search.substring(2, 4);
  var path = db_path + dir + "/" + file + ".db";
  var lines = readfile(path).split("\n");

  lines.forEach(function(line) {
    if (line.substring(0,60) == search.substring(4)) {
      found = true;

      var plaintext = decrypt(line.substring(60), passkey);
      callback(plaintext.split('#'));
    }
  })

  if (!found) {
    callback("not-found");
  }
}

There are three cryptographically interesting functions used in here. Let’s check their guts to understand what exactly are they doing. Keep in mind that the crypto. functions come straight from node’s crypto package:

function sha256(text) {
  return crypto.createHash('sha256')
        .update(text)
        .digest('hex');
}

function sha256_times(text, times) {
  var result = text;
  for (var x=0; x < times; x++) {
    result = sha256(result);
  }
  return result;
}

function decrypt(text, key) {
  var decipher = crypto.createDecipher('aes-256-cbc', key);
  var dec = decipher.update(text, 'hex', 'utf8');
  dec += decipher.final('utf8');
  return dec;
}

Not too complicated. Basically, the code recurses a sha256 computation 1714 times to get a password for decryption, and then once more to get the lookup key. Then, this lookup key is used to locate one or more matching lines, and the password is used to decrypt that line’s content (which results in the voting station information).

I am not a crypto expert, and hence I cannot identify any glaring mistake in here. It seems to me that the personal information can only be recovered by brute-forcing. But how hard would it be? Let’s make an educated estimation!

The DNI part has 10^5 * 23^1 possible combinations (there are only 23 possible letters).
The birth date part has approximately 365 * 100 (we assume nobody is more than 100 years old).
The zip code part possibilities can be reduced if we assume that people live in Catalonia. That is because spanish zip codes are a 2-digit province code plus a 3-digit area code. There are only 4 provinces in catalonia, and hence there are only 4 * 10^3 possible zip codes in catalonia. We could further reduce this because not all zip codes are actually used and anyone can easily obtain a list of all valid codes, but we’ll ignore this to get an estimation.
Combining everything, we have approximately 10^10 * 23 * 365 * 4 = 33580*10^10 possible combinations. Therefore, the key space is of about 48 bits.

This sounds awfully low to me. I suspect that with modern hardware you would be able to transform the entire database to some plaintext database given some patience and not-that-much money. If we assume we have to compute 1715 sha256 hashes to check whether a generated key is in the database checking the entire space would take 575897 * 10^12 hashes. Using the bitcoin wiki as a reference, we can come up with some numbers:

Hardware	Hashes/s	Approx. Time
AntMiner S9	28 * 10^9	238 card days
AMD 7970	825 * 10^3	20870 card years
Nvidia Tesla S2070	749 * 10^3	24381 card years

Well, it doesn’t seem that feasible for a lone hacker like myself, but definitely possible for some well funded attacker. Also, consider that this is the effort it would take to build an entirely plain-text database. If you know someone’s DNI and where they live, you could most probably brute force their birth date using just your laptop and without taking much time!

Open issues

We have already seen that it is possible (albeit hard) to rebuild the entire database in plain text. This means that the Catalan government has potentially disclosed a database containing part of their citizen’s IDs, birth dates, postal codes and voting stations. None of this information is crucial (unlike SSIs, spanish IDs are not supposed to be kept secretly), but if these techniques become more common this could end up badly (especially when people start crossing different databases).

Also, there is something every hacker should know these days: never roll your own crypto. It is way too easy to shoot yourself in the foot… did this happen here? I hope someone more knowledgeable than myself can answer this question.

Finally, there is a potential “political” problem regarding the entire approach to circumvent Spain’s block. IPFs more or less guarantees the availability of the information, yet as far as I know it does not guarantee anonymity. Notice that Spain is already presseing charges against at least 10 people who cloned this website. Thus, they could just as easily add a rogue IPFS node that collected IPs of citizens sharing it through IPFs and prosecute them too.

All in all… do you think Catalonia’s move was a good or bad one here? What could they have done differently? Is this approach a solid one for anyone striving to avoid state-level censorship of any kind?

Very exciting days for a catalan hacker!

There is a lot of discussion going on at Hacker News, you may want to check that.