This is a guest post by Ray Patterson
Read the first part here.
Crossmating: The Altcoin Boom
By the mid-summer of the year 2013, more than a hundred altcoins were up and running, with almost half of them appeared in the latest couple of months. Should we mention that all those 'newbies' were LItecoin forks and utilized scrypt? Another trend of the season was an upstart Proof-of-Stake from PPcoin, so scrypt+PoS combo could be called 'standard alt-coin-beginner package'.
Such (quantitative) popularity of scrypt and exponential growth of Bitcoin complexity led to a simple thought: scrypt-ASICs will appear as soon as they are profitable. Despite the fact that giant November bubble (when Bitcoin was rated up to $1200) was far from beginning to balloon, the search for the new PoW function started again.
How can we diversify a standard hash-function? Well, for instance, by using another standard hash-function! Sifcoin was the first to introduce an idea of subsequent hashing with a few popular hash-functions, using the finalists of SHA3 contest: Blake, BMW, Groestl, JH, Keccak, Skein. The idea is relatively simple: six different algorithms mean six different ASIC cards, i.e. (at a first glance) many and expensive. Moreover, if a backdoor is found for one algorithm (not necessarily a crack, just a faster solution to the problem 'many leading zeros') the complete solution will still hold.
Pioneers do not always get credit for their discoveries: this six-hash algorithm got popular (and acquired a name) in the first Sifcoin fork named Quark. It later gave birth to a couple dozens of altcoins, and one of them surpassed "daddy's" image. It was Darkcoin (called DASH now). The new PoW it used was named X11 and differed, clearly, by number of utilized hash-functions. It did not introduce any innovations (except rounds' sequence), so X11 popularity (second after scrypt) most probably comes from success of Darkcoin itself, which, indeed, had many other changes (some being smart, others not quite).
X11 appeared in the beginning of 2014, and was working only on CPU for a while, bringing joy to the users. However, Darkcoin complexity jumped up two times in April, and this led to a reasonable suspicion that a GPU-miner had been created. As no one came out, a contest for 'official' development of such software was announced, funds were gathered, and in a month hashrate grew ten times higher with a new GPU-miner. This miner turned out to be 5 times as efficient as CPU (10 times for scrypt).
Currently, there are many varieties of X11 and Quark, some of them even have unique names like X14, X15... Still, everyone understands that this algorithm is not ASIC-resistant in the real sense of the word. Calculating 11 different hashes instead of 1 means, roughly, making an assembly line 11 times longer. In other words, R&D breakeven point was simply moved away a few times.
Due to different marketing techniques some SHA-3 finalists became popular as solo-versions. For instance, Keccak itself, being SHA-3. It is quite clear: "SHA-2, improved!" That is, there was no specific concept for ASIC resistance, except the one that developing and starting production takes at least a year. On the other hand, we have a real, brand-new modern standard! A special Skeincoin appeared as well, using (according to its name) Skein algorithm (must have been Bruce Schneier's fans).
Variety: Cryptographic Algorithms
The spire of Evolution was completed, and cryptocurrency-users' thoughts turned back to the memory-bound algorithms. Take scrypt, for example: a good algo, but they used wrong parameters!
It seems like some thought was given to the subject, as there were no other currencies with changed static parameters of scrypt. However, the idea of dynamic memory usage was realized in no less than two ways:
Scrypt-N. Let us remind you of three scrypt parameters selected in Tenebrix and other Litecoins: N=1024, r=1, p=1. The last one is responsible for possibility of paralleling, hence we do not need it. N and r increase the required memory volume, with r increasing number of calling the mixing function Salsa20, i.e. it has a greater "CPU/memory cost" ratio than N. Due to this, developers decided to regularly increase N twofold, making the algorithm utilize more memory. For example, Vertcoin requires only 512 kB at the start (N=4096), in a year its needs would grow up to 1024 kB etc. Re-writing a GPU-miner, according to creators, is piece a cake, but no one is going to make a new ASIC every year.
Scrypt-jane. Basically, this is the same idea of increasing N, but it is not performed on a regular basis but governed by a pseudo-random formula with nonlinear relationship to current time. Although N is increased monotonically, ok, "not decreasing", the periods between increasing time look more like a diverging sequence (6,3,,3,9,3,24,12,36...). In addition, scrypt-jane uses a few internal mixing (Salsa20/8, ChaCha20/8 and Salsa6420/8) and hash (SHA2, BLAKE, Skein, Keccak) functions.
Another memory-bound PoW function based on different principles was Momentum, implemented in BitShares. It is very simple:
For example, we want to sign data D. First we get H = Hash(D), where Hash() - is some cryptographic hash-function.
Let us find such values A and B that BirthdayHash(A + H) = BirthdayHash(B + H), with BirthdayHash() being a memory-bound function, as scrypt.
Now, if Hash(H + A + B) < TargetDifficulty (read: begins with n zeros), than it is finished. Victory! Otherwise, go back to step 2.
As we see, most work is done on step 2, when searching for collisions of two hashes. Naturally, we do not need to discover 256 colliding bits but significantly less, however, it is a very complex task. For instance, we need to find collision of the first 64 bits, and it requires 2^64 hashing operations... or does it?
A superhero is here to help us: birthday problem. It comes down to the following: probability of finding a collision among some set increases quadratically with increasing number of elements (because the number of unique pairs inside the set grows). Practically, it provides us with the following evaluation: in order to find ANY 64-bit collision with 50% probability, we only need to generate 2^32 hashes (4 million instead of 18 trillion, a very nice saving).
Why this principle does not work for 'common' PoW, which, per se, means searching for collisions? The key word here is ANY collision. If we talk about Bitcoin, the collision should be found for a given sample: N zeros in the beginning, and in Momentum N any first bits should collide.
We have here an obvious middle ground between 'time' and 'memory'. As envisioned by its creator, the algorithm should use approximately 2 GBs of memory per stream, meaning that an average user will be able to perform a few simultaneous searches. In this case, all that ASICs (memory NOT being their strong point) can do is watch and be jealous (or be quadratically faster).
There is, however, a setback. All previous PoW algorithms operated 'instantly', meaning that they were performing an 'enumeration', and each attempt required fixed time with equal probability for success. Every checked hash was like throwing a dice with 2^256 edges, and you could take another dice (start working on another block) without affecting chances. What this means for a miner? It means that when he receives any new transaction he wants to include into block, he has to refresh the hashed structure ('take another dice'). It only takes a fraction of a second, so losses are negligibly small. With Momentum, it is not that simple. Every hashing iteration and adding new hash into global 2 GB table has increased probability of success than the previous one! And it is clear that refreshing block header and starting to build a table 'from scratch' is extremely unfavorable. If your chances of finding solution are increasing with every dice throw, it is better NOT to take a new dice. That is, eventually, accepting new transactions and 'resetting' the progress is unfavorable for Momentum miner, meaning that only the transactions that had arrived before the work on this block started. In case of Bitcoin, it would mean that average transaction confirmation time would have increased from 10 to 20 minutes.
Mineral resources: Useful cryptocurrencies
Standing apart is a single PoW function that appeared in Summer of 2013. Before switching to it, we need to recognize a problem with proof-of-work in general. I have to note that some people might not regard it as a problem (on the contrary!), but we are talking about something that many people consider to be a problem.
All this work is useless! All these hashes... no one needs them anywhere except cryptocurrencies. Well, you can, of course, print out and put on a wall the smallest discovered hashes, but they have no practical use. We are not going into disputes like 'should this work be useful or not', but we want to point out the fact that inventing a PoW function providing some benefit is an exotic task.
Still, such function was found. Developer named SunnyKing (he was the one to invent "or, at least, the first to implement" the Proof-of-stake scheme) presented the following idea: the proof of performed work is a discovered sequence of prime numbers conforming to some requirements. Or, rather, it should be Cunningham chain of the first or second kind (so called bi-twin primes).
Firstly, why it is useful. These prime numbers are, naturally, far from small, or there would be no need to search them: about hundreds of figures in decimal notation. Still, it is not enough for RSA encryption, and it also requires random numbers. Cunningham chains are more of a curious mathematical structure, which (theoretically) can lift another veil in the theory of numbers. After all, before open-key cryptography appeared, this complete field was regarded as 'beautiful, useless science', and any efforts, even of such 'competition'-type, are not vain.
Now we can talk about how prime numbers can be connected to cryptocurrency blocks. Suppose our chain starts with some P number. Block header hash (the one searching the nonce field) is interpreted as an integer. This number should be divisor for P-1 or P+1. Network complexity is length of required prime numbers chain, mostly varying from 7 to 11. That is all. It means that it is impossible (actually, very-very hard) to use a random chain (or another's chain, from another block) for proving the work on your header. There are two problems we can point out: firstly, we do not know if there is a simpler way of solving the Primecoin (this is cryptocurrency's name) task. Hashes are more understandable: attack for finding the cryptographic hash-function operand is absolutely hopeless (at least, this assumption is the basis for all cryptographic icons, and God forbid its fall!), so probability of finding a block for some given hash is zero. It is more difficult to say the same about Primecoin block (where we can use all P-1 and P+1 divisors).
Secondly, mining complexity, as we have already pointed out, is proportional to the chain length. However, length is an integer, and it does not cover changes of fractional part. This means that complexity change from 9.0 to 9.99 will be unnoticeable, but 9.99 to 10 will significantly influence total hash rate and frequency of discovering new blocks.
Moreover, this function seems to utilize only CPU-powers, without memory. For quite some time there was no GPU-miner, but as soon as it appeared it became obvious that Primecoin was not an anti-ASIC Grail. Maybe, this was the reason why it still occupies its own niche with a couple of forks, and its PoW function never became popular.
PoWer Sapience: Back to CPU-mining
Finally, let us take a look at a completely different branch of cryptocurrencies PoW phylogenetic tree: CryptoNight and CuckooCycle (GitHub), which was evolving parallel to others after scrypt failed as a CPU-only algorithm.
CryptoNight is a name of hash-function in the CryptoNote code, which differs from Bitcoin in many other ways (starting with the fact that it is not a fork at all). This function was developed in 2012 by Bytecoin team in cooperation with CryptoNote researchers-developers, and Bytecoin became the first implementation of the technology.
CryptoNight exploits the scrypt general idea of "big data sheet with random requests". However, the creators noted a significant problem with linear middle ground 'time' - 'memory'. The second (main) layer in scrypt builds every new data block based on the previous one. It means that if we store only every second block out of N, we will need to recalculate it in 50% cases. Total number of such random requests to the set is N as well, so by saving 1/2 of memory volume we will have to calculate N + 1/2N = 150% blocks, 1.5 times as much. Similarly, by storing every third block, we would not notice it in 33% cases, recalculate one extra block in 33% cases and two extra blocks in the rest 33% cases. It means that total amount of work is N + 1/3N + 1/3*2N = 2N = 200%. So, as per calculations in CryptoNote whitepaper, saving 1/s of all data increases total amount of work only in (s-1)/2 times. By the way, this might be the operating principle of scrypt-ASICs
Therefore, CryptoNight has three principal differences:
Every block is calculated on the basis of ALL previous ones. It means that, say, throwing away every second block will lead to the necessity of recalculating all the blocks instead of only one.
Every request to the data set is not only read, but write as well. It means that every element has a 'second dimension': time. Thus recalculating is getting even more complicated.
Total number of requests to the data set is much greater than number of elements (2^20 vs 2^15), meaning that resultant data set is transformed to the state when it is not recognizable by the end of the cycle.
Moreover, 64-bit operations (multiplying and totaling) and AES as mixing function are used internally. It is curtsey to modern CPUs with integrated corresponding functions (and a stone thrown into GPU's garden). Total memory volume required by CryptoNight is 2 MB, i.e. L3 cache size per core. We cannot say that ASICs cannot reach this point, but cost is considerably high.
Generally, we can say that CryptoNight is very efficient practical algorithm. Although a GPU-miner exists for it, but it only has significant advantage over old CPUs (32 bits or small cache). As far as we can judge, all details are aimed at practical efficiency of common PCs (of which there are billions).
CuckooCycle can be called a complete counterpart. Firstly, it possesses a theoretical basis. I do not mean that it only exists in theory (there are no cryptocurrencies using it), but I mean that its reliability is based on finding a very complicated solution to a theoretically informational task: finding cycles in bigraph. Basically, all modern open-key cryptography uses this principle.
General idea (without going head-deep into the graph theory) is the same as Momentum's, getting optimal algorithm for solving this task at the cost of using some memory. All computations are minimized, so it is request to the memory that consumes most time. Naturally, checking the solution is performed faster by far.
The main question that we can ask is this: is the described algorithm really an optimal solution? Can it happen that tomorrow another cunning CS-article appears, proposing a more efficient solution that does not need much memory? This, in general, is the difference between CuckooCycle and CryptoNight.