Minting Money with Monero ... and CPU vector intrinsics

I woke up on May 28th, 2014, on vacation with my family in the middle of the desert, to find a copy of my private source code plastered across the bitcointalk message board. Announced as a "new optimized version" of the Monero currency miner, it was enthusiastically adopted by cryptocurrency miners across the world. And in the process of doing so, my daily profit from the Monero Mining Project dropped by over five thousand dollars per day.

But let's start at the beginning, when I started getting in to a loose collaboration with three people I've never met---one whose name I'm not even confident I really know---with hundreds of thousands of dollars worth of a nebulous new cryptocurrency at stake.

It started with a cryptic note from someone I'd met online, with a link to a bitcointalk.org message board discussion for a new currency called "bitmonero". His note said only:

"this looks interesting."

From prior collaborations with him, I knew he had a good nose for opportunities in cryptocurrencies. Within an hour or two, he followed up noting:

"Regarding bitmonero it is profitable with aws with the wallet miner. Seem like some people have got optimised miner which is a few times faster as well so there obviously room for improvement..."

AWS, of course, was Amazon Web Services' spot market, which will let you cheaply rent mind-boggling amounts of compute power on short (1 hour) timescales. AWS profitability is a key ingredient for making money on cryptocurrency mining, because it scales: You can rent one machine or 10,000, and the only thing you have to worry about is if you start to use too many, the price will go up.

I was busy with the end of the semester and didn't respond quickly. The next day, I had an email from him offering 1 Bitcoin (about $600) to develop a 5x faster miner, and from chat, we discussed splitting the profits after that 50/50. I was still feeling overwhelmed with real work, but I took a brief look at the code, and something... well, something smelled funny. So funny that I got home, couldn't get it out of my head, and spent the next 6 hours playing around with it -- and got a 3x speedup.

The next day, I dropped my contact a note:

I'm giving it a little go at rewriting the entire thing in SSE for fun. I finally get it - they basically took scrypt as their model but make the whole thing a big long dependent chain. Hard to shave memory but at the cost of slow verification.

Followed a day later by raising that to a 5x speedup, and then an 8x speedup, and within a week, an 11x speedup. That was pretty remarkable, given that a developer had already started trickling optimizations into the codebase.

The more I looked at it, the more clear it became: The original developers deliberately crippled the miner. It wasn't just slow, and it wasn't just naive; it was deliberately obfuscated and made slow by the use of completely superfluous copies, function calls, use of 8 bit pointer types, and accompanied by the most ridiculously slow implementation of the AES encryption algorithm one could imagine.

Now, the history of "Bitmonero" (now called Monero) started to become relevant. In Feb 2014, an anonymous group of developers released a coin called "Bytecoin" based on an entirely new implementation of a bitcoin-like cryptocurrency called Cryptonote, but with much stronger anonymity built in through the use of a clever Diffie-Hellman like mechanism for encrypting destination addresses and the use of Rivest's ring signatures to provide a transaction-level mixnet without the need for realtime mixing. It's not perfect, but it's clever, and the most important thing is that it's new, and fundamentally different from Bitcoin. That attracted a lot of attention.

But when Bytecoin was released, it was presented as though it emerged from two years on the "dark web" (Tor onion sites and the like), during which time, 80% of the possible coins that could ever be minted, had been minted. The reception in the cryptocurrency community was heavily skeptical.

My strong belief is that the skepticism was warranted: Here's the original slow-hash from bytecoin as it was copied into Bitmonero. It has some doozies. For example, on line 100, you might note that for every iteration through an inner loop repeated tens of thousands of times, the AES key is re-imported into the library. The later loop, starting on line 113, is repeated half a million times, and is so abstracted through lots of memcpys and pointer manipulation it's hard to tell that all it really does is one round of AES encryption, a pointer dereference into a random scratchpad, a 64 bit multiplication, and another pointer dereference. Phew. This original code was roughly 50x slower than my final optimized code, and could have easily been used to fake two years of blockchain data on a single computer or a small cluster. I'm pretty sure that's what happened.

Bitmonero was a fork of Bytecoin designed to not have the 80% premine. But its initial developer either didn't know, didn't care, or wanted to profit from the de-optimized hashing. That initial developer was pretty quickly given the boot by the community, and in came an unrelated group of developers who took it over---who were, as far as I can tell, completely unaware of the deoptimization. So things sat there for a few weeks in the same state as Bytecoin.

By the time I got into it, developer "NoodleDoodle" (hey, this is crypto, people can pick whatever names they want -- Satoshi Nakamoto?) had already untwisted the first "de-optimization" with the AES encryption key. But the rest was ripe pickin's. Most importantly, the entire use of AES in the inner loop was one instruction on modern x86 CPUs.

This was a brilliantly designed proof-of-work function targeting the strengths of modern CPUs -- native AES encryption and fast 64 bit multipliers -- tuned to use a scratchpad exactly the size of the per-core L3 cache on Intel CPUs (about 2MB) that someone then wrapped in such a thick blanket of crap it was nearly unrecognizable until you started jumping in, tearing it apart, and putting it back together again.

Here's what it looks like without the crud (diagram shows one round): Some initial 128 bit values are determined by hashing the block state using a Keccak (sha3) variant - call them A and B. The big lookup table is also populated using that same state, mixed around using AES. Then, executed 500,000 times, are rounds of mixing as shown at the right: Use A to determine a pseudorandom location in the scratchpad, take that, mix it in, AES encrypt it (one round), use the result to determine a second location, use that in a 64 bit multiply, store it back, and repeat. Elegantly simple.

I didn't have the AWS account limits (or credit cards!) to really do it right, so I delivered some code to my contact and we went to town.

By the 14th of May, we were 45% of the total hashing power on the coin. Things started to get a little exciting:

I think i should make almost 6 btc in the last 24hr... will send you 3 if i manage to sell all the MRO.

$1500 per day was starting to seem like a very interesting ROI. I gave up on sleep for a while, and begged my family for permission to continue the project for a few hours per day while we were on vacation. Permission granted. I decided we'd made the right decision when on May 21st, we each netted 13 BTC (about $6500 USD), and 17BTC the next day. I think we exceeded 60% of the net hash of the coin at a few points. That's enough to play nasty games, but we weren't interested in that (and doing so would have required me to understand the rest of the code much more deeply than I had), so we just sold as fast as we could mine.

That was eye-opening. Because of the newness of the codebase and its actual, significant technical advances in anonymity (and a lot of growing pains and scalability challenges notwithstanding), a decent bit of Bitcoin money started flowing in to the coin. Some well-known and less well-known folks who'd made millions investing in Bitcoin started viewing Monero as an interesting high-risk hedge for a small percent of their Bitcoin holdings.

By May 16th, I'd squeezed about 90% of the optimizations out of it, and we were running 100x faster than the coin was at its initial release. That didn't stop me from obsessing about speed tweaks for the next week or so, but it gave me a little breathing room to start thinking about the next step.

The natural evolution of cryptocurrencies is that they start out being mined on CPUs, because that's easy. Then someone writes a GPU-optimized miner for them. The big ones jump to FPGA and then ASIC, though as far as the general community knows, only coins using SHA256d (Bitcoin and family) and scrypt (Litecoin, Dogecoin, etc.) have ASIC implementations.

So, why not take the next leap? We needed a GPU miner to keep ahead of the inevitable optimizations that others would release. In fact, the writing was already on the horizon -- NoodleDoodle had released another 2x speedup already, and was talking about getting an even bigger one out. It was only a matter of time until some of the dangerous experts in crypto started getting involved, and yvg1900 had been talking about developing a miner for Monero. The altcoin mining software world is small enough that there aren't too many people out there who can beat my optimizations without help, but yvg is one of them, and I knew that once he was in the game, before too long, I and everyone else would be running his code instead.

(And history showed that this was right - his final release beat my code by about 10%, and he was nice enough to share a few x86/Haswell micro-op optimization ideas with me.)

I've developed some GPU miner software as I blogged about earlier, but I'm not the best out there. Fortunately, in the process of doing so, I met some of the best. I dropped them a note, explained the opportunity, and they signed on. Pretty soon, we had in our arsenal not only a super-fast CPU miner, but also a great little GPU miner.

The beauty of having both was that it reduced our cost basis by a factor of two. While the GPU miner wasn't all that much faster than a CPU miner, due to the way pricing on the spot market often works out, you can rent a machine with a GPU for a similar price to one that only has a CPU. So within a few weeks, we were able to mine on both the GPU and CPU at the same time.

Which was fortunate, because that rock-in-my-guts May 28th incident rolled around sooner than I'd have ever guessed.

From what I can back-patch of it, my contact checked with another developer to see if there was room to further optimize my code on a contract basis. I don't know the details of that contact, so I won't speculate, but something somewhere went very wrong, and the other developer incorporated my optimizations into an open source release. Overnight, our edge started to disappear.

I gave up sleep for another night and eked out one last 5% performance boost to my code, but now we were really depending on having fast GPU code and on minimizing mining cost. We all spent some time benchmarking and running price/performance analyses about the types of instances available at Amazon, and reduced the cost a little through that. But then we got the GPU miner done, and, bogglingly, found ourselves back in business!

Things continued well for a week or two, until on June 4, another GPU developer, Claymore, released his own GPU miner -- but it targeted only AMD GPUs, and Amazon's were Nvidia. It was also more than 2x slower than ours -- three cheers for fast code. We held our breath waiting for an onslaught of new miners, but it never came. My best guess is that the newness of the Cryptonote software, as well as some difficulty getting it running because of a lot of dependencies on Boost and other libraries, scared a lot of people off and kept the technical barriers to entry high.

In the end, our game continued into July -- almost two months of mining at an advantage large enough to make a profit on Amazon. We spent over a quarter of a million dollars renting cloud compute time, to the point where I had a great phone call with the manager of the Spot instance program at Amazon who was trying to figure out how they could improve usability for weirdos like us.

It was a wild experience, and while I was, ahh, financially inconvenienced by the code leak, I'm also obliquely glad my optimizations made their way into the public (I'd have released them eventually, but hadn't planned on doing it quite so early). It's informative that throughout the many ways I've explored of getting paid to improve the state of the art in cryptocurrency software, the most profitable by over an order of magnitude was the most greedy approach of keeping the software entirely private. That's something the developers of future currencies should ponder -- it's probably worth up-front investment to pay someone to release an optimized CPU and GPU optimized miner for your coin, because if they're making a profit afterwords, it could be very hard to convince them to give up the margin.

25 lines mixing C, one instruction of inline assembly, and some AVX intrinsics -- one of the most fun, exciting, and profitable side projects I've ever been involved in. Like everything in crypto, it moved at the speed of light -- it was like launching and then folding up a small startup in the course of two months. And that's why I now have 6 high-end GPU mining rigs in my basement to heat the house through winter and some fun war stories about the importance of operational security and paying attention to low-level programming tricks in school.

(As a note: I make some positive statements about Monero in here, and I really do admire the technical advance it represents in anonymous cryptocurrencies. But please don't read anything I'm writing here as suggesting I think you should buy it -- I view all cryptocurrencies as exceptionally high risk, on a spectrum from "eek" to "stay the heck away!" There's an enormous amount of risk in all of the currencies, and while I talked about the technical positives of Monero here, it also has some very significant technical challenges. I don't personally own any, nor do I hold any Bitcoin - I mine and sell for the most part, to minimize my risk exposure.)

Search This Blog

Dave's Data

Minting Money with Monero ... and CPU vector intrinsics

Comments

Post a Comment

Popular posts from this blog

Reflecting on CS Graduate Admissions

Chili Crisp Showdown: Laoganma and Flybyjing

Finances for CS Ph.D. students