My Thoughts on Meltdown and Spectre

First off, I don’t really expect anyone to care what I think about all this. That’s the beauty of the Internet, I can just write what I’d like on my blog and it’s easy to ignore this backwater HTML. :)

But I do have some fairly strong thoughts on Meltdown and Spectre, which I consider to be a rather large mess and frankly fairly concerning for the future of humanity. (Certainly computing is only one of our problems though.)

Computing power

I’m not an academic or a philosopher, but I’ve always had this strong feeling that humanity would continue to push computing to the farthest it can go, that we would build Matrioshka brains or other similarly massive computers, and maybe have to move them to cold parts of the universe to run them.

But the Meltdown and Spectre issues have set me back a bit on that thinking. I’m sure it’s just a bump in the road, but it is a bit depressing. Computer security is so difficult. Here we are moving everything into the public cloud which is based on CPUs that were never meant to be multi-tenant. If we can’t understand the CPUs that we are building now, imagine the issues we will have in the future when we have to deal with even more complexity. Perhaps Meltdown/Spectre will bring about change.

Computer security

With the recent WIFI hacks, SSL being in a poor state, and now massive CPU issues, things are not looking well (and in fact may never have never been)–I may have to do online banking on a Raspberry Pi.

This NYT article makes a good point:

As things stand, we suffer through hack after hack, security failure after security failure. If commercial airplanes fell out of the sky regularly, we wouldn’t just shrug. We would invest in understanding flight dynamics, hold companies accountable that did not use established safety procedures, and dissect and learn from new incidents that caught us by surprise.

Who was part of the embargo?

One thing that I hope comes out in the near future is a time line of who and what companies knew of the issue. I applaud Google for figuring this out, and letting people know. I think there was a lot of unfairness regarding how this all came about.

Kernel devs working the Holidays

Greg Kroah-Hartman has a pretty good post on the status of kernel devs.

Right now, there are a lot of very overworked, grumpy, sleepless, and just generally pissed off kernel developers working as hard as they can to resolve these issues that they themselves did not cause at all. Please be considerate of their situation right now. They need all the love and support and free supply of their favorite beverage that we can provide them to ensure that we all end up with fixed systems as soon as possible.

Canonical is a small company

I’ve seen a bit of complaining about Ubuntu not having a patch yet (as of me writing this they do not) but one thing people don’t know is that Canonical is a very small company, the best stats I could find suggest ~550 staff. RedHat has ~10,000. That’s a big difference. I have a lot of empathy right now for their kernel team.

Google P0 Team

I’ve got to give Google props for having the Project Zero (P0) team. Google does provide some pretty useful things for the global community, and P0 is one of the more important. What other large companies or public clouds do the same? Not many.

Spectre is not fixed…

Something I think is getting a little lost is that Spectre is not fixed. It seems Meltdown is mitigated, with some workloads having considerable performance impact, but Spectre is not solved. Again I go back to Kroah-Hartman’s blog. Note the part about “claim.”

Again, if you are running a distro kernel, you might be covered as some of the distros have merged various patches into them that they claim mitigate most of the problems here. I suggest updating and testing for yourself to see if you are worried about this attack vector

For upstream, well, the status is there is no fixes merged into any upstream tree for these types of issues yet. There are numerous patches floating around on the different mailing lists that are proposing solutions for how to resolve them, but they are under heavy development, some of the patch series do not even build or apply to any known trees, the series conflict with each other, and it’s a general mess.

Google argues that public clouds are better equipped to deal with these issues. Perhaps it’s true. Though live migration is available in other IaaS systems, and, of course, if your app is “cloud native” you don’t have to live migrate anything, just drain (perhaps using Kubernetes).

In many respects, public cloud users are better-protected from security vulnerabilities than are users of traditional datacenter-hosted applications. Security best practices rely on discovering vulnerabilities early, and patching them promptly and completely. Each of these activities is aided by the scale and automation that top public cloud providers can offer — for example, few companies maintain a several-hundred-person security research team to find vulnerabilities and patch them before they’re discovered by others or disclosed. Having the ability to update millions of servers in days, without causing user disruption or requiring maintenance windows, is difficult technology to develop but it allows patches and updates to be deployed quickly after they become available, and without user disruption that can damage productivity.

Abstractions get all the hype, and yet we still need baremetal knowledge

Many technical people don’t know have much knowledge CPUs work. Or can’t compile a Linux kernel. We now have all these high level abstractions (lambda) but still have to have knowledge of low level things. People who I admire on twitter aren’t working on figuring out how to deploy to AWS better, and instead are figuring out gdb and writing baremetal code in Rust or Go. This is a curious state to be in.

Diversity in Technology

I believe that diversity in technology is a good thing, and by this I mean different kinds of CPUs and other hardware, as well as operating systems and network protocols, etc. Thus Intel’s effective monopoly is not positive. I don’t know if trading Intel for AWS is better. I think we need something more.

I know that many organizations are seeking to reduce cost by using commodity hardware, which usually just means virtualization on x86. I think that may be an overly simplistic way to look at the problem of reducing cost.

Perhaps investing in other CPUs is a good idea if only to achieve some diversity and avoid systemic risk. (Frankly it would be pretty cool to have a non-x86 CPU laptop, we should just do that anyways.)

Conclusion

This is a pretty rambling post, but I think it makes sense given the what’s happening this week in technology.

Bonus: fun tweets

This is interesting. This vulnerability illustrates an interesting economic upside for multi-tenant cloud providers: REVENUE. The vulnerability which they patched quickly drives CPU utilization up and they get to charge more based on CPU use! Rename Meltdown to “Cache-ing” $$$ https://t.co/fzBrpFWY4q
— Hoff (@Beaker) January 7, 2018

It's impossible to reason about computer security in a meaningful manner anymore. The gap between "architectural behavior" and "micro-architectural implementation" is so great, so dark, and is basically, "Here be Dragons." We cannot build solid structures on faulty foundations.
— Bitweasil (@Bitweasil) January 5, 2018

This means the 6 month embargo of #meltdown and #Spectre cost those that weren't in on the club one full year of time responding to it. @intel and @Google decided who would get that advantage and who wouldn't.
— John-Mark Gurney (@encthenet) January 5, 2018

pic.twitter.com/KRKypYAEiw
— 防毒面を着ているサイバーテロ狼 (@wolfniya) January 4, 2018

5 lines of JavaScript broke every single Intel processor made in the past 15 years. pic.twitter.com/fyQcHk6haJ
— Mike Pan (@themikepan) January 4, 2018

“A CPU predicts you will walk into a bar, you do not. Your wallet has been stolen”
— The Internet
— Mike Skalnik (@skalnik) January 4, 2018

Don't panic y'all!

Step 1) Don't use Intel processors.
Step 2) Don't use AMD or anything ARM based.
Step 3) You know what? Just give up technology altogether.
Step 4) Retreat to the woods and build a cabin out of derelict silicon.
Step 5) You're now Ted Kaczynski, you psycho.
— Josh Cincinnati (@acityinohio) January 4, 2018

CERT brings the harsh truth. #Meltdown #Spectre pic.twitter.com/UFPiYA39hd
— Sciuridae Hero (@attritionorg) January 4, 2018

Update #7 - Due to the incomplete information provided by hardware manufacturers, we joined forces with other impacted cloud providers including @linode, @packethost and @ovh to share information and work all together. https://t.co/iVHi72nmFJ
— scaleway (@scaleway) January 4, 2018

8/ @Canonical engineers have been working on this since we were made aware under the embargoed disclosure (Nov 2017) and have worked through the holidays, testing and integrating an incredibly complex patch set into a broad set of @ubuntu kernels and CPU architectures.
— Dustin Kirkland (@dustinkirkland) January 4, 2018