April 28, 2016

The GPL Is Usually Overkill

Something that really bothers me about the GPL and free software crusaders in general is that they don't seem to understand the nuances behind the problem they are attempting to solve. I'm not entirely sure they are even trying to solve the right problem in the first place.

The core issue at play here is control. In a world run by software, we need control over what code is being executed on our hardware. This issue is of paramount importance as we move into an age of autonomous vehicles and wearable devices. Cory Doctorow's brilliant essay, Lockdown: The coming war on general-purpose computing, gives a detailed explanation of precisely why it is of such crucial importance that you have control over what software gets executed on your machine, and not some corporation. Are you really going to buy a car and then get into it when you have no way of controlling what it does? This isn't some desktop computer anymore, it's a machine that can decide whether you live or die.

It is completely true that, if everything was open source, we would then have total control over everything that is being run on our machines. However, proponents of open source software don't seem to realize that this is the nuclear option. Yes, it solves your problem, but it does so via massive overkill. There are much less extreme ways to achieve exactly what you want that don't involve requiring literally everything to be open source.

Our goal is to ensure that only trusted code is run on our machine. In order to achieve this goal, the firmware and operating system must be open source. This is because both the operating system and the firmware act as gatekeepers - if we have something like Intel's Trusted Execution Technology, we absolutely must have access to the firmware's source code, because it is the first link in our trust chain. If we can make the TXT engine work for us, we can use it to ensure that only operating systems we approve of can be run on our hardware.

We now reach the second stage of our trust chain. By using a boot-time validation of a cryptographic signature, we can verify that we are running an operating system of our choice. Because the operating system is what implements the necessary program isolation and protection mechanisms, it too is a required part of our trust chain and therefore must be open source. We also want the operating system to implement some form of permission restriction - ideally it would be like Android, except that anything running on the OS must explicitly tell you what resources it needs to access, not just apps you download from the store.

And... that's it. Most free software zealots that are already familiar with this chain of events would go on to say that you should only install open source software and whatever, but in reality this is unnecessary. You certainly want open source drivers, but once you have control over the firmware and the operating system, no program can be run without your permission. Any closed-source application can only do what I have given it permission to do. It could phone home without my knowledge or abuse the permissions I have granted it in other ways, but the hardware is still under my control. I can simply uninstall the application if I decide I don't like it. Applications cannot control my hardware unless I give them explicit permission to do so.

Is this enough? The FSF would argue that, no, it's not enough until your entire computer is running only open source code that can be verified and compiled by yourself, but I disagree. At some point, you have to trust someone, somewhere, as demonstrated by the Ken Thompson Hack. I'm fine with trusting a corporation, but I need have total control over who I am trusting. If a corporation controls my firmware or my operating system, then they control who I trust. I can ask them politely to please trust or not trust some program, but I can't actually be sure of anything, because I do not control the root of the trust chain. Open source software is important for firmware and operating systems, because it's the only way to establish a chain of trust that you control. However, once I have an existing chain of trust, I don't need open-source programs anymore. I now control what runs on my computer. I can tell my computer to never run anything written by Oracle, and be assured that it will actually never trust anything written by Oracle.

If I sell you a piece of software, you have a right to decompile it, modify it, hack it, or uninstall it. This does not, however, require me to explain how it works to you, nor how to assemble it. If you trust me, you can run this piece of software on your hardware, but if you don't trust me, then a verifiable chain of trust will ensure that my software never runs on your machine. This is what matters. This is what having control means. It means being able to say what does or doesn't run on your computer. It does not require that everything is open source. It only requires an open source, verifiable chain of trust, and faith in whatever company made the software you've chosen to run. If a company's software is proprietary and they do something you don't like, use something else.

Open source code is important for firmware and operating systems, not everything.

March 13, 2016

The Right To Ignore: The Difference Between Free Speech And Harassment

On one hand, America was built on the values of free speech, which are obviously important. We cannot control what people are saying or our democracy falls apart. On the other hand, allowing harassment has a stifling effect on free speech, because it allows people to be bullied into silence. Before the internet, this distinction was fairly simple: Someone with a megaphone screaming hate speech in a park is exercising their right to free speech. Someone with a megaphone following a guy down the street is harassing them.

The arrival of the internet has made this line much more vague. However, the line between free speech and harassment is not nearly as blurry as some people might think. Our concept of reasonable freedom of speech is guided primarily by our ability to ignore it. The idea is, someone can go to a public park and say whatever they want, because other people can simply go somewhere else. As long as people have the ability to ignore whatever you're saying, you can say pretty much whatever you want. We have some additional controls on this for safety reasons, so you can't go to a park and talk about how you're going to kill all the gay people, tomorrow, with a shotgun, because that is a death threat.

Unfortunately, in the midst of defending free speech, a disturbing number of people have gotten it into their heads that other people aren't allowed to ignore them. This is harassment. The moment you take away someone's right to ignore what you are saying, you have crossed the line from free speech into harassment. Freedom of speech means that you have the right to say whatever you want, and everyone else has the right to ignore you. I'm not entirely sure why people think that free speech somehow lets them bypass blocking on a social network. Blocking people on the internet is what gives us our ability to ignore them. Blocking someone on the internet is the equivalent of walking away from some guy in the park who was yelling obscenities at you. If you bypass the blocking mechanism, this is basically chasing them down as they run to their car, screaming your opinions at them with a megaphone. Most societies think this is completely unacceptable behavior in real life, and it should be just as unacceptable on the internet.

On the other hand, enforcing political correctness is censorship. Political correctness is obviously something that should be encouraged, but enforcing it when someone is saying things you don't like in a public place is a clear violation of free speech. This is not a blurry line. This is not vague. If a Nazi supporter is saying how much they hate Jews, and they are not targeting this message at any one individual, this is clearly protected free speech. Now, if the Nazi is actually saying we should murder all of the Jews, this turns into hate speech because it is inciting violence against a group of people, and is restricted for the same safety reasons that prevent you from shouting "FIRE!" in a crowded movie theater.

Now, what if the Nazi supporter tells a specific person they are a dirty Jew? This gets into fuzzy territory. On one hand, we can say it isn't harassment so long as the targeted person can block the Nazi and the Nazi never attempts to circumvent this, or we can start classifying certain speech as being harassment if it is ever targeted at a specific individual. When we are debating what counts as harassment, this is the kind of stuff we should be debating. This is what society needs to figure out. Arguing that mass-block lists on twitter are hurting free speech is absurd. If someone wants to disregard everything you're saying because you happen to be walking past a group of Nazi supporters, they have a right to do that. If someone wants to ignore everyone else on the planet, they have a right to do that.

This kind of stuff is a problem when the vast majority of humanity uses a single private platform to espouse their opinions. Namely, Facebook. This is because Facebook has recently been banning people for saying faggot, and because it's a private company, this is completely legal. This is also harmful to free speech. What appears to be happening is that websites, in an attempt to crack down on harassment, have instead accidentally started cracking down on free speech by outright banning people who say hateful things, instead of focusing on people who say hateful things to specific individuals.

Both sides of the debate are to blame for this. Extreme harassment on the internet has caused a backlash, resulting in a politically correct movement that calls for tempering all speech that might be even slightly offensive to anyone. In response, free speech advocates overreact and start attacking them by bypassing blocking mechanisms and harassing their members. This causes SJWs to overcompensate and start clamping down on all hateful speech, even hateful speech that is clearly protected free speech and has nothing to do with harassment. This just pisses off the free speech movement even more, causing them to oppose any restrictions on free speech, even reasonable ones. This just encourages SJWs to censor even more stuff in an endless spiral of aggression and escalation, until on one side, everyone is screaming obscenities and racial slurs in an incoherent cacophony, and on the other side, no one is saying anything at all.

If society is going to make any progress on this at all, we need to hammer out precisely what constitutes harassment and what is protected free speech. We need to make it absolutely clear that you can only harass an individual, and that everyone has the right to a block button. If a Nazi supporter starts screaming at you on twitter, you have the right to block him, and he has the right to block you. We cannot solve harassment with censorship. Instead of banning people we disagree with, we need to focus on building tools that let us ignore people we don't want to listen to. If you object to everyone blocking you because you use insulting racial epithets all the time, maybe you should stop doing that, and perhaps some of them might actually listen to you.

Of course, if you disagree with me, you are welcome to exercise your right to tell me exactly how much you hate my guts and want me to die in the comments section below, and I will exercise my right to completely ignore everything you say.

February 8, 2016

Standard Intermediate Assembly For CPUs

Over in GPU land, a magical thing is happening. All the graphics card vendors and big companies got together and came up with SPIR-V as the technological underpinning of Vulkan, the as-of-yet unreleased new graphics API. SPIR-V is a cross-platform binary format for compiled shaders, which allows developers to use any language that can compile to SPIR-V to write shaders, and to run those compiled shaders on any architecture that supports SPIR-V. This is big news, and if it works as well as everyone's hoping it does, it will set the stage for a major change in how shaders are compiled in graphics engine toolchains.

How is this possible? The SPIR-V specification tells us that it is essentially a cross-platform intermediate assembly language. It's higher level than conventional assembly, but lower than an actual language. The specifics of the language are fine-tuned towards modern graphics hardware, so that the instructions can encode sufficient metadata about what they're doing to enable hardware optimizations, while still allowing those instructions to be efficiently decoded by the hardware and implemented by the chip's microcode.

While SPIR-V is specifically designed for GPUs, the specification bears some resemblance to another intermediate assembly language - LLVM IR. This move by GPU vendors towards an intermediate assembly representation mirrors how modern language design is moving towards a standardized intermediate representation that many different languages compile to, which can then itself be compiled to any CPU architecture required. The LLVM IR is used for C, C++, Haskell, Rust, and many others. This intermediate representation decouples the underlying hardware from the high level languages, allowing any language that compiles down to LLVM IR to compile to any of it's supported CPU architectures - even asm.js.

However, we have a serious problem looming over our heads in CPU land. Did you know that the x86 mov instruction is turing complete? In fact, even the page fault handler is turing complete, so you can run programs on x86 without actually executing any instructions! The x86 architecture is so convoluted and bloated that it no longer has any predictable running time for any given set of instructions. Inserting random useless mov instructions can increase speed by 5% or more, false dependencies destroy performance, and Intel keeps introducing more and more complex instruction sets that don't even work properly. As a result, it's extremely difficult for any compiler to produce properly optimized assembly code, even when it's targeting a specific architecture.

One way to attack this problem is to advocate for RISC - Reduced Instruction Set Computer. The argument is that fewer instructions will be easier to implement, reducing the chance of errors and making it easier for compilers to actually optimize the code in a meaningful way. Unfortunately, RISC has a serious problem: the laws of physics. A modern CPU is so fast that it can process an instruction faster than the electrical signal can get to the other side of the chip. Consequently, it spends the majority of it's time just waiting for memory. Both pipelining and branch prediction were created to deal with the memory latency problem, and it turns out that having complex instructions gives you a distinct advantage. The more complex your instruction is, the more the CPU has to do before it needs to fetch things from memory. This was the core observation of the Itanium instruction set, which relies on the compiler to determine which instructions can be executed in parallel in an attempt to remove the need for pipelining. Unfortunately, it turns out that removing dependency calculations is not enough - this is why many of Intel's new instructions are about encapsulating complex behaviors into single instructions instead of simply adding more parallel operators.

Of course, creating hardware that supports increasingly complex operations is very unsustainable, which is why modern CPUs don't execute assembly instructions directly. Instead, they use Microcode, which is the raw machine code that actually implements the "low-level" x86 assembly. Of course, at this point, x86 is so far removed from the underlying hardware it might as well be a (very crude) high level language all by itself. For example, the mov instruction usually doesn't actually move anything, it just renames the internal register being used. Because of this, the modern language stack looks something like this:

Modern Language Stack

Even thought we're talking about four CPU architectures, what we really have is four competing intermediate layers. x86, x86-64, ARM and Itanium are all just crude abstractions above the CPU itself, which has it's own architecture dependent microcode that actually figures out how to run things. Since our CPUs will inevitably have complex microcode no matter what we do, why not implement something else with it? What if the CPUs just executed LLVM IR directly? Then we would have this:

LLVM IR Microcode Stack

Instead of implementing x86-64 with microcode, implement the LLVM intermediate assembly code with microcode. This would make writing platform-independent code trivial, and would allow for way more flexibility for hardware designers to experiment with their CPU architecture. The high-level nature of the instructions would allow the CPU to load large chunks of data into registers for complex operations and perform more efficient optimizations with the additional contextual information.

Realistically, this will probably never happen. For one, directly executing LLVM IR is probably a bad idea, because it was never developed with this in mind. Instead, Intel, AMD and ARM would have to cooperate to create something like SPIR-V that could be efficiently decoded and implemented by the hardware. Getting these competitors to actually cooperate with each other is the biggest obstacle to implementing something like this, and I don't see it happening anytime soon. Even then, a new standard architecture wouldn't replace LLVM IR, so you'd still have to compile to it.

In addition, an entire new CPU architecture is extraordinarily unlikely to be widely adopted. One of the primary reasons x86-64 won out over Itanium was because it was capable of running x86 code at native speed, and Itanium's x86 emulation was notoriously bad. Even if we somehow moved to an industry-wide standard assembly language, the vast majority of the world's programs are still built for x86, so an efficient translation between x86 and our new intermediate representation would be paramount. That's without even considering that you'd have to recompile your OS to take advantage of the new assembly language, and modern OSes still have some platform-specific hand-written assembly in them.

Sadly, as much as I like this concept, it will probably remain nothing more than a thought experiment. Perhaps as we move past the age of silicon and look towards new materials, we might get some new CPU architectures out of it. Maybe if we keep things like this in mind, next time we can do a better job than x86.

September 28, 2015

There Will Never Be One True Programming Language

A disturbing number of otherwise intelligent programmers seem to believe that, one day in the distant future, everyone will use the same magical programming language that solves everyone's problems at the same time. Often, this involves garbage collection, with lots of hand-waving about computers built with Applied Phlebotinum.

For the sake of argument, let's assume it is the year 30XX, and we are a budding software developer on Mars. We have quantum computers that run on magic pixie dust and let us calculate almost anything we want as fast as we want so long as we don't break any laws of physics. Does everyone use the same programming language?

No. Obviously, even if we built quantum computers capable of simulating a classical computer a thousand times faster than normal for some arbitrary reason[1], we would still want to write software in a way that was easy to comprehend, and qubits are anything but. So, the standard programming language our Mars programmer would be using would not interact with the quantum computer at all. Perhaps it would be some form of functional language, which the cool kids seem to think will save the entire software industry and make everything faster, better, more reliable and probably cure cancer in the process[2].

But now, this same programming language that allows you to ignore the quantum computer must also be used to write it's own compiler that runs on... a quantum computer. So it needs to simultaneously be an elegant language for writing quantum programs. It also needs to be the perfect language for writing games and business applications and hypernet sites and machinery and brain implants and rocket trajectories and robot AI and planetary weather simulators and life support systems and hologram emittors and—

... Wouldn't it be a lot easier to just have domain-specific languages?

If you want to get technical, you can do all that with Lisp. It's just lambda calculus. It's also a giant pain in the ass when you use it for things that it was never meant for, like low level drivers or game development. You could try extending Lisp to better handle those edge-cases, and that's exactly what many lisp-derivatives do. The problem is that we will always be able to find new edge-cases. So, you would have to continue bolting things on to your magical language of ultimate power, playing a game of context-free whack-a-grammar for the rest of eternity.

This is ultimately doomed to fail, because it ignores the underlying purpose of programming languages: A programming language is a method of communication. Given two grammars, one that can do everything versus one that can do the one thing you're trying to do really well, which one are you going to pick for that particular project? Trying to get everyone to use the same language while they are trying to solve radically different problems is simply bad engineering. Do you really want to write your SQL queries using Lisp? You can only stuff so many tools into a swiss army knife before it becomes completely impractical.

Of course, a language that could do everything would also allow you to define any arbitrary new language within it, but this is equivalent to building an entire new language from scratch, because now everyone who wants to contribute to your project has to learn your unique grammar. However, having a common ground for different languages is a powerful tool, and we are headed in that direction with LLVM. You wouldn't want to write a program in LLVM, but it sure makes writing a compiler easier.

Instead of a future with one true programming language, perhaps we should be focused on a future with standard low-level assembly. A future with many different languages that can all talk to each other. A future where we don't need to argue about which programming language is the best, because every language could be used for whatever task it is best suited for, all in the same project. Different compilers could interpret the standard low-level assembly in their own ways, optimizing for different use-cases.

A universal standard for low-level assembly would solve everyth—

mandatory xkcd

... Actually, nevermind[3].



1
As of right now, this is not actually true for the vast majority of computations we know how to do with qubits. We know that certain classes of problems are exponentially faster with quantum computers, but many other functions would get a mere linear increase in speed, at most. Whether or not this will change in the future is an active area of research.
2 They totally didn't say the same thing about object-oriented programming 30 years ago.
3 </sarcasm>

August 5, 2015

Abortion Has No Moral High Ground

June 26, 2015 was a historic moment. That was the day the Supreme Court legalized gay marriage on a federal level in the United States. As far as I'm concerned, this victory was the inevitable result of a point of view that relied entirely on logical fallacies and irrational hatred. Yet, it was a battle that was fought for centuries, and continues on, as obstinate conservatives vow to continue fighting to the bitter end. The fact that gay rights was ever a political issue will be considered barbaric by future civilizations, because being gay is inherently harmless, and attempting to control who people are allowed to love is an egregious violation of individual rights.

It was this innate emotional connection that won the battle. Every human being on this planet can empathize with the desire to love freely. We were united in our desire for love to win, just like it does in all of our fairy tales.

Abortion, on the other hand, is likely to remain an issue for hundreds of years. I'm not sure if it will ever be put to rest until we invent technology that renders the entire debate irrelevant. The problem is that, with abortion, there is no moral high ground. This is because, under all circumstances, abortion leaves us with two choices: we can pick an arbitrary point in time where a fetus suddenly has rights, or we will be forced to violate someone's rights no matter what option we choose.

First, let me clarify why we cannot base the point at which a fetus suddenly gains rights on any meaningful science: babies do not have on switches. Conciousness is an emergent phenomenon, and we have known for years that animals have some degree of awareness. They have a consciousness much like we do, but operating on a lower level (in some cases, it may simply be on a different level). Babies can't even pass the mirror test until they're about 18 months old. Because of this, we will be forced to pick some completely arbitrary point in time at which the baby suddenly gains rights, if we want to avoid violating those rights.

Now, if we don't do that (and I find it incredibly hard to believe that we will, given the number of people who think that a fertilized egg has human rights), we have two choices:
1. We forbid abortion, violating both the mother's right to her own body and her right to live (0.019% of pregnancies are fatal).
2. We allow abortion, violating the baby's right to live.

There is no way out of this. We cannot use logic to pick a point in time where the baby suddenly gains rights because it will inherently be arbitrary. We'd basically have to say "Well, if you're 80% human, we'll give you rights, because, well, 80% sounds like a good number." No matter how much reasoning we have behind that number, it's still fundamentally arbitrary and therefore it will always be a valid argument to say it is morally wrong. This means we are forced to violate someone's rights no matter what we do. Some people think the baby is more important. Some people think the women is more important. Some people think whose important depends on how far along the baby is. There is no nice, clean solution to this. We have to screw somebody over no matter what we do.

My personal view, which is that we should allow abortions only in the first trimester, is not born out of some moral argument. It is simple engineering pragmatism: it is the least objectionable and most practical solution that I can come up with. I am forced to fall back to my engineering background for this problem because there is no valid moral solution. If someone asks me why I think it's okay that we're killing a fetus, I would say that it's not okay. However, it's also not okay to deny a women the right to her own body. It's not okay to allow unwanted pregnancies to result in unwanted children that get left in orphanages. It's not fair that the hypothetical wanted child that could have been born later will now never be born. No part of this situation is ever okay, on either side of the debate.

If we want to resolve the debate, society as a whole simply has to try and figure out how to minimize the damage. That is all we can do. There is no right answer. There are only wrong answers, and we have to pick the least wrong one.