December 20, 2016

Everyone Does sRGB Wrong Because Everyone Else Does sRGB Wrong

Did you know that CSS3 does all its linear gradients and color interpolation completely wrong? All color values in CSS3 are in the sRGB color space, because that's the color space that gets displayed on our monitor. However, the problem is that the sRGB color space looks like this:

sRGB gamma curve

Trying to do a linear interpolation along a nonlinear curve doesn't work very well. Instead, you're supposed to linearize your color values, transforming the sRGB curve to the linear RGB curve before doing your operation, and then transforming it back into the sRGB curve. This is gamma correction. Here are comparisons between gradients, transitions, alpha blending, and image resizing done directly in sRGB space (assuming your browser complies with the W3C spec) versus in linear RGB:


At this point you've probably seen a bazillion posts about how you're doing color interpolation wrong, or gradients wrong, or alpha blending wrong, and the reason is because... you're doing it wrong. But one can hardly blame you, because everyone is doing it wrong. CSS does it wrong because SVG does it wrong because Photoshop does it wrong because everyone else does it wrong. It's all wrong.

The amazing thing here is that the W3C is entirely aware of how wrong CSS3 linear gradients are, but did it anyway to be consistent with everything else that does them wrong. It's interesting that while SVG is wrong by default, it does provide a way to fix this, via color-interpolation. Of course, CSS doesn't have this property yet, so literally all gradients and transitions on the web are wrong because there is no other choice. Even if CSS provided a way to fix this, it would still have to default to being wrong.

It seems we have reached a point where, after years of doing sRGB interpolation incorrectly, we continue to do it wrong not because we don't know better, but because everyone else is doing it wrong. So everyone's doing it wrong because everyone else is doing it wrong. A single bad choice done long ago has locked us into compatibility hell. We got it wrong the first time so now we have to keep getting it wrong because everyone expects the wrong result.

It doesn't help that we don't always necessarily want the correct interpolation. The reason direct interpolation in sRGB is wrong is because it changes the perceived luminosity. Notice that when going from red to green, the sRGB gradient gets darker in the middle of the transition, while the gamma-corrected one has constant perceived luminosity. However, an artist may prefer the sRGB curve to the linearized one because it puts more emphasis on red and green. This problem gets worse when artists use toolchains inside sRGB and unknowingly compensate for any gamma errors such that the result is roughly what one would expect. This is a particular issue in games, because games really do need gamma-correct lighting pipelines, but the GUIs were often built using incorrect sRGB interpolation, so doing gamma-correct alpha blending gives you the wrong result because the alpha values were already chosen to compensate for incorrect blending.

In short, this is quite a mess we've gotten ourselves into, but I think the most important thing we can do is give people the option of having a gamma correct pipeline. It is not difficult to selectively blend things with proper gamma correction. We need to have things like SVG's color-interpolation property in CSS, and other software needs to provide optional gamma correct pipelines (I'm looking at you, photoshop).

Maybe, eventually, we can dig ourselves out of this sRGB hell we've gotten ourselves into.

November 14, 2016

The Answer Is Not More Censorship, It's More Information

After being accused of censoring conservative news, Facebook fired all it's human editors, which was shortly followed by the algorithm being inundated with fake news. It now appears to be regretting that choice as it came under fire for potentially contributing to the rise of Trump.

This is consistent with a disturbing trend in liberal media, which is to outright censor any speech that is considered even remotely hurtful. Predictably, the definition of "hurtful" speech has gotten wider and wider to the point of absurdity, because that's what happens when you censor things! What kind perversion of progressivism is this? We're supposed to stand for freedom of information, intellectual free thought, and above all, free speech. There is a difference between free speech and harassment. You can ask people to be respectful when debating you, but you cannot ask them to censor their opinions, even if you find those opinions morally reprehensible. If someone wants to respectfully debate the morality of eating babies, you can't censor them for harassment because their only crime is holding an opinion that you disapprove of.

Fake news is yet another problem that is not going to be solved by more censorship. If a platform censors information that people want, no matter how wrong it is, they will simply find another platform that lets them share it. This is why Fox News exists. Instead of outright censoring fake stories, websites need to bring attention to falsehoods. A handy warning message prominently displayed on the article linking to an appropriate Snopes page would be a good step. This way, people are still free to share their fake news stories, but because they're doing it on your platform, you can use this opportunity to give anyone who sees it additional information that allows them to recognize the story as false. If you just censor the story, people will use another platform that you have no control over.

Example of flagged news story

The answer to filter bubbles is not to simply crush the filter bubble you don't agree with. The fact that this notion is even being entertained is frightening. Culture wars are not won by suffocation, they are won by changing minds, and the only way to do that is to respect free speech and be willing to debate people who have views that are wildly and totally incompatible with yours, so long as they are respectful. Otherwise, all we will do is scream at each other, and President Trump may only be the beginning of a catastrophic chasm in America.

You can't change minds by throwing stones. This applies to both sides.

August 18, 2016

Blaming Inslee for Washington's Early Prisoner Release Is Unconscionable

For a guy who keeps trying to convince me he's better than Jay Inslee, Bill Bryant seems determined to ensure I never vote for him in any election, ever.

As The Seattle Times so eloquently explains, the software bug leading to the early release of prisoners dates back to 2002, and wasn't discovered by the DOC until 2012, a year before Inslee took office! Even then, Inslee didn't learn of the problem until December 16th, after which he promptly held a news conference less than a week later. He fired multiple state workers and immediately instituted a rigorous double-checking procedure at the DOC to ensure prisoners were being released on time. Given Jay Inslee's position, I simply cannot imagine a better way to handle this, short of Jay Inslee miraculously developing mind reading powers that told him to audit the DOC for no apparent reason.

The fact that Bryant is blaming Inslee simply because Inslee was the one who discovered the problem is absolutely disgusting. I hear people learning about officials covering up mistakes all the time, and people always ask why they didn't just come forward about the problem and try to fix it. Well, that's exactly what Inslee did, and now Bryant is dragging him through the mud for it. I guess Bill Bryant is determined to make sure no good deed goes unpunished.

What kind of depraved human being would crucify Inslee for something that isn't his fault? An even better question is why they think I would vote for them. There are plenty of other real issues facing Jay Inslee that would be entirely valid criticisms of his administration, but I honestly don't care about them anymore. The only thing I know is that I don't want Bill Bryant elected to any position in any public office, ever, until he publicly apologizes for his morally reprehensible behavior towards Jay Inslee. Washington state does not need a leader that attacks people for doing the right thing just so they can get a few votes. No state in the entire country deserves a leader like that.

It's clear to me that Bill Bryant, and anyone who supports his shameless mudslinging, is the reason we can't have nice things.

July 30, 2016

Mathematical Notation Is Awful

Today, a friend asked me for help figuring out how to calculate the standard deviation over a discrete probability distribution. I pulled up my notes from college and was able to correctly calculate the standard deviation they had been unable to derive after hours upon hours of searching the internet and trying to piece together poor explanations from questionable sources. The crux of the problem was, as I had suspected, the astonishingly bad notation involved with this particular calculation. You see, the expected value of a given distribution $$X$$ is expressed as $$E[X]$$, which is calculated using the following formula:
\[ E[X] = \sum_{i=1}^{\infty} x_i p(x_i) \]
The standard deviation is the square root of the variance, and the variance is given in terms of the expected value.
\[ Var(X) = E[X^2] - (E[X])^2 \]
Except that $$E[X^2]$$ is of course completely different from $$(E[X])^2$$, but it gets worse, because $$E[X^2]$$ makes no notational sense whatsoever. In any other function, in math, doing $$f(x^2)$$ means going through and substitution $$x$$ with $$x^2$$. In this case, however, $$E[X]$$ actually doesn't have anything to do with the resulting equation, because $$X \neq x_i$$, and as a result, the equation for $$E[X^2]$$ is this:
\[ E[X^2] = \sum_i x_i^2 p(x_i) \]
Only the first $$x_i$$ is squared. $$p(x_i)$$ isn't, because it doesn't make any sense in the first place. It should really be just $$P_{Xi}$$ or something, because it's a discrete value, not a function! It would also explain why the $$x_i$$ inside $$p()$$ isn't squared - because it doesn't even exist, it's just a gross abuse of notation. This situation is so bloody confusing I even explicitely laid out the equation for $$E[X^2]$$ in my own notes, presumably to prevent me from trying to figure out what the hell was going on in the middle of my final.

That, however, was only the beginning. Another question required them to find the covariance between two seperate discrete distributions, $$X$$ and $$Y$$. I have never actually done covariance, so my notes were of no help here, and I was forced to return to wikipedia, which gives this helpful equation.
\[ cov(X,Y) = E[XY] - E[X]E[Y] \]
Oh shit. I've already established that $$E[X^2]$$ is impossible to determine because the notation doesn't rely on any obvious rules, which means that $$E[XY]$$ could evaluate to god knows what. Luckily, wikipedia has an alternative calculation method:
\[ cov(X,Y) = \frac{1}{n}\sum_{i=1}^{n} (x_i - E(X))(y_i - E(Y)) \]
This almost works, except for two problems. One, $$\frac{1}{n}$$ doesn't actually work because we have a nonuniform discrete probability distribution, so we have to substitute multiplying in the probability mass function $$p(x_i,y_i)$$ instead. Two, wikipedia refers to $$E(X)$$ and $$E(Y)$$ as the means, not the expected value. This gets even more confusing because, at the beginning of the Wikipedia article, it used brackets ($$E[X]$$), and now it's using parenthesis ($$E(X)$$). Is that the same value? Is it something completely different? Calling it the mean would be confusing because the average of a given data set isn't necessarily the same as finding what the average expected value of a probability distribution is, which is why we call it the expected value. But naturally, I quickly discovered that yes, the mean and the average and the expected value are all exactly the same thing! Also, I still don't know why Wikipedia suddenly switched to $$E(X)$$ instead of $$E[X]$$ because it stills means the exact same goddamn thing.

We're up to, what, five different ways of saying the same thing? At least, I'm assuming it's the same thing, but there could be some incredibly subtle distinction between the two that nobody ever explains anywhere except in some academic paper locked up behind a paywall that was published 30 years ago, because apparently mathematicians are okay with this.

Even then, this is just one instance where the ambiguity and redundancy in our mathematical notation has caused enormous confusion. I find it particularly telling that the most difficult part about figuring out any mathematical equation for me is usually to simply figure out what all the goddamn notation even means, because usually most of it isn't explained at all. Do you know how many ways we have of taking the derivative of something?

$$f'(x)$$ is the same as $$\frac{dy}{dx}$$ or $$\frac{df}{dx}$$ even $$\frac{d}{dx}f(x)$$ which is the same as $$\dot x$$ which is the same as $$Df$$ which is technically the same as $$D_xf(x)$$ and also $$D_xy$$ which is also the same as $$f_x(x)$$ provided x is the only variable, because taking the partial derivative of a function with only one variable is the exact same as taking the derivative in the first place, and I've actually seen math papers abuse this fact instead of use some other sane notation for the derivative. And that's just for the derivative!

Don't even get me started on multiplication, where we use $$2 \times 2$$ in elementary school, $$*$$ on computers, but use $$\cdot$$ or simply stick two things next to each other in traditional mathematics. Not only is using $$\times$$ confusing as a multiplicative operator when you have $$x$$ floating around, but it's a real operator! It means cross product in vector analysis. Of course, the $$\cdot$$ also doubles as meaning the Dot Product, which is at least nominally acceptable since a dot product does reduce to a simple multiplication of scalar values. The Outer Product is generally given as $$\otimes$$, unless you're in Geometric Algebra, in which case it's given by $$\wedge$$, which of course means AND in binary logic. Geometric Algebra then re-uses the cross product symbol $$\times$$ to instead mean commutator product, and also defines the regressive product as the dual of the outer product, which uses $$\nabla$$. This conflicts with the gradient operator in multivariable calculus, which uses the exact same symbol in a totally different context, and just for fun it also defined $$*$$ as the "scalar" product, just to make sure every possible operator has been violently hijacked to mean something completely unexpected.

This is just one area of mathematics - it is common for many different subfields of math to redefine operators into their own meaning and god forbid any of these fields actually come into contact with each other because then no one knows what the hell is going on. Math is a language that is about as consistent as English, and that's on a good day.

I am sick and tired of people complaining that nobody likes math when they refuse to admit that mathematical notation sucks, and is a major roadblock for many students. It is useful only for advanced mathematics that take place in university graduate programs and research laboratories. It's hard enough to teach people calculus, let alone expose them to something useful like statistical analysis or matrix algebra that is relevant in our modern world when the notation looks like Greek and makes about as much sense as the English pronunciation rules. We simply cannot introduce people to advanced math by writing a bunch of incoherent equations on a whiteboard. We need to find a way to separate the underlying mathematical concepts from the arcane scribbles we force students to deal with.

Personally, I understand most of higher math by reformulating it in terms of lambda calculus and type theory, because they map to real world programs I can write and investigate and explore. Interpreting mathematical concepts in terms of computer programs is just one way to make math more tangible. There must be other ways we can explain math without having to explain the extraordinarily dense, outdated notation that we use.

April 28, 2016

The GPL Is Usually Overkill

Something that really bothers me about the GPL and free software crusaders in general is that they don't seem to understand the nuances behind the problem they are attempting to solve. I'm not entirely sure they are even trying to solve the right problem in the first place.

The core issue at play here is control. In a world run by software, we need control over what code is being executed on our hardware. This issue is of paramount importance as we move into an age of autonomous vehicles and wearable devices. Cory Doctorow's brilliant essay, Lockdown: The coming war on general-purpose computing, gives a detailed explanation of precisely why it is of such crucial importance that you have control over what software gets executed on your machine, and not some corporation. Are you really going to buy a car and then get into it when you have no way of controlling what it does? This isn't some desktop computer anymore, it's a machine that can decide whether you live or die.

It is completely true that, if everything was open source, we would then have total control over everything that is being run on our machines. However, proponents of open source software don't seem to realize that this is the nuclear option. Yes, it solves your problem, but it does so via massive overkill. There are much less extreme ways to achieve exactly what you want that don't involve requiring literally everything to be open source.

Our goal is to ensure that only trusted code is run on our machine. In order to achieve this goal, the firmware and operating system must be open source. This is because both the operating system and the firmware act as gatekeepers - if we have something like Intel's Trusted Execution Technology, we absolutely must have access to the firmware's source code, because it is the first link in our trust chain. If we can make the TXT engine work for us, we can use it to ensure that only operating systems we approve of can be run on our hardware.

We now reach the second stage of our trust chain. By using a boot-time validation of a cryptographic signature, we can verify that we are running an operating system of our choice. Because the operating system is what implements the necessary program isolation and protection mechanisms, it too is a required part of our trust chain and therefore must be open source. We also want the operating system to implement some form of permission restriction - ideally it would be like Android, except that anything running on the OS must explicitly tell you what resources it needs to access, not just apps you download from the store.

And... that's it. Most free software zealots that are already familiar with this chain of events would go on to say that you should only install open source software and whatever, but in reality this is unnecessary. You certainly want open source drivers, but once you have control over the firmware and the operating system, no program can be run without your permission. Any closed-source application can only do what I have given it permission to do. It could phone home without my knowledge or abuse the permissions I have granted it in other ways, but the hardware is still under my control. I can simply uninstall the application if I decide I don't like it. Applications cannot control my hardware unless I give them explicit permission to do so.

Is this enough? The FSF would argue that, no, it's not enough until your entire computer is running only open source code that can be verified and compiled by yourself, but I disagree. At some point, you have to trust someone, somewhere, as demonstrated by the Ken Thompson Hack. I'm fine with trusting a corporation, but I need have total control over who I am trusting. If a corporation controls my firmware or my operating system, then they control who I trust. I can ask them politely to please trust or not trust some program, but I can't actually be sure of anything, because I do not control the root of the trust chain. Open source software is important for firmware and operating systems, because it's the only way to establish a chain of trust that you control. However, once I have an existing chain of trust, I don't need open-source programs anymore. I now control what runs on my computer. I can tell my computer to never run anything written by Oracle, and be assured that it will actually never trust anything written by Oracle.

If I sell you a piece of software, you have a right to decompile it, modify it, hack it, or uninstall it. This does not, however, require me to explain how it works to you, nor how to assemble it. If you trust me, you can run this piece of software on your hardware, but if you don't trust me, then a verifiable chain of trust will ensure that my software never runs on your machine. This is what matters. This is what having control means. It means being able to say what does or doesn't run on your computer. It does not require that everything is open source. It only requires an open source, verifiable chain of trust, and faith in whatever company made the software you've chosen to run. If a company's software is proprietary and they do something you don't like, use something else.

Open source code is important for firmware and operating systems, not everything.