Monday, December 22, 2014

Holiday Haze

File:Happy Holidays (5318408861).jpg

Your dedicated blogger is about to vanish in the holiday haze, presumably returning early in the new year. I hope to see you at the Boston ASSA Penn party. (I promise to show up this time. Seriously.) Meanwhile, all best wishes for the holidays.

[Photo credit:  Public domain, by Marcus Quigmire, from Florida, USA (Happy Holidays  Uploaded by Princess Mérida) [CC-BY-SA-2.0 (], via Wikimedia Commons]

Monday, December 15, 2014

Causal Modeling Update

In an earlier post on predictive modeling and causal inference, I mentioned my summer "reading list" for causal modeling:
Re-read Pearl, and read the Heckman-Pinto critique.
Re-read White et al. on settable systems and testing conditional independence.
Read Angrist-Pischke.
Read Wolpin, and Rust's review.
Read Dawid 2007 and Dawid 2014.
Get and read Imbens-Rubin (available early 2015).
I'm sure I'll blog on some of the above in due course. Meanwhile I want to add something to the list: Pearl's response to Heckman-Pinto. (I just learned of it.)

One of its themes resonates with me: Econometrics needs to examine more thoroughly the statistical causal modeling literature vis-à-vis standard econometric approaches. (The irreplaceable Hal White started building some magnificent bridges, but alas, he was taken from us much too soon.) Reasonable people will have sharply different views as to what's of value there, and the discussion is far from over, but I'm grateful to the likes of Pearl, White, Heckman, and Pinto for starting it.

Monday, December 8, 2014

A Tennis Match Graphic

I know you're not thinking about tennis in December (at least those of you north of the equator). I'm generally not either. But this post is really about graphics, and I may have something that will interest you. And remember, the Australian Open and the 2015 season will soon be here.

Tennis scoring is different and tricky compared to other sports. A 2008 New York Times piece, "In Tennis, the Numbers Sometimes Don't Add Up," is apt:
If you were told that in a particular match, Player A won more points and more games and had a higher first-serve percentage, fewer unforced errors and a higher winning percentage at the net, you would deduce that Player A was the winner. But leaping to that conclusion would be a mistake. ... In tennis, it is not the events that constitute a match, but the timing of those events. In team sports like baseball, basketball and football, and even in boxing, the competitor who scores first or last may have little bearing on the outcome. In tennis, the player who scores last is always the winner.
Tricky tennis scoring makes for tricky match summarization, whether graphically or otherwise. Not that people haven't tried, with all sorts of devices in use. See, for example, another good 2014 New York Times piece, "How to Keep Score: However You Like," and the fascinating, "A blog devoted to maps about tennis," emphasizing spatial aspects but going farther on occasion.

Glenn Rudebusch and I have been working on a graphic for tennis match summarization. We have a great team of Penn undergraduate research assistants, including Bas Bergmans, Joonyup Park, Hong Teoh, and Han Tian. We don't want a graphic that keeps score per se, or a graphic that emphasizes spatial aspects. Rather, we simply want a graphic that summarizes a match's evolution, drama, and outcome. We want it to convey a wealth of information, instantaneously and intuitively, yet also to repay longer study. Hopefully we're getting close.

Here's an example, for the classic Federer-Monfils 2014 U.S. Open match. I'm not going to explain it, because it should be self-explanatory -- if it's not, we're off track. (But of course see the notes below the graph. Sorry if they're hard to read; we had to reduce the graphic to fit the blog layout.)

Does it resonate with you? How to improve it? This is version 1; we hope to post a version 2, already in the works, during the Australian open in early 2015. Again, interim suggestions are most welcome.

Monday, December 1, 2014

Quantum Computing and Annealing

My head is spinning. The quantum computing thing is really happening. Or not. Or most likely it's happening in small but significant part and continuing to advance slowly but surely. The Slate piece from last May still seems about right (but read on). 

Optimization by simulated annealing cum quantum computing is amazing. It turns out that the large and important class of problems that map into global optimization by simulated annealing is marvelously well-suited to quantum computing, so much so that the D-Wave machine is explicitly and almost exclusively designed for solving "quantum annealing" problems. We're used to doing simulated annealing on deterministic "classical" computers, where the simulation is done in software, and it's fake (done with deterministic pseudo-random deviates). In quantum annealing the randomization is in the hardware, and it's real

From the D-Wave site:
Quantum computing uses an entirely different approach than classical computing. A useful analogy is to think of a landscape with mountains and valleys. Solving optimization problems can be thought of as trying to find the lowest point on this landscape. Every possible solution is mapped to coordinates on the landscape, and the altitude of the landscape is the “energy’” or “cost” of the solution at that point. The aim is to find the lowest point on the map and read the coordinates, as this gives the lowest energy, or optimal solution to the problem. Classical computers running classical algorithms can only "walk over this landscape". Quantum computers can tunnel through the landscape making it faster to find the lowest point.
Remember the old days of "math co-processors"? Soon you may have a "quantum co-processor" for those really tough optimization problems! And you thought you were cool if you had a GPU or two.

Except that your quantum co-processor may not work. Or it may not work well. Or at any rate today's version (the D-Wave machine; never mind that it occupies a large room) may not work, or work well. And it's annoyingly hard to tell. In any event, even if it works, the workings are subtle and still poorly understood -- the D-Wave tunneling description above is not only simplistic, but also potentially incorrect.

Here's the latest, an abstract of a lecture to be given at Penn on 4 December 2014 by one of the world's leading quantum computing researchers, Umesh Vazirani of UC Berkeley, titled "How 'Quantum' is the D-Wave Machine?":
A special purpose "quantum computer" manufactured by the Canadian company D-Wave has led to intense excitement in the mainstream media (including a Time magazine cover dubbing it "the infinity machine") and the computer industry, and a lively debate in the academic community. Scientifically it leads to the interesting question of whether it is possible to obtain quantum effects on a large scale with qubits that are not individually well protected from decoherence.

We propose a simple and natural classical model for the D-Wave  machine - replacing their superconducting qubits with classical magnets, coupled with nearest neighbor interactions whose strength is taken from D-Wave's specifications. The behavior of this classical model agrees remarkably well with posted experimental data about the input-output behavior of the D-Wave machine.

Further investigation of our classical model shows that despite its simplicity, it exhibits novel algorithmic properties. Its behavior is fundamentally different from that of its close cousin, classical heuristic simulated annealing. In particular, the major motivation behind the D-Wave machine was the hope that it would tunnel through local minima in the energy landscape, minima that simulated annealing got stuck in. The reproduction of D-Wave's behavior by our classical model demonstrates that tunneling on a large scale may be a more subtle phenomenon than was previously understood...
Wow. I'm there.

All this raises the issue of how to test untrusted quantum devices, which brings us to the very latest, Vizrani's second lecture on 5 December, "Science in the Limit of Exponential Complexity." Here's the abstract:
One of the remarkable discoveries of the last quarter century is that quantum many-body systems violate the extended Church-Turing thesis and exhibit exponential complexity -- describing the state of such a system of even a few hundred particles would require a classical memory larger than the size of the Universe. This means that a test of quantum mechanics in the limit of high complexity would require scientists to experimentally study systems that are inherently exponentially more powerful than human thought itself! 
A little reflection shows that the standard scientific method of "predict and verify" is no longer viable in this regime, since a calculation of the theory's prediction is rendered computationally infeasible. Does this mean that it is impossible to do science in this regime? A remarkable connection with the theory of interactive proof systems (the crown jewel of computational complexity theory) suggests a potential way around this impasse: interactive experiments. Rather than carry out a single experiment, the experimentalist performs a sequence of experiments, and rather than predicting the outcome of each experiment, the experimentalist checks a posteriori that the outcomes of the experiments are consistent with each other and the theory to be tested. Whether this approach will formally work is intimately related to the power of a certain kind of interactive proof system; a question that is currently wide open. Two natural variants of this question have been recently answered in the affirmative, and have resulted in a breakthrough in the closely related area of testing untrusted quantum devices. 
Wow. Now my head is really spinning.  I'm there too, for sure.