Mihai Budiu’s Blog

Computer Concepts

Monday, December 28, 2009

An interview with Kurt Akeley

Interview done in February 2009

Kurt Akeley is currently a principal researcher at Microsoft Research in Silicon Valley. He has played a significant role in the history of computing several times; perhaps his greatest claim to fame is his work on the specification of the OpenGL graphics system, done while working at Silicon Graphics; OpenGL is still one of the two major graphics architectures still in wide use (the other one being Direct 3D). Kurt is endowed with a magic touch: he has worked on virtually all layers of the computer system stack, including hardware design, system software, CAD tools, display technology, programming languages, and many other. This interview was conducted in February 2009.

The contents of this interview is licensed under a Creative Commons with Attribution License.

IMG_5277 kurtid

Left: Kurt Akeley seated next to an optics table that he is using to test the design of a 3D display (October 2009, Microsoft).

Right: Kurt’s badge showing employee #10 at Silicon Graphics. Courtesy of Kurt Akeley.

Q: When did you start using computers?

A: I started using computers in my last year in high school. We had one of these teletype machines hooked up to a computer I believe that was at the University of Delaware (although I never saw it). I remember not knowing what an array was – something mysterious and wonderful. We saved our work on little yellow tapes.

Q: Your work spans all levels of the computer system architecture: from hardware design, tools for doing CAD, system software, compilers, graphics libraries. What insights do you carry from one domain to another? Do you find it easy to move between areas?

A: I did find it easy to move between areas. The general direction of my motion — when I was at Silicon Graphics — was "up". When I started at Silicon Graphics I expected that I would do silicon; that was part of the name, and that was part of the excitement — to design custom integrated circuits. In fact, I started with hardware and integrated circuit design, moved to library design, and then higher level designs (but not really application design).

GE Die Image Clark 1982 founders

Left: Die photo of the SGI geometry engine in 1982. Courtesy of Kurt Akeley.

Right: some of the founders of SGI. From the left: Marcia Allen, Mark Grossman, Dianle Wilford, Rocky Rhodes, Kurt Akeley, Marc Hannah, Tom Davis. Several of the founders are missing, including Jim Clark, Vern Anderson, Abbey Silverstone and David Brown. Courtesy of Kurt Akeley.

SGI started with Jim Clark implementing a ge­ometry engine — his name for the custom integrated circuit that he and his students at Stanford had designed. The company was built around that piece of technology. We would start with a highly complex integrated circuit, then we would build circuit boards, and then interfaces, and then a library and finally make people use it. We did it this way because it took very long to design the custom integrated circuits; the people doing the ICs had to get started long before the people doing the interfaces and the system design. This was an implementation problem that cried out for a solution. Our solution, in the mid ’80s, was to stop building custom ICs and to use gate-array technology. That greatly reduced the technology cycle for the components.

A lot of my contribution at SGI in the first 10 years was roughly inverting that order, so that we started from the problem we were trying to solve, followed by the interface to do that, drivers and hardware systems, and finally components needed to implement that system. There’s no right answer here: it’s not right to go bottom-up or top-down; there’s some happy in-between. It’s definitely wrong to go purely bottom-up the way we were: the systems we were building were not doing as well because we had designed the wrong components. This was true in the beginning at SGI and it was largely remedied. We went from being a nothing company to more or less dominating the space of 3D graphics for workstations in the late eighties and early nineties. It is more important to design the right system than to use the very best and fastest components and to design the wrong system.

Q: Would you use today something like FPGAs (Field-Programmable Gate Arrays) for such systems?

A: If by “such systems” you mean "graphics accelerators for computers," probably not. That market has matured considerably. The handful of companies that still exist have developed a considerable expertise in building ASICs. AMD, Intel and NVIDIA do highly specialized and highly engineered components. You could not compete today. But in some new marketplace, that isn’t yet mature, and where the architectures still have to be defined to allow it to mature and to grow, the answer would be "yes".

Q: Who was SGI competing with in the ’80s?

A: It varied over the decade. [The company] Evans and Sutherland pioneered computer graphics in devices for professional application (other than simulating an airplane). Jim Clark (the founder of SGI) worked there when he was in Utah. They built very high performance 3D graphics terminals. The "terminals" were not programmable, but instead hooked to a host computer that was programmed; the host downloaded data sets to the terminal to view them interactively. That was the state of the art for professional use. There were a few very domain-specific applications in the late seventies that used these terminals. SGI distinguished itself by fully integrating the computer graphics into the workstation, building the equivalent of today’s inexpensive personal computer. It was a programmable machine with an operating system, a what-you-see-is-what-you-get terminal, which could do high-performance 3D graphics. In the first 10 years SGI focused on building tightly integrated 3D graphics capabilities into a programmable computer. Apollo and a couple of other companies were also doing it, but SGI was clearly the most successful during that decade. At the beginning of the nineties we had a clear position to drive the market.

Q: So, what happened?

A: SGI went like a cannonball launched from the cannon at ground level. SGI decided early on to cater to professional customers and to help them solve their difficult problems. So we built a sales force with a very deep competence in the field. The business model was to charge high premiums for the hardware, and give away the consulting. The value of the expertise was part of the sales transaction. We sold for high margin but low volume. We were selling billions of dollars per year, but that is small stuff compared to personal computers. The cannonball followed its predictable path, and the customers abandoned us for less expensive technology, with a lower margin.

Q: It sounds like the innovator’s dilemma (Clayton Christensen’s book).

A: There are aspects of that. Such a company has difficulty in introducing products that compete with its higher margins. SGI built graphics for personal computers earlier than anyone else, but what it didn’t do — which is classic innovator’s dilemma — it didn’t ship clearly inferior, but less expensive products. We had defined a graphics architecture called OpenGL, we had defined a set of rigorous compatibility standards with OpenGL, and we weren’t willing to ship inexpensive products that weren’t compatible. Competitors like NVidia and ATI were, and did a good job. Now they are defining the extensions to the architecture that we created. In that sense, that is a classic innovator’s dilemma.

Q: Can you point some innovations that everyone’s using today and which originated at SGI?

A: The concept of stencil buffer from OpenGL (it was actually introduced in IRIS GL, our proprietary library / architecture). The stencil buffer allows you to perform some operations when a pixel is updated. It is not z-buffering (we didn’t invent that). The stencil buffer implemented a very simple state-machine, but provided a lot of useful capabilities in the end. One of its useful applications was for solving "the capping problem" in solid design.

Let me explain: SGI did polygonal 3D graphics. The things rotating on the screen appeared to be solid, but in fact they were modeled with polygons. If you were to look inside you would see nothing. You could look inside by passing a clipping plane, like those cut-away drawings that you see in magazines. Now, when you clipped a polygonal drawing it looks really ugly, because you see the rear sides of the back-facing polygons, and they would be displayed incorrectly or not at all. The designers wanted to clip objects and see them shaded with the correct color. This had been a long-standing problem, and we solved it using the stencil buffer. It’s a simple idea, in retrospect it is obvious, but people hadn’t done it before, and it endures to this day. In all modern graphics systems the final pixel operations in the frame buffer are still not generally programmable; they are specified as a set of modes, and one of the modes is a stencil buffer. The stencil buffer mechanism used today is virtually identical to the one we created.

Q: The SGI system was much less programmable: you had in hardware all the features that now are done in software.

A: In fact, they were all software. The question was "who got to write the programs?" Did an SGI box from 1988 have programmable vertex shaders? Yes, it absolutely did. The geometry engine, that I implemented, used an off-the-shelf floating-point hardware microcode engine; we wrote the microcode. The microcoded engine looked to the external world as a hardwired mechanism: the applications could not change the vertex shaders. The applications could control the shading just through some mode bits. That was true at the vertex level for all Silicon Graphics machines (and — I think – for all competitors of Silicon Graphics). At the pixel fragment level and rasterization level some of these machines were programmable, and some were not. In fact, SGI built machines that were programmable at the pixel level too, all the way to the frame buffer. That level of programmability still hasn’t been revealed in the world of OpenGL and Direct3D.

Q: One of the most enduring things to come out of SGI was OpenGL. You were one of the main architects of OpenGL. How was Open GL born?

A: The career path that I described previously, of inverting the order of system layers, gave rise to my interest in graphics interfaces. I realized that we needed to design the interface first, and then figure out the implementation (e.g., what integrated circuits to use).

There was a Silicon Graphics standard called IRIS GL. Around 1987 I specified a lot of the additions to IRIS GL that enabled capabilities of the new graphics system we were developing. In the next few years a lot of competition in this area emerged. In particular, the PHIGS evolved from a relatively uninteresting retained-mode scene graphics library to becoming a 3D extension to the X-Windows system called PEX (PHIGS for X), with an active standardization process.

A lot of the value SGI had delivered to customers was in specifying the correct interface. I had put a lot of energy into this for 10 years. It was a frightening thought to lose control, but, more fundamentally, to have a bad interface as the industry standard. We had to choose between supporting IRIS GL and improving the X-based interface, knowing that unfortunately it was predicated on PHIGS, which we thought was quite bad.

Q: Who supported PHIGS?

A: Almost everybody at this time. SGI got a third-party company to implement PHIGS on IRIX. Our competitors were SUN and HP and maybe Apollo. Probably all these companies had PHIGS implementations. People at SUN were particularly excited about it. These companies were actively moving from that specification to an X extension and making it into a standard architecture.

By the way: it’s important and very valuable for an industry to have a standard architecture. But it’s also important that it be a good architecture. I think we are in a better place with OpenGL than we would have been by building on PHIGS or PEX. That could be debated, though of course it’s a test that we will never do.

I realized that specifying the interface was the key design decision. So we cranked out an interface that we thought was better. Along the way we gathered a several partners and made this work a collaborative exercise. In the 1.0 version of OpenGL, published on July 1st 1992,a few of us at SGI made all the decisions, based on a lot of input [from our partners]. It wasn’t a design by committee, although Digital, Microsoft, and Compaq contributed considerably to it. We made all the hard decisions for the first revision. We also promised in advance that the future governance of the standard would be done in a less heavy-handed way: "trust us, give us your input, we’ll do a good job on the first revision, and then we’ll govern it going forward in a much more democratic way." We wanted to get it off to a good start. As they say "a camel is a horse designed by a committee." There was a lot of fear and angst in the community, because, remember, SGI was the dominant player. And people sort-of trusted us, but there still was a lot of concern. I think that in retrospect they made a good choice.

IMG_5218

The cover of the v 1.0 Open GL specification (Kurt’s signature is on the top-right corner).

Q: You said that OpenGL is a better interface. How do you measure the quality of an interface?

A: Let me back up: something that’s not fully appreciated about OpenGL is that it is an architecture. I’ve been using that term consistently. It is an architecture, in the sense used by Fred Brooks for the IBM 360 ISA (Instruction Set Architecture) in the sixties. OpenGL is a library interface, but it should be evaluated against the same metrics as an ISA:

  • how suitable is it to its task,
  • how effectively can it be applied to tasks not envisioned by its designers
  • how effectively can it be implemented consistently on different machines
  • how well can it be evolved into the future while still maintaining the compatibility
  • how well does it encourage application compatibility

Fred points out a classic set of design mistakes; for example, if you leave any things undefined in an architectural specification (and, by the way, there needs to be a written specification), then they become defined de-facto by early implementations, and often these de-facto definitions aren’t in the best interest of evolving the architecture. One way to measure the quality of and architecture, is to see how much undefined behavior there is in the architecture. You can get picky about the exact metric of "how much," but by using hand-wavy comparisons we can claim that before OpenGL no graphics architecture was very tightly defined, and many had all kinds of undefined behaviors. As a result, each implementation was different, and application compatibility failed. To be fair, SGI’s proprietary implementation IRIS GL was quite bad. We built OpenGL based on experience from our early effort at defining a flawed architecture.

Q: The effort to generalize IRIS GL into OpenGL made it better?

A: Absolutely. Because it wasn’t just an effort to generalize it. For example, we had licensed IRIS GL to IBM so they could use it on their workstations. This turned out to be an awful mess, because IRIS GL wasn’t very well specified. Having them implement it required constant communication: "what happens in this case?" The correctness was defined by "he who yells loudest:" whichever implementation had the most applications would define correctness. That’s a very poor way to implement an architecture. We also had our own bad experience with Iris GL that we were attempting improve with OpenGL. This process was not generalizing or popularizing; it was an attempt to build a library at the same time as creating an industry standard. Those weren’t inconsistent goals. Some of the wrong things in IRIS GL made it difficult for IRIS GL to become an industry standard.

Q: Do you think that this architecture is still fit given the current evolution of hardware?

A: Software interfaces, even if they are not done very well, but which get a large following, last a long time. There’s a kind of "fitness by default." I would predict that people will be using OpenGL and Direct3D (which is more or less a different evolution of the OpenGL architecture), for some time to come. They are not going to go away.

On the other hand, for some really important application spaces, such as first person shooter interactive 3D graphics games, or games that push the state of the art in computer graphics, I think that Direct3D and OpenGL have little life left. The way implementation technology changes — I am thinking specifically the Intel Larabee graphics chip – will push the standard interface for game development to a higher level. Like the Unreal Game Engine. People creating those game engines need a different kind of interface to the underlying hardware and microcode, and they will bypass Direct3D and OpenGL.

Q: So what will the new graphics architecture be?

A: The architecture has been generalized through programmability to become a high-level pipeline. You can think of it in terms of datatypes. Let’s ignore high-level surface specifications for a moment and focus just on polygons or triangles. The base type for these is the vertex. The graphics system converts vertexes to triangles, converts triangles to pixel fragments and finally converts those to pixels, and puts them into the framebuffer. It’s a three-stage pipeline with specialized tidbits to conversion from one datatype to another. That sequence is incredibly useful and powerful, but it isn’t the only reasonable sequence of transformations for computer graphics. For example, you can turn things around with ray tracing: instead of starting with vertexes and pushing them toward the framebuffer, you can start at the framebuffer and cast rays back into the scene and figure out what each ray comes from in the scene. You can think of this as a loop-unrolling technique. The standard graphics architecture really can’t deal with ray tracing. This is just one example. In the near term the world will not switch from the pipeline sequence to ray tracing. There has already been a lot of research for mixing the strengths of both, and this research will turn into products soon. These products are not implemented naturally on top of an architecture like OpenGL or Direct3D.

Q: But the world still needs an architecture!

A: But it might not be a standard architecture. The Larabee system will expose something that we can call an architecture: a sea of IA32-like processors; that will be well-defined. It won’t be an industry standard, it will be an Intel standard. And it won’t be licensed; it won’t be available to NVidia. OpenGL or Direct3D will continue to be available, but you will also see the architecture exposed through something like the Unreal game engine. There won’t be any software layer between Larabee and the Unreal game engine.

Q: The graphics engine itself is evolving in unexpected ways. You worked on Cg and there are other languages like CUDA (for parallel programming on graphics cards) that are becoming very popular. What do you think about this evolution?

A: I like it: it’s fun! At SGI we made some pretty weak attempts to move in this direction. In 1990 we shipped a machine which provided a programmable interface for general-purpose computation. Some of the early people at SGI were hired by Jim Clarke specifically to expose the geometry engine for more general-purpose calculations in the graphics pipeline. It’s an old idea, and probably older than that, but that’s when I became aware of it. It’s old, but it’s really getting some traction now, because the technology and expertise have evolved far enough to make these architectures practically programmable. They were programmable by the people who built them, but they weren’t programmable by general-purpose applications developers. You need to implement the right language architecture, and to build the right compilers to make all this work. There really is a convergence between GPUs and CPUs. If you pry open a modern NVidia part and the Intel Larabee you find that each of them has on the order of 10 cores. And each of these cores is a programmable engine, and is implemented with an instruction set augmented with SIMD capabilities. And the widths are on the order of 16 for both of these machines.

Q: The memory management seems to be very different.

A: Yes. The NVidia part still isn’t designed to run an operating system. Its relationship with memory is rather different. There are other important differences: the Intel architecture exposes this implementation quite directly, while the NVidia does not, even for CUDA. It hides that implementation behind an architecture that’s higher-level, like OpenGL. It still implements the OpenGL pipeline and it time-shares the same underlying compute engines for all the important stages of this pipeline. But it hides this fact from the programmer. The other difference is that NVidia does not expose an individual core. Its architecture makes you think that you are programming "lanes". You are programming only one of the SIMD paths, and NVidia put in mechanisms to handle divergent code paths, when the SIMD hardware executes the same program on 16 datasets and a branch that goes in different ways in different data sets. Even CUDA, where the graphics pipeline abstraction goes away, still maintains a Single Program Multiple Data model. Intel will expose a lower level for Larabee. I think we’ll converge to something more like Larabee than NVidia, something that’s designed to do well with thread parallelism, and with data parallelism. Then we can say that graphics has led the underlying implementation toward a multi-core computer architecture.

Q: Can you tell us about your current research?

A: I am now in two broad areas. One is a combination of human vision and computer graphics. My goal is to implement stereo display technology that provides better focus cues to the viewer. This work is straddling psychology, perception and human vision, and computer graphics. I continue to publish in this area and to collaborate with people at Berkeley, where I did my dissertation work. The second area is computer architecture: evolving graphics architecture to provide a platform for data-parallel computation.

IMG_5214

Kurt’s image reflected in a prototype 3D display with correct focusing cues (the subject of his Ph.D. thesis), using 3 semi-transparent mirros.

Q: What’s wrong with current displays?

A: I did my dissertation on stereo displays. All existing ones have a bad property: they create a simulated object in your view whose apparent distance is determined by its vergence (disparity), meaning that it appears slightly different to your two eyes. Because of disparity you have a sense of how far away it is. That’s called “depth cues” in a stereo image. But there are other cues for the brain besides the vergence, and in particular there is a cue based on where your eyes are focusing. If you are focusing at one distance but verging (or looking at) an object at a different simulated distance, you are getting conflicting cues. That’s bad, but not tragic, the focusing cues are less important than display latency or hidden surface occlusion (another fundamental depth cue). Occlusion and vergence are solved, but people were unaware that focus was a problem at all. It turns out that where you focus is important. Focus doesn’t matter so much for a non-stereo display, but it does matter in stereo displays.

Q: Or at the 3D cinema.

A: Same thing. You are focusing on the screen, but you display images which are farther than the screen or in many cases much closer than the screen. We implemented a display that allows you to focus at approximately the correct distance (i.e., same as the vergence distance). It does that without any moving parts and without tracking where you are actually looking. Using that system we’ve done some measurements to quantify the problems. Does decoupling focus from vergence cause discomfort? The answer is "yes". There had never been a rigorous experiment to show it. We also postulated, based on the mathematics, that even with very low resolution in depth, we can make people focus at the right distance, “in between” depth pixels. We are collecting data that shows you can build good a display with a very low resolution and depth.

You can build a display with 10 transparent screens, but we have shown we can simulate correct cues at a finer granularity with many fewer than 10. It’s also interesting from an engineering standpoint because the obvious ways to solve this problem were just insanely expensive. To have thousands of pixels of depth resolution, you need billions of pixels. There is no easy jump from the current million pixel displays to a billion pixels, but conceivably engineering can jump from 1 million to 10 millions. In this work fundamental science (understanding human vision) and engineering (building a display that can solve the requirements) converge.

Q: Do you see applications? For example, in flight simulators?

A: Focus matters most in the near field [of vision], within an arm’s reach. The flight simulators solved this long ago by setting the focus at infinity. In cinema it’s almost the same, except that sometimes the simulated distances are very close. The screen is effectively infinity, but the movie may provide a disparity that makes you think that you are looking two feet in front of you. In flight simulators by the time you are two feet from the ground the game is over. Remote surgery, augmented reality, telepresence: are practical applications.

What’s computing used for these days? It’s mostly communication. Facebook. We are still doing only rudimentary visual communication. We need to provide comfortable, intuitive viewing systems. That will mean stereo, and that will mean near field as well as far field. I think that this work isn’t just for flight simulation. There is a place for stereo 3D graphics. If you think you know the details, it’s a good place to start a company.

Q: How about the architectural direction in your research?

A: I work with some Stanford students and a professor on making graphics architectures more general-purpose both for graphics, and for high-performance computing. We have a paper in the January issue of Transactions on Graphics. We are thinking about the architecture at the level below Direct3D: how many cores there are, how they are managed, how they communicate.

Q: What ideas in your field of expertise surprised you? The converse is: what ideas were ahead of their time and still haven’t been accepted?

A: I am on record saying that I don’t think that exposing the programmability of GPUs will be a successful approach, a wildly wrong prediction. This is a classic expert problem: which things that I know should I question? You can’t move forward if you doubt everything, but you move incorrectly if you don’t reevaluate things that have changed

Something that the world still isn’t ready for? The kind of rigor that we brought to OpenGL. in particular, rigor about (computation) state. We identified, named and provided mechanisms to change, query and utilize all of the state in OpenGL. The document specified that implementation could do as it pleases, but it must appear at the architectural level that implementation is managing state in exactly the way described. OpenGL’s forward evolution also took this very seriously. And it works. It allows people to use OpenGL effectively, and to extend it effectively, to reason about it effectively, and it is antithetical to modern software practice!

Q: Do you know any other API which satisfies these criteria?

A: We patterned OpenGL on the X Windows system. The people who did the core of X Windows took state very seriously (not the whole software stack on top of the X Windows — I am not making a case for Motif here). Not a perfect job, but an admirable one.

As far as I can tell, at least at the application level, pretty much nobody is rigorous about exposing state. How is the state modified, stored, queried? All of these things are very ill-defined at higher levels of the software stack. It drives me crazy. I cannot think of a fundamental reason why we shouldn’t be as rigorous at the higher levels. My take is that we are just being sloppy. Perhaps most people just don’t think this way: if you rigorously exposed the state of the telephone, Outlook, Photoshop, it will be confusing and not helpful to the average person. I find that unimaginable, but it may be so. What seems clear to me to be the right way to do something may in fact not. But I continue to have this hunch: that we’ll all be in a better place when we take the notion of state, and specifying the interactions of state clearly, much more rigorously than we do now.

posted by Mihai at 3:36 pm  

Sunday, January 20, 2008

A Planetary Operating System

Put side to side a laptop and a mainframe from 40 years ago and you will be amazed by the astonishing evolution.  Not only are hardware resources many orders of magnitude larger, but also the software is immensely more powerful and sophisticated.  Operating systems appeared in the ’60, for managing mainframe computers.  They handled tasks common to a variety of user programs, such as low-level device handling, scheduling the processor, allocating and naming storage, enforcing security.  An operating system main role is to transform the raw machine into a set of high-level abstraction, which can be used and shared by applications.  For example, instead of thinking which bytes to write or read from a disk, the applications access named files.

It may seem that we are close to have reached the last word in operating systems, and studying them will soon be part of history.  But an invisible revolution is ongoing, whose importance and magnitude is enormous. Despite its enormous scale, the revolution is mostly invisible.  Hidden from our eyes is slowly evolving is a set of enormous, planetary-scale operating systems.  We are still quite far from having reached this goal; to make a historic analogy, I estimate is that we are still in the pre-Unix era (1971) with respect to the planetary OS.  If you look carefully you can see how every year a few more pieces of the puzzle are being filled.

So, what is the computer that is to be managed by the planetary OS?  In its current incarnation it is the datacenter — a collection of tens of thousands of processors and disks — but soon it is going to be a collection of datacenters linked together into a seamless whole.  The goal of the planetary OS is to make millions of processors and disks to behave as a single machine, as easy and natural to use as a desktop.  Using a simple interface you will invoke the power of tens of thousands of processors, spread all over the world.  You will sift through petabytes of data with a simple mouse click.  You will manage millions of data streams from a single console.

You can read about some of the pieces of this enormous puzzle.  Here are some sample links: the Google File System, Amazon’s S3 and VMWare VMFS offer a single filesystem namespace, spanning reliably and transparently up to tens of thousands of machines.  Google’s Map-Reduce and Microsoft’s Dryad allow anyone to execute a program on thousands of machines, sifting through hundreds of terabytes of data in a matter of minutes.  Amazon’s EC2 and VMWare’s DRS offers transparent compute resource virtualization (extending the concept of process in operating systems to the cluster-level).  Microsoft’s Autopilot is the generalization to the cluster-level of the BIOS and software updates, monitoring and deploying automatically software.  Google’s BigTable transforms a cluster into a huge database.  User management is described in Google’s paper, but it is an ubiquitous piece in many other on-line services, from E-Bay to Yahoo IM.  Google’s Chubby provides cluster-level reliable interprocess communication.  Akamai’s network handles vast volumes of traffic for millions of clients.  And, of course, Web-based platforms are the new user-level APIs, which programmers can use to craft mash-ups (the extension of traditional applications).

And this list could continue.

Currently most of these pieces are still disjoint – not yet tied together into a coherent whole, and most of them do not scale up to planetary-size (they are confined to the cluster- or datacenter level).  But the trend is unmistakable: there is an ongoing race to build a unified, simple, single system, which will manage transparently, automatically, and effortlessly (to a large degree – not even a desktop operating system can work without supervision, so we can’t expect full autonomy at this scale) a gigantic computer spanning the whole globe.

It is certainly a very exciting time!

posted by Mihai at 11:36 pm  

Sunday, September 30, 2007

The plan behind Google

Updated: fixed a broken link.

I read a very interesting business book about how great companies are built.  (I am not the only one, the cover boasts "more than 1 million sold.")  The book is "Built to Last: Successful Habits of Visionary Companies."  I am providing the Amazon link, but you can also find a brief description at wikisummaries.  I liked the book because the authors — who are current or ex- academics from the Stanford school of Business — claim convincingly that their analysis is driven by data.  The methodology employed was as follows: the authors have first identified 18 "great" companies, using a survey.  These are companies that have thrived for a very long time, more than 100 years for most of them.  Then the authors set-up a control group, which contains similar companies, but slightly less successful.  Then the authors attempt to find a set of recipes that stand behind the great companies.  The data collection and analysis took 6 years.  Not surprising, taking into account that they had to dig into archives older than 150 years sometimes.

The results are quite surprising, and I highly encourage you to read the book for details.  The one-line summary (which you won’t find in the book in this form) is that all such great companies are really organized a lot like religious organizations.  (This great observation is due to Martin Abadi.)

But one thing that struck me while reading the book is that I recognized a lot of the traits of these organizations in the way Google seems to be structured.  My personal hypothesis is that the founders of Google have actually read this book carefully (it was published in 1996), and they set up deliberately to create a company which would follow the recipe in the book, and thus would be destined to become "great."  The way their stock and revenues has been working, they seem to have been vindicated, but it is way too soon to give a verdict.

Here I want to point out some of these striking similarities.

  • A great company should have a core value, which it preserves no matter how many mutations it suffers.  Clearly, I am thinking about "Do no evil."
  • The company should be driven by "big, hairy, audacious goals."  Goals like the moon landing, which motivate and drive for a long time, but are reachable.  (These are not the same as the core value.)  What else but "organize the world’s information?"
  • The company ideology should be more than just profits.  In fact, short-term profit should be secondary to pursuing the long-term goals.  Doesn’t this fit with the lack of financial guidance offered by Google?
  • Create a cult-like organization, which draws a clear line between the insiders and the outsiders.  There is a lot to quote here, from the free meals and other perks, which are designed to bond the workers together, making them as independent as possible from the "outside world", to their almost paranoid secrecy (in some respects) — for example their extremely strict non-disclosure agreement.
  • Try a lot of stuff and keep what works.  I believe this one needs no explanation.  In fact, Marissa Mayer, google’s VP of search and UX gave a talk at Stanford on Google’s "9 notions of innovation".  One of the points is that the numbers rule: an idea is judged better if supported by measurable evidence.
  • Good enough never is.  Constantly drive for improvement; you can never declare victory, business is a continuous process.  Even if the search engine and AdSense bring in a truckload of money, Google still introduces a whole set of products which take (or will take) advantage of its strength in advertising, such as gmail, maps, youtube, and the on-line office suite.

For some of the recipes in the book I could not find equivalents in Google’s structure, but it is still a very young organization.  For example "home-grown management:" the organization should prepare its own leaders, and should be able to survive unscathed the disappearance of its charismatic founders.  (Anecdotal reports seem to indicate that Google does not like at all the idea of management, preferring a very flat organization, where a lot of the decision power is held by the engineers.)

It is not clear whether these recipes are either necessary or sufficient for creating a great company.  But I will certainly be following with interest the trajectory of Google.

posted by Mihai at 10:54 pm  

Monday, August 13, 2007

Research failures

R. W. Johnson Jr., the founder of Johnson and Johnson liked to say frequently about his company: “Failure is our most important product.” He meant that J&J learned how to create useful products by attempting to create many unsuccessful ones as well.

There is no way to predict the outcome of research: that is what makes it research in the first place (as opposed to just search), you are looking for something that you don’t know yet.

Given the hard constraints that have to be satisfied by the “real world,” it shouldn’t be a surprise if most research results are actually failures, in the sense that their immediate results are not really useful or applicable.

The unfortunate fact is that, in order to get your results published, you have to make the research look good. Almost no conference committe really likes to publish negative results. To have a paper accepted, you have to beat everybody out there, by some metric. The consequence is that many authors spend a great deal of effort to obfuscate the real results that they have obtained, packaging them in such a way to make them look good.

For example, this can involve a “careful” selection of benchmarks, which are not really representing reality, but just highlighting the positive features of the ideas. There’s always a benchmark that will make your reseach look good. If there is no good benchmark, you can always tweak the baseline against which you are comparing: why not compare against unoptimized code, or some obsolete implementation, or perhaps use some favorable units of measure (e.g. plot everything in clock cycles, forgetting to mention that a cycle is 1ns for the adversary and 30ns for the evaluated system).

But good research provides more results than just an artifact. A very important result of research, which is often neglected, is the lesson that has been learned by doing the research. Even if the results are bad, the lesson can be extremely valuable.

Let me give you a concrete and dramatic example to illustrate: starting in the ’70s and tapering off in the mid ’90s there has been a substantial amount of research on dataflow machine architecture. One of the most active groups investigating this topic was at MIT, led by Jack Dennis and Arvind. To be more specific I will just focus on Arvind. Dataflow was trampled by superscalar microprocessors, and most of that research does not seem to have a lot of commercial applicability nowadays. However, the vast majority of Arvind’s students that have worked on dataflow machines, from designing them, building their compilers, and building their hardware, have become virtually a mini “who’s who” list of computing personalities. You can see some of them at the web page about Arvind’s 60-th birthday.

Here’s a sample: Bob Ianucci is head of Nokia’s research center, Greg Papadopoulos is Chief Technology Officer and Executive Vice President of Research and Development at Sun, Keshav Pingali and Derek Chiou are Professors at the University of Texas at Austin, David Culler is Professor at University of California at Berkeley, James Hoe is Professor At Carnegie Mellon, but there are many other. The point I want to make is: what you learn from your work may be more important than the work itself. What Arvind taught these people is much more than dataflow machine architecture, it is how to think critically about research, and how to explore new fields by asking the right questions and seeking a deep understanding of the solution. This has enabled them to be successful researchers themselves.

posted by Mihai at 12:11 am  

Sunday, August 5, 2007

More on Presentation Mistakes

More on Presentation Mistakes

A few years ago I have given a very short talk about giving effective talks; I still think that was a good summary, so I am providing the link above, to the PowerPoint slides.

BTW, I love PowerPoint. I think that it is a great tool, which can be extremely effective when used properly. Like any other tool, it can be used improperly, but this doesn’t make it bad. (Are hammers bad because they can cause injuries?) I think PowerPoint enables many people to present and organize their ideas in much better ways than before. Can you give a great speech without it? Sure. Can you have a terrible slide show? Certainly. But that’s not how you measure its effectiveness.

I have also written some (extended) advice on how to give presentations some time ago, but I guess that web page it too long, and not terribly original.

I also know quite a few people who manage to contradict a very large number of pieces of advice from my write-up and still give amazingly good talks (one example I remember vividly is Amir Pnueli). The secret in these cases is almost always crystal-clear thinking and a very sharp flow of ideas. So these rules are not the only way to do things.

But here I thought to mention a few (other) mistakes which I see frequently in job interviews (and many other presentations):

  • Not listening the questions. It is amazing how many people answer a different question from the one that is being asked. Watching tapes of my talks I have realized that this has happened to me too. The reason is that I often have already in mind a list of questions that I expect, and I tend to match the words I hear to one my mental models. (Some people start answering before the questioner has even finished, thinking they have heard everything!)

    The best way to avoid this symptom is to listen carefully, repeat or rephrase the question, and (if possible) ask whether this is the intended meaning.

  • Giving too many details. A good talk is a fine balance between advertising and teaching hard facts. In truth, most people will forget almost everything you tell them, so the point is to make them interested in the work, and to stick a few salient facts in their mind. To remember, (the vast majority of) people will have to rehearse, and for this they will have to go to a more persistent material than just a talk. A corolary of this observation is that one should rely on intution when it is appropriate, and not try to explain everything in detail.
posted by Mihai at 11:25 pm  

Sunday, July 22, 2007

Some Common Job Interview Mistakes

You have worked hard to earn a Ph.D., and now you are looking for a job, either in an academic position (in a university), or in an industrial research lab, or perhaps in some other place where you have to use your recently earned qualifications. For this transition you have to go through a job interview. This is one of the most important “exams” in your life, since depending on its outcome your future may change in completely unexpected ways.

Having seen the job interview process from both sides, once when I was looking for jobs, and many times while interviewing candidates, I want to point out a few mistakes that I see many candidates making during their interviews. None of these mistakes is “fatal”, and stengths in other areas of the interview can overcome any of them. But why not avoid them if you can?

  • Argue about assumptions instead of substance. Most job interviews contain a presentation given by the candidate. This presentation gives the candidate to focus on an important project which is representative of his or her past research. In many locations this presentation is interactive: the members of the audience can interrupt and ask questions during the presentation.

    A good presentation starts with an introduction, which explains the setting of the research, and (some of) the assumptions under which the problem is solved.

    (Here are some examples: we assume that “the cost of communication is much larger than the cost of computation”, or “we assume that we have a universal public key infrastructure“, or “we assume that we can completely replace the internet protocol IP with the protocol we have designed”, or “we assume that we can recompile all software with our compiler”.)

    Many talks leave some of the assumptions unstated, and listeners have to be deduce them during the talk. I have seen two things happen quite often:

    • The speaker and the listeners have different assumptions sets in mind during the presentation, and thus they do not understand each other, and they argue constantly about the quality of the solution.
    • Even if the speaker and the listeners do agree on a set of assumptions, the listeners may not find the assumptions reasonable. For example, many listeners may object to most assumptions presented above. This can turn out into a religious debate.

    Both these situations are highly counterproductive to the presentation.

    If you are giving a talk and you sense that the debate steers towards assumptions, it is time to stop it immediately and re-focus it. The research has been done, and there is nothing you can do to change the assumptions you have used. You should not spend too much time to defend them either, religious wars are endless, and as a job candidate you have much more to loose than the interviewers by wasting precious time during the talk.

    What you can say is the following: “These are the assumptions behind this piece of work, let’s make all of them clear. You may like them or not, but we cannot argue about them anymore. In this talk, I will show you what I have built on top of these assumptions. What you should judge me for is this construction — my research — and not the assumptions. The value is in the work, not in the premises. If you like the construction, perhaps I can do more constructions in the future, after you hire me, starting from a set of assumptions which is less controversial.”

  • Attempt to hide obvious weaknesses. Sometimes during a presentation an audience member discovers some weakness in the presented work. For example, a very old benchmark was used for measurements, which does not stress enough current machine capabilities. Or there is some related research that the candidate did not know about. Or a theorem was used, for which the assumptions were not all satisfied.

    One of the worst attitudes to take in this case is to defend your work, and to attempt to hide the mistake. “No, this benchmark is actually very good, because the machines in our lab are old.” “No, I didn’t read that work, but I know that that group does not use a sophisticated compiler like mine.” “I am sure that the theorem can be proved anyway.”

    You have to realize that often during your interview in the audience there may be experts in the field of your research. Attempting to trick them with subterfuges will backfire badly.

    The best thing, when discovering a mistake, is to first understand it (this shows that you can think), and second acknowledge it, and then to present the rest of the work, and show how it stands on its own. Hopefully, this mistake will only have a local impact, and it is not the basis for everything you have done in the last 5 years.

  • Not care about the place where you are going. If you interview some place, you better care about that place: the people who work there and who might become your colleagues, the work atmosphere, the reward system, if you plan to do research how you get money for your research, and, very important, the health and wealth of the mother organization which is hiring you. If you interview some place and you never display an interest in any of these things, it shows you are not really committed to go there.

    These are good subjects to discuss for some of the one-on-one meetings that usually occur during the job interview.

  • Show modesty. Some people don’t like to boast about their achievement. The interview is the worst place to hide your qualities.

    Some people prefer for their abilities to “speak for themselves”. Well, I have news for you: it doesn’t really work. People are too busy to guess your abilities, you have to use your mouth.

    In general, do not assume people will think and infer something about you. “They will see I published papers in conferences both in theory and systems, so they will realize that I am an interdisciplinary guy.” If you think that interdisciplinarity is one of your strengths, put it in writing in your statement of purpose, or cover letter, or even better, on a slide in the talk (or all of them).

    I am not saying you have to go around saying how good you are, that does not work, but one thing you have to be very careful during the interview process is about using negative labels for yourself. Even when joking. This is a long post already, so I will write some more about this topic another time.

posted by Mihai at 11:30 pm  

Sunday, July 8, 2007

Computing Research and Monopolies

I would like to point out an interesting correlation between high quality research labs and monopolies:

  • AT&T had a monopoly on long distance telephone service for most of the 20th century in the United States. In 1925 AT&T has created Bell Labs, one of the most famous research laboratories in history, the birthplace of transistors, lasers, information theory, Unix, the C language, and many other things.
  • IBM was convicted in 1973 for having created a monopoly in the digital computer market. IBM’s research centers have pioneered many fundamental concepts of computer science, engineering, manufacturing, starting with IBM TJ Watson, founded in 1945.
  • Xerox signed a consent decree in 1975 to settle an anti-trust suit with the Federal Trade Comission, regarding their monopoly on Xerography. At the time Xerox had just founded the legendary Palo Alto Research Center, or Xerox PARC. You can read about some of the early work done at PARC in my interview with Chuck Thacker.
  • In 2001 Microsoft settled a lawsuit with the Department of Justice regarding its monopoly power on operating systems. Currently Microsoft Research is one of most respected research labs in industry. (History will judge whether its influence is comparable to the other three cited above.)

I am not saying that good research only happens at monopolies, there are plenty of other examples. But often monopolies use some of their money for good purposes.

History also teaches us that when the resources of the monopoly start to dwindle, the labs will suffer. Bell Labs was slowly dismantled, Xerox now only partly supports PARC (which has lost Xerox from its name), and many divisions from IBM Research do not enjoy the lavish resources they used to.

posted by Mihai at 11:48 pm  

Saturday, June 23, 2007

Academics Love Themselves

The higher (i.e., university) education in the United States is really good. In fact, it is so good that it is a significant “export” of the United States. Here is some data from an Oct 2006 Congressional Report:

In FY2005, the Department of State issued 565,790 [student] visas, making up 10.5% of all nonimmigrant visas issued.

And also:

Data from the National Science Foundation (NSF) shows that in 2004, foreign students on nonimmigrant visas accounted for 28.4% of all the doctorates in the sciences and 57.2% of all the doctorates in engineering.

Personally, I came to the US for graduate studies: in 1996 at Cornell, and I moved in 1997 at Carnegie Mellon for a Ph.D in computer science. I can testify that I found the system very good indeed.

One important thing I have learned here is the respect for hard facts, for supporting your statements with data, and for quantifying your assertions with numbers. (This is one reason that I have provided you with the quotations above.) Many European systems are built around the respect of lofty “traditional ideas,” or even worse, “personalities,” incarnated in the untouchable professor, who is always right.

In the US everything is open for debate, and in universities the scientific argument, based on facts, is a powerful weapon. This banishes rigidity, keeping the flow of ideas open.

However great, we should not believe that the system is perfect. In this text I want to point out a weakness of the US higher education system. I won’t be the first to do it: for a much more entertaining account, see this talk by Sir Ken Robinson.

When you talk about a system being “perfect,” you have to first define a measure by which you quantify its quality. While I can’t provide an objective metric, what I have in mind is the degree by which the system produces people which are prepared for the challenges offered by the “real world,” that awaits them at graduation, and, in particular, for a job.

I have to agree with Sir Ken Robinson: the entire education is optimized for producing professors. The higher you go in the educational hierarchy, the more likely you will be to come out a professor.

One can find many explanations for this state of affairs. Let me try to propose one: a positive feedback loop self-reinforcing across generations. The more you stay in school, the more your role models are professors. These are the people you see every day. I know it very well: I have been in school for almost 30 years (that’s a really long time!), and I became convinced I have to be a professor myself. Why? I had never really seen any other profession in front of my eyes. In graduate school there was an implicit, not-very-clearly-stated assumption that the successful graduates go to become professors, and the failed ones go to industry or some other “shady” places like that.

This is how the feedback loop starts. You see only professors around you, and your students also become professors, and they have only seen professors all their lives. A second problem is the information intake. Professors are really smart people. They generate a lot of ideas. They like to write and talk a lot about these ideas. That’s why they write papers and organize conferences where they meet other people like themselves. What they don’t like to acknowledge is that there are lots of other sources of ideas which are not universities. Speaking in particular about computing, where I know the situation better, there are a lot of very good ideas generated in industry, both in mature companies and start-ups, in open-source, by independent consultants, or even just by random hackers. Well, academics very seldom acknowledge such ideas — they don’t know how to cite something which is not a formal paper. For this reason the feedback loop is somewhat closed. Not completely closed, some “traitors” do occasionally slip in at least new problems to work on from the “real world.”

This is where the title of this post comes from: I came to believe that in general academics do love themselves more than the real world.

Professors defend their status quo with two technical weapons:

  • The academic freedom: “don’t you interfere with my teaching and my work.”
  • The fundamental principles: “everything changes too quickly, so I am not going to teach the latest technological fad, I am teaching the principles.”

Both of these are great things, but can be used to just reinforce the feedback cycle I described above. And the problem is, not everybody should be a professor.

posted by Mihai at 11:04 pm  

Friday, June 15, 2007

Research in Academia vs. Industry

The meaning given to the word [computer-related] “research” is not the same in universities (i.e. academia) and in industrial research labs. (Research has yet other meanings, that I won’t touch, for example, “market research.”) The fact that “research” has two meanings is quite subtle, because these two meanings do overlap substantially.

The duality of meanings is most apparent when you see how people from the two environments regard the same piece of work. While I saw people from academia regard a paper as a masterpiece, at the same time people in industry declared it useless. And each party had perfectly valid arguments to their side. How could that be possible?

To reconcile these seemingly contradictory points of view, one has to understand that people in these two camps play two different games, and thus optimize for two different criteria. In academia the reward is tenure, and (perhaps surprisingly) the respect of peers, and the main measure of success seems to be the number of publications (and various awards). In industrial research labs the main reward is having your work translated into a product, and the measures of success are more varied, but include mainly technology transfer (hard to quantify) and patents. What blurs this picture is the fact that there are quite a few people who cross the lines: academics who create companies (i.e., start-ups) or consult for industry, people in labs who publish papers. But in the end, there are two different games that are played here, with the same name, “research”, but with different rewards, and different rules.

There is this famous quote:

Life is a game. Money is how we keep score.
— Ted Turner

I believe that this is pushing the “game” framework a little too far, as I will explain in a minute. But the “game” framework has become really useful for me. Let me give you one more example.

Let’s take software developers. I have met some absolutely brilliant software developers, who can craft some amazing pieces of software, and who have proven that they have a very deep understanding of computers, perhaps much deeper than many people who have stayed much more time in school and gotten a Ph.D. exactly to study the behavior of the machine.

I can imagine how a guy with a Ph.D. can snicker about these developers being just “code monkeys,” who can’t put a paper together, and who don’t know the “related work.” I can also imagine how the coder can snicker about the Ph.D. guy having never produced a piece of code which doesn’t break any time the wind blows. So, who is better?

This is the wrong question. You have to understand that the developer and the graduate student are playing different games. One of them is optimizing for producing software, and the other is optimizing for producing papers. Both of these are really hard and intellectually deep activities. Both of them, when done well, can be very useful for society, and society can pay top dollar for them. You can’t ask one of them to play the other’s game. It would be like asking Tiger Woods to write equations and Einsten to play golf. These are just different games.

That’s why I think that Ted Turner’s quote is not always appropriate: not all people want to play together at the same table.

But I have drifted from my original subject.

Once you understand that academia and industry are different, you can adjust your career accordingly. This has important consequences for the interview style – which should be different in academia and industry, and for managing your career, and even for networking with people from these environments. But I hope to write about some of this stuff in another post.

posted by Mihai at 10:13 pm  

Thursday, May 3, 2007

An interview with Leslie Lamport

Leslie Lamport
Leslie Lamport in his office, May 2007

Leslie Lamport is a legendary figure of computing. While he is probably most well-known because of the open-source typesetting LaTeX macro package and book, arguably his most important contributions are in the domain of distributed systems; this is also the subject of this interview.

This interview was conducted in April 2007. Leslie Lamport typed the answers to my questions and reviewed the final editing. I thank my colleagues who have helped me think about these questions.

This interview is licensed under a Creative Commons Attribution License.

Q: You have always been a member of an industrial research lab. What is the difference between an industrial research lab and a university department?

A: Jean Renoir wrote in his autobiography that someone once asked his father, the painter Auguste, why he painted from nature. Renoir père answered that if he were to try painting a tree in the studio, he would be able to draw four or five different kinds of leaves, and the rest would all look like them. But nature creates millions [his count] of different kinds of trees. I like working in industry for the same reason Renoir liked to paint from nature.

Q: What was the first computer you have used/programmed?

A: The IBM 705.

Q: Your paper Time, Clocks and the Ordering of Events in a Distributed System (Lamport Clocks) (1978) taught programmers once and for all how to think about clocks. The key message had been known by physicists since Einstein: that there exist events in a computer system which do not occur one before another, (i.e., time is not a total order).

A: I hope it didn’t teach anyone once and for all how to think about clocks. For one thing, I’ve written papers describing other ways to think about them. The most important idea in that paper is thinking about distributed systems in terms of state machines — an idea that most people who read the paper seem not to notice.

Q: The clocks are really “stealing the show” in this paper, and I can understand why people can overlook the state-machine. Could you explain the essence of the state-machine idea?

A: Almost any system can be described as a state machine. So an easy way to implement a system is to describe it as a state machine and use a general algorithm for implementing state machines.

Q: The Byzantine Generals Problem paper (1982) describes the first provably correct algorithm for making several computers agree when some of them may give deliberate wrong answers. What are the its practical applications?

A: The only practical applications I know of are in real-time process control — in particular, for systems that fly airplanes.

Q: The Part-Time Parliament (Paxos), (1989 and 1998) paper shows how to make a (server) program more reliable by making several copies, which continue to operate as long as a majority of them are functioning correctly. Paxos is deployed in the Google Chubby lock server and in the Microsoft Autopilot cluster management service from Live Search. Where else is Paxos deployed?

A: The problem that Paxos solves is common to a wide range of distributed systems. The normal-case behavior of Paxos is what sensible programmers come up with when faced with the problem. If those programmers also succeeded in correctly handling the abnormal cases, then they would almost certainly have some variant of Paxos. Of course, they almost certainly won’t come up with Paxos unless they’ve already learned about it. I don’t know how many programmers have. As of a few years ago, I knew of two or three groups at Microsoft who were planning to use Paxos in their next releases. I don’t know if they did.

Q: If I understand the history right, Paxos was appreciated first by the system builders, and later by the theoretical community, despite one of its main contribution being the formal correctness proof. Why did things happen this way?

A: If that is indeed the case, then I suspect Butler Lampson was responsible. He is the only person who understood the algorithm and its significance as soon as he saw it. He advocated essentially the state machine approach and the use of Paxos as the general state-machine implementation. His influence was greater among system builders than among theoreticians.

Q: I noticed that you worked on an operating system (1962—1965). What large programs have you implemented, besides LaTeX — A Document Preparation System (1985)?

A: I haven’t implemented any programs that my colleagues would consider large (LaTeX included). People seem not to appreciate the virtues of writing small useful programs. Years ago, Butler Lampson proposed a project for capturing every keystroke and mouse click and permitting you to restore your computer to its state at any instant. This would have been a large, multi-person project. I thought that was a neat idea. Since all machine state that I ever needed to restore was contained in files that I created with Emacs, I spent an hour writing some Emacs macros that called RCS to checkpoint files whenever I saved them. (Since I grew up in the days when computers crashed frequently and am in the habit of saving the file I’m editing every minute or so, that was often enough. Otherwise, it would have been easy to do the checkpointing on some other regular basis.) I can therefore recreate the state of any of the source files I’ve written since 1993 within about a minute of any given time. I told my colleagues about my macros, but no one was ever interested. Grand projects are much more exciting.

Q: All throughout your writings you emphasize the necessity not only of coming up with sound algorithms for solving problems, but also of proving formally, mathematically, their correctness. Not just using program testing, but using mathematical reasoning for all possible circumstances. However, the currently available formal methods are unable to prove correctness of large software systems, such as real operating systems. What is your advice to software developers for bridging this gap between algorithms (which can be analyzed) and full software systems?

A: You seem to be implicitly asserting the usual argument that program verification cannot prove the correctness of a complete operating system, so it is useless for real systems. The same reasoning would hold that because you can’t implement a complete operating system in C (since you need a fair amount of assembly code), C is useless for building real systems. While this argument has obviously never been applied to C, it has in fact been used to dismiss any number of other programming languages.

People fiercely resist any effort to make them change what they do. Given how bad they are at writing programs, one might naively expect programmers to be eager to try new approaches. But human psychology doesn’t work that way, and instead programmers will find any excuse to dismiss an approach that would require them to learn something new. On the other hand, they are quick to embrace the latest fad (extreme programming, templates, etc.) that requires only superficial changes and allows them to continue doing things basically the same as before. In this context, it is only fair to mention that people working in the area of verification are no less human than programmers, and they also are very reluctant to change what they do just because it isn’t working.

The fundamental idea behind verification is that one should think about what a program is supposed to do before writing it. Thinking is a difficult process that requires a lot of effort. Write a book based on a selection of distorted anecdotes showing that instincts are superior to rational judgment and you get a best seller. Imagine how popular a book would be that urged people to engage in difficult study to develop their ability to think so they could rid themselves of the irrational and often destructive beliefs they now cherish. So, trying to get people to think is dangerous. Over the centuries, many have been killed in the attempt. Fortunately, when applied to programming rather than more sensitive subjects, preaching rational thought leads to polite indifference rather than violence. However, the small number of programmers who are willing to consider such a radical alternative to their current practice will find that thinking offers great benefits. Spending a few hours thinking before writing code can save days of debugging and rewriting.

The idea of doing something before coding is not so radical. Any number of methods, employing varying degrees of formalism, have been advocated. Many of them involve drawing pictures. The implicit message underlying them is that these methods save you from the difficult task of thinking. If you just use the right language or draw the right kind of pictures, everything will become easy. The best of these methods trick you into thinking. They offer some incentive in the way of tools or screen-flash that sugar coats the bitter pill of having to think about what you’re doing. The worst give you a happy sense of accomplishment and leave you with no more understanding of what your program is supposed to do than you started with. The more a method depends on pictures, the more likely it is to fall in the latter class.

At best, a method or language or formalism can help you to think in one particular way. And there is no single way of thinking that is best for all problems. I can offer only two general pieces of advice on how to think. The first is to write. As the cartoonist Guindon once wrote, “writing is nature’s way of showing you how fuzzy your thinking is.” Before writing a piece of code, write a description of exactly what that piece of code is supposed to accomplish. This applies whether the piece is an entire program, a procedure, or a few lines of code that are sufficiently non-obvious to require thought. The best place to write such a description is in a comment.

People have come up with lots of reasons for why comments are useless. This is to be expected. Writing is difficult, and people always find excuses to avoid doing difficult tasks. Writing is difficult for two reasons: (i) writing requires thought and thinking is difficult, and (ii) the physical act of putting thoughts into words is difficult. There’s not much you can do about (i), but there’s a straightforward solution to (ii) — namely, writing. The more you write, the easier the physical process becomes. I recommend that you start practicing with email. Instead of just dashing off an email, write it. Make sure that it expresses exactly what you mean to say, in a way that the reader will be sure to understand it.

Remember that I am not telling you to comment your code after you write it. You should comment code before you write it.

Once you start writing what your program is supposed to do, you will find that words are often a very inconvenient way to express what you want to say. Try describing the mean of a set of numbers in words. If you succeed in describing it precisely and unambiguously, you’ll find that you’ve written a formula as a sentence. This is a silly thing to do. In a great many cases, mathematics is a much more convenient language than English for describing what a piece of code is supposed to do. However, it is only a convenient language if you are fluent in it. You are undoubtedly fluent in arithmetic. You have no trouble understanding

The mean of the numbers a1, … , an equals (a1 + … + an) / n.

The kinds of things you need to describe in programming require more than simple arithmetic. They also require simple concepts of sets, functions, and logic. You should be as familiar with these concepts as you are with arithmetic. The way to achieve familiarity with them is the same way you did with arithmetic: lots of practice.

As you get better at using math to describe things, you may discover that you need to be more precise than mathematicians usually are. When your code for computing the mean of n numbers crashes because it was executed with n = 0, you will realize that the description of the mean above was not precise enough because it didn’t define the mean of 0 numbers. At that point, you may want to learn some formal language for writing math. But if you’ve been doing this diligently, you will have the experience to decide which languages will actually help you solve your problem.

Q: You performed an interesting experiment analyzing quicksort implementations that you found on the web using a search engine.

A: I did a Web search on “quick sort” and tested the first 10 actual algorithms that came up. Half of them were wrong. Every one that was written in pseudo-code, and hence could not have been tested, was wrong. Not one Web page made any attempt to show that the algorithm worked, although several of those pages were notes from computer science courses [at universities].

Q: You have written a book (2002) and many papers about TLA+, which is a very high-level specification language. What is TLA+ good for, and what isn’t it good for?

A: TLA+ is good for writing formal, high-level descriptions of certain aspects of concurrent and distributed systems. The major part of TLA+ (in terms of the number of constructs and operators) consists of a language for ordinary math. Since ordinary math is wonderful for a great many things, TLA+ is potentially useful for any task that requires very rigorous mathematical statements. For which of those tasks it is actually good is something that can only be discovered by trying.

Q: Should programmer’s and computer science education emphasize distributed system concepts more in the Internet era? What are some of the fundamental concepts one should internalize?

A: I’ve spent much of my career in search of fundamental concepts. I haven’t found any that compare with concepts such as energy and momentum in physics. There are a few results that seem to be of fundamental importance, such as the Fischer, Lynch, Patterson theorem. One important concept that is underappreciated in the world of distributed systems is that of invariance.

Q: Is invariance used in distributed algorithms in a different way from traditional algorithms?

A: It’s used in the same way as for any concurrent algorithm. The term invariant is used somewhat differently when talking about sequential algorithms — meaning an assertion that is true when control is at some point in the program rather than one that is true at all times.

Q: What software tools do you use in your day to day work?

A: Emacs, LaTeX, SANY (the TLA+ parser) and TLC (the TLA+ model checker).

Q: What hard open problems are there remaining in distributed systems?

A: On the theory side, recognizing a problem has generally been harder than solving it. Since you are asking about problems that are already recognized to be problems, the answer is probably none. On the practical side, I don’t think I have any more insight into that than most other people.

Leslie Lamport in 2003

Photo (c) Keith Marzullo, 2003

Q: A big part of your research effort seems to be building the appropriate abstraction of reality, which is accurate enough to model essential details, but also precise enough to be modeled mathematically. How does one learn the art of abstraction?

A: An engineer at Intel once told me that writing TLA+ specs taught them to think more abstractly about their systems, which improved their designs. But I don’t know if that made them any better at looking at a completely new situation and abstracting its essence. It’s probably a skill one is either born with or learns at an early age.

Q: Did you get lots of fan email because of LaTeX?

A: Perhaps one or two a week.

Q: What are some things you would do if you were without access to a computer for three months?

A: Think and try to find some pencils and paper.

posted by Mihai at 10:02 pm  
Next Page »

Powered by WordPress