Music Typesetting on Linux: The People Behind LilyPond By Chris Cannam One of the best-known and most ambitious music programs for Linux is the LilyPond score engraving system. Unlike other typesetting software like Finale or Sibelius, LilyPond is not a score editor, and it has no GUI — instead it aims to start from a simple textual description of the music and turn it into the highest possible quality output, automatically. LilyPond is the result of several years of work by Han-Wen Nienhuys and Jan Nieuwenhuizen. In this extensive interview, Linux Musician's Chris Cannam talks to them about recent and future directions for the project. Chris: I recently found a file of music examples I had printed out from LilyPond, probably in 1998. The LilyPond printouts looked less professional than they would be today, but many of the capabilities of today's software were in place. What have you been doing for the last six years? Han-Wen: About five years ago we were working up to release 1.0. Our target was to have a usable program that could produce basic music notation, where we defined “basic” as “whatever is in our set of simple test pieces”, and usable was “will not dump core, mostly.” We succeeded, but of course it didn't work very well for things that weren't in our test-pieces. By that time, we were also reaching the bounds of what was possible in our model of notation, an object-oriented model, hard-coded in C++. So we decided to integrate the GNU's GUILE library, a Scheme interpreter which was specifically designed to extend programs. We spent the next two to three years refactoring our C++ code into Scheme functions. This resulted in a more flexible, more efficient and better maintainable program. “We knew what ‘publication quality’ engraving meant, and were determined to perfect Lily into producing that.” The second big change was catalyzed by an invitation to join a workshop in Firenze, Italy, organized by Nicola Bernardini of AGNULA fame, then director of Centro Tempo Reale. At the workshop we met Nicola, a few top-notch engravers, and an editor for Universal Edition, an Austrian publisher that does a lot of contemporary music. We had the chance to discuss LilyPond with several experts. On the one hand, we were thrilled that they took us seriously, but on the other hand they pointed to several inadequacies in our output. We arrived back home a great deal wiser. We knew what “publication quality” engraving meant, and were determined to perfect Lily into producing that. Since we like hand-engraved music, we started reproducing simple pieces in LilyPond and comparing the output side-by-side. By doing close comparisons, we learned how music should really look, and we fixed all the deficiencies that we found. In anything that you write, there will always be a neat, simple, small idea that is obscured by crufty implementation, bad design or suboptimal algorithms. According to me, the real art of programming is recognizing the neat idea, and being ruthless enough to redo all the other bad bits. Since we're writing new code all the time, we also have continue to refactoring everything, and this how we have spent the last few years: coding new stuff, and refactoring old stuff. We also did a lot with the documentation. Some of our users complain about the current documentation, and they're probably right, but what we have now is light-years ahead of the manual a few years ago. Your website features an essay on music typesetting that is quite critical of other software, with an entertaining piece of bad typesetting from Finale. You make an effort to explain that it isn't just an exceptional example — but surely if programs like Finale and Sibelius are so widely used by good musicians, they can't really be that bad? The default output of Finale is indeed shockingly bad, which is why almost all other vendors routinely compare their packages to Finale. Of course, that's why we use it too. The default layout of Sibelius is not very elegant, but at least it's usable. A Sibelius sample would be a less entertaining and less convincing demonstration. A lot of readers misinterpret the Finale example. Finale is a powerful package, and in the hands of a good engraver — which is something different from a good musician — it can produce very good scores. However, the friendly GUI is misleading: you need a lot of time and expertise to get decent scores from Finale. We've seen the engraver for the late Luciano Berio in action, and he uses Finale exclusively, but only as a glorified drawing program. We recently interviewed the music engraver Mike Mack Smith, who uses a package called Amadeus. How do you feel LilyPond stands up against venerable professional packages like Amadeus or SCORE technically? “ Concentrating on hard problems instead of simple ones is one of the things that makes life interesting for me.” For development purposes, we look at hand-engraved scores. Those are our “gold standard”, so looking at SCORE or Amadeus output is of no use to us. I do have a big binder with all the sample printouts that I could lay my hands on, including Amadeus and SCORE. In terms of default graphical quality LilyPond handily beats both of them, but that's an unfair comparison. Like Finale, Amadeus and SCORE are very powerful packages, but they do not mislead users with a friendly GUI. They are aimed at professionals that need to create perfect prints in a short time-span. LilyPond is also focused on making perfect prints, but at the same time, the software should do The Right Thing. Sometimes we have to disappoint users, because implementing the Right Thing can take a long time. Speaking of doing the right thing, your essay on typesetting contains an example from LilyPond that I'd find unacceptable because it has an arpeggio mark too close to a note. You'd expect that to be easy to fix in a GUI program, because the arpeggio is a graphical object: select it and move it. In LilyPond it's less obvious what to do. It looks as though you're making things harder for yourselves by concentrating on an impossible problem (knowing exactly what the user means) in preference to a simpler, widely applicable one (making it easier for the user to refine what they mean). Concentrating on hard problems instead of simple ones is one of the things that makes life interesting for me. In LilyPond an arpeggio is also a graphical object, and correcting this mistake is actually very similar to the GUI approach: select an object  Arpeggio and move it  \override Arpeggio #'extra-offset = #'(0.5 . 0.0) This moves the symbol half a staff space to the right, in LilyPond 2.1 syntax. The difference is not conceptual, but practical: without a GUI, you have figure out the exact textual command, and where to insert it in the input file. Is this an example of what you meant by disappointing users — by making simple things a bit harder to do — because implementing the Right Thing can take a long time? I was going to say that it's not at all what I meant, but on second thought, it's actually not such a bad example. I wrote positioning code which assumes that the wiggle is left of the chord. The correct fix is to use a more general positioning mechanism, which I've now done. So now you can also tune the amount of space between the chord and the arpeggio with  \override Arpeggios #'padding =#<..a number..> This is a specific illustration of the general situation: a user complains of something that isn't working well (arpeggio positioning), and wants to have a quick and dirty fix (moving objects about manually). From my perspective, hand-holding a user and teaching him to do obscure manual tweaks is bad idea: it takes me a lot of my time, and it doesn't help the next guy that runs into the same problem. In effect, it wastes my time. The proper solution is to fix this problem for once and for all, make the solution available to everyone else, and make sure that everyone can also find it. In this case, it amounts to a small change in the C++ sources. Often, the proper thing is writing or polishing documentation. From my perspective, a user asking a question is an indication that the documentation is still lacking. Unfortunately some problems are either too specific for me to spend a lot of time on, or a good solution is too difficult. As an example of the latter, the formatting of slurs is still far from optimal in LilyPond. I expect that this module will have to be rewritten, but that is a big long-term project, so I have to disappoint users that complain of the shape of their slurs. You place a lot of emphasis on matching existing hand-engraved scores. Isn't it possible that rather than creating a program that can do “correct” musical typesetting, you are in fact training a program to specialise in imitating Bärenreiter editions? We like Bärenreiter especially, but we have worked from Breitkopf & Härtel and Ed. Peters samples as well. We wouldn't mind having a “Bärenreiter” score generator, but in practice it's not really feasible. There are compromises in the layout when it comes to selecting weights, distances and glyph shapes, and these compromises vary from score to score — I suppose it depends on who did the plate engraving. In the end, we have to find our own compromise, such that LilyPond output is consistent with itself in all typographical aspects. Regarding the Solo Cello Suites (Barenreiter BA 350, our engraving bible), I think we have a slightly heavier look, which is probably related both to personal preferences and our black note head, which is more elongated and therefore heavier. How far do you feel able to judge quality of output separately from your own personal typographic preferences? After several years of looking at scores from a typographical perspective, Jan and I definitely feel qualified. Our belief is reinforced when we talk to professionals: they take us seriously, and share many of our views. In addition, we have reached a point where we can sometimes spot subtle errors in their engraving work. Users may get fuzzy feelings from this knowledge, but of course it doesn't really buy them much. That's why we try to document as much as typographical knowledge as much as possible. This information is contained in comments to the program and font source code, and especially in our regression-test collection: we have a set of LilyPond source files that test every aspect of the typesetting engine. The comments to those files document our beliefs when it comes to proper typesetting. Do you usually feel that when you've settled on something and encoded it as your preferred test output, that decision is sound and final? Yes, I often feel like that, but after a while, users always pop up with obscure examples where our approach fails. If possible, I try to enhance Lily to deal with their complex cases too. So how open is musical typesetting to personal tastes? Jan:I think that lots of personal taste goes in tiny details. Professional engravers sometimes design their own font. Just enough to add your “fingerprint” to an engraved score, but sublte and tasteful enough not to annoy the trained eye. I think engravers agree on the “big things”, which is where the challenge still is for notation software. You drew your own font for LilyPond, didn't you? Which symbols caused the most trouble? Han-Wen: Yes, in a galaxy far away, and long ago, LilyPond used the MusiXTeX font. But we were unable to get a licence for the font that was as permissive as we needed for the rest of Lilypond, so we started writing our own font. We started out with the basics (note heads, accidentals), and gradually replaced all the symbols over time. “The G clef has a nice combination of poise and flourish, but the bottom crook is still out of balance.” I find the most elegant symbols the most difficult to draw. In particular, I have put in a lot of work in the flat symbol and the G clef. When I prepare slides for a presentation on Lily, I show some glyphs in magnification. Usually, I end up tweaking with the parameters of those glyphs. As you can guess, I'm still not satisfied with them. For example, the G clef has a nice combination of poise and flourish, but the bottom crook is still out of balance. I expressly say we started “writing” a font instead of “drawing”, since the font is also a program, written in METAFONT, Donald Knuth's system for designing fonts. The nice thing about METAFONT is that the font is parameterized, making it quite easy to have slightly different shapes depending on the design size. If you look closely at good music prints, then you will notice that smaller print (i.e. smaller staff sizes) uses heavier staff lines. Of course since the music font must match the staff size, smaller fonts should be comparatively chubbier. With METAFONT we have successfully implemented this in the LilyPond development series, and I think that LilyPond is the only engraving system that sports a music font in several design sizes. Do you know of many people using LilyPond professionally? Not of many people, but there are people that do the odd paid engraving job. I estimate that it will take another few years before we will start to see LilyPond scores from major publishers. We do see that LilyPond use is rising everywhere, not only from our web and email statistics, but also because I bump into more and more users in Real Life, for example at orchestra rehearsals. Your website says you're available to do paid work based on your LilyPond expertise. Have you had much interest? No, but I haven't solicited it actively. What sort of work can you imagine people needing? We added the remark [about paid work] to the FAQ partly on impulse, and partly to see to if it is possible to make a part-time job out of LilyPond consultancy. I have to admit that we haven't thought very deeply what kind of services we could deliver. One advantage is that it means we have a valid argument to ignore feature requests that aren't generally useful. In the past we implemented things because they seemed “cool to have”: some of our conversion utilities, and “easy notation” note-heads (they have the name of the note printed inside the head). After the work was done we might find that the person requesting it had disappeared and wasn't so interested after all. Nowadays we would request a fee for such things, and that would sift the pie-in-the-sky dreamers from users with serious needs. If my hypothetical small typesetting company tried out LilyPond and needed some support or training — would you be available to do that? Typesetting companies have an obligation to deliver, so you probably would not base a company on LilyPond unless you were sure that you yourself were capable of making LilyPond deliver. Some users do their engraving professionally in LilyPond only because they feel sufficiently comfortable with the low-level commands for tuning output. As Mike Mack Smith explained, the engraving business has tight margins and I suppose that individual engravers don't have much money to spare for consultancy. From our point of view, it means that we have to figure out who else would want to fund work on LilyPond. Broadly speaking, LilyPond offers an open-source/free software solution to music document production, archival and analysis. Parties that have interests in these areas are potential sponsors. Music technology research groups might be interested in a system for storing and producing musical documents; libraries might be interested in infrastructure to build digital on-line libraries. There are foundations to stimulate general participation in and production of art: I imagine some of them might find the effect of LilyPond on performing arts worthwhile. Big publishers can save money if typesetting is done more efficiently, which LilyPond could do for them. Since they have more money to invest in such long-term projects, they might also be an interested party. In some far away future, I hope that any or all of these institutions would fund further work on Lily. Most music being written and listened to today is not distributed in written form, and a lot of (for example) hit songs make pervasive use of things like samples and effects that can't be recreated effectively from a written representation. And of course music can be very easily distributed electronically at any stage of completeness. Do you think traditional written music has any future at all, other than to communicate with the past? Your question touches upon another issue: most people, especially non-musicians, see music as a product that takes the form of electrically generated sound-waves, and stored on hard-disks or shiny disks, while to you and me, music is a way expressing myself. I think that the real question should be “Does live performance have a future?” I think it has: making music is deeply satisfying, all the more when done in front of an audience, and listening to live music is also much better than to recorded music, if only because it forces listeners to focus their attention. Written music, i.e. sheet music, is crucial for all music that does not have a simple structure — basically anything besides light music — so I don't see written music going away. In fact one of the motivations behind putting so much energy in LilyPond is giving written music a better future. Some day far away, LilyPond is “done”, and then the Mutopia Project can really take off. I hope that the availability of good software for publishing music will lead to more music being accessible, and also to more newly written music. “Finale and Sibelius cater for the simple forms of light music. No software caters for the needs of classical composing.” There is still a lot of work that can be done in that area. No software caters for the needs of “classical” composing. Of course there are various sequencers, and packages like Finale and Sibelius (and of course, Rosegarden!), but they cater for the relatively simple forms of light music. Even in the age of computers, classical composers still write music by scribbling stacks of note-paper full with ideas and fragments, and piecing those bits together to a full score. It's a very laborious process, but computers cannot give them the same overview as a bunch of paper fragments spread out over a desk would do. Are the needs of classical composing something that you want LilyPond ultimately to address? No, I don't think LilyPond will ever address that. LilyPond is an exercise in reducing the musical input as far as possible, and we have reduced the links between different music fragments to their relationship in time: are two fragments played sequentially, in parallel, are they repeats, are they condensed like tuplets? This makes for a very concise and elegant format, but I don't think it reflects how composers think of music. To a composer, one fragment of music may have many relations to another fragment. For example, a motive played by one instrument may be a continuation of melodic line in another instrument. At the same time, the same fragment might have a function in the harmony, and be thematically related to other motives. In a music composition system, motives would be entered separately, and connected with all these relations. A complete score is just an enormous bunch of motives, connected in many ways, and it could be visualized in many ways. A printable part is a just one view of a of the score: one where all motives for one instrument are strung together. I heard that you play a lot of modern classical music in ensemble yourself. Yes, I play French Horn in the Utrechts Blazers Ensemble, which is a student wind ensemble dedicated to 20th-century music. Recently I also joined the VU-Orchestra, a very good amateur symphony orchestra in Amsterdam. They play late-romantic and modern repertoire. Aside from the packages that manipulate it, how far is classical notation itself capable of catering to the needs of composers? Do you play much music that demands particular specialised notations? The stuff in the UBE is fairly “normal”, at least when it comes to notation. Our conductor is a big fan of John Adams, Louis Andriessen and Stravinsky, so we play a lot of music from them and their students. These are eclectic composers: they blend many musical styles (ranging from medieval hoketus via french baroque to boogie-woogie) into new pieces. The pieces are largely based on traditional music so they look fairly normal except for the frequent time signature changes. I think the weirdest thing I've ever played in the UBE is “De Volharding” ("Perseverance"), a piece written as inauguration for ensemble “De Volharding” by Louis Andriessen. It's an archetypical minimal music piece, where everyone in the ensemble plays ad libitum from a set of ostinato patterns. The patterns slowly change over the course of minutes in a group process. I had to return the parts, but the following LilyPond notation might give an idea of what was written. It had fragments like  f16[ g f g] \bar ":|" "repeat approx 200 times" "change gradually into" g16[ f g f]  The majority of the parts that we have to play from are rental material, and not performed very often. If they're not classics (like Stravinsky or Poulenc), the parts are written by hand. “Most modern music has evolved from existing old music, and so has its notation.” Getting back to the general question of new notation: people like to point to funky, weird modern notation as a problem area for LilyPond, but people seem to forget that weird notation is only necessary for weird music. Most modern music has evolved from existing old music, and so has its notation. In any event, it is my personal opinion that we should do only one step at a time. First we should have a good understanding of producing traditional notation with computers. Only then are we in the position to explore in which direction to improve or extend notation. MusicXML recently made the news when Recordare announced the release of version 1.0 of the specification. What do you think of it? It's nice that there is finally a format that is supported by more than one package, but I am not terribly impressed. In my opinion, any file format that claims to be universal should have two properties: it should have an expressive structure, so other formats can be expressed in it, and it should be as lean as possible, so that converting from other formats amounts to removing information. I think that MusicXML fits neither. I have the utopian vision of a “universal” music format. That would be a format capable of expressing all kinds of written music while being suitable for machine manipulation. Such a format must not have redundant information, as that gets in the way of manipulation. For example, in LilyPond you can define a music fragment,  frag = \notes { c'4 d'8 e' f'2 } and shift it by a beat, doing  newfrag = \notes { s4 % shift by quarter note \frag } Many other formats, including MusicXML, define a fragment of music as a list of measures, where each measure may contain notes. This structure gets in the way of manipulation: when you shift a fragment by a quarter note, the bar lines and beaming change completely. Lean data structures for flexibility is an example of duality, and this concept is much more general. In object-oriented programming, base classes always have fewer data members than derived classes, and for that reason, one can perform more operations on them. A mathematical example of duality is C∞, the space of infinitely smooth functions: it is smaller than C0, the space of continuous functions, and therefore, more operations can be applied to C∞. Aside from theoretical aspects, a lean data structure is practically useful. Converting from MusicXML to LilyPond is rather easy: parse the XML, discard everything but pitches and durations, and dump those to a .ly file. It does change the problem: the more you remove from a data format, the more advanced the software has to become to fill in the missing details. The big problem is not so much defining the format, but writing the software to recreate the notation for a piece of music. For practical use as an interchange format, surely MusicXML only has to be easy to write and parse, and to be substantially more expressive than MIDI? Perhaps. Maybe I just have trouble comprehending the concepts of “music format” and “easy to parse”. Music is a time-based thing, so a music format should support parallel and sequential composition. As a BNF grammar, you would have  Music:: NOTE | SEQ Music* | PAR Music* This is basically what the LilyPond format is all about, and I can't see how you could make it much simpler than this. It's a context-free grammar. How much easier parsing do you need? You see defining the format as 10% of the effort and making good use of it as the other 90%, but I imagine from the point of view of MusicXML, defining a format that could work is 10% of the effort and getting a majority of software to agree to and use it is the other 90%. Does the relative success of MusicXML in that sort of environment, particularly compared to earlier attempts like NIFF, not make it seem that it's a thing people do actually want? I think you should ask this question to “people”. As a developer of notation software, I prefer to deliver the features that make users happy. Then they will continue using LilyPond. By contrast, the main asset of having MusicXML-output is that users can migrate away from LilyPond more easily, and that doesn't give me warm fuzzies. And, is MusicXML so successful? Sure, the diagram at www.recordare.com has a neat little box saying MusicXML in the center, and many neat little arrows going to neat little boxes listing other software, but are people using it all that much? Jan: In my view, MusicXML is a job poorly done. You estimate an effort of 10% going into the design of the format itself, and that shows. Used as a notation interchange format, it is a step up from MIDI. MIDI has all the notes, a bit of tempo, a broken key signature. That's maybe 50% of the “music”; the other 50% is lost. MusicXML adds another 25%: most notably, articulations. Now we're already at 75%, yay! It also adds unnecessary stiffness and clumsy verbosity and buzzword compliance. Because of these facts, I'm afraid that MusicXML's lifespan may well be an order of magnitude shorter than that of MIDI — about three rather than thirty years. After some user pressure we implemented MIDI import for LilyPond. In practice, re-entering a piece in LilyPond is often quicker than adding to and touching-up the MIDI import result. As a consequence I consider the MIDI import filter a mostly wasted exercise. The question we have to ask ourselves is whether the result/effort equation for supporting MusicXML is significantly more favourable than it was for MIDI. If we can postpone worrying about and supporting MusicXML until its successor comes along or until it gets fixed, that may well be better for our users. For LilyPond, MusicXML could work as a better way to interchange music notation with other programs. But what is it what people want from us: support any exchange format that is a bit richer than MIDI and widely adopted? Or would they rather have LilyPond draw fret diagrams? Have any of your decisions in the design and development of LilyPond turned out to be big mistakes? Han-Wen:Like I said before, we are rewriting the internals of LilyPond on a continuous basis, so we see the results of design decisions very directly in the code. Since bad ideas lead to bad code, they are refactored out relatively quickly. Unfortunately that doesn't hold for syntax: for compatibility reasons we cannot change the format too often. One big thing that took us a long time to repair was the syntax for chords. There used to be no distinction between a chord (a set of pitches) and simultaneous music (pieces of music playing together). At first this was rather neat, since having no chords made syntax slightly leaner. Unfortunately, it also resulted in many clumsy inconsistencies from the user's point of view. The main point of the 1.8/1.9/2.0 releases was to fix this problem in a graceful way. I was certainly rather surprised to see such major changes to the syntax in 2.0. Is that likely to happen again in future major releases? “We have to make future users happy too, and that's a much bigger group than the current users.” Frankly, I think that we are starting to hit rock-bottom when it comes to simplifications of the syntax. Nevertheless, if we can think of improvements, we will surely implement them: we have to make future users happy too, and that's a much bigger group than the current users. Before anyone gets the wrong impression, we don't leave current users out in the cold. We have a conversion script that handles most of the syntax changes transparently. How much time do you each manage to spend on Lilypond? Han-Wen:It depends on my personal situation. It has ranged between 10 and 50 hours per week. When I started LilyPond, I had loads of free time (being a lazy MSc. student), and for the last half year I've been unemployed, which allowed me to spend obscene amounts of time on Lily. This will change soon, though. I'm starting as a IT/logistics consultant in mid-April, so it will probably go down to a decent 10 to 15 hours a week. Jan:If I'm lucky about 12-20 hours a week. Do you have many other contributors? Han-Wen: The hard-core coding work is basically a two-man show; we do get support from others with porting, packaging, translating and proofreading. Also, some special features, like tablature and ancient notation, have been contributed by other people. What sorts of music do you listen to, rather than play? Jan: Anything interesting, really. I like medieval choirs but also modern stuff, especially minimal music: even what Han-Wen plays, when I get to listen. Han-Wen: Live music usually boils down to going to concerts where friends play, so that tends to be classical. My CD collection is quite varied, including jazz, rock and lots of “classical” dating from between 1450 and 2001. I have to admit that I don't often listen to CDs any more. Background music is distracting when you're not listening, and when I have the time to listen, I'd rather be coding or, even better, playing music myself. You can hear the VU Orchestra, with Han-Wen on French Horn, performing in the Concertgebouw in Amsterdam on Saturday June 26, 2004. The programme includes Lutosławski's Concerto for Orchestra and Bartok's Concerto for Viola.