Monday, October 20, 2008

The Universal Design Pattern



  This idea that there is generality in the specific is of far-reaching importance.
Douglas Hofstadter, Gödel, Escher, Bach


Note: Today's entry is a technical article: it isn't funny. At least not intentionally.

Update, Oct 20th 2008: I've added an Updates section, where I'll try to track significant responses, at least for a week or so. There are three entries so far.

Contents



Introduction


Today I thought I'd talk about a neat design pattern that doesn't seem to get much love: the Properties Pattern. In its fullest form it's also sometimes called the Prototype Pattern.

People use this pattern all over the place, and I'll give you a nice set of real-life examples in a little bit. It's a design pattern that's useful in every programming language, and as we'll see shortly, it's also pretty darn useful as a general-purpose persistence strategy.

But even though this pattern is near-universal, people don't talk about it very often. I think this is because while it's remarkably flexible and adaptable, the Properties Pattern has a reputation for not being "real" design or "real" modeling. In fact it's often viewed as a something of a shameful cheat, particularly by overly-zealous proponents of object-oriented design (in language domains) or relational design (in database domains.) These well-meaning folks tend to write off the Properties Pattern as "just name/value pairs" – a quick hack that will just get you into trouble down the road.

I hope to offer a different and richer perspective here. With luck, this article might even help begin the process making the Properties Pattern somewhat fashionable again. Time will tell.

Three Great Schools of Software Modeling


Before I tell you anything else about the Properties Pattern, let's review some of the most popular techniques we programmers have for modeling problems.

I should point out that none of these techniques is tied to "static typing" or "dynamic typing" per se. Each of these modeling techniques can be used with or without static checking. The modeling problem is orthogonal to static typing, so regardless of your feelings about static checking, you should recognize the intrinsic value in each of these techniques.

Class Modeling


You know all about this one. Class-based OO design is the 800-pound gorilla of domain modeling these days. Its appeal is that it's a natural match for the way we already model things in everyday life. It can take a little practice at first, but for most people class modeling quickly becomes second nature.

Although the industry loves OO design, it's not especially well liked as an academic topic. This is because OO design has no real mathematical foundation to support it — at least, not until someone comes along and creates a formal model for side effects. The concepts of OOP stem not from mathematics but from fuzzy intuition.

This in some sense explains its popularity, and it also explains why OOP has so many subtly different flavors in practice: whether (and how) to support multiple inheritance, static members, method overloading vs. rich signatures, and so on. Industry folks can never quite agree on what OOP is, but we love it all the same.

Relational Modeling


Relational database modeling is a bit harder and takes more practice, because its strength stems from its mathematical foundation. Relational modeling can be intuitive, depending on the problem domain, but most people would agree that it is not necessarily so: it takes some real skill to learn how to model arbitrary problems as relational schemas.

Object modeling and relational modeling produce very different designs, each with its strengths and weaknesses, and one of the trickiest problems we face in our industry has always been the object-relational mapping (ORM) problem. It's a big deal. Some people may have let you to believe that it's simple, or that it's automatically handled by frameworks such as Rails or Hibernate. Those who know better know just how hard ORM is in real-world production schemas and systems.

XML Modeling


XML provides yet another technique for modeling problems. Usually XML is used to model data, but it can also be used to model code. For instance, XML-based frameworks such as Apache Ant and XSLT offer computational facilities: loops or recursion, conditional expressions and setting variables.

In many domains, programmers will decide on an XML representation before they've thought much about the class model, because for those domains XML actually offers the most convenient way of thinking about the problem.

The kinds of data that work well with XML modeling tend to be poorly suited for relational modeling, and vice-versa, with the practical result that XML/relational mapping is almost as infamously thorny as O/R mapping.

And as for XML/OO mapping, most of us tend to treat it as a more or less solved problem. However, in practice there are several competing ways of doing XML/OO mapping. The W3C DOM and SAX enjoy the broadest use, but they are both sufficiently cumbersome that alternatives such as JDom and REXML (among others) have gained significant followings.

I mention this not to start a fight, but only to illustrate that XML is a third modeling technique in its own right. It has both natural resonances and surfaces of friction with both relational design and OO design, as one might expect.

Other schools


I'm not claiming that these three modeling schools are the only schools out there – far from it! Two other obvious candidates are Functional modeling (in the sense of Functional Programming, with roots in the lambda calculus) and Prolog-style logical modeling. Both are mature problem-modeling strategies, each with its pros and cons, and each having varying degrees of overlap with other strategies. And there are still other schools, perhaps dozens of them.

The important takeaway is that none of these modeling schools is "better" than its peers. Each one can model essentially any problem.

There are tradeoffs involved with each school, by definition — otherwise all but one would have disappeared by now.

Finding the sweet spot


Sometimes it makes sense to use multiple modeling techniques in the same problem space. You might do a mixed XML/relational data design, or a class-based OO design with Functional aspects, or embed a rules engine in a larger system.

Choosing the right technique comes down to convenience. For any given real-world problem, one or two modeling schools are likely to be the most convenient approaches. Exactly which one or two depends entirely on the particulars of the problem.

By convenient, I mean something different from what you might be thinking. To me, a convenient design is one that is convenient for the users of the design. And it should also be convenient to express, in the sense of minimalism: all else being equal, a smaller design beats a big one. One way of looking at this is that the design should be convenient for itself!

Unfortunately, most programmers (myself included) tend to use exactly the wrong definition of convenience: they choose a modeling technique that is convenient for themselves. If they only have experience in one or two schools, guess which techniques they'll jump to for every problem they face?

This problem rears its head throughout computing. There's always a "best" tool for any job, but if programmers don't know how to use it, they'll choose an inferior tool because they think their schedule doesn't permit a learning curve. In the long run they're hurting their schedules, but it's hard to see that when you're down in the trenches.

Modeling schools are just like programming languages, web frameworks, editing environments and many other tools: you won't know how to pick the right one unless you have a reasonably good understanding of all of them, and preferably some practice with each.

The important thing to remember is that all modeling schools are "first class" in the sense of being able to represent any problem, and no modeling school is ideal for every situation. Just because you are most comfortable solving a problem using a particular strategy does not mean that it is the ideal solution to the problem. The best programmers aim to master all available techniques, giving them a better chance at making the right choices.

Property Modeling


With this context in mind, I claim that the Properties Pattern is yet another kind of domain modeling, with its own unique strengths and tradeoffs, distinct from all the other modeling schools I've mentioned. It is their first-class peer, inasmuch as it is capable of modeling the same broad set of problem domains.

After we've finished talking about the Properties Pattern in exhausting detail, I think I'll have convinced you of the pattern's status as a major school of modeling. Hopefully you'll also start to have a feel for the kinds of problems it's well-suited to solve – sometimes more so than other schools, even your current favorite.

But before we dive into technical details, let's take a brief peek at a fascinating comparison of Property-based modeling to class-based OO design. It's a non-technical argument that I think has some real force behind it.

Brains and Thoughts


Douglas Hofstadter has spent a lifetime thinking about the way we think. He's written about it perhaps more than anyone else in the past century. Even if someone out there has beaten him in sheer quantity of words on the subject, nobody has come close to rivaling his style or his impact on programmers everywhere.

All of his books are wonderfully imaginative and are loads of fun to read, but if you're a programmer and you haven't yet read Gödel, Escher, Bach: An Eternal Golden Braid (usually known as "GEB"), then I envy you: you're in for a real treat. Get yourself a copy and settle in for one of the most interesting, maddening, awe-inspiring and just plain fun books ever written. The Pulitzer Prize it won doesn't nearly do it justice. It's one of the greatest and most unique works of imagination of all time.

Hofstadter made a compelling argument in GEB (thirty years ago!) that property-based modeling is fundamental to the way our brains work. In Chapter XI ("Brains and Thoughts"), there are three little sections titled Classes and Instances, The Prototype Principle, and The Splitting-off of Instances from Classes that together form the conceptual underpinnings of the Properties Pattern. In these little discussions Hofstadter explains how the Prototype Principle relates to classic class-based modeling.

I wish I could reproduce his discussion in full here — it's only three pages — but I'll have to just encourage you to go read it instead. His thesis is this:
The most specific event can serve as a general example of a class of events.

Hofstadter offers several supporting examples for this thesis, but I'll paraphrase one of my all-time favorites. It goes more or less as follows.

Imagine you're listening to announcers commenting on an NFL (American football) game. They're talking about a new rookie player that you don't know anything about. At this point, the rookie – let's say his name is L.T. – is just an instance of the class "football player" with no differentiation.

The announcers mention that L.T. is a running back: a bit like Emmitt Smith in that he has great speed and balance, and he's great at finding holes in the defense.

At this point, L.T. is basically an "instance" of (or a clone of) Emmitt Smith: he just inherited all of Emmitt's properties, at least the ones that you're familiar with.

Then the announcers add that L.T. is also great at catching the ball, so he's sometimes used as a wide receiver. Oh, and he wears a visor. And he runs like Walter Payton. And so on.

As the announcers add distinguishing attributes, L.T. the Rookie gradually takes shape as a particular entity that relies less and less on the parent class of "football player". He's become a very, very specific football player.

But here's the rub: even though he's a specific instance, you can now use him as a class! If Joe the Rookie comes along next season, the announcers might say: "Joe's a lot like L.T.", and just like that, Joe has inherited all of L.T.'s properties, each of which can be overridden to turn Joe into his own specific, unique instance of a football player.

This is called prototype-based modeling: Emmitt Smith was a prototype for L.T., and L.T. became a prototype for Joe, who in turn can serve as the prototype for someone else. Hofstadter says of The Prototype Principle:

"This idea that there is generality in the specific is of far-reaching importance."


Again, it's a three-page discussion that I've just skimmed here. You should go read it for yourself. Heck, you should read the whole book: it's one of the greatest books ever written, period, and every programmer ought to be familiar with it.

Hopefully my little recap of Hofstadter's argument has convinced you that my calling it the "Universal Design Pattern" might just possibly be more than a marketing trick to get you to read my blog, and that the rest of this article is worth a look.

The Properties Pattern is unfortunately big enough to deserve a whole book. Calling an entire school of modeling a "design pattern" is actually selling it short by a large margin.

Hence, if this article seems excruciatingly long, it's because I've tried to cram a whole book into a blog, as I often do. But look on the bright side! I've saved you a bunch of time this way. This will go much faster than reading a whole book.

Even so, don't feel bad if it takes a few sittings to get through it all. It's still a lot of information. I considered splitting it into 3 articles, but instead I just cut about half of the material out. (Jeff and Joel: seriously. I cut 50%.)

Who uses the Properties Pattern?


I assume I've already convinced you that this pattern is worth learning about, or you'd have left by now. I'm showing you these use cases not to draw you in, but to show you some of the very different ways the pattern can be used.

We could probably find hundreds of examples, but I'll focus on just a handful of real-world uses that I hope will illustrate just how widespread and flexible it is.

Before we start, there are two things to keep in mind. The first is that people have different names for this pattern, because even though it's quite commonplace, there hasn't been much literature on it. One paper calls it Do-It-Yourself Reflection. Another article calls it Adaptive Object Modeling. See Further reading for the few links I could dig up.

Whatever name you use, once you know how to look for it, you'll start seeing this pattern everywhere, wearing various disguises. Now that we have a common name for it, it should be easier to spot in the wild.

The second thing to keep in mind is that the Properties Pattern scales up: you can choose how much to use it in your system. It can be anything from simple property lists attached to a few of your classes to make them user-annotatable, up through a full-fledged prototype-based framework that serves as the foundation for modeling everything in your system.

So our examples will range from small ones to very big ones.

Eclipse


One nice small-scale example of the pattern is the Eclipse Java Development Tools (JDT): a set of classes that model the Java programming language itself, including the abstract syntax tree, the symbol graph and other metadata. This is used by the Eclipse backend to do all the neat magic it does with your Java code, by treating Java source code as a set of data structures.

You can view the javadoc for this class hierarchy at help.eclipse.org. Click on JDT Plug-in Developer Guide, then Programmer's Guide, then JDT Core, then org.eclipse.jdt.core.dom. This package defines strongly-typed classes and interfaces that Eclipse uses for modeling the Java programming language itself.

If you click through any class inheriting from ASTNode you'll see that it has a property list. ASTNode's javadoc comment says:
"Each AST node is capable of carrying an open-ended collection of client-defined properties. Newly created nodes have none. getProperty and setProperty are used to access these properties."

I like this example for several reasons. First, it's a very simple use of the Properties pattern. It doesn't muck around with prototypes, serialization, metaprogramming or many of the other things I'll talk about in a little bit. So it's a good introduction.

Second, it's placed smack in the middle of a very, very strongly-typed system, showing that the Properties pattern and conventional statically-typed classed-based modeling are by no means mutually exclusive, and can complement one another nicely.

And third, their property system itself is fairly strongly typed: they define a set of support classes such as StructuralPropertyDescriptor, SimplePropertyDescriptor, and ChildListPropertyDescriptor to help place some constraints on client property values. I'm not a huge fan of this approach myself, since I feel it makes their API fairly heavyweight. But it's a perfectly valid stylistic choice, and it's useful for you to know that you can implement the pattern this way if you so choose.

JavaScript


At the other end of the "how far to go with it" spectrum we have the JavaScript programming language, which places the Prototype Principle and Properties Pattern at the very core of the language.

People love to lump dynamic languages together, and they'll often write off JavaScript as some sort of inferior version of Perl, Python or Ruby. I was guilty of this myself for over a decade.

But JavaScript is substantively different from most other dynamic languages (even Lisp), because it has made the Properties Pattern its central modeling mechanism. It borrowed this heritage largely from a language called Self, and some other modern languages (notably Io and one other language that I'll talk about below) have also chosen prototypes and properties over traditional classes.

In JavaScript, every user-interactible object in the system inherits from Object, which has a built-in property list. Prototype inheritance (think back to our example of the Emmitt Smith instance having been the prototype for the L.T. instance) is a first-class language mechanism, and JavaScript offers several kinds of syntactic support for accessing properties and declaring property lists (as "object literals").

JavaScript is often accurately described as the world's most misunderstood programming language. Armed with our newfound knowledge, we can start to see JavaScript in a new light. To use JavaScript effectively, you need to gain experience with a whole new School of Modeling. If you simply try to use JavaScript as a substitute for (say) Java or Python, you'll encounter tremendous friction.

Since most of us have precious little actual experience with property-based modeling, this is exactly what happens, and it's no wonder JavaScript gets a bad rap.

Pushing it even further


In addition to how centrally you want to use the Properties pattern in your system, you can also decide how recursive to make it: do your properties have explicit meta-properties? Do you have metaprogramming hooks? How much built-in reflection do you offer?

JavaScript offers a few metaprogramming hooks. One such hook is the recently-introduced __noSuchMethod__, which lets you intercept a failed attempt to invoke a nonexistent function-valued property on an object.

Unfortunately JavaScript does not offer as many hooks as I'd like. For instance, there is no corresponding __noSuchField__ hook, which limits the overall flexibility somewhat. And there are no standard mechanisms for property-change event notification, nor any reasonable way to provide such a mechanism. So JavaScript gets it mostly right, but it stops short, possibly for performance reasons, of offering a fully-extensible metaprogramming system such as those offered by SmallTalk and to some extent, Ruby.

The pattern takes shape...


Before we move on to other uses of the Property Pattern, let's put JavaScript (and its central use of the pattern) into perspective, by comparing it to another successful language.

First: JavaScript is not my favorite language. I've done a lot of JavaScript programming over the past 2 years or so, both client-side and server-side, so I'm as familiar with it as I am with any other language I've used.

JavaScript in its current incarnation is not the best tool for many tasks. For instance, it's not great for building APIs, and it's not great for Unix scripting the way Perl and Ruby are. It has no library or package system, no namespaces, and is missing many other modern conveniences. If you're looking for a general-purpose language, JavaScript leaves you wanting.

But JavaScript is the best tool for many other tasks. As just one example, JavaScript is an outstanding language for writing unit tests — both for itself, and also for testing code in other languages. Being able to use the Properties Pattern to treat every object (and class) as a bag of properties makes the creation of mock objects a dream come true. The syntactic support for object literals makes it even better. You don't need any of the silly frameworks you see coming from Java, C++ or even Python.

And JavaScript is one of the two best scripting languages on the planet, in the most correct sense of the term "scripting language": namely, languages that were designed specifically to be embedded in larger host systems and then used to manipulate or "script" objects in the host system. This is what JavaScript was designed to do. It's reasonably small with some optional extensions, it has a reasonably tight informal specification, and it has a carefully crafted interface for surfacing host-system objects transparently in JavaScript.

In contrast, Perl, Python and Ruby are huge sprawls, all trying (like C++ and Java) to be the best language for every task. The only other mainstream language out there that competes with JavaScript for scripting arbitrary host systems is Lua, famous for being the scripting language of choice for the game industry.

And wouldn't you know it, Lua is also a language that uses the Properties Pattern as its central design. Its central Table structure is remarkably similar to JavaScript's built-in Object, and Lua also uses prototypes rather than classes.

So the world's two most successful scripting languages are prototype-based systems. Is this just a cosmic coincidence? Or is it possible that a suitably designed class-based language could have been just as successful?

It's hard to say. I've used Jython as an embedded scripting language for a long time, and it's worked pretty well. But I've personally come to believe that the Properties Pattern is actually better suited for extensibility than class-based modeling, and that prototype-based languages make better extension languages than class-based languages. That's effectively what's happening with embedded scripting: the end-users are growing and extending the host system.

In fact I was convinced of it before I even knew JavaScript. Let's take a look at another interesting "Who uses it?" example: Wyvern.

Wyvern


My multiplayer game Wyvern takes the Properties Pattern quite far as well, although in some different directions than what we've discussed so far. I designed Wyvern long before I'd heard of Self or Lua, and before I'd learned anything about JavaScript. In retrospect it's amazing how similar my design was to theirs.

Wyvern is implemented in Java, but the root GameObject class has a property list, much like JavaScript's Object base class. Wyvern has prototype inheritance, but since I'd never heard of prototypes before, I called them archetypes. In Wyvern, any game object can be the archetype for any other game object, and property lookup and inheritance work more or less identically to the way they work in JavaScript.

I arrived at this design after scratching my head for months (in late 1996) over how to build the ultimate extensible game. I wanted all the game content to be created by players, and I came up with dozens upon dozens of detailed use cases, in all of which I wanted players to be able to extend the game functionality in surprising new ways. In the end I arrived at a set of interleaved design patterns, including a rich command system, a rich hooks/advice system, and several other subsystems I'd love to document someday.

But the core data model was the Properties Pattern.

In some ways, Wyvern's implementation is more full-featured than JavaScript's. Wyvern offers more metaprogramming facilities, such as vetoable property change notifications, which gives in-game objects tremendous flexibility in responding to their environment. Wyvern also supports both transient and persistent properties, a scheme I'll discuss below.

On other ways, Wyvern just made different decisions. One big one is that Wyvern's property values are statically typed. The property names are always strings, just like in JavaScript, but the values can be various leaf types (ints, longs, booleans, strings, etc.), or functions (a trick that wasn't easy in Java), or even archetypes.

But despite the differences, Wyvern's core property-list infrastructure is a lot like that of JavaScript, Self and Lua. And it's been a design I've been fundamentally happy with for over ten years. It's met or exceeded all my original expectations for enabling end-user extensibility, particularly in its ability to let people extend the in-game behavior on the fly, without needing to reboot. This has proven extraordinarily powerful (and popular with the players.)

Where Wyvern clearly got the pattern wrong was in its lack of support for syntax. As soon as I decided to use the Properties pattern centrally in my game, I should have decided to use a programming language better suited for implementing the pattern: ideally, one that supports it from the ground up.

I eventually wound up using Python (actually, Jython) for a ton of my code, and it was far more succinct and flexible than anything I wrote in Java. But I was foolishly worried about performance, and as a result I wound up writing at least half the high-level game logic in Java and piling on hundreds of thousands of lines of getProperty and setProperty code. And now the system is hard to optimize; it would have been much easier if I'd had a cleaner separation of game-engine infrastructure from "scripty" game code.

Even if I'd done the whole game in Python, I'd still have had to implement a prototype inheritance framework to enable any object to be able to serve as the prototype for any other object.

I realize I haven't really explained why prototype inheritance works so well, except for my brief mention of mock objects for unit testing. But to keep this article tractable, I had to delete several pages of detailed examples, such as "Chieftain Monsters" that could be programmatically constructed by adding a few new properties to any existing monster

When I told you this pattern was big enough for a book, I meant a big book. Without the examples handy, all I can do is say that using JavaScript/Rhino (or Lua, once it became available on the JVM) might have made my life easier. Or heck, writing my own language might have been the best choice for a system that large and ambitious.

In any case, live and learn. It's a lot of code, but Wyvern is still a properties-based, prototype-based system, and it has amazing open-ended flexibility as a result.

We've been through the two big examples now (Wyvern and JavaScript). I'll close this "Who Uses It" section with just a few more key examples.

Lisp


Lisp features a small-scale example of the Properties Pattern: it has property lists for symbols. Symbols are first-class entities in Lisp. They're effectively the names in your current namespace, like Java's fully-qualified class names.

If Java classes all had property lists, it would still be a small-scale instantiation of the Properties pattern, but it would open up an awful lot of new design possibilities to Java programmers. Similarly, Lisp stops short of making everything have a property list, but to the extent it offers property lists they're exceptionally useful design tools.

Emacs Lisp actually makes heavy use of the Properties Pattern, inasmuch as essentially every one of its thousands of configuration settings is a property in the global namespace. It supports the notion of transient vs. persistent properties, and it offers a limited form of property inheritance via its buffer-local variables.

Unfortunately Emacs doesn't support any notion of prototypes, and in fact it doesn't have any object-orientation facilities at all. Sometimes I want to model things in Emacs using objects with flexible property lists, and at such times I find myself wishing I were using JavaScript. But even without prototypes, Emacs gains significant extensibility from its use of properties for data representation.

Keep in mind that there are, of course, big tradeoffs to make when you're deciding how much to use the Properties pattern; I'll discuss them in a bit. I'm not criticizing any of the systems or languages here for the choices they've made; all of them have been improved by their use of this pattern, regardless of how far they decided to take it.

XML revisited


Earlier I described XML as a first-class modeling school. Now that we have more context, it's possible to view XML as being an instantiation of the Properties Pattern, inasmuch as it uses the pattern as part of its fundamental structure.

XML's view of the pattern is two-dimensional: properties can take the form of either attributes or elements, each kind having different syntactic and semantic restrictions. Many folks have criticized XML for the unnecessary complexity of this redundant pair of property subsystems, but in the end it doesn't really matter much, since two ways to model properties is still better than zero ways.

So far we've seen various policies around static checking: Eclipse (strong/mandatory), JavaScript/Lua (very little), and Wyvern (moderate).

XML offers what I think is the ideal policy, which is that it lets you decide for yourself. During your initial domain modeling (the "prototyping" phase — a term now loaded with Even More Delicious Meaning), you can go with very weak typing, opting for nothing more than simple well-formedness checks. And for many problems, this is as much static checking as you'll ever need, so it's nice that you have this option.

As some of your models become more complex, you can choose to use a DTD for extra validation. And if you need a really heavy-duty constraint system, you can migrate up to a full-fledged XML Schema or Relax NG schema, depending on your needs.

XML has proven to be a very popular modeling tool for Java programmers in particular — more so than for the dynamic language communities. This is because Java offers essentially zero support for the Properties Pattern. When Java programmers need access to the pattern, the easiest approach is currently to use XML.

The Java/XML combination has proven reasonably powerful, despite the lack of syntactic integration and numerous other impedance mismatches. Using XML is still often preferable to modeling things with Java classes. Even Eclipse's AST property lists might have been better modeled using XML and a DOM: it would have been less work, and the interface would have been more familiar. And as for Apache Ant: JSON-style JavaScript objects for build files would have been exactly what they needed, but by the time they'd realized they needed a plug-in system, the damage was done.

As Mozilla Rhino becomes better documented, and as more Java programmers begin to appreciate the usefulness of JSON as a lightweight alternative to XML, JavaScript may begin to close the gap. Rhino provides Java with the Properties Pattern much more seamlessly than any XML solution. I've already mentioned that it's superior (even to XML) for unit tests and representing mock test data.

But it goes deeper than unit testing. Every sufficiently large Java program, anything beyond medium-sized, needs a scripting engine, whether the authors realize it or not. Programs often have to grow to the size of browsers or spreadsheets or word processors before the authors finally realize they need to offer scripting facilities, but in practice, even small programs can immediately benefit from scripting. And XML doesn't fit the bill. It's yet another example of programmers choosing a School of Modeling because they know it, rather than learning how to use the right tool for the job.

So it goes.

Bigtable


Last example: Google's Bigtable, which provides a massively scalable, high performance data store for many Google applications (some of which are described in the paper – click the link if you're curious.) This particular instantiation of the Properties Pattern is a multidimensional table structure, where the keys are simple strings, and the leaf values are opaque blobs.

Hardcore relational data modelers will sometimes claim that large systems will completely degenerate in the absence of strong schema constraints, and that such systems will also fail to perform adequately. Bigtable provides a nice counterexample to these claims.

That said, explicit schemas are useful for many domains, and I'll talk more about how they relate to the Properties Pattern in a bit.

This would probably be a good time to mention Amazon's Simple Storage Service, but I don't know anything about it. I've heard they use name-value pairs.

In any case, I hope these examples (Eclipse AST classes, JavaScript, Wyvern game objects, Lisp symbols, XML and HTML, and Bigtable) have convinced you that the Properties pattern is ubiquitous, powerful, and multifaceted, and that it should be part of any programmer or designer's lineup.

Let's look in more depth at how it's implemented, its trade-offs, and other aspects of this flexible design strategy.

Properties Pattern high-level overview


At a high level, every implementation of the Properties Pattern has the same core API. It's the core API for any collection that maps names to values:
  • get(name)
  • put(name, value)
  • has(name)
  • remove(name)
There are typically also ways to iterate over the properties, optionally with a filter of some sort.

So the simplest implementation of the Properties Pattern is a Map of some sort. The objects in your system are Maps, and their elements are Properties.

The next step in expressive power is to reserve a special property name to represent the (optional) parent link. You can call it "parent", or "class", or "prototype", or "mommy", or anything you like. If present, it points to another Map.

Now that you have a parent link, you can enhance the semantics of get, put, has and remove to follow the parent pointer if the specified property isn't in the object's list. This is largely straightforward, with a few catches that we'll discuss below. But you should be able to envision how you'd do it without too much thought.

At this point you have a full-fledged Prototype Pattern implementation. All it took was a parent link!

From here the pattern can expand in many directions, and we'll cover a few of the interesting ones in the remainder of this article.

Representations


There are two main low-level implementation considerations: how to represent property keys, and what data structure to use for storing the key/value pairs.

Keys


The Properties pattern almost always uses String keys. It's possible to use arbitrary objects, but the pattern becomes more useful with string keys because it trivially enables prototype inheritance. (It's tricky to "inherit" a property whose key is some opaque blob - we usually think of inheritance as including a set of named fields from the parent.)

JavaScript permits you to use arbitrary objects as keys, but what's really happening under the covers is that they're being cast to strings, and they lose their unique identity. This means JavaScript Object property lists cannot be used as be a general-purpose hashtable with arbitrary unique objects for keys.

Some systems permit both strings and numbers as keys. If your keys are positive integers, then your Map starts looking an awful lot like an Array. If you think about it, Arrays and Maps share the same underlying formalism (a surjection, not to put too fine a point on it), and in some languages, notably PHP, there isn't a user-visible difference between them.

JavaScript permits numeric keys, and allows you to specify them as either strings or numbers. If your object is of type Array, you can access the numeric keys via array-indexing syntax.

Quoting


JavaScript syntax is especially nice (compared to Ruby and Python) because it allows you to use unquoted keys. For instance, you can say
var person = {
name: "Bob",
age: 20,
favorite_days: ['thursday', 'sunday']
}
and the symbols name, age and favorite_days are NOT treated as identifiers and resolved via the symbol table. They're treated exactly as if you'd written:
var person = {
"name": "Bob",
"age": 20,
"favorite_days": ['thursday', 'sunday']
}
You also have to decide whether to require quoting values. It can go either way. For instance, XML requires attribute values to be quoted, but HTML does not (assuming the value has no whitespace in it).

Missing keys


You will need to decide how to represent "property not present". In the simplest case, if the key isn't in the list, the property is not there (but see Inheritance further on).

If a property is frequently removed and re-added, it may make sense to leave the key in the list with a null value. In some systems, you may need null to be a valid property value, in which case you'd need to use some other distinguished (and reserved) value for this micro-optimization to work.

Data structures


The simplest property-list implementation is a linked list. You can either have the alternating elements be the keys and values (Lisp does this), or you can have each element be a struct containing pointers to the key and value.

The linked list implementation is appropriate when:
  • you're just using the pattern to allow user annotations on object instances
  • you don't expect many such annotations on any given instance
  • you're not incorporating inheritance, serialization or meta-properties into your use of the pattern
Logically a property list is an unordered set, not a sequential list, but when the set size is small enough a linked list can yield the best performance. The performance of a linked list is O(N), so for long property lists the performance can deteriorate rapidly.

The next most common implementation choice is a hashtable, which yields amortized constant-time find/insert/remove for a given list, albeit at the cost of more memory overhead and a higher fixed per-access cost (the cost of the hash function.)

In most systems, a hashtable imposes too much overhead when objects are expected to have only a handful of properties, up to perhaps two or three dozen. A common solution is to use a hybrid model, in which the property list begins life as a simple array or linked list, and when it crosses some predefined threshold (perhaps 40 to 50 items), the properties are moved into a hashtable.

Note that we'll often refer to property sets as "property lists" (or "plists" for short), because they're so often implemented as lists. But it's fairly unusual for the order to matter. In the rare cases when it matters, there are usually two possibilities: the names need to be kept in insertion order, or they need to be sorted.

If you need constant-time access and want to maintain the insertion order, you can't do better than a LinkedHashMap, a truly wonderful data structure. The only way it could possibly be more wonderful is if there were a concurrent version. But alas.

If you need to impose a sort order on property names, you'll want to use an ordered-map implementation, typically an ordered binary tree such as a splay tree or red/black tree. A splay tree can be a good choice because of the low fixed overhead for insertion, lookup and deletion, but with the tradeoff that its theoretical worst-case performance is that of a linked list. A splay tree can be especially useful when properties are not always accessed uniformly: if a small subset M of an object's N properties are accessed most often, the amortized performance becomes O(log M), making it a bit like an LRU cache.

Note that you can get a poor-man's splay tree (at least, the LRU trick of bubbling recent entries to the front of the list) using a linked list by simply moving any queried element to the front of the list, a constant-time operation. It's surprising that more implementations don't take this simple step: an essentially free speedup over the lifetime of most property lists.

Inheritance


With prototype inheritance, each property list can have a parent list. When you look for a property on an object, first you check the object's "local" property list, and then you look in its parent list, on up the chain.

As I described in the Overview, the simplest approach for implementing inheritance is to set aside a name for the property pointing to the parent property list: "prototype", "parent", "class" and "archetype" are all common choices.

It's unusual (but possible) to have a multiple-inheritance strategy in the Properties pattern. In this case the parent link is a list rather than a single value, and it's up to you to decide the rules for traversing the parent chains during lookups.

The algorithm for inherited property lookup is simple: look in my list, and if the property isn't there, look in my parent's list. If I have no parent, return null. This can be accomplished recursively with less code, and but it's usually wiser to do it iteratively, unless your language supports tail-recursion elimination. Property lookups can be the most expensive bottleneck in a Properties Pattern system, so thinking about their performance is (for once) almost never premature.

The deletion problem


If you delete a property from an object, you usually want subsequent checks for the property to return "not found". In non-inheritance versions of the pattern, to delete a property you simply remove its key and value from the data structure.

In the presence of inheritance the problem gets trickier, because a missing key does not mean "not found" – it means "look in my parent to see if I've inherited this property."

To make the problem clearer, assume you have a prototype list called Cat with a property named "friendly-to-dogs", whose value defaults to the boolean true. Let's say you have a specific cat instance named Morris, whose prototype is Cat:
var Cat = {
friendly_to_dogs: true
}

var Morris = {
prototype: Cat
}
Let's say Morris has a nasty run-in with a dog, and now he hates all dogs, so we want to make a runtime update to his friendly-to-dogs property. Our first idea might be to delete the property, since a missing key or null value are often interpreted as false in a boolean context. (This is true even in class-based languages like C++ or Java, in which a hasFooBar function will return true if the internal fooBar field is non-null.)

However, Morris does not have a copy of "friendly-to-dogs" in his local list: he inherits it from Cat. So if your deleteProperty method does nothing but delete the property from the local list, he will continue to inherit "friendly-to-dogs", which will irk him (and you) endlessly until you figure out where the bug is.

You can't delete "friendly-to-dogs" from the Cat property list, or all of your cats will suddenly become dog-haters, and you'll have outright war on your hands. (Note that in some settings this is exactly what you want to do, illustrating the inherent universal trade-off between flexibility and safety.)

The solution for Morris is to have a special "NOT_PRESENT" property value that deleteProperty sets when you delete a property that would otherwise be inherited. This object should be a flyweight value so that you can check it with a pointer comparison.

So to account for deletion of inherited properties, we have to modify our property-lookup algorithm to look in the local list for (a) a missing key, (b) a null value, or (c) the NOT_PRESENT tag. If any of these apply, the property is considered not present on the object. [Note: the semantics of null values are up to the system designer. You don't have to make null values mean "not there." Either way is fine.]

Read/write asymmetry


One logical consequence of prototype inheritance as we've defined it is that reads and writes work differently. In particular, if you read an inherited property, it gets the value from an ancestor in the prototype chain. But if you write an inherited property, it sets the value in the object's local list, not in the ancestor.

To illustrate, let's add a "night-vision" property to our Cat prototype. Its value is expected to be an integer representing how well the cat can see in the dark. Let's say that the default value is 5, but our hero Morris has been eating his carrots, so we want to set his "night-vision" property value to 7.

The setProperty code does not need to check the parent chain: it simply adds the key/value pair {"night-vision", 7} to Morris's local property list. If we set the property on Cat, then all cats would have Morris's super-vision, which isn't what we want.

This asymmetry is normal. Back in our L.T. / Emmitt Smith example, when we were adding properties to L.T., we didn't want to modify Emmitt! It's just how the pattern works: you override inherited values by adding local properties, even when the override is a deletion.

Read-only plists


Many implementations of the pattern offer "frozen" property lists. Sometimes (e.g. for debugging) it's useful to flag an entire property list as read-only. Ruby supports this via the "freeze" method on the built-in root Object class. In any sufficiently large, robust implementation of the Properties pattern, you should include the option to freeze your property lists.

If you offer a "freeze" function, you should think about whether you want to offer a "thaw" as well. The decision depends on whether you want to offer programmers additional protection, or you just want to lock them up and throw away the key.

My personal view is that Java programmers tend to overuse the "freeze" function when they start with Ruby. For that matter, they tend to overuse "final" in Java. I mentioned before the trade-off between flexibility and safety. When you use the Properties Pattern, you're consciously choosing flexibility over safety, and in many domains this is the right choice. In fact, safety can be viewed as a kind of optimization: something that should ideally be layered on, functioning behind the scenes rather than being interleaved with the user APIs and flexible data model.

A nice (and simple to implement) compromise on safety and flexibility is to offer a ReadOnly property attribute, as JavaScript does. There are certain properties (such as the parent pointer) that are less likely to need to change as the system evolves, so it's probably OK to lock them down early on. Doing this on a property-by-property basis is much less draconian. Even better, you should consider making the ReadOnly property attribute non-inheritable, so that subtypes can choose their own policies without compromising the integrity of the supertypes.

We're more or less done with inheritance: it's not very complicated. There are a few other inheritance-related design issues that I'll cover in upcoming sections.

Performance


Performance is one of the biggest trade-offs of using the Properties Pattern. Many engineers are so concerned with performance (and its attendant paradoxes and fallacies) that they refuse to consider using the Properties pattern, regardless of the situation.

As it happens, the pattern's performance can be improved and mitigated in several clever ways. I won't cover all of them here, but I'll touch on some of the classics and one or two new approaches.

Interning strings


Make sure your string keys are interned. Most languages provide some facility for interning strings, since it's such a huge performance win. Interning means replacing strings with a canonical copy of the string: a single, immutable shared instance. Then the lookup algorithm can use pointer equality rather than string contents comparison to check keys, so the fixed overhead is much lower.

The only downside of interning is that it doesn't help much when you're constructing a property name on the fly, since you still need to hash the string to intern it.

That's not much of a downside, so as a rule, you should always intern your keys. A large percentage of property names in any system are accessed as string literals from the source code (or are read from a configuration file and can be interned all at once when the file is read), and interning works in these common cases.

Corollary: don't use case-insensitive keys. It's performance suicide. Case-insensitive string comparison is really slow, especially in a Unicode environment.

Perfect hashing


If you know all the properties in a given plist at compile-time (or at runtime early on in the life of the process), then you might consider using a "perfect hash function generator" to create an ideal hash function just for that list. It's almost certainly more work than it's worth unless your profiler shows that the list is eating a significant percentage of your cycles. But such generators (e.g. gperf) do exist, and are tailor-made for this situation.

Perfect hashing doesn't conflict with the extensible-system nature of the Properties pattern. You may have a particular set of prototype objects (such as your built-in monsters, weapons, armor and so on) that are well-defined and that do not typically change during the course of a system session. Using a perfect hash function generator on them can speed up lookups, and then if any of them is modified at runtime, you just fall back to your normal hashing scheme for that property list.

Copy-on-read caching


If you have lots of memory, and your leaf objects are inheriting from prototype objects that are unlikely to change at runtime, you might try copy-on-read caching. In its simplest form, whenever you read a property from the parent prototype chain, you copy its value down to the object's local list.

The main downside to this approach is that if the prototype object from which you copied the property ever changes, your leaf objects will have the now-incorrect old value for the property.

Let's call copy-on-read caching "plundering" for this discussion, for brevity. If Morris caches his prototype Cat's copy of the "favorite-food" property (value: "9Lives"), then Morris is the "plunderer" and Cat is the plundered object.

The most common workaround to the stale-cache problem is to keep a separate data structure mapping plundered objects to their plunderers. It should use weak references so as not to impede garbage collection. (If you're writing this in C++, then may God have mercy on your soul.) Whenever a plundered object changes, you need to go through the plunderers and remove their cached copy of the property, assuming it hasn't since then changed from the original inherited value.

That's a lot of stuff to keep track of, so plundering is a strategy best used only in the direst of desperation. But if performance is your key issue, and nothing else works, then plundering may help.

Refactoring to fields


Another performance speedup technique is to take your N most commonly used properties and turn them into instance variables. Note that this technique is only available in languages that differentiate between instance variables and properties, so it would work in Java but not in JavaScript.

This optimization may sound amazingly attractive, especially during your first round of performance optimizations. Be warned: this approach is fraught with pitfalls, so (as with nearly all performance optimizations) you should only use it if your profiler proves that the benefits will outweigh the costs.

The first cost is API incompatibility: suddenly instead of accessing all properties uniformly through a single getProperty/setProperty interface, you now have specific fields that have their own getters and setters, which could potentially have massive impact in your system (since, after all, these are the most commonly-accessed properties). And unless you're using Emacs, your refactoring editor probably isn't smart enough to do rename-method constrained on argument value.

You can mitigate the API problem by continuing to go through the get/setProperty interface, and have them check the argument to see if it's one of the hardwired fields. This will result in an increasingly large switch-statement (or equivalent), so you're trading API code maintenance for API simplicity. It also slows down the field access considerably, which offsets the performance gain from using a field.

The next cost is system complexity: you have twice as many code paths through every part of your system that deals with the Properties pattern. Does inheritance still work the same? What about serialization? Transient properties? Metaprogramming? What about your user interface for accessing and modifying property lists? You face a huge code-bloat problem when you split property access into two classes: plist properties and instance variables.

The next cost is accidentally getting it wrong: how do you know what the most-accessed properties are? You may do some runtime profiling and see that it's one set, but over time the characteristics of your system might change in such a way that it's an entirely different set. Realistically you will have to instrument your system to keep track, on a regular basis, of which properties are accessed at a rate beyond your tolerance threshold, so you can convert them to fields as well. This isn't a one-off optimization.

But all these costs pale in comparison to the big one, which is extensibility: instance variables are set in stone. It will be a vast amount of work to try to give them parity with your plist properties: change-list notifications, the ability to override them or remove them, and so on. It's likely that you will wind up sacrificing at least some flexibility for these fields.

So use this optimization with extreme caution.

Refrigerator


The last performance optimization I'll mention is more about conserving memory than CPU. If you're worried about the overhead of a per-object property-list field, even if it's usually null, then you can implement property lists using a separate, external data structure.

I don't know what it's normally called, so I'm calling it the Refrigerator, since you're basically putting yellow stickies all over it. The idea is that you don't need to pay for the overhead of property lists when very few of the objects in your system will ever have one. Instead of using a field in each class with a property list, you maintain a global hashtable whose keys are object instances, and whose values are property lists.

To fetch the property list of an object, you go look to see if it's on the Refrigerator. The property list itself can follow any of the implementation schemes I discussed in Representations above.

I first heard this idea from Damien Conway in roughly 2001 at a talk he gave. He said he was considering using it for Perl 6, and I thought it was pretty clever. I don't remember what he called it, and I don't know if he wound up using it, but consider this idea to be his gift to you. Thanks, Damien!

REDACTED


Brendan Eich came up with astoundingly clever performance optimization for the Properties Pattern, which he told me about back in January. I was ready to publish this article, but I told him I'd hold off until he blogged about his optimization. Every once in a while he'd ping me and tell me "any day now."

Brendan, it's October, dammit!

Rolling your own


Needless to say, I've only scratched the surface on performance optimization of the Properties pattern. You can get arbitrarily fancy. The point I'm trying to get across is that you shouldn't despair when you discover your system is unacceptably slow after designing it to use the Properties pattern. If this happens, don't panic and throw out your flexibility – go optimize! The game of optimization can be fun and rewarding in its own right.

Just don't do it before you need it!

Transient properties


While implementing Wyvern, I discovered that making changes to a persistent property list is a wonderful recipe for creating catastrophes.

Let's say some player casts a Resist Magic spell, which boosts her "resist-magic" integer property value by, oh, 30 (thirty percent). Then, while the spell is active, the auto-saver kicks in (writing her enhanced "resist-magic" property value out to the data store along with the rest of her properties), and then the game crashes.

Voilà – the player now has permanent 30% magic resistance!

It doesn't have to be a game crash, either. Any random bug or exception condition (a database hiccup, a network glitch, cosmic rays) can induce permanence in what was intended to be a transient change to the plist. And when you're writing a game designed to be modified at runtime by dozens of programmers simultaneously, you learn quickly to expect random bugs and exception conditions.

The solution I came up with was transient properties. Each object has (logically speaking) two property lists: one for persistent properties and one for transients. The only difference is that transient properties aren't written out when serializing/saving the player (or monster, or what-have-you.)

Wyvern's property-list system has typed values. I haven't talked about Properties Pattern type systems yet, but in a nutshell my property values can be ints, longs, doubles, strings, booleans, archetypes (which is basically any other game object), or "bean" (JavaBean) properties.

My early experimentation yielded the interesting rule that non-numeric transient properties override the persistent value, but numeric properties combine with (add to) the persistent value.

A simple example should suffice. If you have a persistent boolean property "hates-trolls" (and who doesn't, really?), and you accidentally ingest a Potion of Troll Love, then the potion should set a transient value of {"hates-trolls", false} on your character. It overrides the persistent value. There's no combining going on; it just replaces the original.

However, for our "resist-magic" int property, if you put on a ring of 30% magic resistance, it should (by default) add to your current value, which may be a combination of innate resistance and resistances conferred from other magic items and spells.

This numbers-are-additive principle applied pretty uniformly across my entire code base and property corpus, so it's built into the lookup rules for Wyvern's property lists. getIntProperty("foo") must get both the transient and persistent (possibly inherited) values for "foo" and add them before returning the result.

I experimented with different approaches for representing transient properties. Originally I used a kind of hungarian-notation, prefixing transient property names with an @-character ("@foo") and keeping them in the same hashtable as the persistent properties. One advantage of "@" was that it was invalid character in XML attribute names, so it was originally impossible for me to accidentally serialize a transient property.

Eventually I migrated to keeping them in separate (lazily created) tables. This made it easier to deal with interning names (not having to prepend "@" all the time) and generally made bookkeeping easier. I don't remember all the trade-offs involved with the decision anymore (it was about 7 years ago), so you'll have to retread that road yourself if you decide to offer transient properties in your system.

The deletion problem (remix)


Transient properties introduce their own version of deletion problem. You can't just remove the property from the transient list, since the lookup algorithm will just look for it in the persistent list. And you don't want to remove it from the persistent list; that defeats the purpose of using transients.

The solution is similar to what I did for deleting inherited properties: you insert a placeholder into the transient list saying "NOT_PRESENT", and as long as that placeholder is in the list, it's as if the object doesn't have the property.

Note that this implies the existence of two similar API calls: removeTransientProperty for deleting a transient property from the transient list, and transientlyRemoveProperty for temporarily hiding a property from the persistent list.

Persistence


Persisting property lists is a huge topic; I'll just touch on the basics.

For starters, XML and JSON (and for that matter, s-expressions) are all perfectly valid choices for serialization format. You can probably imagine how this works, so I won't beat it to death.

Text-based formats have big wins in readability, maintainability, and bootstrapping (you don't need to create special tools to read and write them).

For performance – both for network overhead and disk storage – you might want to consider designing a compressed binary format. One easy way to test whether this will be a win for you is to take the intermediate approach of gzipping your data to see how well it compresses, and whether it produces a discernable blip in performance.

Wyvern initially used a filesystem trie-like structure for storing its data, but as the number of distinct objects grew to several hundred thousand, I had to switch to a database.

You can use an RDBMS, but you're in for a world of hurt if you try to map the Properties pattern onto a relational schema. It's a tricky problem, and probably isn't one that you want to solve yourself.

I wound up using an RDBMS and just shoving the XML-serialized property list into a text/clob column, and denormalizing the twenty or thirty fields I needed for queries into their own columns. This gets me by, but isn't a happy solution.

What you really want is a hierarchical data store optimized for loose tree structures: in a word, an XML database. At the time I was designing Wyvern's persistence strategy (1998-ish), XML databases were pure vaporware, and even after a few years they were still fairly exploratory and unstable.

Today things are different, and there are many interesting options for XML databases, ranging from 100% free (e.g. Berkeley DBs) through 100% expensive (e.g. Oracle XML).

You might also look into Object databases, but I've never heard of anyone coming through that experience with anything but battle scars to show for it.

Query strategies


Querying goes hand-in-hand with persistence: once you have a bunch of objects in a data store, you'll want to ask questions about them. Producing a High Score List is a good example: you want to compute a function of some set of properties across all the players in your database.

If you're just using the filesystem, you're stuck with grep or its moral equivalent on Windows, which is likely to be painfully slow. So don't do that.

If you're using an RDBMS, and you've serialized your property lists into a single row-per-object clob or text column, then you can use (My)SQL's LIKE and RLIKE operators, or their equivalents, to do free-text searches.

However, your property lists are likely to be hierarchical (e.g. player inventory is a bag, which has its own properties and collection of objects it's holding), and free-text search doesn't understand hierarchy. So this approach is really just a faster version of grep.

Querying is the biggest reason for using an XML database, since it gives you XPath and XQuery as expressive languages that work on XML data about as well (give or take) as SQL works on relational data.

Because you have the advantage of working in "these days" (2008+) as opposed to "those days" (1998), you now have the interesting option of using JavaScript/JSON and JQuery. I don't know much about it, but what little I do know seems promising.

One final approach, which may not scale very well unless you can find a way to parallelize it, is to simply load all the objects into an instance of your server, and use programmatic access to walk the objects and construct your query results manually. Although it requires some infrastructure to make it work (and to make it not crash your system, once you have enough objects), it has the major benefit of giving you a full programming language, which can be useful if you're doing a complex query and your XPath/XQuery skills aren't up to par.

Backfills


Data integrity, a.k.a. Safety, is one of the two biggest trade-offs (the other being performance) you make when you choose to use the Properties pattern. In the absence of a schema, your system is open to property-list corruption through bugs and user error (e.g. typos).

Interestingly, in big companies I've worked at that have strong schema constraints, they still always seem to run into data-integrity problems, so it's not clear how much the schema is really helping here. A schema can certainly help with navigability and performance, but no schema can completely avert data corruption problems.

As soon as you notice you've got bad data, you need to do what many people in the industry term a "backfill": you have to run through all the existing data and fix the problem. Sometimes this is as simple as running a SQL update on a single column. And sometimes it involves writing a complex program that painstakingly computes the inverse of whatever bogus operation created the bad data in the first place.

And sometimes backfills require just winging it, since the lost data may be irrecoverable and you need to use heuristics to minimize the damage. No matter how you store your data and how careful you are about replicating it and backing it up, this kind of thing can happen at pretty much any scale.

The Properties pattern doesn't really introduce anything new to the backfill landscape; all the usual options apply. You'll just need to be mindful that user error (especially mis-typed property names) can make backfills a bit more commonplace, so you should plan to spend a fair amount of time developing a convenient backfill infrastructure for your system.

I should mention, embarrassing as it is, one other option, which I call "lazy backfill", and I've used it extensively. Sometimes I'll notice a data-corruption issue that needs fixing but doesn't really justify a day of my time to fix all at once. So I have a small subsystem in place for player logins and map loads: I iterate through the property lists looking for properties that I've flagged (hardwired in the code) as "bad data", and I call helper backfill functions on the fly to fix just that property list.

This is obviously a hack, and it also imposes some minor performance overhead (probably not detectable via profiling) on logins and map loads, but I'll be honest: it's served me well, and I've fixed at least 20% of my data-corruption problems this way.

Type systems


I've already touched on this a little here and there. Eclipse's AST property lists use an interesting type system that provides a reasonable amount of metadata for each property, although (I think) it stops short of allowing properties to have their own property lists.

JavaScript properties have a small, fixed amount of metadata. Each property has a set of flags. The flags include ReadOnly (can't modify the value), Permanent (can modify the value but can't delete the key), DontEnum (key doesn't show up in iterators but can be read directly), and others depending on the implementation.

Wyvern has its own Java-like flavor of typed properties, largely because I implemented the system in Java long before the advent of auto-boxing, and I needed a convenient way of dealing with primitive types vs. object types. If I were to do it all over again, I probably wouldn't go that route. I would want some sort of scheme for metaproperties (aka "property attributes") — perhaps in a separate per-object metaproperty-list. But I'd simplify the interface and get rid of all the primitive-typed versions of all my has/get/set inherited/persistent/transient property calls.

I won't go into any further detail about type systems, except to say that (a) you can use them to any degree you desire; there's nothing intrinsic to the Properties Pattern that precludes them, and (b) Duck Typing becomes fairly crucial to systems that are designed fully around the Properties Pattern, so if your language has any structural-typing support it'll help.

Toolkits


Wyvern has a Map Editor that allows you to create and edit objects. Since all the game objects are property lists that use the prototype inheritance pattern, the conventional JavaBean approach doesn't work, since the JavaBeans API (which is more or less designed for this problem, except with instance fields) uses Java reflection, and my properties don't have individual getters and setters.

Wyvern wound up with something very similar to a JavaBeans property editor, except it knows how to read, write and display my GameObject property lists.

It wasn't a huge amount of work, but it's something you should keep in mind as you decide whether to use the Properties pattern in your system. If you need a GUI for object edits, you'll probably need to do some custom UI work.

Problems


I've talked about the main problems imposed by the Properties pattern: performance, data integrity, and navigability/queryability. They're all trade-offs; you're sacrificing in these areas in order to achieve big wins in flexibility and open-ended future extensibility for users you may never meet. (You also get huge wins in unit-testability and prototyping speed, but I assume these benefits are obvious enough that I don't need to dwell on them.)

One other problem is reversability: it's hard to back out of the Properties pattern once you commit to it. Once people start adding properties, especially if they're using programmatic construction of key names, you'll have trouble on your hands if you want to try to refactor the whole system to use instance fields and getters/setters. So before you use this pattern, you should put your system through a prototyping phase (ironically enough) to determine whether it will work out as it scales.

Further Reading


I wasn't able to find much, but here are some interesting articles and papers on the subject.

Dealing with Properties [Fowler 1997]

Prototype-based programming (Wikipedia)

Do-it-yourself Reflection [Peter Sommerlad, Marcel Rüedi] (PDF)

Prototype pattern (Wikipedia)

Self programming language (Wikipedia)

Refactoring to Adaptive Object Modeling: Property Pattern [Christopher G. Lasater]

New Updates



Oct 20th 2008: The comments on the article are outstanding. People have pointed out lots of further reading, as well as well-known systems (Atom, RDF) that make extensive use of the pattern at their core. Thanks, folks! This is great stuff.

Oct 20th 2008: Martin Fowler sent me a link to a 1997 paper he wrote on the topic. It's in the Further Reading section. Well worth a read. He mentions a few important considerations that I left out:
  • The (static) dependencies problem — your compiler can't generally help you with finding dependencies on properties, or even tell you what property names are used in your system. He suggests using a registry of permissible property names as one solution. I strongly considered that approach for Wyvern, but wound up relying on a mix of dynamic tracing and static analysis to get me the dependency graph, and it's been "accurate enough" for my needs. In particular, synthesizing property names on the fly happens about as (in)frequently as Reflection happens in Java. So for the most part the dependencies issue is as tractable as it is in Java: "tractable enough".

  • Substituting actions Fowler suggests that it's difficult to replace an existing property access with an action. This is only true in the Java implementation (hence, true for my game). In languages like Python, Ruby and JavaScript 1.5 that support property-access syntax for getters and setters this is a non-issue.
Overall, Martin's take on the pattern is "avoid it when possible", which is sound (if conservative) advice. My take is that everyone's doing it anyway, so we should formalize it. I've used it as the central data model for my 500,000-line multiplayer game for 10 years, and I assert that the benefits vastly outweigh the problems. I also witnessed the pattern's use in Amazon's Customer Service Tools database for some 5 years, and again, the benefits vastly outweighed the downsides.

You just have to know what you're getting into before you dive in, which is sort of the point of my article.

Oct 20th 2008: Fellow Googler Joe Beda mentioned that IE4 originally supported arbitrary attributes on HTML elements, which dramatically extended the flexibility for web developers. Today, no browsers support it, though John Resig claims HTML 5 will fix this. In the meantime, developers use fake css classes and hidden elements; it's a mess. I actually deleted a pretty large rant about this problem from the article. But yeah. It's a problem. When you don't provide the Properties Pattern to people, they find horrible workarounds, which is much worse than anything that can go wrong if you simply support it directly. (Joe mentioned that it posed serious technical problems with the cache, though, so I wouldn't assume it's trivial to add the support back in to browsers today.)

Final thoughts


I haven't covered the whole landscape for this pattern. There are concurrency issues, access-control issues (e.g. in Wyvern, some properties, such as email address, can only be read by very high-level Wizards), documentation issues, and a host of other considerations.

Let me summarize what I think are the key takeaways.

First: this is a critically important pattern. I call it the "Universal" design pattern because it is (by far) the best known solution to the problem of designing open-ended systems, which in turn translates to long-lived systems.

You might not think you're building that kind of system. But if you want your system to grow, and have lots of users, and spread like wildfire, then you are building exactly that kind of system. You just haven't realized it yet.

Second: even though people rarely talk much about this pattern, it's astoundingly widespread. It appears in strongly-typed systems like Eclipse, in programming and data-declarative languages, in end-user applications, in operating systems, and even in strongly typed network protocols, although I didn't talk about that use case today. (Nutshell: a team I know using CORBA got fed up and added an XML parameter to every CORBA API call, defeating the type system but permitting them to upgrade their interface without horking every existing client. Bravo!)

Third: it can perform well! Or at least, "well enough". The playing field for potential optimizations is nearly unbounded, and with enough effort you can reduce just about everything to constant time.

Finally, it's surprisingly versatile. You can use it on a very small scale to augment one teeny component of an existing system, or you can go the whole hog and use it for everything, or just about anything in between. You can start small and grow into it as you become more comfortable with the pattern.

The Properties Pattern is not "just" name/value pairs, although the name/value pair certainly lives at the heart of the pattern.

If you believe Hofstadter, the Properties Pattern (using the Prototype Principle) is an approach to modeling that complements class-based modeling: both are fundamental to the way our brains process symbolic information.

I suspect that if you've read this far, you'll start seeing the Properties Pattern everywhere you look. It's embedded in many other popular patterns and systems, from Dependency Injection to LDAP to DNS. As legacy systems struggle to evolve to be more user configurable and programmable, you'll see more and more of your favorite software systems (and, I hope, languages) incorporating this pattern to varying extents.

Again: I wasn't able to find much literature about this pattern. If you happen to know of books, papers or articles that expound on it in more detail, please link to them in the comments!

I hope you found this interesting and potentially useful. It was definitely more work than my typical posts, so if it doesn't go over well, I'm happy to go back to the comedy routine. We'll see.

122 Comments:

Blogger Daniel Gomes Silveira said...

Wow! A Blog entry with index... nice :)

4:40 AM, October 20, 2008  
Blogger Rik said...

Isn't this what David West talks about in his book "Object Thinking" when he discusses an objects responsibility to "Descibe itself"?

I'd like to see more people use this indeed, I like it but don't encounter it very often in other peoples code.

4:55 AM, October 20, 2008  
Blogger Adrian Kosmaczewski said...

I've published recently some experimental C++ code, with an implementation of the property pattern which I use later to store objects in a SQLite database - I have indeed some "battle scars" to show off after doing this :) but it was a nice experiment anyway. Feel free to play with the code! It's here: http://tinyurl.com/6lq33q

4:58 AM, October 20, 2008  
Blogger Rik said...

Here's a link to the book I mentioned: Object Thinking by David West

5:02 AM, October 20, 2008  
Blogger Gael vander Schelden said...

In this idea of key value, What do you think about couchdb:

http://incubator.apache.org/couchdb/

Thanks for the long articles :)

5:03 AM, October 20, 2008  
Blogger Chris said...

Don't the object systems of Perl and Python follow this pattern? Or are those "just name/value pairs"?

5:47 AM, October 20, 2008  
Blogger dmost said...

sounds a LOT like RDF

6:23 AM, October 20, 2008  
Blogger Unknown said...

You just described the extension model for Atom, namely how you're expect to handle foreign markup with a mustIgnore processing directive; this is also one of the reasons why Atom's XML is a flat structure.

For reference you might be interested in the old AI frames/slots approach to KR, especially handling 'inheritance' and 'exceptions', like the penguin-isa-bird, bird-can-fly modeling problem. I'm sure there's a bunch of people inside Google (like Norvig, I think AIMA has a chapter on slots/frames) that can give you pointers.

http://norvig.com/ijcai83.html

Great post btw.

6:28 AM, October 20, 2008  
Blogger J said...

Excellent article. I just want to point out that Lua merely allows you to create a prototype-based object system. It doesn't really come with one built-in. The power that comes with metatables lets you get really creative with an object system.

6:34 AM, October 20, 2008  
Blogger greenlove said...

I don't know if describing XML as a paradigm is useful. What does XML do that's unique? XML (XSD, XSLT etc) is a group of tools. What is the philosophy and how is it different that OO or functional?

There is a lot of resistance to using properties type strategies because of a lack of compile time checks for key names. On the one hand you have more flexibility, but on the other you don't get the checks. IMO checking can be done in unit tests.

Also performance: there are few applications and fewer places in those applications where the hash-table lookups for string properties lead to any meaningful performance problems.

6:39 AM, October 20, 2008  
Blogger Unknown said...

Thanks Steve. Lots of food for thought here! Seems to make as much / more sense than OOP in many respects!

Hopefully you're going to show us some examples in a later post? For the less abstract amongst us :-)

7:03 AM, October 20, 2008  
Blogger Unknown said...

I find that this pattern very frequently in so called "ERP" software.

One main problem with this pattern is that a naive implementations has a big memory overhead.
I've got some numbers at my old blog here .

It's good that you mention String interning, because without it this pattern is useless in Java.
This goes as far as that we will have soon special support in the Eclipse Memory Analyzer for finding duplicates for Strings.
See also my blog at about the Eclipse memory consumption, where I describe a way to find those duplicates using only the existing MAT features.

You mention LinkedHashMap and that a ConcurrentLinkedHashMap is not available with the JDK.

Note that LinkedHashMap has even a higher memory overhead than HashMap and that at least the existing ConcurrentHashMap has an overhead of around 2 Kbyte when you use the default constructor.

Regards,
Markus

7:22 AM, October 20, 2008  
Blogger Ola Bini said...

As someone else mentioned, a glaring hole in this article is that you don't mention RDF and SPARQL. If I needed to have this as external data instead of in the language, I would say RDF and SPARQL is the way to do it.

7:30 AM, October 20, 2008  
Blogger gregory said...

sufis and yogis and meditators beat you to it by a few thousand years, the universe in a grain of sand ..

nice post, thanks

7:39 AM, October 20, 2008  
Blogger Jacob O'Reilly said...

I am reminded of Pick databases (see http://en.wikipedia.org/wiki/Pick_operating_system). I haven't tried it out myself but I suspect that the impedence mismatch between properties pattern and hash-file database (as in Pick) wouldn't exist; it would be a much more natural fit.
Thanks for the time and effort, it was a good read.

7:41 AM, October 20, 2008  
Blogger Corey said...

This comment has been removed by the author.

7:56 AM, October 20, 2008  
Blogger Mitch said...

Brendan Eich came up with astoundingly clever performance optimization for the Properties Pattern, which he told me about back in January.

Is this the polymorphic property cache thing that Brendan mentioned? I've been waiting to hear about this.

v8's hidden classes are also a pretty neat solution to this.

I want to second the mention of CouchDB; in addition to addressing persistence, they also have a neat approach to the querying issue (map-reduce-combine, basically, which allows them to incrementally update their indexes).

7:58 AM, October 20, 2008  
Blogger Corey said...

Interning strings is well known to be suboptimal for multithreaded system due to the constant locking and unlocking (and what reasonable application isn't multithreaded these days?) I've only seen profiling data for this in C++, but I suspect it cuts across languages since concurrency isn't a C++ specific problem. std::string is usually implemented with interning, but performance sensitive applications often ditch it for a non-interned implementation for this exact reason. (Hint hint Steve... I wonder if you've seen any code that uses a std::string replacement...)

8:01 AM, October 20, 2008  
Blogger Kris Zyp said...

Have you looked at Persevere (http://sitepen.com/labs/persevere.php) for persisting Property model data? It is similar to CouchDB in it's JSON-based approach, but has much more integration with the JavaScript environment and orthogonal persistence. The class model also preserves prototype-based delegation (CouchDB does not). Plus it is based on Rhino.

8:20 AM, October 20, 2008  
Blogger Buddy Casino said...

The Java Content Repository API (JSR-173) sounds like a good fit as storage back-end for the property pattern. It is mainly used in content management systems, where unstructured data is common. A schema can be defined, but it is optional. Has anybody used it for that purpose?

8:27 AM, October 20, 2008  
Blogger DoronY said...

A nice example of the "Refrigerator" pattern is WPF's usage of dependency properties.

8:45 AM, October 20, 2008  
Blogger Carl said...

In honor of Hofstadter, it's a self-referential blog entry. Table of contents = Properties pattern.

8:47 AM, October 20, 2008  
Blogger Anonymous said...

Thanks, great article!

Regarding takeaways: I feel tha there is a flip side to the versatility argument, namely that once you start using the Property pattern even in a small corner of your system, the pattern tends to spread to other parts of the system in an insiduous, underhand sort of way, which is difficult to control ^^

8:51 AM, October 20, 2008  
Blogger Jeremy Fincher said...

> This is because OO design has no real mathematical foundation to support it — at least, not until someone comes along and creates a formal model for side effects.

Surely you haven't forgotten Hoare Triples and axiomatic/operational semantics from undergrad?

9:09 AM, October 20, 2008  
Blogger JacobM said...

First of all, yes, please do write more of these long technical posts. Very useful.

Second, if a property list is being used for one part of a system to communicate with another, does anyone have a recommendation for making that safer.

That is, say I'm going to pass an object that, by convention, will have certain named properties. Do I just hard-code the property names in the locations these are set and retrieved, and rely on myself to make sure they stay in sync? Do I have a single location where key shared property names are stored? Something else?

Or is this not a good use for a property list?

9:10 AM, October 20, 2008  
Blogger Rob Sayre said...

"JavaScript permits numeric keys, and allows you to specify them as either strings or numbers. If your object is of type Array, you can access the numeric keys via array-indexing syntax."

I think you mean JavaScript always uses string keys, but allows you to specify them as numbers. You can always access properties you using array indexing syntax, whether or not your object is an Array.

js> var foo = {"12":"baz"}
js> foo[12]
baz
js> var bar = new Array()
js> bar["12"] = "baz"
baz
js> bar.length
13
js> bar[12]
baz

9:21 AM, October 20, 2008  
Blogger J. Aaron Farr said...

Excellent article. I also concur with those who point to CouchDB, ATOM Pub, and RDF. In one sense, you can think of RDF as named hashmaps, a clear implementation of the "property design pattern."

9:25 AM, October 20, 2008  
Blogger sapphirepaw said...

Awesome. You have given a name to something I've struggled with in several guises. For instance, I tried to write a converter between XML and JSON once, but after a day of struggling, decided that XML's node types made it a fundamentally different beast. I could make it work, but I'd end up with the JSON side having most of the convenience of XML and none of the readability or toolset. Or, I could cripple XML.


@chris: Perl only has the seeds of an object system. Bring your own soil and grow your own! See Class::InsideOut and all the other related modules on CPAN.

9:59 AM, October 20, 2008  
Blogger Kevin Dangoor said...

Prototype-based inheritance in Python:
http://loveandtheft.org/2008/09/11/prototype-based-programming-in-python/

Also, the Zope Object Database (ZODB) is a 10 year old open source python ODB, that is essentially a persistent property bag store. It works great for many apps, and there is a querying mechanism available (Catalog) to gather up collections of objects.

Kris Zyp mentioned Persevere, which is an interesting implementation of persistent objects using JS. Kris didn't mention that Persevere also supports things like JSONSchema, which gives you the kind of support you get with XML schemas but you can use JSON rather than XML!

Fun post. Thanks!

10:10 AM, October 20, 2008  
Anonymous Anonymous said...

For further reading (prototypes) also: http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html and http://lucacardelli.name/Slides/1996-05%20Class-based%20vs%20Object-based%20Languages%20(PLDI%20Tutorial).pdf

10:12 AM, October 20, 2008  
Blogger Andrey Fedorov said...

Unfortunately Emacs doesn't support any notion of prototypes.

Sure it does - just cons stuff onto assoc lists. PG mentioned this at the end of Arc lessons.

10:12 AM, October 20, 2008  
Blogger Edoc said...

With regard to scripting languages which offer prototyped-based objects, check out REBOL

10:24 AM, October 20, 2008  
Blogger chrisk said...

Hey Steve, just a quick quibble about your "poor man's splay tree" optimization, where you move elements to the front of a linked list when they're queried.

You suggest that this optimization is essentially free, but there are cases where it may not be. If your property list is being accessed by multiple threads, then this implementation will force many reads (any that aren't reading the first element) to acquire write locks on the list. If the number of threads is large, then the overhead of all this lock contention might well outweigh the benefit of finding your element closer to the front of the list. (At least for shortish lists that are read frequently but written much less frequently, which I suspect is a common case.)

10:36 AM, October 20, 2008  
Blogger Unknown said...

The GEB sections you refer to are nicely available starting at http://books.google.com/books?id=aFcsnUEewLkC&printsec=frontcover&dq=Godel,+escher+bach#PRA1-PA353,M1. In case anyone doesn't have the book handy & wants to follow up.

10:44 AM, October 20, 2008  
Blogger rtvaughan@mac.com said...

One important limitation of this technique is that the compiler cannot help you much. Consider what you could lose here: compile time type checking, variable and method name checking. When working with plists, for example with GTK's GObject system, all these things bite with depressing frequency. Automated help to spot with dumb typos is a big thing to lose.

11:42 AM, October 20, 2008  
Blogger Unknown said...

Words fail me to express my gratitude. Time and time again I hear that "reading other people's **code**" is one of the top ways to improve your own code.

This article made me feel as if I had spent time reading your code. This is remarkable! It is as if I read your code, without the hassle of downloading and compiling it and configuring a suitable environment in which to step through it. I was able to get a sense for how you approached your design and what thoughts guided your decisions as you laid down source code.

To echo others: yes, please do write more of these long technical posts. Very useful.

12:06 PM, October 20, 2008  
Blogger rdm said...

I would say that your "transient property" example was really an example of a data type which included a schedule. After searching for your property by name, you should also search within the schedule by time. You can delete historical garbage from your schedule, when its convenient (for example, when serializing to disk).

If you do things this way, you do not have to worry about corrupting your permanent properties with your transient properties because the transient aspect is permanent... well, temporarily at least.

Note that this also can be made to support non-overlapping durations (but, of course, each conflicting duration adds some fragmentation to your schedules so of course you probably should not go overboard).

12:08 PM, October 20, 2008  
Blogger Unknown said...

Aren't transient values a case of a specialisation of the original object? Eg. couldn't you just create a new object with the original object as its prototype? Of course, if the original object is an entity (And it seems to be the case in your example), you would have to separate the identity from the values. So you get:

// Initial object
willy_the_wizard = {
name: "Willy",
attributes: {
magic: 0
}
}

// temporary change
willy_the_wizard.attributes = {
prototype: willy_the_wizard.attributes,
magic: 30
}

12:13 PM, October 20, 2008  
Blogger Samuel A. Falvo II said...

I'm surprised that you list Javascript as the "epitome" of prototype-based languages. While influential, it's far from the most extreme implementation of the concept.

I think you'll find that Io language (http://www.iolanguage.com) overwhelmingly makes Javascript look positively rigid in comparison. It offers EVERYTHING that Lisp offers, PLUS prototyped objects.

Strongly recommended.

12:23 PM, October 20, 2008  
Blogger rdm said...

Yes, you could create a new object, but why would you? (I am not asking that rhetorically, I think that that question is crucial in any design.)

Also, your new example object does not include any implementation of the transient character of this property, and instead implements the "Refactoring to fields" pattern without any hints as to why that's good in this case.

I would call this object design bad (lossy, mechanically complicated and obfuscated). Having a separate "transient property list" was better, in my opinion (that approach is still lossy but is safely lossy and also reasonably transparent in the sense that the abstraction accomplishes something useful).

12:34 PM, October 20, 2008  
Blogger dlowe said...

Interestly, LambdaMoo (http://lambdamoo.org/) is also designed under similar lines. I'm surprised you didn't look at it when designing Wyvern.

12:59 PM, October 20, 2008  
Blogger paurullan said...

Thanks for the reading. I would be very pleased if you posted the other half you killed!

JFTR, it great to know this kind of approach. Recently I was trying to design a Magic:The Gathering game. The experience was painful: all I got thinking was how do I design this such I can put a spell over a creature but at the end of the turn it goes out? or things like how will I keep on the cards and their skills?. My excuse may perfectly be that I am only an CS undergrad but this is just a great idea.

1:09 PM, October 20, 2008  
Blogger David Slik said...

A similar property-list key-value based model is used in the emerging XAM object storage standard from SNIA.

http://www.snia.org/forums/xam/technology/specs/

In this storage model, arbitrary typed values can be attached to string names and persisted as collections. An integrated SQL subset provides searching, and you can create dynamic structures by referring to other objects by the returned object identifier (XUID), allowing replication of many pointer-based techniques.

This storage model offers many interesting opportunities for object-based storage, and I'd encourage you to take a look at the specification.

1:27 PM, October 20, 2008  
Blogger Ranga Blogger said...

Steve, what do you think about about Tcl?

The concepts behind the language are very easy to understand. It's got a precisely and concisely documented standard library. The Tcl book is very good also. And Tcl has a very efficient architecture to enable embedding and to expose your C/C+ objects as Tcl objects.

If only browsers allowed multiple scripting languages ...

"And JavaScript is one of the two best scripting languages on the planet, in the most correct sense of the term "scripting language": namely, languages that were designed specifically to be embedded in larger host systems and then used to manipulate or "script" objects in the host system. This is what JavaScript was designed to do. It's reasonably small with some optional extensions, it has a reasonably tight informal specification, and it has a carefully crafted interface for surfacing host-system objects transparently in JavaScript.
"

1:27 PM, October 20, 2008  
Blogger Unknown said...

Good post on a topic close to my heart.

I found lucene and its ilk to be a great fit for the query requirements of property based systems. The clob + key reside in the rdbms and all/most fields indexed in lucene. the databse only providing for load/store and lucene used for querying both cross field google style and field specific select style. fast reliable and fun.

1:51 PM, October 20, 2008  
Blogger Waquo said...

Another way to optimize property access would be hidden classes:
http://code.google.com/apis/v8/design.html#prop_access

Of course this is only possible when you're writing your own compiler.

"My early experimentation yielded the interesting rule that non-numeric transient properties override the persistent value, but numeric properties combine with (add to) the persistent value."
In a functional language, transient values could be expected to be functions that take the original value as a parameter and return a new one.

In Haskell, examples for such functions would be:
\str -> "I override the old string"
const "new string" -- same as above

\str -> "I am prefixed " ++ str

\num -> num + 1
(+1) -- same as above

\num -> 1 -- set to 1
const 1 -- same as above

2:39 PM, October 20, 2008  
Blogger StanD. said...

Lotus Notes follows this concept fairly closely, I believe. A Notes "database" contains Documents which are essentially plists. A Document can be based from a Form which is the equivalent to your object plist. You can arbitrarily add and remove properties to Documents, iterate the properties on a Documents, etc.

2:50 PM, October 20, 2008  
Blogger dmost said...

This unstructured un-schematised collection of properties, prototypes and metadata can very quickly result in an undocumented soup. Its great for a single programmer or a small team of programmers, but can you image trying to figure out the behavior of a legacy system where a property can be added to or removed from a thing from just about anywhere.

3:15 PM, October 20, 2008  
Blogger Unknown said...

Thanks for the article. It jives a lot with a similar system I was building (in Ruby).

There's been some work at creating a Property-based system sittong on top of a RDBMS database. Check out ThingDb. It has the concepts of attributes as database rows. It also adds versioning and author semantics. Definitely worth a check. ThingDb.

The hardest problem I had to do deal with has been 1) transient and 2) nested properties.

In your example, how do you handle the semantics of a "magic potion" that wears off after 2 turns, or transients that have different semantics than "adding if they are ints" or "replace if they are strings". I have tried defining these modifers as first class objects called "Effects" or "Modifiers". In addition to a "parent" link, they would have a "target" link that defines the object they are modifying. However, it still has some problems. How do you handle the case of the "delete"? And, when the property is deleted, we need to make sure the object still has it's previous setting.

4:07 PM, October 20, 2008  
Blogger dmost said...

For transient properties, wouldn't it be easiest to add function instead of a value. The function would be a closure of the form "getvalue() { return time() < expires ? value : undefined } or somesuch.

Theres a host of issues with this approach, in that the expiry time isnt accessible as metatdate, but it does illustrate some of the issues with the property/metadata approach. i.e. you can have all the metadata you want, but unless something is paying attention to it, it matters not.

4:38 PM, October 20, 2008  
Blogger Unknown said...

My comment on how this related to epistemology:

http://curi.us/blog/post/1339-steve-yegge-on-epistemology

5:35 PM, October 20, 2008  
Blogger John Cowan said...

I've been thinking now for some time about a variant of the Properties pattern in which the property names are not interned strings but are themselves property bags, so each property itself has an extensible set of properties, including its name. In that way we can distinguish between cases in which properties of different objects are really the same property, and when they just happen coincidentally to share a name. For example, a book has an author property and a title property. The author in turn has a name property, and the name has properties like firstName, lastName, and title (Mr., Ms., etc.). It's just a coincidence that the title of a book and the title of a person have the same name, and in this variant of the model they are represented by distinct property objects.

I also add to this the notion that the properties form a hierarchy. For example, an object named Fido (that is, whose name property is the string "Fido") might have the boolean property isDog. Because isDog is a subproperty of isAnimal (a fact recorded by the properties of the isDog and isAnimal property objects), it is also the case that it has the property isAnimal with the same value as isDog.

Finally, I thought I'd mention that Self, unlike Javascript, allows multiple prototype objects. Different versions of Self experimented with how to handle inheritance of the same property from more than one parent; the final decision was to disallow such conflicts. An attempt to get the value of a property with a conflicting definition is an error.

6:26 PM, October 20, 2008  
Blogger brongondwana said...

Grabbing on to a little side issue - backfills. Very interesting. My experience has been that you can't make systems perfect up front as well - at FastMail we use Cyrus for our email backend store, and we have a whole stack of subsystems which exist only to check for "bad data" and either let us know or silently fix it.

Even the "let us know" tends to be emails in a known format, which there are then tools which can log in to the mailbox, read the notifications and spit out some suggested fixes and the command lines to copy-and-paste to do them. A common fix-it-up pattern is "./fix-stuff.pl | less" followed by "./fix-stuff.pl | sh" if it all looks sane. You need that sort of automation once your dataset gets big - and stuff _will_ corrupt, in all sorts of interesting ways.

6:59 PM, October 20, 2008  
Blogger Sean said...

After reading the introduction (and not reading the T.O.C.) I thought the article was going to be about that even more universal-design-pattern: copy-and-paste.

I guess in a way it was.

8:16 PM, October 20, 2008  
Blogger David Rupp said...

Steve Yegge is a Strange Loop. In a Good Way.

And the next (read: "first") computer language I design will include first-class support for JSON.

9:17 PM, October 20, 2008  
Blogger Triplefox said...

Great post and very much along the lines of what I've discovered in designing my own complex (primarily game-related) systems.

One further step that I've taken - slightly tangential, but it seems to play nice with property-heavy systems - is to encapsulate as many finite states as possible within their own objects.

I too have dealt with lots of transients, and my best current solution for taming them is to keep them in the background, slightly bottlenecked, and sliced up into small classes that are easy to compose. This makes it easier to isolate all of those bugs, as I can just turn each class on and off and watch the effects. In theory, it also means better reusability for each form of state.

11:00 PM, October 20, 2008  
Blogger geir said...

Great article. I don't think it was long enough :) A lot of what you said resonates quite a bit with what I've been working with the last 7 months at 10gen, namely an app server that among other languages, supports server-side JavaScript as a first-class citizen, as well as a persistence store (called Mongo) that works with JSON-like documents.

We do serialize into a generalized binary format (we call it BSON), because the persistence store is intended for general use - we want it to be easy for languages other than JS to work with it.

I think that your suggestion of compression is interesting - it's a trade-off between storage and network vs query performance, and I'm not sure if I'd make that trade. Clearly there are field types (say a video) that you'd want to store compressed. Maybe the solution is to store compressed, and maintain a per-document set of de-compressed fields, lazily adding to the set when you see those fields used in a query, or be eager - maintain a list of fields that the DB has seen in queries on a per-collection basis, uncompressing those on each insert/update so query time is stable. Hard to say.

I think you're right about hierarchical queries - the documents are hierarchical and thus the query language has to be. Because I've been so non-XML document focused, I never thought of XQuery or XPath. I'm a Java Weenie, and tend to fall back to the kinds of QL you find in JPA - SQL-ish but with predicates that allow object hierarchy navigation ("... where a.b.c = 2"). I think that such approaches will contribute to whatever the end-solution is for us, but right now, we're structuring the query "selectors" for Mongo in terms of documents themselves. Eg:


db.mydocs.find({name:"geir"});

which on the positive side is a really natural way to work with the database - you can think in documents for querying. On the negative side, a moments thought will reveal the hornets nest of challenges that come with such an approach. Still, the model is evolving - we're learning quite a bit every day - but it's been effective so far so we think we're on the right track.

Anyway, thanks for the article.

geir

3:13 AM, October 21, 2008  
Blogger 12challenge said...

An important aspect of this that goes undiscussed is changes to the parent object. Emmitt Smith is a fast runner (speed: 10). L.T. is like Emmitt Smith, but also catches passes (parent: Emmitt Smith, passcatching: 10). Now what happens when Emmitt Smith gets old and we reduce his speed to 5?

The answer is that L.T. got slow, too. Unless you didn't want that sort of thing to happen, in which case you had to do a deep copy of everything out of Emmitt Smith when you created L.T., which makes the whole parent link pointless.

Or am I just completely misunderstanding how the parent relationship is supposed to be used (entirely possible)?

6:33 AM, October 21, 2008  
Blogger zby said...

For Perl implementations see: Class::Classless and Class::SelfMethods .

6:46 AM, October 21, 2008  
Blogger Unknown said...

Blogger 12challenge said...

An important aspect of this that goes undiscussed is changes to the parent object. Emmitt Smith is a fast runner (speed: 10). L.T. is like Emmitt Smith, but also catches passes (parent: Emmitt Smith, passcatching: 10). Now what happens when Emmitt Smith gets old and we reduce his speed to 5?

The answer is that L.T. got slow, too.
Or am I just completely misunderstanding how the parent relationship is supposed to be used (entirely possible)?


Your logic seems solid. Exception logic to deal with 'reduced' property values seems like a kludge to me. (If reducing, go look for inherited values? Yuk).

Define inheritance as being time bounded until the ancestor value changes? Then break the inheritance perhaps?

7:12 AM, October 21, 2008  
Blogger 12challenge said...

DaveP said...

Define inheritance as being time bounded until the ancestor value changes? Then break the inheritance perhaps?


Yeah, I was thinking about something like this, but then the parent object has to maintain a list of all objects that inherited from it, in order to update them. They, in turn, have to notify the parent when they are deallocated so it can remove them from its list.

Seems like a lot of overhead. But Stevey seems pretty sold on this stuff so I'd love to get more explanation from him.

8:06 AM, October 21, 2008  
Blogger Unknown said...

I believe Damien Conway called “Refrigerator” “Inside-out Objects”.

For him the main motivation for this solution was encapsulation (something you don’t get from more conventional Perl OOP), not performance optimization. Discussed in details in Conway’s book “Perl Best Practices”.

8:16 AM, October 21, 2008  
Blogger Thomas Lann said...

I actually like the fact that you write long in-depth articles. I feel like do a good job of covering the topic without being to verbose and to vague.

Jeff Atwood may complain about the length. But I feel he often writes articles to meet his quota of articles per week. A lot of the articles have very little to do with Computer Science.

I also get a little annoyed with his Windows advocacy. Sometimes it seems he bashes other OSes to drive traffic or increase the amount of comments he has.

Please keep up the great blogs/articles. If you ever quit Google maybe you should think of teaching or writing articles for magazines like Dr. Dobbs.

8:30 AM, October 21, 2008  
Blogger Jeremy Weir said...

Can you make sure your program doesn't allow L.T. to hurt his toe or grow old?

8:54 AM, October 21, 2008  
Blogger Chris Ryland said...

You've just re-explicated the primary reason that Lisp was originally used for "undefined" problems like AI: its lists (including property lists) could be used in a "big ball of mud" fashion to model nearly anything, with no formal data structures in sight.

Seems like this crucial insight has nearly been lost in recent years and in recent language designs.

9:14 AM, October 21, 2008  
Blogger Sony Mathew said...

Steve, I don't quite see how the Properties Structure or the Prototype Pattern simplifies Dependency Injection (DI) or unit-tests. We know DI simplifies unit-tests.

As I understand it - DI is about "not" constructing object-impls directly - thereby allowing unit-tests to swap out these impls. Are you saying all references between objects will be to a generic PObject (e.g GameObject) and so substitutability is always possible? - but without DI - objects would still be forced to directly construct the PObject they desire thereby defeating the purpose.

However, I do like the idea of fully-typed Java-Beans (with all the getters & setters) using an underlying Properties infrastucture - would be useful for instance-level inheritance (Prototype) which has always been a wish of mine. I also wish Java would support it directly somehow - perhaps with syntax like 'super = someObj'.

11:59 AM, October 21, 2008  
Blogger Unknown said...

Great post! I wrote about prototype inheritance in JavaScript and ActionScript recently, but I did not think to relate it to GEB!:

http://nodename.com/blog/2008/09/19/dynamic-programming-in-as3/

12:30 PM, October 21, 2008  
Blogger dmost said...

Sony Mathew says "However, I do like the idea of fully-typed Java-Beans (with all the getters & setters) using an underlying Properties infrastucture"

Windows Presentation Foundation has a property model underlying it, though its wrapped with strongly typed objects. In effect, for each property of a strongly typed object, there is a getPropertyValue and setPropertyValue call going on underneath.

12:41 PM, October 21, 2008  
Blogger Jiayao said...

Great post! Thanks! I hope the other half you deleted still exists in some form of buffer. Can you post that somewhere?

2:17 PM, October 21, 2008  
Blogger Robert said...

I think one of the earlier commenters sorta said this, but mightn't you want some extension to your API that lets the programmer declare a property name, optionally together with its value type? Then one's framework could at least raise an exception when assigning a value to an undeclared property (likely spelling error) or when assigning an ill-typed value (likely turning the object model into a mess)?

If one had a Lisp-style development environment (where one is entering code in the context of a running system), I suppose one could even get some correctness checking at compile time.

3:04 PM, October 21, 2008  
Blogger Tartley said...

Python also allows unquoted keys in dictionary construction, although you have to use the explicit 'dict' constructor, rather than the usuaal '{}' syntax:

person = dict(
name="Bob",
age=20,
favorite_days=['thursday', 'sunday']
)

4:10 PM, October 21, 2008  
Blogger Unknown said...

> Interestingly, in big companies I've worked at that have strong schema constraints, they still always seem to run into data-integrity problems, so it's not clear how much the schema is really helping here.

When the data integrity problems come from code changes, those problems typically come from a lack of normalization, ie, the system is such that some facts are expressed in multiple places.

In fact, a huge fraction of all bugs are due to a lack of normalization. Count them some time.

An interesting fraction of the features of programming tools (broadly defined) are aimed at the consequences of a lack of normalization.

5:01 PM, October 21, 2008  
Blogger con sultan said...

The "Refrigerator" pattern is used in Java and .NET to implement the Monitor pattern - not every object needs to create synchronization primitives such as a mutex and a condition variable just in case, if it's about to be waited on, or notified/pulsed, so these are taken from a pool and associated with the objects on a as needed basis.

12:17 AM, October 22, 2008  
Blogger Jacob O'Reilly said...

An important aspect of this that goes undiscussed is changes to the parent object. Emmitt Smith is a fast runner (speed: 10). L.T. is like Emmitt Smith, but also catches passes (parent: Emmitt Smith, passcatching: 10). Now what happens when Emmitt Smith gets old and we reduce his speed to 5?

The answer is that L.T. got slow, too. Unless you didn't want that sort of thing to happen, in which case you had to do a deep copy of everything out of Emmitt Smith when you created L.T., which makes the whole parent link pointless.


I think that you just have to model your specific needs differently. Suppose that we model Emmitt Smith so that we're capturing something that is fairly static, it would change in only small ways, and broad changes (such as an unfortunate accident in which his leg falls off) are modeled as descendents of E.S. You might record "mobility challenged Emmitt Smith" as a diff against plain old "Emmitt Smith".
Taken further, since inheriting from E.S. is fairly quick, you might say that all changes to E.S. produce a new generation of him.
I think I'll leave the issue of updating references (or not) until after I've had my morning coffee. :-)

3:32 AM, October 22, 2008  
Blogger Barry Kelly said...

Re the CORBA folks using XML for interface resilience: note that they still need to deal with all of the same issues that the type system was forcing them to face, except that now those issues are implicit in the actual real-world use of the protocol between client and server, rather than embedded in the type system.

3:56 AM, October 22, 2008  
Blogger Eckhart said...

Great post, and many ideas reminded me strongly of NewtonScript.

It has nowadays also a cross-platform port in case you don't have a Newton lying around :)

6:42 AM, October 22, 2008  
Blogger 12challenge said...

Jacob O'Reilly,

I see how you could make that work, but it implies that as soon as you derive L.T. from Emmitt Smith, you need to freeze Emmitt Smith and use only copies of him if you need to modify him.

Alternately, instead of deriving L.T. from Emmitt Smith, you could have a JustLikeMe() function. LT=EmmittSmith.JustLikeMe() would create a new object with the same parent as Emmitt Smith (Running Back) and assign it all the same specific values as Emmitt Smith (speed:10). Then you're free to modify Emmitt Smith or L.T. without either running over the other.

But either way you seem to be creating a strict separation between prototypes (classes) and specific examples (instances). This seems to negate the whole "generality in the specific" principal that Steve's basing this whole thing on.

11:04 AM, October 22, 2008  
Blogger Unknown said...

Another variation on the theme. Naked Objects!

They look like nested property lists, but with the added behavior that each property has it's own kind of editor / UI.

It extends the Property-model and adds the tool set right on top of it.

http://www.theserverside.com/tt/articles/article.tss?l=NakedObjectSeries_1

(Wondering if Blogger will flag the naked in my URL, hehe).

11:56 AM, October 22, 2008  
Blogger Unknown said...

I love your post, and the prototype pattern. Actually I have been using it for some design recently. Interesting thing I came across is that the cross-referencing (I called it this anyway, I am sure it's not a terminology). Two prototypes could cross reference each other in their properties, which officially breaks the parent-child relationship. We could certainly extract another common property set out, but it doesnt always look intuitive. This pattern seems to be more suitable for loose-coupled systems, which might explain the flexibility to extend by end user.

Another example I consider as a close match to this pattern is web url itself. Consider every url is a key, which could be used to retrieve the value - web content blob. A website contains a number of these properties, and the way the cross reference is done between url and content might give a different idea of implementing this pattern - reference to key(s) in values.

12:38 PM, October 22, 2008  
Blogger Jacob O'Reilly said...

12challenge,

I think perhaps that what you're representing in your prototype model might not be Emmitt but something more abstract to begin with, such as "great football player", something that generally doesn't change a huge amount. When someone refers to this other player and compares them with Emmitt creating that map, you might set their prototype to be Emmitt, or you might instead link them against the more general "great football player" instead of Emmitt.
I was thinking of a more hybrid approach, whereby you would link L.T. to Emmitt but as your map evolves and you develop a more specific view of Emmitt and a more general view of a great player, you might have cause to reevaluate L.T.s prototype to the more general "great football player".
I think this example is a little difficult to put in concrete terms because I doubt many of us have a need to handle such abstract concepts in code. However, the data structure required is simple so long as you can adjust your prototype dynamically. (And you don't assume the algorithm for when/how to remap prototypes is instrinsic in the structure!)

1:37 PM, October 22, 2008  
Blogger 7 said...

i'm probably missing something, but what is so terribly difficult about modelling the property pattern using the relational model? it seems like a natural fit to me. Example model:

PLISTS
plist_id property_name property_id

INTEGER_PROPERTIES
property_id integer_value

STRING_PROPERTIES
property_id string_value

PLIST_PROPERTIES
property_id plist_id

PARENTS
plist_id parent_plist_id

if your DBMS allows union types, then you can replace INTEGER_PROPERTIES, STRING_PROPERTIES, and PLIST_PROPERTIES with a single relation. Otherwise, you use a UNION clause in your query.


Admittedly, this sort of schema requires rather difficult or impossible SQL queries to deal with, but this is really a weakness in SQL, not with relational modelling.

But again, I'm probably missing something obvious that means this is more difficult than I think.

9:10 PM, October 22, 2008  
Anonymous Anonymous said...

"to keep this article tractable, I had to delete several pages of detailed examples, such as "Chieftain Monsters" that could be programmatically constructed by adding a few new properties to any existing monster"

Nooooo!

Screw tractable, I want those examples and the extra content! How about an "extended version"? Pretty pleeeease? :-D

10:38 AM, October 23, 2008  
Blogger Naveen said...

I want to second Jiayao's comment - please post the part you deleted!!

I am especially interested in the design issues you had in using this pattern in your Wyvern game. I'm designing a python based engine for Magic the Gathering, a collectible card game where the game pieces can override the properties/rules of other pieces and the game, and I've found that my approach is similar to what you've described here, although not so formalized. For example, how do you track multiple transient changes that have different expirations? Obviously, if a later change expires before an earlier one, the earlier value should then come into effect, so you need to stack them in a way where you access the most recently added one.

11:12 AM, October 23, 2008  
Blogger Unknown said...

+1

Please give us the Director's Cut (the missing 50%).

Very interesting read (so far).

6:12 PM, October 23, 2008  
Blogger Unknown said...

+1 for "Object Thinking" by David West.

7:48 PM, October 23, 2008  
Blogger Chris said...

This might be a good idea, but you need to learn to tell it in fewer words.

8:24 PM, October 23, 2008  
Blogger Action Jackson said...

This comment has been removed by the author.

10:30 PM, October 23, 2008  
Blogger gdanov said...

Really great post. I wish I could write like you.

Do you, by chance, know good reading on the plist persistence aspect? Often blobs are not an option and we are forced to put all our data into tuples containing primitive types.

2:18 AM, October 24, 2008  
Blogger Antagonist Prime said...

Great blog entry, the bits on Wyvern were especially interesting. Considered writing an entry just concerning game development?

4:39 AM, October 24, 2008  
Blogger Drew @ Cook Like Your Grandmother said...

Since no one else has commented on it, I'll be the pedant to point out that carrots do not, in fact, improve your vision.

10:40 AM, October 24, 2008  
Blogger Rob Sayre said...

Modern browsers do support arbitrary attributes. Maybe you mean something specific that I'm missing. The following document does work in Firefox and Safari, alerting "abc".

<html>
<head>
<title>Test</title>
<script>
var d = document;
var test = function() {
var div = d.getElementById("demo");
div.bar = "b";
div.setAttribute("baz", "c");
alert(div.getAttribute("foo") +
div.bar +
div.getAttribute("baz"));
}
</script>
</head>
<body onload="test()">
<div id=demo foo=a></div>
</body>
</html>

1:33 PM, October 24, 2008  
Blogger Dave Johnson said...

As well as Douglas Hofstadter, Steven Pinker is also very much worth a read.

2:58 PM, October 24, 2008  
Blogger Lyndon said...

Interesting synchronicity!

I recently had a conversation with a friend who works with the non relational database M204 which is for very large scale datasets.

It make use of Entity Attribute Values.

http://en.wikipedia.org/wiki/Entity-attribute-value_model

7:46 PM, October 24, 2008  
Blogger Aristotle said...

Editorial note: the man’s name is Damian (two As), not Damien (with an E).

Also, the implementation technique you call “refrigerator” is known as “inside-out objects” in the Perl world.

11:36 PM, October 25, 2008  
Blogger Rakesh Pai said...

Chris Ryland said... You've just re-explicated the primary reason that Lisp was originally used for "undefined" problems like AI: its lists (including property lists) could be used in a "big ball of mud" fashion to model nearly anything, with no formal data structures in sight.

And since JavaScript shares this trait with Lisp, I'm wondering why no one has thought about AI programming in JavaScript.

9:14 AM, October 27, 2008  
Blogger Chris Ryland said...

And since JavaScript shares this trait with Lisp, I'm wondering why no one has thought about AI programming in JavaScript.

Well, there was no Javascript back in the 60's through the 70's at the peak of the AI craze, but even if there were, Lisp also had the killer feature that no other language has to date: programs are data. This one fact enables things like macros, like building code to be executed dynamically, etc.

Plus, only in the past few months has Javascript performance started to actually rival serious programming languages. Lisp has had optimizing compilers for decades.

Just some points.

12:22 PM, October 27, 2008  
Blogger rdm said...

Chris, I think you need to work on the accuracy of your dates.

Also, I can find examples of people using or talking about using javascript for ai (using google).

12:55 PM, October 27, 2008  
Blogger Unknown said...

I knew I was in for a long read when I saw the index and I was not disappointed. Every minute I spent reading this was worth it.

I'm actually working on developing a game of my own. Partly for fun, partly to learn, partly to get my own ideas of an awesome game out there, whether it's been done or not. I will be using properties for just about everything now.

Thank you for setting me on the right track.

12:59 PM, October 27, 2008  
Blogger Parveen Kaler said...

Hi Steve,

Great article. This actually describes the Property Bag system and Attribute Editor we had at Relic Entertainment very well.

The canonical reference for this subject in the game development world is a presentation called "A Data-Driven Object System" by Scott Bilas of Gas Powered Games. He gave the presentation at GDC in 2002.
http://www.drizzle.com/~scottb/gdc/game-objects_files/frame.htm

Transient Properties
---
Most games solve the transient property problem by having a deterministic simulation. The key is to have your "Cast Spell" action update the state of the game in the exact same manner each and every time.

Persist the casting of the action rather than supporting transient properties.

3:13 PM, October 27, 2008  
Blogger Unknown said...

Hmmm, nicely written article.. but sorry, not convinced!

Firstly, WRT inheritance: its fiendishly difficult to keep a mental image of a prototype based system where properties are being modified/read/inherited all over the shop. Especially objects that reference prototypes which mutate their properties over time. Emmitt Smith suddenly loses his "great speed and balance" - does L.T?

Secondly, WRT multiple inheritance: The situation is even more confusing.. What if the announcer has said: "L.T's catching skills are like Emmitt Smith and Walter Payton combined!" now what...

No, give me:

+ interface
+ composition
+ delegation (to composites)
- inheritance

any day of the week.

Sam

6:28 AM, October 28, 2008  
Blogger Tartley said...

@Sam: Hey there! Your objections are reasonable, but I wonder how much they stem from being 'stuck' in an object-modelling mindset - seeing every design problem as a nail to which our familiar tool should be applied.

We've all done lots to explore and familiarise ourselves with the situations where object modelling works well. And we've probably grown to subconsciously avoid the areas of design where it doesn't work well.

Maybe the 'universal design pattern' (bad nondescript name!) idea needs some time to bed down in our conciousness before we can truly evaluate the areas in which it excels and should be applied, versus the areas in which it doesn't work well and should be avoided.

6:57 AM, October 28, 2008  
Blogger Tartley said...

It crosses my mind that any dynamic language, in which classes are amenable to change at runtime, then the inheritance tree of the classes themselves become an example of this design pattern in action. Base classes are the parent container. Class-level attributes are the key-value mappings. Inherited classes may add new class level attributes of their own, but otherwise inherit those of the parent class.

In Python, access to class attributes defined on a parent class are looked up dynamically at runtime, first by searching the class itself, then its parent classes, so changes to the parent class would be inherited by all inheriting classes.

7:02 AM, October 28, 2008  
Blogger rdm said...

I also had some objections to some of this blog rant, but i now realize that I was objecting to something that might solve this inheritance issue.

Stevey's design had a careful barrier between permanent features of an object and temporary features. Object based property inheritance presumably only worked on the permanent features.

This binary treatment of time has limitations (which was the basis of my original objection) but does seem to address some of these other concernss.

7:06 AM, October 28, 2008  
Blogger Jess said...

In what sense is it helpful to think of a map or array as a surjection? What codomain does a map imply, and how does a map span the whole thing? Does the codomain change whenever we put() or remove()?

It might be more apt to consider a map to be just a function, without mentioning jectivity, but I don't really see what that gets us pedagogically. As others observe, the difference between the prototype pattern and OO or indeed any style with side effects is one of perspective. Namespaces, they're honking great!

8:45 AM, October 28, 2008  
Blogger Unknown said...

Tartley,

The minus sign in my previous comment should hopefully show that I am far from being in a typical "object-modelling mindset".

I believe inheritance should be discarded. All behaviour should be implemented though composition of reusable components, with interfaces and delegation to provide polymorphism.

But I digress.. my immediate argument here is that prototype based modelling will end up a mind-numbing mess of inherited properties very quickly. Let's see.. a donkey is a bit like a horse, but maybe the size of a cow, although its ears are sort of rabbit-like.. hang on who just made the horse able to jump that high, all my donkeys are escaping!

Sam

9:43 AM, October 28, 2008  
Blogger ]{s said...

This pattern is not just useful in game programming systems. It's incredibly useful in any place where you want enormous run time flexiblity and uncertainty of what will constitute your object at the time of creation.
An example, in financial applications, where every system is adding a bit of its own enrichment to some order a client has placed, such a system is incredibly useful.

Also, I find this pattern incredibly useful in databases too. Whenever I have to model base->derived classes, instead of a table per derived class, i usually end up with a instance-property table mechanism. i.e. I have a table where the instance of the object is captured, and all of its properties are captured as name value pairs in a child table with the instance id as foreign key. With this mechanism, the schema is stable but you have enormous flexiblity in what you model. For all the nutty people out there who get hung up on type safety, i find it useful to make views per derived class. This keeps them happy, and me too.

Thanks for the article.

11:38 AM, October 28, 2008  
Blogger Unknown said...

"If you think about it, Arrays and Maps share the same underlying formalism (a surjection, not to put too fine a point on it), and in some languages, notably PHP, there isn't a user-visible difference between them."

I think Lua is the most notable example:

"Unlike other scripting languages, Lua does not offer an array type. Instead, Lua programmers use regular tables with integer indices to implement arrays." (The Implementation of Lua 5.0)

Until 5.0, arrays were stored in regular hashes.

5:51 AM, October 29, 2008  
Blogger Brooks said...

There's an interesting example of the property pattern that you didn't mention, but which is quite familiar: Lexical scoping. Any given scope has a set of symbols and their meanings, some of which are local and some of which are inherited from higher-level scopes.

I recall thinking about this at one point when considering TeX-like langauges. These have a couple of interesting variations on the usual property pattern: There is generally only one scope active at a given execution point, and so the dependency chain is a straight line, but it tends to be painfully deep when metaprogramming is involved (much deeper than would be typical with a C program, for instance). Values can be changed at either the current level or the base ("global") level, and thus some optimizations to avoid having to backtrack values all the way up the chain seem quite important. I did a bit of thinking about how to do that effectively, and tried to follow how Knuth did it in TeX, but never really got very far with it.

11:33 AM, October 29, 2008  
Blogger Malte said...

Hey Steve,

very very nice post. For an interesting use of the prototype pattern in conjunction with class based programming you might want to check out the JS meta system Joose http://code.google.com/p/joose-js/

It allows going back and forth between classes and prototypes using the detach method that is present in all objects.

1:41 PM, October 29, 2008  
Blogger Albert said...

One interesting choice for storing the data might be something like the "network database" Neo.

It stores objects with properties and their relations to each other. It also allows querying the network by properties and relationships.

5:03 AM, November 01, 2008  
Blogger Unknown said...

Thank you for this post. I found it very inspiring and well worth a read. I'd like to see more posts like this!

5:32 AM, November 03, 2008  
Blogger Nate Cull said...

@12challenge:

"Yeah, I was thinking about something like this, but then the parent object has to maintain a list of all objects that inherited from it, in order to update them. They, in turn, have to notify the parent when they are deallocated so it can remove them from its list."


That idea is at the heart of a toy system I'm working on which I think has a lot of potential. Yes, it's a lot of state to maintain, but when it comes to distributed Internet-based systems, you end up reinventing such publish-subscribe tracking manually anyway (in things like web proxy caches and RSS subscriptions and mailing list servers in the large, or signal/slot GUI widget systems in the small).

I'm thinking about systems like Cells, FrTime and Flapjax which are coming from the 'functional reactive programming' paradigm. If we mix this with property bags, I think we can get a truly universal, fully pure-functional, compute/store/distribute framework.

6:56 PM, November 06, 2008  
Blogger Unknown said...

not sure your use of the term “modeling” is correct in some technical sense. It is useful perhaps in conveying your enthusiasm for prototype based lang.

For example, if you look at Wikipedia, here's some standard meaning of modeling:

http://en.wikipedia.org/wiki/Model_theory
http://en.wikipedia.org/wiki/Mathematical_model
http://en.wikipedia.org/wiki/Computer_model

for much more, see:
http://en.wikipedia.org/wiki/Modeling
under the section “Abstractions, concepts, and theories”.

none of them are exactly as you described by a list of specific “models” you speak of.

Further, looking at your list of models:

• Class Modeling
• Relational Modeling
• XML Modeling
• Other Schools (functional, logic based)

...

now, Relational modeling really covers class modeling, becausec classes, in java, is a a hierachy (i.e. tree), and a tree is just a specific type of relation. (another word for “relation” in math is a graph, or network, of which tree is just a specific type) Now, consider other OOP langs that has multiple inherence, then it's basically a graph more complex than tree, but still a relation.

Then, XML is really just a tree. So, XML and Java is pretty much the same with respect to their “modeling” structure, namely a tree. It's kinda odd you mix them in the same context and call them modeling. OOP in general is a programing paradigm, while XML is mostly a inert data structure. Both are trees. In their different ways, you could call each modeling. But to list them together as different types of modeling is odd, because in one sense both of them are really just trees, and in another perspective one is a programing paradigm, while xml is just a data representational structure.

in summary, i find your list of different school of modeling is rather a mismash of elements. The Class Modeling, XML modeling, are really just different aspects of Relational Modeling in sa far you call them “modeling” ...

Am not trying to pick bones about terminologies, but the term modeling, in general describes methods to emulate problem or system. The only proper use of the term modeling in a strict sense, are those under Wikipedia's description of “Mathematical Modeling”. They can be classified in several ways,... e.g. statistical, neural networks, cellular automata, probabalistic, determinsitic, algebra based, ... there is no one classification system ... rather, like all the branches of math, how one classifies them depends...

You say that OOP is not popular with academecians due to it not having a mathematical basis... although i do see what you are saying, but to say that is kinda odd because strictly speaking nor any functional programing lang or program written in functional programing paradigm has any mathematical basis (except those langs or software that are meant to be proof systems, e.g. coq). Functional lang or software has math basis only in a rough sense, and so does OOP has mathematical basis (e.g. as state machines).

Then, in your section “Finding the sweet spot”... i find it quite fuzzy, similar to as discussed above about the use of the term modeling. For instance, how about a school of modeling using cellular automata? statistics? genetic programing? Broadly, these are important modeling methods. They are far more so considered as a modeling method than XML or OOP would be considered as a modeling method. And any of these modeling method could be implemented in most langs, in Java (OOP) or Haskell (functional), Perl (procedural), for examples.

Since you list XML and OOP as modeling, then what about pattern matching in functional langs? if XML and OOP are considered as a schools of modeling, then pattern matching, term rewriting languages/systems, should also be one of the “school of modeling” in the context of your article, as well as quite a lot others... (say, list/array processing or progarming langs (lisp, apl, matlab).

----

In your subsection “BRANINS AND THOUGHTS”, you wrote:
“Douglas Hofstadter has spent a lifetime thinking about the way we think. ...”. I think you gave too much credit to Doughlas and his rather pop book “ Gödel, Escher, Bach”.

In the same subsection, you went on with 10 or so paragraphs describing “prototype modeling” using football player as analogy. I feel that you went too far and fuzzy. I mean, the same football player scenario can be worded so that it fits in some functional modeling paradim or OOP paradigm or logic paradigm or patter matching paradigm.

For example, the announce introduces a football player “L.T.”, oh then i thought it's a pattern that matches the general pattern of football player. As the announcer give more detail, i realized it's a more specific pattern. Then when the announcer introduced another player, i realized it's a pattern that matches “L.T.”. So, pattern matching is the all important modeling technique in human thought!!!

So, in summary, i feel that this is too fuzzy to be meaningful. It's almost like word play. One can paraphrase it to fit into any “school of modeling”.

As another argument... prototype lang like javascript is merely a slightly different way to do OOP. I can't see a precise formulation in the context of your blog to say that one is different than the other as a modeling system. Not quite sure how broad or specific you mean by prototype modeling neither. In functional lang, you can just have functions with local persistant vars that are var/value pairs, or as you say in lisp with its “properties” facilities to symbols.

The above is my first impression in reading your article up to and include the section “Eclipse”.

--------------

As i read the section “Javascript”, “Wyvern”, “Lisp”, “XML Revisited”, the context of your discussion of “model” and property based model becomes clear. Effectively, in my view, the talking about different school of modeling is merely a rough intro. The meat is really just about the propotype design pattern as a programing paradigm, in particular, as in Javascript or a function/item that contain a list of var/value pairs that can be inherited. You think that this paradigm is very useful.

By now i spent maybe 1 hour to read the above parts of your article and wrote this post so far. I just scanned the rest of your article, which go over some concrete details and issues about this paradigm.

----------------

In summary, i like this blog more than your other blog entries, because in other you often ramble in many directions with fancy made-up stories and liberal use of informal writing style (like: haha gocha! doh, bah! don't apologize!! Tell Fred i still like him.). (i particularly find the informal touch distracting) This blog is rather concrete and direct, apart from a relatively short and fuzzy intro about “schools of modeling”.

I think, if the title of the blog is changed to “how i love prototype based paradigm” or “the usefulness and issues of prototyped paradigm”, or “prototyped paradigm as a programing pattern”, would be more fitting, as opposed to Software Modeling. I just noted that your actual title is “The Universal Design Pattern”, ok, that's fitting too. Personally i find the term “Design Pattern” despicable (and i find the classic book Design Patterns by the “gang of four” as pure garbage (see this rant: “Why Software Suck” at http://xahlee.org/UnixResource_dir/writ/why_software_suck.html )). To use “universal” as a adjective for the prototype paradigm/“design pattern” i think is a bit zealous.

overall, interesting article. Thanks. (i wasn't gonna read it but scanning the modeling intro got me started bitching) This post is just my rambling of impressions.

Xah
∑ http://xahlee.org/

7:21 AM, November 12, 2008  
Blogger rdm said...

Xah Lee, I have not fully studied your response yet, and might have overlooked some gems of thought. However, I think that you are talking about a very different level of abstraction than Stevey was talking about.

Rather than go into details, I will suggest that you are thinking of modeling different kinds of systems than Stevey was thinking of.

8:24 AM, November 12, 2008  
Blogger Fedor Soreks said...

this link:

http://steve-yegge.blogspot.com/2008/10/Missing

from Missing keys in ToC is broken

5:02 AM, November 14, 2008  
Blogger Unknown said...

This is an amazing article; I just got around to reading it. For the people who complain about "typos" showing up in key names, causing a new key to be created instead of reassigning an old key, there's a simple psychological solution.

Instead of just having the method "setProperty(key, val)", also have "bindAndSetProperty(key, val)". Only this later method can be used to create new keys, and the former method will raise an error if the key is unknown. The longer name will discourage it from being used as much. This api mirrors Scheme's usage of "define" and "set!", which also helps accidental typos.

6:31 PM, November 18, 2008  
Blogger Nuno Job said...

Quite surprised you didn't reference the obvious. CouchDB is an apache project and you only refer BDB. DB2 is the industry leader in XML databases an I know people that left oracle and sybase cause it couldn't handle the amount of data they needed.

So why the Oracle fixation? At least you could have referred both alternatives.

12:57 AM, November 29, 2008  
Blogger Eric Kemp-Benedict said...

This was a nice article, cleanly written and well argued. And the pattern seems to have wide applicability.

I saw that Ranga Blogger had mentioned Tcl. As this is my language of choice for many purposes, I went ahead and posted an implementation of the Prototype Pattern on the Tcler's Wiki. By the next morning a much tighter version had been posted (using Tcl 8.6). To get an idea of how compact Tcl can be and its flexibility (you can define new control structures easily), here's the implementation (the tighter one, by a better Tcl programmer than me):

oo::class create prototype {
variable state parent
constructor {{parentObject {}}} {
array set state {}
set parent $parentObject
}
method get name {
if {[info exists state($name)] || $parent eq ""} {
return $state($name)
} else {
return [$parent get $name]
}
}
method put {name value} {
set state($name) $value
return
}
method has name {
if {[info exists state($name)]} {
return 1
} elseif {$parent eq ""} {
return 0
} else {
return [$parent has $name]
}
}
method remove name {
unset state($name)
}
}

6:19 PM, December 08, 2008  
Blogger yehnan said...

It was definitely more work than my typical posts, so if it doesn't go over well, I'm happy to go back to the comedy routine.

Please don't. I love this post very much. And I'm translating it to Mandarin. The first round is over. But still needs more rounds to polish.

Thanks for the essay. :)

6:26 AM, January 03, 2009  
Blogger Unknown said...

>The concepts of OOP stem not from mathematics but from fuzzy intuition.

Didn't a major portion of OOP stem from AI? Marvin Minsky's frames and Alan Kay's doctoral thesis and such?

12:08 PM, September 15, 2009  
Blogger hyponiq said...

Indeed this was a good read. It's incredibly amazing how involved this system structure is with many other languages' framework libraries. And now that someone has put a face on the practice I personally have almost (unknowingly) always used, I'll likely recognize it more for what it is than for generalizing it as another pattern that doesn't quite fit the description.

1:28 PM, March 25, 2011  
Blogger Ola Bini said...

As it happens, Java will finally get properties attached to arbitrary classes, through java.lang.invoke.ClassValue. (You can find the JavaDoc for it here: http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/java/lang/ClassValue.html).

6:21 AM, April 12, 2011  

Post a Comment

<< Home