Quack, you B*!#*rd!

Now, don’t get me wrong… I like dynamically typed languages (especially Python, not so much Ruby ;^), but I have to say that the whole “duck typing” thing is a bit overrated!

Take the following simple example:-

def pretty_print_person(person):
    """ Pretty print a person instance to stdout. """

    print "Name:", person.name, "Age:", person.age

    return

Although no types were harmed in the making of the above function, I can’t just pass any old object in as the “person” argument. If I try to pass in an integer, say, 42 I get:-


AttributeError: 'int' object has no attribute 'name'

In other words there is an implied protocol (Python terminology) or interface (Java et al!) that is required of the “person” argument, namely that it has attributes called “name” and “age” (note that in Python that’s *all* it says, it says nothing about the types of those attributes).

Now if I am a single developer, working alone in my bedroom, on a small piece of code that nobody else in the world will ever need (or want!) to see, and I name all my arguments nicely so that the implied type is kinda obvious, and I have an amazing memory then this is probably fine, but what if I am working in a team of n developers (where n > 1 ;^)?

In the team situation, the implied type information that was in the head of the developer that created the function is thrown away. Now, I can hear the quacks of protest already… “Why not just add a comment?”. Fair cop. Here is some code taken from the “ActionController” module in Ruby.

  # Holds a hash of all the GET, POST, and Url parameters passed to
  #  the action. Accessed like params["post_id"] to get the
  #  post_id. No type casts are made, so all values are returned as
  # strings.
  attr_internal :params

  # Holds the response object that's primarily used to set additional
  # HTTP headers through access like response.headers["Cache- # Control"] = "no-cache". Can also be used to access the final
  # body HTML after a template has been rendered through
  # response.body -- useful for after_filters that wants to
  # manipulate the output, such as a OutputCompressionFilter.
  attr_internal :response

  # Holds a hash of objects in the session. Accessed like
  # session[:person] to get the object tied to the "person"
  # key. The session will hold any type of object as values, but the key
  # should be a string or symbol.
  attr_internal :session

  # Holds a hash of header names and values. Accessed like
  # headers["Cache-Control"] to get the value of the Cache-
  # Control directive. Values should always be specified as strings.
  attr_internal :headers

Here the developer has, very nicely, commented the attributes so that I can see, for example that the “headers” attribute contains a hash of header names and values, and that the values should always be specified as strings. I can also see that the “response” attribute holds a response object which also places a requirement on the protocol/interface support by any values assigned to it.

All of this is very handy information indeed (assuming that the comments are correct and up-to-date of course), but why not provide the information in the code so that it is accessible beyond just the API documentation system? You never know, it might come in handy for:-

a) validation of values assigned to attributes/passed in as arguments
b) simple GUI generation
c) OR database mapping
d) component frameworks
e) web frameworks
f) insert your favourite tool here ;^)

Now, the hard-hearted amongst you might, at this point, just tell me to bugger-off back to Java or C++ or whatever statically-typed hell-hole I came from. Well, truth be told, I don’t want to. I like the terseness of most dynamically-typed languages (so, maybe Ruby overdid it there a tad ;^), I like the meta-programming/introspection capabilities, I like how I can get closer to being able to express *what* it is the code is intended to do as opposed to *how* it does it. I especially like being able to prototype without types and gradually “harden” them as I understand more about what is going on (which IIRC was a feature of Dylan, a programming language from Apple).

Now, I am obviously, neither the first nor the only one to want this combination of static and dynamic types, and if you cast an eye around the Python community you will see that every person (and their dog!) involved in team development has, at some point, written or adopted a system for specifying type information, and there at least 2 stable and mature projects that have achieved much wider adoption:-

1) Traits
2) Zope Interfaces

Disclaimer: I used to work for Enthought Inc., the company behind Traits, but I didn’t write it, I have no vested interest in it, and it is free, open-source, and BSD licensed!

IMHO, using dynamically-typed languages in conjunction with optional static-type systems combines expressive power, readability and incredible tool potential, and offers a viable alternative to statically-typed (and usually compiled) languages for non-bedroom based development teams ;^)

Q. When is a Boolean not a Boolean?

A. In the hands of dodgy Python developers ;^)

I just came across an example of my least-favourite(?) anti-pattern in Python – using “implicit” boolean values in conditional expressions. This particular occurrence was found in sample code in the “Google App Engine”, but it could have come from lots of places ;^)

...
user = users.get_current_user()
if user:
   # Do something...

else:
  # Re-direct to login page
...

The “get_current_user” method returns None if there is no user currently logged in, hence the poorly written “if user” test.

Now this is (obviously) perfectly valid code because in Python, empty lists, dicts, 0 (zero), None etc all evaluate to False, whereas a list or dict with at least one item, a non-zero integer, a non-None reference to an object etc. all evaluate to True…

Well, almost… and therein lies the problem! If, for example, an object instance implements the special method “__len__” and happens to return zero then it too would evaluate to False. Maybe what you wanted, and maybe not, but in my experience this has caused some weird, wonderful and subtle bugs (the best kind ;^). IMHO it is much better to use explicit boolean expressions where, errr, booleans are expected, and hence the above example should be:-

user = users.get_current_user()
if user is not None:
   # Do something...

else:
  # Re-direct to login page

I’m not sure why some Python developers insist on using the above pattern – do they really think that the typing it saved them reduced the overall development time? If so, maybe they also think of themselves as typists, not developers ;^)

Good Test, Bad Test…

Developers shouldn’t think of writing tests as like writing code – they should think of it as *exactly* the same as writing code. IMHO, the “code” is has 2 parts, the implementation and the tests, and each needs as much care and attention as the other… Anyhoo…

A test remarkably similar to the following cropped up recently:-

def a_test(self):
   ...
   self.assertRaise(SomeError, self.foo(x, y, z).blargle)
   ...

Now, to paraphrase the legendary Mr Clough – “It’s not the worst test I’ve ever seen, but it’s in the top 1” ;^)

How can such a small test be so bad? Well, it manages to pack a couple of critical errors into a single line (and that’s some going I have to say ;^):-

1) It is not clear whether it is the method call or the attribute access (or both) that is expected to raise the exception.

2) The implied API is clunky at best. If the reason for calling the method is to get hold of the “blargle” then just return the “blargle”! The only thing that this test makes clear is that the API is not clear!

If I was a betting man, I would bet that this test was written after the implementation, which might excuse the API weirdness (it could be the start of refactoring a legacy API), but not the lack of clarity…

Views on Django views…

I’ve come across Django on a couple of sizeable projects now, and I’ve noticed that the teams involved put all of their views in ‘views.py’ which can end up being as long as 1000 lines+ (and then some ;^). Now, I know that is the Django way, but my gut instinct is that:-

a) it cuts down on potential re-use
b) on mutli-developer teams it increases the chances of merge conflicts

Not to mention that any Python module over a couple of hundred lines long sets off my nervous tick ;^)