Benefits of Dynamic Typing
This is not a post about static typing vs. dynamic typing. In the several past years I have worked with both and see advantages and disadvantages in both dynamic and static typing. While this might be an interesting topic on its own, this is maybe for a future post.
No matter what you like or dislike more, some languages are built from the core being either static or dynamic typed. Perl, including other languages like Python, Ruby, JavaScript, Lisp, Lua, … are built as a dynamic typed language. But instead of embracing dynamic typing and using it’s advantages many programers actually work with a dynamic typed language as if it would be static-typed and adding type-checking either through libraries or through obsessive use of classes.
In Perl this is no exception. In this post I will show you what happens when you use a dynamic typed language like a dynamic typed language, not like a static typed language with missing type-checking.
Creating a Point
For discussion we pick a simple easy example. We just want to create
a Point that contains an X and Y value. Sometimes also called a Vector2
.
One way, and probably one of the most picked way in many languages today is to create a class out of it. In Perl this looks like this.
|
|
Here we define a class Point
with methods x
and y
that act as a getter
and setter at the same time. When you have written many many classes like
this, like me, then maybe you gona go crazy of how stupid that code is
and how much boilerplate it is to always write those getter/setters. One
answer to that in Perl was Moose
. So we also can write the class like
this in Moose.
|
|
Here is how both classes are used.
|
|
So what is wrong about this two examples you maybe ask? Well, a lot, so let’s begin how a truly dynmic typed version looks. Here is the code you usually would write when you embrace dynamic-typing.
|
|
You see what the problem is? That whole classes stuff usually creates just a lot of boilerplate mess that aren’t really needed. I mean consider the following.
All that those classes do is basically just wrapping a Hash. You write those getter/setters and what do they do? Well they just set a field in a Hash or get you a field from a Hash. But what is wrong just using a Hash instead?
I mean let’s look at how to get a field. Here is a class version compared to the direct version using a Hash.
|
|
and here is setting a value.
|
|
So all you get from those 10-20 lines class implementation is that you replace
curly braces {}
with parenthesis ()
! And in setting you save one character
for typing =
. Woosh! Awesome!
Now there is the next problem. Did you ever Benchmarked how much performance you will lose using a class instead of directly using a Hash? Here is a benchmark for initalization.
|
|
Here are the results.
|
|
So the pure perl version is roughly 5 times faster compared to moose. And just the Perl class to just using a hash is 2.5 times the performance. How about getting and setting a field?
Here is the Benchmark code.
|
|
and here are the results.
get
Rate perl moose hash
perl 6981/s -- -19% -75%
moose 8650/s 24% -- -69%
hash 28183/s 304% 226% --
set
Rate moose perl hash
moose 2739/s -- -61% -92%
perl 7045/s 157% -- -79%
hash 34133/s 1146% 385% --
Setting in Moose is so much more worse compare to the pure perl version because it additionally does a type-check for number. And of course that happens at runtime. That’s what dynamic typing actually means. But compared to directly getting/setting a value in a hash both version are just extraordinaly bad.
So at this point I say this solutions have already two problems.
- Too much boilerplate code.
- Too much of a performance hit.
Can we improve?
There are reasons why people start using classes and I go into that further now. But let’s see how creating a point is usually solved for example in typical Scheme.
|
|
When you are unfamiliar with Scheme. Here is how that code translates to Perl.
|
|
In some sense this is very familiar to the class/object-oriented versions. Only with the difference that it is by far a lot less code. Instead of so called methods you just create namespaced functions. What is interesting is the following.
A function like make_point()
creates a Point. But the structure it creates
isn’t taged/flaged as a point. It’s structure is actually just an array/list
with two fields, and that’s it.
You as a programmer have to remember if that is a point or not and if that
value is valid or not. But also the function point_x
and point_y
are in some sense special as they don’t really check it’s structure or check
if they are valid points. They just expect an array and either return
the first or second value. And that’s what dynamic typing actually is about.
At least, let’s check how fast those versions are.
|
|
Rate init set get
init 8296/s -- -20% -43%
set 10383/s 25% -- -28%
get 14491/s 75% 40% --
Initialization is a little bit faster compared to initialization an pure perl
object or mosse object. Mainly this is because it uses an Array and those
can be created faster. But with 8296
calls per second it is still not fast
enough as directly initialization of an hash that is around the 34000
calls
per second. Here you can basically see just the overhead of a function call
vs. directly initialization something. Calling functions aren’t cheap!
I changed the code and instead of using an Array I used a Hash so it is similar to all the objects we had so far. Here are the results for those. I omit the code here as changing to a hash should be obvious. The results are now.
Rate init set get
init 5493/s -- -45% -59%
set 9998/s 82% -- -25%
get 13398/s 144% 34% --
With ~5500 cps
we are slower than just initialize a hash ~7800 cps
but faster
than using a perl class with ~3100 cps
.
Otherwise getting or setting a field through a function is around twice as fast as the object-oriented versions. It’s faster because calling functions is faster as calling methods on objects and dividing a getter/setter into two parts make the code faster as they are no branching logic that decides what todo or always do both things.
So, here is the question. Why not stick to such a function based solution? It is by far more less code and it also is faster.
But let’s go a little bit deeper to see some more difference.
Adding behaviour
Let’s assume the following. We want to add a function add
that takes two
points and adds them together. I like immutable programming, so even if
the data-structure (Array,Hash) is mutable it means instead of mutating
on of its input, we return the result as something new.
Now it starts to become interesting, because we can pick two solutions to implement it. Here is the first solution of the Scheme like version.
|
|
This solution is similar as the following solution in the object-oriented code.
|
|
So what we are doing in both cases is to not directly access the hash
to extract x
and y
, instead we are used to use methods to access those
fields. Well, methods that aren’t that fast to begin with. So we now have
four of those getter methods we call to just get 4 fields.
Here is another way how to implement add
instead.
|
|
So instead of using functions/getters we directly use the hash. Just let’s compare how fast those different implementations are.
|
|
class_1 1047/s -- -34% -49% -63%
hash_1 1585/s 51% -- -23% -44%
class_2 2055/s 96% 30% -- -28%
hash_2 2844/s 172% 79% 38% --
As expected. The implementations that uses a function/method to access x/y
are the slowest. But still amazing that a function based solution is already
50% faster than an object-orientet solution.
Otherwise directly accessing the hash fields makes the code around two times
faster compared to it’s similar solution. class_2
is around two times
faster as class_1
. Same goes from hash_1
to hash_2
. From the slowest
to the fastest version we have a 2.7x performance improvement.
Here is btw. the Moose solution.
|
|
and the benchmark.
|
|
this prints
|
|
So with roughly 460
calls per seconds we nearly have half of the speed compared to the slowest version class_1
while hash_2
is around 6 times faster.
How about a faster Moose
version? Well in Moose it is usually discouraged
to directly access the fields. I guess proper object-orientation is the reason
for that. And sure you want those checks and behaviour that Moose
typically
add. So no fast version for you!
Some call it prober object-orientation. I just call it bullshit.
Advantages and Disadvantages
It’s something i cannot stress often enough. Everything has it’s advantages and disadvantages. The same goes for which solution you pick.
When you use function/methods to access the fields of x
and y
then
there is one advantage. You actually can change the internal structure
on how that object is represented. For example you change the hash version
to an Array. Because Array access are faster. Doing that change would
mean the add1
solutions would still continue to work. While the add2
solutions will break.
So while the add1
solution has an advantage when you change the internal
representation it comes at a big performance hit. So those add2
solutions
are faster but when you change the internal representation you also
must change those functions to reflect the lastest changes.
But here is the great Buuuuut. How often do you fuckingly change the internal representation? Are you stupid or something like that?
You usually pick one representation and then you stick with. I guess once you settled to one representation you maybe will stick for it for the rest of your life. So there is no point in optimizing a case that usually never will happen but causing you big performance hits for your code.
And even when you change the internal representation maybe every 10 years,
yeah you also must change add
. That’s not a big deal.
So while there are advantages and disadvantages in both cases. The advantages you get by accessing every field trhough a function are practically not worth it.
Extending Point
Object-orientation is often teached that you can use crappy inheritance
to extend your objects. So let’s do that. We create a Point3D
that
we extend from PointP
.
|
|
and now we can do:
|
|
okay, works nice. But how does our Scheme inspired code will look like with the same behaviour?
|
|
Now, let’s look at it. Obviously point_x
and point_y
don’t need to change
because all they do is just return an x
field from a hash. Those functions
work with any hash that has a x
field. Very reusable and obviously
not really needed at all. $hash->{x}
does the same.
We actually could create a point3d_add
function and differentiate between
point_add
and point3d_add
but, why? The whole purpose of dynamic typing
is also to check for structures you pass in and depending on the structure
you can decide different behaviour. Here point_add
works with both. It
just calculates the x/y
fields, and when both hashes provide a z
field,
they also calculate the z
field.
You can pass it 2D points and 3D points. You even can mix it and one can be a 2D point and the other being a 3D point. Not even that, it works with any hash that just have those fields.
Here are the ways you can use those functions.
|
|
Isn’t it somehow awesome that we also can pass it PointP
and Point3D
objects? Create the hashes directly instead of calling make_point
or
make_point3
and also mix 2D and 3D points? And besides all of that we
even write less code for all of that.
How about performance?
|
|
here are the result.
Rate point3d scheme
point3d 1778/s -- -28%
scheme 2461/s 38% --
40%
is maybe not the greatest jump, but also not the worst. Considering
that writing code this way offers you:
- Less code
- More flexibility
- Higher performance
only leaves with one question open.
Why do you ever want to write object-oriented code with classes and a pseudo type-checking if when you start using a dynamic typed language like a dynamic typed language has so much more benefits?