Sunday, March 25, 2007

Compression and Communication in Software Development

The process of refactoring software code is something akin to a compression algorithm. As an example let us take a simple lossless image algorithm that takes runs of the same colour and stores the colour and the number of pixels in the run. We can think of this as finding two functions that are the same, but with different names and creating 1 function and pointing the original two callers to the new function. This process is about abstraction.

Fair enough you might say, but we can take this a bit further. Thinking about compression triggered some thoughts about Information theory I learnt at University. Namely, Shannon's Law- so I looked it up on Wikipedia, and here is what it had to say:-

  • it is a felony offense to discharge firearms randomly into the air, resulting in injury or death.

Err, ok maybe not that Shannon's Law (in Arizona). A little further digging on the right pages reveals Coding theory (no it not about programming!!!!!). Ah, and now I have found the bit of information I am after (and its a 1 - sorry poor joke -it's a Sunday evening!!!)

  1. Entropy of a source is the measure of information. Basically source codes try to reduce the redundancy present in the source, and represent the source with a fewer bits that carry more information.

SOURCE ENCODING

Translated to software development - there is a minimum number of lines of code required for each program, where there is absolutely no redundant code. This can commonly be seen in some coding competitions, where the winner of a challenge solves the problem in the fewest lines of code.

Putting aside readability, quality, comments, maintainability, the meaning of your code or communicating your code with other developers etc, we can surmise the following. When your code is represented in the most concise manner, no two lines or pieces of code should look the same, do the same thing or repeat any other lines of code. This is because any lines of code with the same data are redundant and do not transfer any new information to the compiler.

So here is the problem, in many languages it is IMPOSSIBLE to make code concise and remove redundancy - thereby reducing the entropy of the code. Take for example design patterns, they are seriously REDUNDANT!!!! The mere fact that they exist, they provide the same construct in multiple scenarios really means they should have been written once, not millions of times by millions of developers!!! (Mind you I have now used generics to write a Singleton - ONCE)

C# has just created some "syntactic suger" to reduce the amount of code required to write a property, which meets one of the key tenets of Information Theory. Namely, use short words for the most frequently used words!

So, on the flip side, is:

CHANNEL ENCODING

When communicating information we need to add redundant information to ensure our message is understood - think listening to a fuzzy radio station - sometimes it is possible to understand without hearing every word!

So, in programming terms, we have two key audiences/recipients for our code:-

  1. The compiler
  2. You, the developer, and other developers

We add large amounts of redundancy for comments and in terms of code structure in order to effectively communicate the meaning of our code. At this point, due to tiredness I am starting to forget what I really wanted to say! A quick wrap up and I'll think about it some more!!!

1. We can apply some of the theories of Information theory to Software development

2. We apply Source Encoding as we refactor/abstract our code

3. We apply Channel Encoding to improve the communication of our code

4. Finally, we are slaves to our compilers when it comes to removing redundant lines of code

Friday, March 16, 2007

Currying, Closures ad how to screw your mind in 1 line of code

Ok, so we can pass functions as data using delegates. I recently started using delegates as return values for functions, and this can lead to some odd syntax!

So, the 1 line of code:

int answer = Add(5)(3);

Note the second set of brackets - where the hell did they come from!

So here's the add function

public static AddNumbers Add(int x)
{
return delegate(int y) { return x + y;};
}

where AddNumber is a delegate (function pointer) defined as:

public delegate int AddNumbers(int y);

What the hell does this mean?

When we call Add(5) we are doing two things:

1. Setting the variable x within the anonymous function's return statement
2. returning a function, which takes a parameter y

As the return value of Add(5) is a function we can then call the function Add(5)(3); which will return the value 8!

This leads to two important functional concepts:

1. Currying - a function which takes multiple parameters can now be split into multiple functions each taking one parameter i.e. Add(int x, int y) becomes Add(int x) : return AddNumber(int y)
using the stack to store the x variable

2. Closures - the anonymous method can "see" the variables within the Add(int x) method of which it is a part - just like the contents of an if statement would. The complexity here is when the anonymous function is returned from the Add function

We can use currying to intialize parameters. So for example lets say we wanted to do the following sums:

answer = 3 + 5
answer = 3 + 7
answer = 3 + 9

rather than do:
int answer1 = Add(3, 5);
int answer2 = Add(3, 7);
int answer3 = Add(3, 9);

we can initialise the 3 parameter using the Add function, and rewrite our code as:

AddNumbers adder = Add(3);
int answer1 = adder(5);
int answer2 = adder(7);
int answer3 = adder(9);

This becomes really useful when out initialization occurs in a different part of our code to the actual "running" of the code. It could also allow us to create an "immutable" function, where we initialize all of the functions parameters at one point in our code and return a parameterless function.

Applying Control Theory to Software Development

Many moons ago I learnt control theory - possibly the hardest final year engineering paper at Cambridge. And it had the following to say:

Every system can be treated as a black box with inputs and outputs - ok no revolution here, but I need a starting point!

It also said:

Observability is a measure for how well internal states of a system can be inferred by knowledge of its external outputs - thank you Wikipedia.

and,

Controllability denotes the ability to move a system around in its entire configuration space using only certain admissible manipulations - Wikipedia's entry is not so helpful a translation would read how much of the output can we control using the inputs

So, Observability is the ability to deduce whats going on inside by whats coming out - a bit like the stool analysis and Controllability is the ability to configure the system via inputs - the equivalent of the atkins diet.

Now in software systems, we can apply the same ideas. Namely, when I write a chunk of code it is a black box system - stuff goes in and stuff comes out!!!!!

In fact several programming techniques are based entierly on this principal, namely, Inversion of Control Design pattern, Test Driven Development and large chunks of Functional Programming concepts.

In fact, my normal gripes with any framework (Java or .Net) typically occur when the code in the framework is not Observable or Controllable enough. In fact code can typically be improved by increasing ones ability to configure it and by providing clear outputs for given inputs!

Saturday, March 03, 2007

delegates and interfaces

In trying to explain some functional concepts in C# to an OO programmer, it became apparent that there is a quite clear correspondance between an interface and a delegate (function pointer).

The following is a list of comparisons:
  1. An interface provides a level of indirection/abstraction for a class - a delegate provides a level of indirection/abstraction for a function
  2. An interface provides signatures for a bunch of functions/properties - a delegate provides signatures for a function
  3. both can be passed as variables
  4. An interface will only work with an object, whereas a delegate can work with static and instance methods
  5. Finally, delegates can be "invoked" or called asynchronously
  6. An interface quite clearly relates to a class
  7. An class can only implement an interface once, but can have many functions, which match a delegate

In many cases a delegate can provide simpler implementation than an interface, especially when the function is stateless and can therefore be made static therefore reducing side-effects. Its food for thought anyway!