Cook Computing

 

« June 2002 »

Going Offline

Saturday 29 June

I'm going offline for a while so there may be a delay in fixing any XML-RPC.NET problems (and publishing the first version of the MC++ FAQ).

Posted by at 07:33 AM. Permalink.

MC++ FAQ(s)

Wednesday 26 June

Thanks for the offer Tomas, but I think I'd prefer to stick with what I'm working on. An important personal reason for working on a FAQ, or entries here for that matter, is to clarify my thinking on topics I encounter on a day-to-day basis (hence the often ecletic collection of trivia found here). As such I'd like to document the issues in MC++ that have interested me or have been significant in my MC++ development work. I'm sure you and Sam with your greater experience of MC++ can produce a very comprehensive FAQ which would more than complement what I'm doing.

Posted by at 08:29 PM. Permalink.

XML-RPC.NET 0.5.3

Tuesday 25 June

I've just uploaded the latest version of the XML-RPC.NET library. It fixes a deserialization bug which was independently reported by two users within a couple of days of each other last week.

Posted by at 07:42 AM. Permalink.

Weak References

Thursday 20 June

Funny how you miss things sometimes. It was only when reading Richard Grimes' book last night that I came across weak references. His example is:

class C
{
 private WeakReference b = null;

 public C (B bref)
 {
  b = new WeakReference(bref);
 }

 public void UseB()
 {
  if (b.IsAlive)
  {
   B bref = (B)b.Target;
   // use bref
  }
 }
}

While an instance of C holds a weak reference to instance of B in the b member, the garbage collector can destroy the object. UseB checks whether the object still exists by calling the IsAlive method of the WeakReference and then calls the Target method of the WeakReference to obtain a normal strong reference to the object(or simply checks if the result of the Target call is null before using it).

Weak references are used for memory usage optimizations. For example if a large object may be needed again but its ok to destroy it if memory runs short and recreate it when required again, it can be held via a weak reference. If the GC runs, the object is destroyed, otherwise it remains available for further use.

Richter describes weak references in this article.

Posted by at 05:56 PM. Permalink.

MC++ Optimizations

Thursday 20 June

Tomas commented on my previous entry to the effect that the MC++ compiler does indeed extra optimizations but this is one of the reasons why it does not generate verifiable code. My gut instinct is that verifiability is not something to be given up lightly: I'd prefer to leave the bad old days of inadvertent (or even deliberate) memory corruption such as buffer overflows well behind.

Posted by at 05:25 PM. Permalink.

MC++ Performance

Thursday 20 June

In the interview with Herb Sutter at DevX, Sutter claims "In the .NET world, C++ is still the best-performing language for most development work.". When I read this I took it as PR but last night I noticed a similar opinion in in Richard Grimes' book Developing Applications with Visual Studio.NET: "...the C++ compiler performs some optimization on the IL it produces, so the code generated from the managed C++ compiler will perform better than code generated from C# or VB.NET.".

I've not not seen any mention of this elsewhere. Maybe its just a rumour put about by the VC++ team :-)

[BTW Grimes' book is a good introduction to NET development and using VS.NET. He covers a lot of material but in a very readable way.]

Posted by at 07:01 AM. Permalink.

XML-RPC.NET Bug

Wednesday 19 June

If you use XML-RPC.NET please note the following bug.

If an array occurs in a position in an XML-RPC request or response where the type of its elements is not known AND one or more of its elements is a string specified without the <string> element, an exception is incorrectly thrown by the deserializer. The response below is an example of where the problem occurs.

The two string instances are in an array member of a struct and no type information is available to the serializer. (In this case the array should be parsed into an array of type Object[].)

<?xml version="1.0" encoding="ISO-8859-1"?>
<methodResponse>
 <params>
  <param>
   <value>
    <struct>
     <member>
      <name>key3</name>
      <value>
       <array>
        <data>
         <value>New Milk</value>
         <value>Old Milk</value>
        </data>
       </array>
      </value>
     </member>
    </struct>
   </value>
  </param>
 </params>
</methodResponse>
Posted by at 05:58 PM. Permalink.

The .NET Cost : Who Pays?

Tuesday 18 June

Software Development Magazine has the final installment of Bertrand Meyer's series on multi-language development under .NET. This article focuses on CLS compliancy.

Meyer discusses some CLS rules which apparently have caused problems when making Eiffel CLS-compliant: method overloading, prohibition of static methods in interfaces, requiring constructors to call base class constructors, and prohibiting a constructor being called more than once (to reinitialize an object). I expect different languages will have problems with a different subset of CLS rules.

I'm still not clear in my mind as to how feasible it is to implement any other language on top of the CLR (ignoring CLS issues). Support for a particular object model seems to be baked into the CLR (for example MI is not supported, covariance is not supported, and overloading is supported by the CLR). Of course you can implement anything you want using IL but are there costs to this such losing verifiability which I believe to be very important for producing reliable code - Meyer suggests that untyped languages can never be made verifiable. I need to read up some more on these topics (for example, could the CLR have been designed to support a wider range of languages; what does verifiability really mean and is it only important for downloadable code; etc, etc).

Posted by at 07:34 AM. Permalink.

.NET Profiler

Sunday 16 June

I needed to profile some code last night and was surprised to discover that profiling is not supported by VS.NET. A quick search on Google brought up the Compuware profiler. The "Community Edition" is a free download and seems to work well. It integrates with VS.NET and is easy to use (screenshot).

Posted by at 07:58 AM. Permalink.

Adding Ref-Counting to Rotor

Thursday 13 June

After reading Chris Sell's announcement about Adding Ref-Counting to Rotor and the subsequent discussion on the dotnet list, I've been thinking about how it might work. I might be totally wrong because I'm don't know much about the CLR but the following seems plausible.

Each reference type object will have an associated refcount. This will be stored alongside the object in the garbage collected heap.

No changes are planned to IL. Instead the JITter will be enhanced so that IL instructions which potentially change the effective ref count on an object will generate extra code which modifies the ref count stored alongside the object.

If this code detects that the ref-count has gone to zero, the object's finalizer is called (if it has one).

The following code illustrates what might happen:

class A { }

class _
{
  static A CreateA()
  {
    A m = new A();
    A n = m;
    n = null; 
    return m;
  }
  static void Main(string[] args)
  {
    CreateA();
  }
}

The IL for method CreateA is:

.method private hidebysig static class A 
        CreateA() cil managed
{
 // Code size       16 (0x10)
 .maxstack  1
 .locals init ([0] class A m,
          [1] class A n,
          [2] class A CS$00000003$00000000)
 IL_0000: newobj     instance void A::.ctor()   // green
 IL_0005: stloc.0
 IL_0006: ldloc.0  // green
 IL_0007: stloc.1
 IL_0008: ldnull
 IL_0009: stloc.1  // red
 IL_000a: ldloc.0  // green
 IL_000b: stloc.2
 IL_000c: br.s       IL_000e
 IL_000e: ldloc.2  // green
 IL_000f: ret  // red
} // end of method _::CreateA

The instructions marked in green are where the ref count of the instance of class A is incremented and red where it is decremented. So in this example there is a net increase of 3 refs before the end of the method. Two of these are held in in the locals so the ret instruction must trigger two decrements as the locals go out of scope. The remaining reference is ok because it is the return value and is passed back on the stack.

The relevant IL in Main is:

 IL_0000:  call      class A _::CreateA()
 IL_0005:  pop  // red

The return value is not used and so pop is used to remove the object from the stack. The object reference is not stored anywhere so this results in the object's ref-count going to zero and the JITted code calls the finalizer of the object (if it has one).

Of course there is a lot more to it than this sample, for example handling member object references, exceptions, etc; and there are many more IL instructions which affect ref-counts than the handful mentioned above. But this could be how it is intended to work in principle. It remains to be seen what the effect on performance will be but this can be minimized by optimizing out the ref-counting where the JITter can determine it is not required.

Lack of deterministic finalization is the most common complaint I hear against .NET. It would be great to see this fixed in a future version of .NET.

Posted by at 08:12 PM. Permalink.

XML-RPC.NET, VB, and Arrays

Wednesday 12 June

Quite coincidentally a problem with VB and arrays was discussed on the XMLRPCNET Yahoo group today. A VB implementation of a proxy method included the following line:

[codemp] Dim param(1) As Object [/codeamp]

This unexpectedly (for a C# coder at least) defines an array with two elements so when you write:

param(0) = New String(name)

you end up with null in the second element. This was causing the Invoke method of XmlRpcClientProtocol to throw an exception. The array in this case should be defined as:

Dim param(0) As Object

Dr GUI.NET's article has a section on this difference between VB.NET and C# which warns:

his differences causes problemsit's especially nasty when the dimension is a variable, particularly if that variable is passed between routines written in different languages. The result is that your arrays can be sized differently than you expect, leading to nasty off-by-one errors and null-pointer reference exceptions.

Posted by at 08:29 PM. Permalink.

.NET Arrays

Wednesday 12 June

An excellent article by Dr GUI.NET on Arrays in the .NET Framework (via Sam Gentile).

One thing not mentioned is the fact that arrays of arrays ("jagged arrays" or "ragged-row arrays") are not CLS compliant. They were in the .NET betas (at least the compiler passed them as CLS) and were ideal as the basis for the .NET return type for the XML-RPC Introspection API method system.methodSignature. This returns an array of method signatures. Each signature is an array of strings, the first string being the return type, the remaining string(s) being the parameter types.

As a result the signature of corresponding built-in proxy method in class XmlRpcClientProtocol had to be changed from:

public string[][] SystemMethodSignature(string MethodName)

to:

public object[] SystemMethodSignature(string MethodName)

where each object in the returned array is an array of strings, which is not so convenient to use. I've no idea why jagged-arrays were not included in the CLS when they are supported by the CLR.

Posted by at 07:07 PM. Permalink.

More Collection Trivia

Tuesday 11 June

While glancing through the decompiled SortedList code I noticed there is a serious bug waiting to happen. When a key/value pair is added, a reference to the key object, not a copy or clone of it, is stored in the key array. This means that if a reference to the key object is also held outside the SortedList (or obtained via the GetKey method), the object can be modified, thereby corrupting the index. This code illustrates the problem:

class Str : IComparable
{
  public Str(string s)
  {
    str = s;
  }
  public int CompareTo(object x)
  {
    Str sx = (Str)x;
  int ret =  String.Compare(str, sx.str);
  return ret;
  }
  public string str;
}

class _
{
  static void Main(string[] args)
  {
    Str s1 = new Str("aaa");
    Str s2 = new Str("bbb");
    SortedList srtdlst = new SortedList();
    srtdlst.Add(s1, "111");
    srtdlst.Add(s2, "222");
    s1 = new Str("ccc"); // corrupt the index
    string ss = (string)srtdlst[s1]; // Item method fails
  }
}

HashTable suffers from the same problem. In this case the documentation states: Key objects must be immutable as long as they are used as keys in the Hashtable.

An example of an immutable object would be an instance of String. Alternatively if a value type is passed in as the first parameter of Add, an implicitly boxed value is created which cannot be modified outside the SortedList instance.

Posted by at 08:18 PM. Permalink.

SortedList vs HashTable: Insertion

Monday 10 June

In a completely unscientfic test, when adding the same collection of randomized key/value pairs to instances of HashTable and SortedList (using the default constructors for both classes), the former wins for large collections and the latter for small collections. The break-even point in my test program is in the order of 3000 pairs. However the performance of SortedList degrades in a non-linear fashion going above this number. Ensuring the initial capacity of the SortedList is sufficient for the whole collection does not make a difference to this.

Posted by at 07:21 PM. Permalink.

Class SortedList

Monday 10 June

Following on from the previous entry, the System.Collections.SortedList class is available if you want a collection you can access by key but which is also ordered when enumerated. However I examined this class in mscorlib.dll using Anakrino and it is implemented using two arrays, one to hold the keys, the other to hold the values. Insertion and deletion is achieved by brute force copying of the array elements, so this is not going to perform very well for large collections (well, thats what I'd guess but I really should verify this with some sample code).

There is a useful brief overview of .NET collections in this article by Jeffrey Richter.

[I wonder how many developers recognized the origin of the name anakrino?]

Posted by at 06:03 PM. Permalink.

Btrees and Algorithms

Saturday 8 June

Brad Wilson describes a fundamental difference between hash tables and C++ maps: the latter are implemented as a balanced tree and so maintain ordering of the items by key. This caught my attention because I've recently been reading David McCusker's notes on a design for btree-based blobs (the challenge of understanding these notes makes for pleasant occasional reading). I have to admit that before this I didn't know anything about btrees. Not having studied Computer Science I've not had much exposure to algorithms; I don't even own a copy of Knuth's seminal Art of Computer Programming. Similarly, as I mentioned in an earlier post, I've not learnt how to program in Lisp until now. However, I don't think this matters all that much when it comes to producing working software: after working with a large number of developers I've not noticed any correlation between ownership of a Computer Science degree and competence.

Posted by at 12:30 PM. Permalink.

Learning Lisp

Thursday 6 June

After reading some articles by Paul Graham I decided to do something about acquiring some proficiency in Lisp. I'll probably never use it at work but Lisp does seem such a fundamental part of the history of computing that I feel I'm lacking as a software engineer without it.

Searching via Google I found some recommendations common to several sources. Successful Lisp by David Lamkins is an online book which I started reading last night. So far I've found it an excellent introductory text. I've also ordered the following books from Amazon: ANSI Common LISP by Paul Graham, and Common Lispcraft by Robert Wilensky.

[Sucessful Lisp is only available online but I used the excellent wget to mirror it onto my laptop. The Windows version worked fine once I had remembered to configure the proxy server setting.]

Posted by at 07:30 AM. Permalink.

XML-RPC Documentation

Tuesday 4 June

Taking advantage of the extra holiday because of the Queen's Jubilee, I'm spending some time working on documentation for XML-RPC.NET.

I'm writing it as a single document to avoid the extra work of maintaining individual pages. It would be better to use something like DocBook but I'm not sure the benefits outweigh the cost of learning how to mark up up a document and install/configure the necessary software.

The documentation will define and describe the final feature set of XML-RPC.NET, indicating where features are not implemented yet. In fact there is not much more functionality to add, mainly work to complete existing features. Then it will be time to move onto something else.

Posted by at 02:44 PM. Permalink.

Premature Optimization

Tuesday 4 June

Premature Optimization by Justin Rodd (via Almost Perfect) suggests the famous quote originating from Tony Hoare and restated by Donald Knuth: "Premature optimization is the root of all evil". I've always thought this quote has all too often led software designers into serious mistakes because it has been applied to a different problem domain to what was intended.

The full version of the quote is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." and I agree with this. Its usually not worth spending a lot of time micro-optimizing code before its obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems.

I've worked on systems where the architects adhered to "Hoare's Dictum". All goes well until realistic large-scale testing is performed and it becomes painfully obvious that the system is never going to scale upwards. Unfortunately by then it can be very difficult to fix the problem without a large amount of re-design and re-coding.

On a smaller scale I'm reminded of a configuration client I use regularly. It communicates with the server via DCOM and makes so many DCOM calls that it is very irritating to use over a slow network connection. I've looked at the code and I suspect that simplicity of design was thought to be the better approach instead of a more complicated design which would have resulted in far fewer DCOM calls. Again, optimizing this code would require a lot of re-working: optimization after a design has been implemented nearly always involves much more work than incorporating it into the original design.

Posted by at 02:05 PM. Permalink.

RSS and Bandwidth

Monday 3 June

Jason Kottke makes a couple of points about the new usage of the <ink> element for automatic discovery of RSS feeds.

(1) I just tested with IE6 and Mozilla and it appears that use of the <link> element does not result in the linked document being downloaded when the main document is displayed in the browser. However this maybe a quirk of the browser configurations on my machine so does not prove anything definitively.

(2) Once the aggregator has extracted the RSS link from the web page I would imagine it would only access the RSS file from then on. Displaying the web page would only happen if the user wanted to look at this after reading the text contained in the RSS file. But in a wider sense the point about minimising bandwidth usage is a valid one. Many people host their web sites on a provider's server and they have to pay for this. In most cases there is a limit on monthly bandwidth usage and if you have a popular blog this could soon be reached, which results in extra cost.

RSS at first sight seems to offer a way round this. The RSS file contains a summary of each item in the <description> element and an associated URL for the content of the item in the <link> element. The content is only downloaded when the user is interested by the description. So you have small RSS files pointing to possibly large files containing content, i.e. low bandwidth usage for accessing RSS files.

But in reality the <description> element is increasingly being used to contain the complete content of each item, so that people can read a blog in their aggregator in preference to reading the content in the blog's web pages. Therefore the bandwidth usage is pretty much the same as if original web pages were viewed, i.e. high bandwidth usage.

It would be preferable if the RSS did not contain the content but people seem to like reading blogs via an aggregator. Therefore if each RSS item could be identified uniquely and containined a URL to the content in a format suitable for viewing in the aggregator (not the same as what the <link> item is currently used for), the aggregator could cache the content for each item and we could return to small RSS files and lower bandwidth usage: the content would still be downloaded by the aggregator but only once. The <description> element could also revert to its proper usage.

(And, regardless of the above, all aggregators should be using etags and the If-Modified-Since HTTP header to reduce the number of times an RSS file is downloaded.)

Posted by at 03:38 PM. Permalink.

Contravariance

Sunday 2 June

At last, the final installment of my investigation into covariance. The converse of covariant method arguments is contravariant or conformant arguments. In this case the argument of the overidden method in the derived class is less derived than the argument in the parent class, i.e. its derivation goes in a "different" way and hence "contra". Modifying the previous example:

class Child
{
  // following redefinition of argument is not valid C#</font>
  public override void DoSomething(ArgGrandParent x) { x.A(); }
}

Obviously this code works:

Child child = new Child();
child.DoSomething(new ArgGrandParent());

and calling DoSomething of class Child virtually via a Parent reference is now also statically correct:

Parent parent = new Child();
parent.DoSomething(new ArgParent());

The virtual call to DoSomething in class Child is passed an instance of ArgParent which is fine because ArgParent supports method A which this version of DoSomething calls.

The slightly confusing aspect of contravariance is at first glance it might seem that the Parent class could get called with an instance of ArgGrandParent but some code shows that this is statically incorrect and will not compile:

Parent parent1 = new Child();
parent1.DoSomething(new ArgGrandParent()); // won't compile
Parent parent2 = new Parent();
parent2.DoSomething(new ArgGrandParent()); // won't compile

DoSomething in class Child can only take an argument of type ArgGrandParent when a reference to an instance of Child is being used. Whenever a reference to an instance of Parent is used, the argument must be of type ArgParent or a class derived from ArgParent.

Unlike covariant arguments, contravariant arguments are statically type-safe, and for this reason the contravariant approach is used in some langauges, for example Sather.

Posted by at 09:08 AM. Permalink.

Covarient Method Arguments

Saturday 1 June

As I mentioned in a previous post, covariant return types are uncontentious: their usage can be statically verified by the compiler and they incur no runtime cost. Covariant method arguments are a different matter: in certain situations their usage cannot be statically verified and the compiler must insert runtime checks. A covariant method argument type is one which is more specialized than the corresponding argument in the base class version of the method. Again two things are varying in the same direction: as the class becomes more derived, an argument in an overidden method become more derived.

Several example classes are need to illustrate covariant arguments. NB this code is pseudo-C# because neither the CLR nor C# support covariant arguments.

class ArgGrandParent 
{
  public virtual void A() {}
}

class ArgParent : ArgGrandParent
{
  public override void A() {}
  public virtual void B() {}
}

class ArgChild : ArgParent
{
  public override void A() {}
  public override void B() {}
  public virtual void C() {}
}

class Parent 
{
  public virtual void DoSomething(ArgParent x) { x.B(); }
}

class Child
{
  <font color="red">// following redefinition of argument is not valid C#</font>
  public override void DoSomething(ArgChild x) { x.C(); }
}

Straightforward usage of covariant arguments is where the derived class is called with derived arguments:

Child child = new Child();
child.DoSomething(new ArgChild());

The call to DoSomething is statically typesafe. The DoSomething in class Child calls method C of the ArgChild instance. However it is very easy to make a call which results in an exception being thrown at runtime:

Parent parent = new Child();
parent.DoSomething(new ArgParent());

DoSomething is virtual so DoSomething in class Child will be called. But this version of DoSomething calls method C of its argument and an exception is thrown: the argument is of type ArgParent which doesnt support method C. This illustrates why a runtime check is necessary in some cases and why coavariant argument types are not necessarily considered a good thing.

Posted by at 11:13 AM. Permalink.