Cook Computing

 

« November 2002 »

Archiving Weblog Posts

Tuesday 26 November

John Robb points to some work-in-progress at UserLand regarding the archiving of weblog posts in RSS files. The original requirement seems to be backing up weblog posts to a server. However the solution seems to confuse the description of a post with its content. A lot of people seem to put the whole content of a post into the RSS description element but its not necesssary or always desirable. For example the weblog youre reading puts only a brief introductory extract in the description element and if the reader is interested they can click through in their browser to the full post. One of the advantages of this is that every time I update the RSS file and aggregators detect it has changed, they download only a relatively small RSS file, not one containing the full content of recent entries. This stops the full content of multiple posts being downloaded repeatedly whenever the RSS file is updated.

Archiving of weblogs posts is important. I often read a post which I would like to archive locally and retrieve later either by looking at the archived posts of a particular weblog or applying a search to the whole archive. Conversely when working on something I often think of a post that Id like to refer to but cant remember where I read it. If I had access to the content of posts I could write an application to do this but it is not possible. Unless the RSS description element contains the full content of a post, and we cant rely on that, then the only access to the content is via the web page pointed to by the RSS entry. Which puts you in a screen-scraping scenario which I dont want to bother with.

Google does provide some of this functionality but you get a lot of noise in a Google search and it is not as snappy as searching a local archive which is guaranteed to contains items of interest. So lets start making content available so that clients can manipulate it in any way they want. This way we can start to develop much richer aggregators.

Speaking of new clients, Dave Winer also makes a comment that with new software to view weblogs We will have routed around the Microsoft browser monopoly. Two points here. First, I wasnt aware that Microsoft has a browser monopoly. I frequently use Mozilla, even on Microsoft sites, and when I look around the office at work I see people using a variety of browsers. Second, I may be mistaken, but the screenshot of Brent Simmons BlogBrowser appears to use an embedded browser, which is an unsurprising requirement given the amount of HTML markup that appears in some RSS files. This means a Windows version of the new generation of BlogBrowsers would have to either implement their own HTML renderer or embed a browser. The first case is unlikely and in the second case it would either be IE or Mozilla. If IE then we havent routed around Microsoft very well, if Mozilla then there cant really be a browser monopoly.

Posted by at 08:35 AM. Permalink.

IndexedBtree Class

Sunday 24 November

David McCusker has a piece on dicts as arrays. This has spurred me to make available an IndexedBtree class I worked on a while ago after reading David's writing on btree-based blobs. At the time I mentioned that I did not know much about traditional computer science algorithms and after doing some testing which demonstrated that the NET class SortedList does not scale very well, I was inspired to implement the IndexedBtree class both as an exercise in learning how btrees work and to provide something which provided the functionality of SortedList but which could scale to ten of thousands of members.

I originally implemented the class as a traditional btree but realised that adding the functionality to access the collection as a numerically indexed list was fairly trivial, as David suggested. Each non-leaf node, as well as storing the first key of each sub-node, stores the cumulative count of the nodes in each sub-node. Thus, when retrieving a value, instead of doing a binary search on the keys in stored in each node, you can do a binary search of the counts, which means that it is just as fast to access the collection by numerical index as by random key (probably slightly faster for numerical index because the comparisons in the search will usually be faster).

Ive spent some time today tidying up the code and adding some unit tests to the test harness. It should be completed in a day or two.

Posted by at 08:48 PM. Permalink.

Document Management

Friday 22 November

Chris Sells recently discussed the limitations of hierachical forms of data organization. This was interesting because several years ago I worked on a document management system which attempted to address this problem. In this system a document existed in its own right without requiring the concept of belonging to a container in a hierarchy. Obviously some form of organization was required and this could be achieved in two ways. First, a document could have one or more parents, each parent being a folder or another document, the latter specifically to allow for composite documents built up via OLE linking. So although a hierarchy could still be implemented you could develop a much more network oriented organisation of your data. This was useful for creating different views of your documents which gets away from the idea of the often too limiting strict hierarchy.

Second, and more significant, was the concept of search folders, where the contents of a folder were based on search criteria which were evaluated each time the folder was opened or refreshed. The search criteria were of various types such as ownership, date created/modifed, title, document type, size, description, keywords, and even content which is where I came in: I had ownership of the separate UNIX-based document store which performed full-text indexing of the documents. I seem to remember that the system also allowed you to request notifications to be sent whenever the contents of a search folder changed but this may be wishful thinking.

Was it successful? Yes, I think it was. If you were lazy or had better things to do than spend all day tidily organising your folders you could just chuck documents into the DMS and rely on the search folders to do the work for you. I wish I could do that now with the thousands of documents I have archived. Windows does provide searching for documents based on various criteria but the searches are transitory and the UI is not integrated into the standard Windows Explorer application, so it pales into comparison with a more thoroughly implemented document management client. Ive seen reports that a future version of Windows is going to have a file store based on a database so maybe things will get a lot more interesting in this area.

The key to all this is the user interface. Clever interfaces which appeal to developer types simply just dont work with the vast majority of users yet how do you present concepts like these without a complex UI?. The client of the DMS I worked on also had to handle both versions and revisions of versions of documents (but well worth it because this is so much more powerful than just providing a single level of versioning) so you can imagine how difficult it was to convey all these various complexities in an understandable UI.

I could do with a screenshot to refer to because my memory of the UI is a bit hazy now but the main problem that comes to mind was that the UI did not show the parents of a document very satisfactorily, so browsing in this direction was difficult. Thats speaking as a developer of course. Maybe the concept of a document being accessible in two or more folders would be completely alien to the average user and any attempt to represent this would not have worked.

When I scan over the thousands of documents I now have stored in the archive hierarchy on my hard disk its obvious that I need a much better way of managing this data. Maybe I could do a lot better using the inbuilt indexing facilities of Windows and the ability to create links in the file system but I would still be missing the ability to add metadata to files and anyway it all seems like too much work.

Posted by at 08:24 AM. Permalink.

CoSetProxyBlanket and IUnknown

Wednesday 20 November

While testing the custom SSP for DCOM authentication, I remembered it is necessary to call CoSetProxyBlanket after the client has called CoCreateInstanceEx, because the SSP requested in CoCreateInstanceEx only applies to the acquisition of the interface and not to calls on the interface. What I didnt remember was that it is also necessary to call CoSetProxyBlanket on the controlling IUnknown interface of the proxy. So I was initially puzzled when I saw, in the debug window, an exception being thrown when my interface smart pointer went out of scope, and the COM local server continued running even though I assumed all references to it had been released.

The solution is to QI for the IUnknown interface on the proxy and call CoSetProxyBlanket on this at some point before releasing it and the other interfaces that have been acquired on the object. Of course if you don't do this the object is eventually released via the COM pinging mechanism but that can take up to six minutes which night cause problems in some circumstances

Posted by at 07:11 AM. Permalink.

SSP and Windows 98

Tuesday 19 November

The custom SSP I'm implementing needs to work with Windows98 clients so I did some testing yesterday. I installed Windows98 onto a VMWare session but this didn't work very well: the os kept hanging. Someone mentioned they had also had problems with Win98 with VMWare so I had to find a machine to install onto (note: BootDisk.com was useful for getting hold of a boot floppy).

Once I had implemented the Ansi versions of the SSP entry point and those SSP functions which take string parameters, the SSP worked fine.

Posted by at 08:10 AM. Permalink.

Developing WMI Solutions

Tuesday 19 November

Developing WMI Solutions

Friend and colleague Gwyn Cole has co-authored Developing WMI Solutions with Craig Tunstall, published by Addison-Wesley last week. WMI has not had much publicity but if you are administering or implementing applications it is definitely worth knowing about.

A couple of years ago I did some exploratory work into instrumenting a server application and found there was hardly any material available on how to do this, particulary regarding schema design. Gwyn's book would have been very useful with its very comprehensive coverage of schema design, implementing clients and providers, and using WMI from an administrator's point of view. The latter point is important because Microsoft have already instrumented much of their operating systems and server apps using WMI, which means you can do a lot of administration via scripting. There is even a chapter on implementing the MMC snap-ins you'll want for your management UI (please don't knock up your own management app using a tree-control, it can look so amateurish in comparison).

Posted by at 07:13 AM. Permalink.

Intensive SSP

Sunday 17 November

I've been working on a custom Security Support Provider (SSP) package for authenticating DCOM calls from clients which do not belong to a Windows domain. I've not done any work in this area before so this has been an intensive introduction to SSP and also to some of the more complex details of Windows security (Keith Brown's book Programming Windows Security has been very helpful).

My starting point in the learning process was the SAMPSSP sample provider from Microsoft which is mentioned in the MSDN docs. I couldn't find it on the MSDN website but instead found a copy here.

The biggest showstopper with this sample code when used with DCOM is that the implementation of ImpersonateSecurityContext simply returns a success code without doing any impersonation. This results in CoCreateInstanceEx returning the ubiquitous E_OUTOFMEMORY. I'm afraid to say I wasted an appreciable amount of time on this, matters being made more confusing by the fact that co-creating an object using NTLM authentication followed by a call to CoSetProxyBlanket using the custom SSP worked fine.

For learning about SSP in a practical way, there is Keith Brown's SSPI Workbench and Tomas Restrepo has a useful article Authentication the SSPI Way complete with source code.

One thing that surprised me is that even if RPC_C_AUTHN_LEVEL_CONNECT is specified at both ends, which I understood to mean that only the initial verification of credentials is signed to ensure integrity, there is still a call to MakeSignature and VerifySignature for both the request and response of every call on an interface.

Some other points worth mentioning are:

  • The client and server must be on different machines. If not, the custom SSP is ignored in the CoCreateInstanceEx call and CoSetProxyBlanket fails.
  • Microsoft.com has an informative Word document on SSPI.
  • To install your SSP append the name of SSP's dll to the SecurityProviders value of this key: HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders, for example:
    msapsspc.dll, schannel.dll, digest.dll, msnsspc.dll, mysample.dll
  • To install your SSP for DCOM add a new value to this key: HKLM\SOFTWARE\Microsoft\Rpc\SecurityService. The name of the value is the RPC ID of your SSP (the id used in CoCreateInstanceEx and CoSetProxyBlanket) and the text is the name of your dll, e.g. 123: REQ_SZ "mysample.dll".
Posted by at 03:54 PM. Permalink.

GAZM supports Blogger API

Monday 11 November

XML-RPC.NET has mostly been used to implement XML-RPC clients. Adam Sills on the other hand is using the service classes to implement the Blogger API on the GAZM.org website. Great stuff, it is spurring me on to issue the next release so that the automaticly generated documentation looks like this sample instead of this.

Posted by at 07:13 PM. Permalink.

Class Augmentation in C#

Sunday 10 November

Alexis Smirnov illustrates some of the future features in C# as described in a presentation by Anders Hejlsberg. Interesting to note that class augmentation is one of the features, not so earth shattering as generics or anonymous methods, but very useful when you are dealing with large classes, especially when nested classes are involved.

Posted by at 11:20 AM. Permalink.

Essential .NET, Volume 1

Wednesday 6 November

Essential .NET, Volume 1 has been released. According to the Addision-Wesley website the contents include:

  • CLR's evolution
  • Assemblies in the .NET Framework
  • The CLR type system
  • Programming with type
  • Objects and values
  • Methods
  • Explicit method invocation
  • Application domains
  • Security
  • Interoperability

It will be interesting to see if Box can bring anything new to the .NET book pile given the overwhelming abundance of .NET books over the last year or so. Of his previous books Essential COM was a classic, Effective COM an interesting collection of tips, but although I liked Essential XML a lot, except for the padding at the back of the book, most people I've spoken to were not very enthusiastic about it. He's got an ubiquitous style of presenting material, particularly in front of an audience, which makes even the most technical subjects seem interesting. Lets hope he has found a typically Box-like perspective on .NET which will add to the material already out there.

Posted by at 06:02 PM. Permalink.

The TabletPC Market

Friday 1 November

Scoble describes the TabletPC debate and suggests that the main market for TabletPCs will be people who want to use them standing up. Well, I need to find out what a TabletPC really is but if it is anything like what I want it to be - something the size of an A4 notepad with a hi-res screen designed to be held with the long side upwards and with built-in Wi-Fi and no keyboard - then I will want to use it lying down, either in bed or on the floor, or semi-reclining on the sofa, and once-daily sitting on the bog, mainly browsing emails, web pages, and RSS feeds, possibly inputting small amounts of text such as sending emails but often no more than entering URLs or Google searches. Once the price drops I suspect this will be the real market.

Posted by at 06:50 AM. Permalink.