## Tuesday, September 8, 2009

### How many objects do we need?

One of the interesting, and potentially invaluable concepts that the Chandler project brought to light (i.e. to my attention ) was the idea of "stamping". The concept of stamping is the idea of defining incoming data by assigning it a type. When someone is simply typing in some information quickly, it by default goes into a note and then allows the user to stamp it as an event, or a task, or whatever else is needed.

A typical (though narrow) implementation of this concept is in the ability to take an e-mail that is an invitation and have the information be put into the calendar at the appropriate time. Microsoft Outlook allows e-mail's to be scheduled this way.

As far as I can tell, the Chandler project does this by displaying different facets of information in different contexts. This implies that the information is stored and "stamped" with several different types.

For example: someone may send you an evite to an event. You respond to the evite but keep the e-mail around as a reminder. If you liked the restaurant that the event occurred at you may decide to keep a note about the fact that you like that restaurant and you may also want to put the address of that restaurant into your contact list. In the Chandler world this is done by stamping the same piece of information twice. The first time it is stamped with the type "note" and the second time with the type "contact".

My apologies in advance if it turns out that I am misinterpreting the design notes for Chandler.

I think, the same thing could be achieved by creating new objects from the initial object and then maintaining associations between them. This would allow someone to create an event from an e-mail and the event would have a reference back to the original e-mail.

### Incomplete thoughts on displaying PIM data

Modeling calendars, agendas, projects and so on as views into the data.

Examples:

A calendar could be considered a view into a timestream populated with events and tasks and anything else that can be located in time allows organizing chronological events in terms of days, weeks, months, years, etc..

A timeline could be considered a view into a timestream populated with the events, tasks and anything else that can be located in time.

A timetable could be considered a view into a timestream populated with the events, tasks and anything else that can be located in time in a tabular form.

The daily agenda could be considered a view into a timestream populated with events and tasks and anything else that can be located in time.

A Gantt chart could be considered a view into a set of actions that can be tracked and have a due date. A to do list could be considered another view into a set of actions that can be tracked and have a due date.

A contact list could be considered a view into a set of information about people, organizations, companies and so on.

So far, so good.

Taking on that there is a view into a otherwise undifferentiated set of data has the potential to give great results. That is effectively what ECCO allowed you to do on a limited scale.

And it seems to me that there are two parts to this. There are the constraints or criteria that gives the list of items to be displayed such as show me all the items that can be located in time and whose date occurs somewhere within the next month. And some of the criteria could be tags and or "all items referenced by the following taxonomy..."

So that gives us some common views that may be of use:
1. For sets of items that can be located in time => calendars, schedules
2. For sets of items that can be tracked and/or have a due date there are projects, to do lists, agendas, checklists.
3. For sets of items that are contact information of various types there are contact lists, mailing lists, and directories.
4. For communications and messages there are threaded conversations.
5. For events that have happened in the past there are journals and audit logs.

So if we now look at how those views could be created we start having to pull together all the previous notes and discussions.

Say for example we needed to create a mailing list for a specific community such
as the extended family. Of course the criteria would be something like: please
display contacts referenced by the "family" tag/taxonomy.

### Context is decisive

Context is decisive. What ever context you have for piece of information governs
your understanding of that piece of information. So context is decisive.

for days now. And since they are getting in the way of my other work I am
putting them into the blog in the hopes that they will leave me alone.

One of the most common requests I have bumped into is for the PIM to take into
account the context in which you are running the application. So, in other
words, if you are at work it would only show you by default the tasks and events
pertinent to you being at work. If you are on the road it would only show you
those items that are pertinent to you being on the road.

The GTD methodology recommends setting up tasks list with tags such as @phone
and @office so that when you are on the road you can simply list those tasks
that you have the resources to perform. Is this something that makes sense to
model in a more sophisticated manner.

In other words, what does knowing the context make available in terms of work?
Should it be modeled and if so how?

## Saturday, September 5, 2009

### It was Col. Mustard in the library with the candlestick

One of the big questions in any enterprise development project is who gets to muck with the database or data repository.

This is significant even in the case of the PIM.

When looking at how to deal with pushing data out to other repositories such as Google, polling data from other repositories such as Google or RSS feeds, and just the general headaches of synchronizing to and from other devices is clear that who gets to touch the database and how is a critical question.

For now, I am assuming that all of the tools that feed to and from other data sinks/sources will do so by operating against the database rather than having the PIM up and running and managing those operations.

### Kitchen sink modeling

The more I work with user stories and scenarios for PIMs the more it is becoming clear that there is simply a sea of information that only the user has any sense of. Most of the things that we are modeling have specific meanings to the individual (i.e. must a promise have a due date?). For some people the answer is yes for others the answer is no.

It is clear that for most people there are clear distinctions between the different types of data they have. In other words most people have a consistent way they mentally model meetings versus tasks versus holidays. And the way they model it is quite closely tied to the way they work. So people work and live in a world of pieces of data that have clear types and clear behaviors.

The collections (i.e. projects, agendas, to do lists, checklists, etc..) people use to manage those different types of data ( i.e. to do list items, tasks, promises, calls, errands, chores, etc..) are just as individualized as the data items themselves.

For example: For some people a task may be committed to to the extent that somebody has said "I will do that sometime next week". And they put a sticky note in their calendar so that when they are looking at that week they know what things they promised to do that they haven't put into "space and time" yet.

Other people maintain tasks on lists that don't count them as real until they are scheduled on a calendar. Until that point they are on the "Not Doing Now" or "Unscheduled" list.

So clearly, the majority of everything to be modeled needs to be customizable. so, after walking through user story after user story this is what I see:

1) Every thing to be tracked in a PIM has one or more behaviors.

The behaviors are

• Locatable in time (LIT)
• Can Occupy time (0T)
• Locatable in space (LIS)
• Has a lifespan (SPN)
• Has trackable progress (TRK)
• Has a due date (DUE)
• Requires resources (RR)

The different types of items can have the behavior or not have the behavior. If they do have the behavior then they can either have the state of having a fixed value or not having a fixed value.

For example, a promise can have the behavior of being locatable in time, but it may not yet have been fixed in time.

2) Everything is versionable.

There are many areas where tracking the changes to a scheduled item (who generated them and when) is critical to fixing something. This is specially true when you're dealing with synchronizing schedules with multiple calendars.

3) All items tracked in the PIM are associated with a specific type and that type has associated properties and each type has default values for those properties and behaviors.

For example, an event can have the behavior of being locatable in time and may have the behavior of occupying time and the default time it occupies is 15 minutes.
4) Context is decisive: context such as home -- online or home -- off-line or work -- online etc. have resources available such as (phone, computer, Internet, e-mail, etc.)

Many methodologies such as GTD take into account the context in which you were working. For example there are some things that should only be done from home, there are some things that should only be done from work, and there are some things that can only be done when you have a phone available. The user's context governs both what they should be doing as well as what they are capable of doing given the resources available.

5) Taxonomies (hierarchies) go from wider towards narrower ( i.e. Extended Family => Immediate Family)

In looking at all the different ways people navigate through hierarchies and the way they create their own private taxonomies (using file folders, categories, tags, etc.) it is clear that most people navigate from wider and narrower.

## Wednesday, August 12, 2009

### PIM type tools.

On a slightly tangential note, one of the tools that I constantly review and revise are my tools for managing tasks and scheduling. I figured it was worthwhile to direct you to my other blog and links to my two latest posts regarding the state of the three big open-source contenders: Evolution, Chandler, and Thunderbird/Lightning.

- Jim

### Upgrading my existing PIMs and Systems: Take 2

As I said last week, Chandler appeared to be a bust.

Over the weekend, I tried out using Thunderbird 2.0 and Lightning and a few plug-ins.

It is definitely far more stable than Chandler. Installation was a breeze. The Thunderbird functionality was rock solid. The Lightning integration was a little less so. Many of my Google calendars did not display all of the events.

I then went on to try Thunderbird 3.0 beta and the corresponding experimental Lightning version. Integration is a little cleaner, some Google events still did not display (though not the same ones) and the application did crash under the 64-bit version of Ubuntu (Jaunty Jackelope) I am running.

So I ended up back using Evolution as my primary PIM. But I did make a few changes in how I used it.

I change the system so that my two Gmail accounts are accessed using IMAP. My Comcast account is accessed using POP , and I am experimenting with using rules tailored for each of the three accounts.

In addition, I have switched over to using "Remember The Milk." (aka RTM) as my primary task manager. Google tasks appears to be of no real use for me since I can't sync my tasks to anything useful.

I use Tasque ( Linux only) to edit and modify my task entries and I use Evolution's read only access of RTM to see the tasks on the fly. It is far from ideal. Far, far, far from ideal. But it is a step up. When Evolution has the ability to access Google Tasks then I may be able to switch to one online provider. Of course, for that to happen Google needs to publish an API that allows us to access the tasks.

- Jim

## Thursday, August 6, 2009

### Upgrading my existing PIMs and Systems

Periodically, I reviewed the existing PIM solutions that are out there in the hopes of finding something that will please to handle my intermediate needs while I am writing something to handle all of my needs.

I am currently using Evolution in connection with Google calendar and sinking evolution to a Palm pilot. In my review. I discovered that Chandler is now at a greater than 1.0 version. Yay. Sort of.

I have often commented in this blog about how effectively the Chandler Project has modeled the domain of task management and scheduling. They have some brilliant concepts and have nailed many of the basic issues needed to take things to another level.

Unfortunately, stability is not one of them. I tried pulling down and installing Chandler on four different machines/OSes. Windows XP/Vista/Ubuntu 8 and Ubuntu 9.

Every single one of them crashed multiple times during normal operations. Whether it was importing from Google ics files, to creating a new collection, or creating a new calendar. Crashes were the rule rather than the exception.

This was also my experience over a year ago.

I don't know what else to say, but "Damn"

Over the weekend . I will look at both Thunderbird 2.0 and its corresponding Lightning plug-in as well as the Thunderbird 3.0 beta and its corresponding Lightning plug-in.

- Jim

## Sunday, June 21, 2009

### Model Mismatch

Interoperability is a stone cold *expletive deleted*. When writing a PIM or any other calendar app, you will pick wrong. Just get used to it.

Here is what I mean:

In the calendar world their is only one viable standard for the exchanged info : the icalendar file format. It can be exchanged in multiple ways, though the
CalDAV protocol (Sits on Top of WebDAV which sits on HTTP) is becoming the exchange method of choice.

But whos should we use ? Lots of applications support it. But Microsoft's ICalendar format is not always in sync with Google's and so on AND it is not even anyone's fault. The areas where things fall down are often where the spec itself has issues such as recurring events and events that last all day.

When does an all day event start ? 00:00:01 or 00:00:00 ?

When does an all day event end ? 23:59:59 or 24:00:00 (oh wait, thats 00:00:00) oops.

Those questions just begin to touch on the issues.

Whichever way you handle all day events you will have to do accurate conversions to the other way or risk all sorts of "round trip translation errors". If you have synchronized your PIM with another and back again you have probably already run into this....

There are conferences and technical groups that spend days just battling with how to specify and manage recurring events. Look for "IIOP Recurring Events"

Most PIMs solve this by closely adhering to the icalendar specification for their domain model. Thats great where the spec really works but their are some key things that don''t appear to be covered or are incompletely specified. i.e. Tagging and taxonomies, the same events in multiple calendars, hierarchical tasks and events, and so on. I am not yet an expert on icalendar but I will become solidly familiar with it over the next few weeks. Luckily icalendar provides an extension mechanism so that the application specific information can be captured. Of course, other applications won't use the info and some of them won't even preserve it (Google's icalendar support seems to remove tasks from the calenar).

My project for next weekend is to try and do a full round trip between Google calendar and the Saltation domain model using Google's Calendar API and maybe using the icalendar file format.

6.21.09 I wrote this post on 6.19.09 and then posted it today. As I go back and look at what Google actually supports I am stunned. To try and push through the Google API looks like a decent amount of work for little initial return. I think I am going to work first on FULL icalendar file format support (import and export) next weekend and then look at publishing and reading from Google directly. At least debugging the process will be simpler.

### Progress

I just realized that naming the PIM and the blog the same thing may have been less than optimal. Ah well.

I did a quick clean up of my build scripts for saltation.

I am still wrestling with how to present the information on SourceForge in a way that works. The gods know that I have getting annoyed often enough with OSS projects and navigating around trying to figure out what is where. I will spend some
serious time and thought to laying things out in away that works.

## Wednesday, June 17, 2009

### Taking things public

About two weeks ago I realized that I was creating this project as an open source project but not taking it out into the open source community.

Anybody see a small disconnect there?

So, I have started a project (called saltation) on SourceForge and I will be checking in code within 24 hours.

And, I have already run into my first roadblock (or at least a speed bump).

Specifically, it is the build system. I use a common build framework for all of my projects which I call appropriately enough, common-build. It uses Ant and Ivy for the build and dependency management. For a little more information on that see this posting in my other blog : Ant versus Maven.

Both Maven and Ivy use an external repository for storing dependencies such as Java libraries. SourceForge still hasn't worked out all the ways to interact with this. so the bottom line is one of my first tasks is going to be making the build system require a minimum of setup (. I.e. you should be able to build right out of the box).

That will probably take me a day or two counting testing.

Till then,

- Jim

## Friday, June 12, 2009

### More in the world of unused code detection : Klocwork

As noted in a previous posting, I have a major refactoring task ahead of me with the code base that I am now the owner of.

Recently we had an intermittent problem that may have been the result of a resource leak. Because we were unable to reproduce it, we put some processes in place for the next time and I did what I always do. Look to see if there is some tool out there that will allow me to detect resource leaks in the code in the current branch of the code base.

The two that most people seem to recommend are Coverity and Klocwork. A number of my acquaintances have said that Klocwork was better at detecting resource leaks so I decided to try it.

Here is the good news: the tool mostly does the job.

The bad news : the company doesn't quite have it together.

In one key way it does: they are curteous, efficient and smart. The sheer efficiency and overall effectiveness highlights even more their marketing/sales deficiencies.

I think the key thing that I kept bumping into is that there were limitations and conditions that were not clearly spelled out during the purchasing process. Klocwork had a salesperson and a technician on the phone with me at their request to evaluate my needs. The decision was made for me to go with the Klocwork Solo product rather than the honking big enterprise licensed behemoth that is their main line product. When I pulled down the demo version of Klocwork Solo I discovered that the temporary license covered more than a month but that the demo version could only handle 99 files at a pop and the Solo product could only run under Windows even though it is a Java application ( it appears to spawn a Windows executable as part of its analysis process).

Neither of these facts came to the surface in the initial call.

Personally, I would recommend that when you are making available a trial application of a code analysis tool, that you make it a short period of time AND allow it to handle an enterprise class number of files. After all, you know people are going to use it on their code base, and will need to do so in order to demonstrate its worth to the powers that be.

The 99 file limitation was a pain, but I evaluated the tool enough to determine but it was probably worth the $100 it costs to get it in and try it out on a larger code base. I purchased the$100 Solo product and discovered that it was limited to 1000 files (but apparently I can call Klocwork and get that expanded). Luckily, I was able to limit the code base I was interested in to 1000 files.

But the other thing I discovered is that the licensing tool appears to talk with the Klocwork mothership very time I start up Eclipse and the license is only valid for a year. That was another detail that was not presented upfront.

I don't want to leave you with the idea that I think that Klocwork is intentionally misleading people. I don't think that. But I do think that the management of licenses and sales are oriented toward a different scale of user than a small shop of 3 to 5 developers.

I don't think I will be recommending their tools until they rapidly revise their licensing and license management.

- Jim

## Tuesday, June 2, 2009

### Technology choices revisited

In one of my earlier posts Technology Choices I had looked at what user interface toolkit I should be using. What I've discovered in the intervening time is that this application is going to put a premium on flexibility in presentation. As a result, I am seriously reconsidering the user interface toolkit. The Eclipse RCP is a very powerful platform upon which to build an application. Unfortunately it does have a very strong set of metaphors upon which it is built (Views and Workspaces) which seem too rigid for what I am attempting.

So I am seriously looking at using QT with the Java bindings.

Any thoughts?

### Going to and fro

The other part of the last few months has been looking at importing and exporting and synchronizing.

Spent a great deal of time looking at the iCalendar specification and all of the recurrence rules. And then I read all of the use cases for recurrence rules interoperability that have been put together by some incredibly diligent groups of people. The people that work on those specs definitely come under my heading of unsung heroes.

As often appears to be the case in the area of calendaring, there are really no clean answers. And, going back and forth between the icalender format and a more flexible internal model is going to require a great deal of detailed work. But, it is very clear that iCalendar is the de facto standard for now and I see nothing on the horizon yet that will replace it.

And personally, I think it gives the biggest bang for the buck. If I can support CalDAV and ICalendar to any significant degree I will be able to publish calendars to and from Google Calendar , as well as many others applications.

So my immediate attention is to support import/export for ICalendar , and then support using CalDAV to publish.

- Jim

### When things don't fit into a neatly modeled the world

I am back after a long hiatus. A combination of work plus a course I'm teaching has kept me very busy.

But always in the background I am thinking about the PIM. That's how we referred to it in my household "The PIM". I would wonder if I'm a little obsessed except I'm having so much fun.

I've been coding some commandline utilities to test the data model and, as I expected, I have run into the limitations of my original model. I had originally started with calendars, and agendas and so on being first class objects. What I mean by that is that they have some distinct existence in the real world , and that is why I am modeling them. What I have discovered is that calendars, address books, task lists, journals and so on are all simply semantically themed collections of stuff we care about.

There are few assertions I can make about what a calendar is that are universal enough to be easily recognizable to everyone.

So I stepped back and thought about it for a bit and this is what I have come up with: we have an ocean of first-class objects that we would like to track (incense, tasks, promises, contact information, little notes) and so on. And then we spend our time trying to organize them into multiple collections they give us easy access to what we want to do when we want to do.

So I am experimenting with having any of these collections simply be taxonomies. Just like described in an earlier post. This would mean that many calendars have a relationship to each other. An example would be the calendar I maintain for my children's schedule all together which could have two related calendars ( one for Aidan and one for Trent ) . both Trent and Aidan's calendar would refer to events in the Boy Scout calendar.

The same would be true of address books. I could have an address book that references my friends and a related address book that only has the subset of my college friends.

The same appears to work well for journals, agendas, sets of conversations, and the collection of resources and notes that I refer to as Data Mines or just Mines.

## Tuesday, May 26, 2009

### New packaging equals new tools

Sometimes a new package gives new life to existing functionality.

I am working to track down a resource leak in our project at work, that only occurs during rare instances. Of course, I am using NeoLoad to stress the system, but since it appears to be a resource leak. I was planning to use the built-in JVM monitoring tools for doing what I need to do.

Those tools are generally commandline tools and I always have to refresh my memory about how they work.

But lo and behold, Sun had a brilliant idea in the package a front end interface called "Java Visual VM" that combines the profiling, monitoring, and heap dump capabilities all in one.

Go ahead and try it out. It should probably become a habit to routinely monitor applications you're working on when running unit tests and such. And since it is built into the 6.0 JDK , Sun has just lowered the bar to doing that kind of monitoring.

Very useful and built into your JVM.

## Sunday, May 17, 2009

### Short Tour Testing

This is a integration or system-level testing technique that scales well and works at the unit testing level as well.

I originally discovered it in an article by Tom Cargill in C++ Report many moons ago (see below). I have not found any electronic descriptions of the technique so I figured I would revive it for those who would find it useful.

Cargill appears to have originally derived the process from a text on validating computer protocols by Holzmann (see below). Having read the book thoroughly, I can see where he derived it from, though would not have occurred to me to do so.

The basic design and concept is very simple: for any mixture of states and transitions that can be walked through in some complex sequence to produce a bug there is a short tour (three to five steps) through the same set of states and transitions that will give you the same bug.

I found this to be invaluable for pounding on APIs in order to validate that they have the correct mixture of correct error handling and correct functionality.

I typically implement this is in Java in the pseudocode looks something like this:

int tour = 1001;
int numberOfMethods = 13;
int numberOfParamSetsPerMethod = 20;

for ( int step = tour; step--; step > 0)
{
int methodIndex = step % numberOfMethods;
int paramSetIndex = step % numberOfParamSetsPerMethod;

invoke(methodIndex , paramSetIndex );
}

The end result of this is a predictable "drunken walk" through the combinations of methods and parameters.

Of course, the code can be made even simpler using the reflection API in Java.

Once the tour code has been designed, the test is invoked and the results of the tour are validated by eye. Typically at that point I save the results of the log of the tour so that it can be programmatically compared against the test results each time.

If the parameter sets are chosen well, this form of testing will go a long way towards discovering interaction issues in the system. I have had great success using scripting languages such as Lua to call C language APIs to do this kind of testing. I have also used it to test service architectures to expose session managementand exception handling issues.

1. Cargill, Tom, "Short Tour Testing", C++ Report, vol 7, no. 2, February 1995, pp 60-62.
2. Holzmann, Gerard, "Design and Validation of Computer Protocols", Prentice Hall (c) 1991

## Wednesday, April 8, 2009

### Text processing without the pain

Don't get me wrong, I love sed and awk. I have whole libraries of sed and awk scripts for doing all sorts of things. But some of them took a lot longer to write than they should've.

Last night I was faced with the task of translating a whole bunch of documentation from Tex and Latex to Docbook 5.0 XML. That meant doing multiline matches with sed and some preprocessing with awk, and my spirit rebelled.

I went beserk with online searches for "alternatives to sed awk" "text processing commandline utilities" and so on. The problem is that the standard text processing utilities need so much explanation that tutorials on how to do things with sed and awk are churned out so that they outnumber the alternatives by at least an order of magnitude difference.

I finally did what I should have done in the first place. I went to sourceforge.net and searched for "text processing". And I found Gema (pause for heavenly choir music).

It is not perfect by any means. the documentation in particular is just as cryptic as the original sed man pages. But in less than a half hour I had a script up and running that cleanly handled the multiline matches I needed to do.

As an example:

### Discovering unused code in Java

When I set out to track down unused code in Java I came across a large number of static analysis tools that all seemed to do the job fairly well.

I used several of them including a fairly good eclipse plug-in called UCDetector. It was not fast, but it was very thorough.

By using those tools we were able to remove the obviously unused code. That resulted in a nontrivial shrinkage of about 30%. Unfortunately, due to the fact that much of the code gets called via Java's reflection API, There is a large amount of code that is not so obviously unused.

Since we have a UI test suite I thought that we would run the test suite against the front end and then log or track the methods that are actually called in the back end. Than we could eliminate the methods we found that were unused.

I first tried using the JDI interface of the JVM and simply remotely log each entrance into a method ( I didn't care about the exit). Unfortunately that slowed the Backend server system to a crawl. It would have taken weeks to get the data we needed.

I've tried both AspectJ and JBOSS AOP to produce a logging overlay and ran into significant problems deploying those in the older JBOSS 4.0.5GA environment. This was not significantly improved by the relatively nonstandard nature of our deployables.

Finally, we struck a gold mine. By using the YourKit profiler, which had minimal performance overhead, we were able to get the list of method calls that had been made. What made it especially easy was the profiler had a feature that would allow me to generate a dump of the call tree once an hour.

Here is the address of the profiler people: http://www.yourkit.com/

I just want to note that we were able to do all this using the evaluation version and that it easily passed a five minute test (. In other words, we were able to get it up and doing real work in five minutes). We have already ordered a copy.

Of course, taking 48 hours of those dumps and manually exporting them to CSV was a royal pain. To the YourKit guys: that is a hint.

### What kind of jungle am I in?

Currently I am working for a company that is doing a rapid re-factoring of an existing Java code base. The application uses JBOSS for a J2EEE EJB back end serviced by a Tomcat web application as the front end. The wonderful thing is that the code does work.It is, unfortunately, an incredible bear to maintain.

This is, of course, no different than the life of many other developers. But since I am not interested in either myself or my fellow developers suffering, we are slowly but surely re-factoring the code base so that it is easier to maintain and we are faster at turning around new features.

One of the key elements is that the original 3 developers appear to have been stuck in a room and let loose for nine months to a year. As a result, we have a code base that has a phenomenal amount of unused code. One engineer seem to write a lot of code based on the idea that "it would be neat if the code did...". Another engineer wrote a lot of overly clever code rather than trying to brute force method to see if it would suffice. And the last engineer seemed to suffer from NIH: "Not Invented the Here" syndrome and rewrote portions of the Java standard library as well as the Hibernate toolkit.

So my task has been to track down unused code.

My next post will talk about the tools we used to do that.