RDFiCal: iCalendar in RDF

Libby Miller

Institute for Learning and Research
University of Bristol
8-10 Berkeley Square, Bristol, BS8 1HH, UK

Dan Connolly

W3C

Abstract

This paper describes the development of RDFiCal, an RDF vocabulary (ontology) for describing calendar events based on the iCalendar standard (RFC 2445). It describes the reasons for creating the RDFiCal vocabulary and the iterative community process used to create it. It explains why we decided on a conversion process which was both highly automated and highly social, and describes why we think that modelling iCalendar in RDF is both feasible and useful.

Keywords

RDF, RDF vocabulary, ontology, calendar, events, scheduling

Introduction

Recent work on the Mozilla calendar project and the iCal Apple application has stimulated interest in the subject of creating, displaying and exchanging calendars. These products make it simple for users to create and make available calendars represented in the iCalendar standard (RFC 2445)[1], and to subscribe to the calendars they are interested in which have been created by other people. Because people are able to export their calendar information, there is a great deal of data available in the iCalendar format, making calendars a potential source of substantial quantities of Semantic Web data.

Calendaring is an interesting Semantic Web application because it is such an important part of people's day-to-day lives, and because it also encompasses difficult philosophical and modelling problems, and social, trust and privacy questions. It straddles all levels of the Semantic Web from simple queries to find out who is at a conference, to complex scheduling problems involving logic.

This paper focuses on a small part of this space, the apparantly straightforward problem of how to convert an established, fairly well-understood, well-used vocabulary for calendaring into a Semantic Web vocabulary (ontology). It describes the reasons for creating the RDFiCal vocabulary and the iterative community process used to create it. It explains why we decided on a conversion process which was both highly automated and highly social, and describes why we think that modelling iCalendar in RDF is both feasible and useful.

History

Since early 2001, the RDF Interest Group has had an informal taskforce examining calendaring and scheduling issues and RDF. This arose from an initial exercise by Tim Berners-Lee[2] which described how one might approach the problem of converting iCalendar to RDF, and later, from discussions at the W3C Technical plenary in 2001.

There have been several attempts to model calendaring formats in RDF, including the 'hybrid' model[3], another iCalendar model[4] and a Palm model[5]. ICalendar is the standard format used by most devices and software for calendaring and scheduling, but is also lengthy, complex and difficult to implement in its entirety.

A SWAD-Europe workshop held in October 2002 in Bristol, UK on calendaring and the Semantic Web[6] produced many usecases for descibing calendar events using an RDF vocabulary, and also a plan of action as to how the process of creating such a vocabulary might proceed.

Why an RDF vocabulary for calendaring is useful

"The bane of my existence is doing things I know the computer could do for me." Dan Connolly, The XML Revolution, 1998[7]

There are two broad classes of usecase for RDFiCal. The first concerns the need to have all your calendar information in one format, in order to be able to schedule meetings effectively. The second concerns connecting events with other types of thing: people, documents, locations.

The Personal Information Disaster

Despite the existance and wide uptake of RFC 2445 iCalendar, many people still have scattered schedules described in email text, web pages of conferences and workshops, output formats of various calendars, including iCalendar, and other (sometimes proprietory, sometimes binary) formats. This information is held in public and private places. It is not easy to combine this information or search it. This leads to time-consuming manual copying and pasting from webpages to PDAs and other devices and calendar applications. It can result in errors in scheduling leading to missed meetings, and missed opportunities, and a general inertia about scheduling since simply finding out what you are supposed to be doing can be very difficult.

As an example, suppose that your organisation uses an iCalendar-compliant calendaring system to enable scheduling within the organisation. So, how do you schedule a meeting with someone from outside that organisation? How do you tell your family and friends about work commitments?

Or, suppose you are going to a conference. How do you get the information from the web page of the conference into your own calendar? How does your airline let you know when your flights are? How do you schedule imprompu meetings with interesting people from outside your organisation while at the conference?

There are two intermingled issues here. One concerns privacy and data protection: your family are not allowed to see the schedules for all your work colleagues; your work colleagues should not need to know about all your non-work events, and so on. This paper will not examine this issue, except to note that it is a non-trivial problem for any part of the Semantic Web where private and public data is merged together (and perhaps retransmitted in some form).

The other issue is interoperability. It is simply the need to gather all the information relevant to your schedule and be able to examine it all in one place. Scheduling is a very complex social activity. It concerns priorities which may vary over time and change as events get nearer. It is something that requires your input and that of the other people you are scheduling a meeting with. It is something that requires you to have all the relevant information at your fingertips. Even if an agent (or secretary) does your scheduling for you, it or he requires all the relevant information in order to be able to make that decision. It simply makes no sense for scheduling information to be spread over many disparate sources.

Simply putting the calendar information in RDF doesn't help with this latter problem, except insofar as it represents a common format for the information. ICalendar also represents a common format for calendar information, so:

Why not just use iCalendar?

We almost never want to talk about an event without talking about people, documents or places. Where is this conference? What else is close by? Who else is attending? What documents are relevant? Events are like glue between people, places, documents, pictures. The reasons for having an RDF vocabulary for calendaring become compelling when we consider that (a) descriptions of events tend to spill out into other areas, and (b) creating vocabularies is difficult.

Most of the usecases which came from the Semantic Web Calendaring workshop in Bristol concerned mixing non-calendar with calendar information. For example:

When attending an event, be able to download into a Palm or similar all the information needed about the event, for example, maps, travel itinerary, information about the participants, agenda.

I haven't met most of the people coming to this meeting - what do they look like? Also, what papers have they written?

Who else is attending this conference? who could I try and meet?

What pictures where taken at this conference?

What restaurants are close to this event? Are any of them recommended by people I know?

ICalendar does not (and, we argue below, should not) try to encompass all these different aspects of people, pictures, locations and documents. Events are one way of making connections between people, for example, but that does not mean that an event vocabulary should encompass all the different things we might want to know about people (their address, their papers, who they know), as well as all the aspects of events that we care about, such as date, time, attendee.

One reason is that a single vocabulary for events is unlikely to be detailed enough to convey all the information we would like to have about events, people, places, documents. For example, in iCalendar, CATEGORIES are free-text keywords, but it would be interesting to use controlled vocabularies or dictionaries such as Wordnet to describe the category of meeting. Or of an ATTENDEE, it would be useful to know more than their name and email address, for example what they have written, who they work for, where they are based and who they know.

Vocabularies which describe many different fields of interest do exist but they are usually developed by a small group. Attempting to develop such a vocabulary with input from the relevant communities would be very difficult and time-consuming. In a process which involves communities who have an interest, it is much more efficient to devolve the creation of specific vocabularies to specific communities. Therefore being able to use these different vocabularies together in predictable ways is very important.

The advantage of encoding data in an RDF vocabulary for calendaring rather than iCalendar is that many different vocabularies can then be combined with event data, connecting people (or documents) without placing the burden of vocabulary design for many intersecting domains on one group.

The 'hybrid' model: dealing with judgement calls

Previous conversions of iCalendar to RDF had been created manually by one or two people, and so because of the length of RFC 2445 are somewhat inconsistent and uncertain.

As an example, Michael Arick and Libby Miller created a model of iCalendar in RDF in June 2001, termed 'hybrid' because it represented a merge of two separate attempts at conversion. As the vocabulary was created there were many judgement calls to be made, and in the absence of some process for deciding between different techniques, these decisions were somewhat arbitrary. Broadly, the decisions that had to be made came under the following headings:

Resolving ambiguities, clarifying and understanding the iCalendar specification

Example: how is a VCALENDAR to be interpreted?

A VCALENDER is simply a group of VEVENTS or other calendar components such as VTODOs. What are the semantics of this grouping?

Identifying best practice in the RDF part of the translation

Example: should we use a URI, a literal, or a class?

The STATUS of a VEVENT (an event) in iCalendar can be one of TENTATIVE CONFIRMED, or CANCELLED. These terms could be represented as a class with three possible values, or as URIs or as literals. It was not clear how best to encode this information within an RDF schema.

Example: should we change from the iCalendar date format to the XML schema format?

ICalendar uses the profile of ISO 8601 YYYYMMDDThhmmss; W3C's XML schema uses a slightly different profile, both syntactically (using W3C Date Time format, YYYY-MM-DDThh:mm:ss) and in terms of specification of the timezone, which in iCalendar uses an identifer for a timezone component and in W3C Date and Time format[8] uses an offset from UTC. Should we use the iCalendar profile to be consistent with ustablished usage, or should we use the XML schema datatypes profile, to be consistent with XML useage?

Example: which subclass relationships are consistent with iCalendar?

Certain iCalendar objects appear to have a sublass relationship to each other, or to some unnamed superclass, for example VEVENT, VTODO and VJOURNAL. Should we attempt to describe this relationship?

Deciding whether to divert from the letter of the iCalendar specifcation in order to do something useful in RDF

Example: how do we map a CAL-ADDRESS to a person?

In iCalendar, a CAL-ADDRESS is a URI, usually a mailto: URI. Implicitly there is a calendar user behind that address. Should we try to model that user in order to make it easier to work with other vocabularies that can talk about people, for example?

Example: should event recurrance rules be described using a rule language?

Recurrance rules in iCalendar are complex and contain defaults. How can we map this to a set of rules? What language should be use for that mapping? Or should we just encode the rules in RDF, leaving their semantics undefined?

How can we make sure we have used consistent patterns throughout the schema?

Example: when we make any of the above decisions how can we ensure that we have applied them consistently? and in general, Which are the best RDF patterns to use? and how does this decision depend on how the vocabulary is used?

The problem is that there are many different ways of interpreting the iCalendar specification as RDF. Choosing between these can be difficult in the absence of precedents showing that a particular approach has been useful for applications. Consistently applying any approach is problematic in cases where it is difficult for humans to recognise repeating patterns because of the length or complexity of a vocabulary.

Requirements for RDFiCal

Experiences like these and the usecases developed at the Bristol workshop led to a number of quite specific requirements for an RDF conversion of iCalendar, which have governed the recent work on RDFiCal, and were designed to minimise inconsistencies in the schema and diversions from best practice.

Roundtripping to and from iCalendar

The ability to roundtrip unambiguously to and from iCalendar emerged as an essential requirement for the RDFiCal vocabulary. This is not just a question of converting 'legacy' data; many tools use iCalendar and will continue to do so, and to create data which cannot be used with these tools would only increase the degree of the Personal Information Disaster.

Creating a schema using real data

Since there is already a great deal of existing data in iCalendar, the RDFiCal vocabulary should refect its established use in tools. Similarly, data describing usecases results in a robust vocabulary.

Community interaction

There is a signifiant community of interest in the RDF and calendaring area, and they need to be informed and be able to comment and reject proposed parts of the vocabulary. An active and varied community can identify inconsistencies within the specification and with respect to best practice for RDF.

Paper trail

Decisions should be made in public and the reasons documented, to resolve any future ambiguities and mapping problems.

Mapping to other vocabularies

For RDFiCal to be a Semantic Web vocabulary, and to be useful for linking different types of information, the ability to map between other commonly used vocabularies and RDFiCal, and link RDFiCal data to other data is essential. Examples vocabularies include FOAF[9] (people), Dublin Core (documents), Wordnet (a dictionary), CYC (a large, comprehensive and detailed ontology, including such concepts as time and place).

The current process

At the end of the Bristol RDF Calendaring workshop in October 2003, we had a number of guidelines to drive the development of a new RDF Calendar vocabulary, and overcome some of the need for judgement calls.

Roundtripping to and from iCalendar

The first of these was to automate where possible. Many of the problems with the 'hybrid' version had arisen because of the length of the iCalendar specification, and the consequent problem of maintaining consistency within a manual conversion, making roundtripping difficult. This and the requirement for consistency with available data, gave us the following guideline:

Create an RDF schema for iCalendar which is mechanically derived from the iCalendar data

We have created two software tools that convert from iCalendar to RDF and one that converts from RDFiCal to iCalendar. The iCalendar to RDF tools create a syntactic conversion. As we have seen this is not a straightfoward process because of the judgement calls noted above. However once these decisions have been made and the community has been consulted, a syntactic transformation of iCalendar data files is all that is required to create RDF versions. We have also created a repository of testcases in iCalendar and RDF, generated by existing tools (Mozilla calendar, Apple iCal, Evolution) and usecases (restaurant opening hours, bus timetables, conferences).

Mapping to other vocabularies

Among the testcase repository are a number of usecase-driven examples. We have also used a wiki to write up links with vocabularies such as FOAF and GEO[10].

Creating a schema using real data

The schema we use is generated from accepted testcases using CWM (Closed World Machine), a forward-chaining inference engine. This meant that the schema reflected real data use, rather than the schema driving the use of the data. This bottom-up approach ensured consistency with the substantial established useage of iCalendar. It also effectively ensured that the use of RDFiCal and the schema were consistent.

Community Interaction

Other guidelines are social, to do with maintaining links with the community, and taking the advice of the community on difficult issues, and being open with the community, for example:

Create a W3C namespace for a schema, http://www.w3.org/2002/12/cal/

This means that we have a place to put the schema and test data which has a good persistence policy.

Announce all changes to the schema to the www-rdf-calendar mailing list. If anyone objects, within 7 days, we'll back out the changes (for further discussion).

Status says 'If the CVS date below haven't changed in the last two months, active developments have likely ceased'

Another decision was to use IRC to discuss development and implementations. This resulted in discussion with Chandler, Apple iCal, Mozilla calendar and RDF calendar developers.

Paper trail

We announced virtual meetings to the RDF calendar mailing list, and also the logs of the meetings, any changes to the schema and any actions.

The result is an automatically generated schema which is consistent with the established use of iCalendar, which was created with community input; which has documented reasons for decisions, and which is underpinned by testcases and examples.

Mapping RDFiCal to other RDF vocabularies

ICalendar refers to people indirectly (through 'calendar user-agent') and locations directly (via LOCATION and GEO). Our approach has been to model these within the RDFiCal vocabulary in order to enable roundtripping, but also to map these particular classes and properties to overlapping vocabularies. For describing people we have mapped calendar user agent to the FOAF vocabulary, and for location to the GEO vocabulary. These initial points of contact between the vocabularies provide connectives to the rest of the FOAF vocabulary for describing people, and to other data described using the FOAF and GEO vocabularies.

Applications and tools

As part of the process of creating the iCalendar vocabulary, two tools have been written to convert iCalendar to RDFiCal, and one to create iCalendar from RDFiCal[11]. A web-based tool has been created that enables the direct creation of RDFiCal data[12].

Several tools have been created which use RDFiCal or its antecedents. For example, the Retsina[13] Semantic Web Calendar Agent uses the 'hybrid' model together with the Dublin Core and FOAF vocabularies, and enables importing events into Microsoft Outlook. Sherpa[14] Calendar enables creation, publishing and search of RDFiCal events. An IRC-based robot has been created that allows simple querying of public events harvested from the web as RDFiCal and rebroadcast using a RESTful RDF query service[15].

Tools to help users create RDFiCal data in combination with other vocabularies are an interesting next step that will allow the integration of RDFiCal with other vocabularies.

Unresolved Issues

Event identification

Deciding when one event is the same event is another is in the general case a notoriously difficult problem. Proximity in space and time is generally not sufficient. In practical terms, two meetings may be in close proximity in time and space but not the be the same meeting; teleconferences do not have a common location. The specific case of deciding when two descriptions of an event refer to the same event is also difficult. A conference may be described as a single event lasting three days, a single day event repeated three times, a series of related events running sequentially and in parallel. These different ways of describing the same event make reference by description, for example using a query, inaccurate in some cases.

To make interesting connections between events, some way of definitely or approximately inferring that two descriptions refer to the same event is essential. Where event descriptions are generated by many different people and tools, using an explicit unique identifier for the event is impractical. One proposed solution is to use the homepage of an event as an owl:inverseFunctionalProperty, so that OWL reasoners can deduce that two descriptions refer to the same event. This is obviously not useful where the event in question does not have a homepage (such as a flight), and may be damaging in cases where people make cataloguing mistakes, for example mixing up the homepage of a conference with the homepage of a particular paper within the conference. This also applies where events data is automatically generted, but behind the automation is a human who must understand these kinds of distinctions.

A different approach may be to use software which can approximately match events from their descriptions, i.e. identification occurs when the data is used rather than when it is created.

Helping humans create machine-processible data

In general, when humans create data for machine-processing, many errors and ommissions creep in: mistakes of time and place, inaccuracies of identification, and ommissions because the information is obvious to them and so does not need to be described for them. When the information is shared and processed, these errors and ommissions become a significant problem for the usecases we have described. Tools to enhance the creation of event data, enabling the reuse of the means of identification of events, people and things are an important part of minimising these errors; documentation and clear examples of best practice are also vital means of improving data creation.

Finding, combining and searching public and private event data

Current models of searching distributed RDF data include the 'Google' model and the 'Web Services' model. For event data, the 'Google' model assumes that it will be useful to be able to search a centralised, harvested repository of stuctured events data. This would indeed be a useful thing to be able to do for public events, but is inappropriate for some event data which can contain personally and commercially sensitive information. A distributed system which can enable the location and merging of private and public information would be a more useful solution, since only then can individuals schedule events accurately, and avoid the personal information disaster described above. This will be true of many kinds of information on the Semantic Web, for example, personal addressbooks and public contact information; public and private documents. Managing this public/private split, and degrees of privacy, and groups of trusted individuals is also one of the major challenges for personal information management tools at the current time. Consistent interfaces to distributed, discoverable datasources with comprehensive trust policies is towards a description of what is required for this type of information. Working towards a common format for public and private event data gets us some way of determining the requirements for a Personal Information Management tool for events.

Conclusions

This paper describes the development of an RDF vocabulary for calendar information [16], [17] which we believe is robust, tested, and consistent both with the way that the iCalendar developer community uses the specification, and with the way the RDF community requires calendar information to be described. We are still a long way away from solving the personal information disaster and from effectively searching and visualizing the complexity of relationships between events and people, images, documents and locations; however the existence of applications which use the RDFiCal vocabulary goes some way towards indicating that modelling iCalendar in RDF is both feasible and useful.

References

[1] F. Dawson and D. Stenerson: Internet Calendaring and Scheduling Core Object Specification (iCalendar) , 1998-11, Internet Engineering Task Force, Standards Track http://www.ietf.org/rfc/rfc2445.txt

[2] Tim Berners-Lee: A quick look at iCalendar, 2000-10-02, v 1.43 http://www.w3.org/2000/01/foo

[3] Michael Arick and Libby Miller: 'Hybrid' RDF iCalendar schema, June 2001 http://www.ilrt.bris.ac.uk/discovery/2001/06/schemas/ical-full/hybrid.rdf

[4] Dan Connolly: iCalendar RDF Model, http://www.w3.org/2000/10/swap/pim/ical.rdf

[5] Dan Connolly: Palm datebook RDF model http://www.w3.org/2000/08/palm56/datebook

[6] Libby Miller: SWAD-Europe Deliverable 3.7: Developer Workshop Report 2 - Semantic Web calendaring, 2002-10-24 http://www.w3.org/2001/sw/Europe/reports/dev_workshop_report_2/

[7] Dan Connolly: The XML Revolution, Nature, 1998 http://www.nature.com/nature/webmatters/xml/xml.html

[8] Micha Wolf, Charles Wicksteed : W3C Note: Dtae and Time Formats 15th September 1997, http://www.w3.org/TR/NOTE-datetime

[9] Dan Brickley, Libby Miller: FOAF Vocabulary Specification 2003-10-15, http://xmlns.com/foaf/0.1/

[10] RDF Geo vocab workspace 2003-01, http://www.w3.org/2003/01/geo/

[11] Morten Frederiksen : Round-tripping RDF/iCal, 2003-08-20 http://www.wasab.dk/morten/2003/06/cal/action.html

[12] Masahide Kanzaki : An introduction to RDF version of iCalendar, 2003-11-07 http://kanzaki.com/docs/sw/rdf-calendar.html

[13] RETSINA Semantic Web Celandar Agenat http://www.daml.ri.cmu.edu/Cal/

[14] Sherpa Calendar http://www.sherpasuite.com/

[15] Libby Miller: WhoWhatWhenWhere October 2003, http://swordfish.rdfweb.org/discovery/2003/10/whwhwhwh/

[16] Dan Connolly and Libby Miller: RDF Calendar Workspace, 2003-04-09, W3C RDF Interest Group http://www.w3.org/2002/12/cal/

[17] iCalendar RDF Schema, http://www.w3.org/2002/12/cal/ical.rdf

[18] Mozilla Calendar http://www.mozilla.org/projects/calendar/

[19] Apple iCal http://www.apple.com/ical/

[20] RDF Interest Group http://www.w3.org/RDF/Interest/

[21] Leigh Dodds: The RDF Calendar Task Force, 2001-07-25, xml.com http://www.xml.com/pub/a/2001/07/25/rdfcalendar.html