This draft: 2003-11-12, libby.miller@bristol.ac.uk. See notes
Recent work on the Mozilla calendar project and the iCal Apple application has stimulated interest in the subject of creating, displaying and exchanging calendars. These products make it simple for users to create and make available calendars represented in the iCalendar standard (RFC 2445), and to subscribe to the calendars they are interested in which have been created by other people. Because people are able to export their calendar information, there is a great deal of data available in the iCalendar format, making calendars a potential source of substantial quantities of Semantic Web data.
Calendaring is an interesting Semantic Web application because it is such an important part of people's day-to-day lives, and because it also encompasses difficult philosophical and modelling problems, and social, trust and privacy questions. It straddles all levels of the Semantic Web from simple queries to find out who is at a conference, to complex scheduling problems involving logic.
This paper focuses on a small part of this space, the apparantly straightforward problem of how to convert an established, fairly well-understood, well-used vocabulary for calendaring into a Semantic Web vocabulary (ontology). It describes the reasons for creating the RDFiCal vocabulary and the iterative community process used to create it. It explains why we decided on a conversion process which was both highly automated and highly social, and describes why we think that modelling iCalendar in RDF is both feasible and useful.
Since early 2001, the RDF Interest Group has had an informal taskforce examining calendaring and scheduling issues and RDF. This arose from an initial exercise by Tim Berners-Lee which described how one might approach the problem of converting iCalendar to RDF, and later, from discussions at the W3C Technical plenary in 2001.
There have been several attempts to model calendaring formats in RDF, including the 'hybrid' model, another iCalendar model and a Palm model. ICalendar is the standard format used by most devices and software for calendaring and scheduling, but is also lengthy, complex and difficult to implement in its entirety.
A SWAD-Europe workshop held in October 2002 in Bristol, UK, on calendaring and the Semantic Web produced many usecases for descibing calendar events using an RDF vocabulary, and also a plan of action as to how the process of creating such a vocabulary might proceed.
"The bane of my existence is doing things I know the computer could do for me." Dan Connolly, The XML Revolution, 1998
There are two broad classes of usecase for RDFiCal. The first concerns the need to have all your calendar information in one format, in order to be able to schedule meetings effectively. The second concerns connecting events with other types of thing: people, documents, locations.
Despite the existance and wide uptake of RFC 2445 iCalendar, many people still have scattered schedules described in email text, web pages of conferences and workshops, output formats of various calendars, including iCalendar, and other (sometimes proprietory, sometimes binary) formats. This information is held in public and private places. It is not easy to combine this information or search it. This leads to time-consuming manual copying and pasting from webpages to PDAs and other devices and calendar applications. It can result in errors in scheduling leading to missed meetings, and missed opportunities, and a general inertia about scheduling since simply finding out what you are supposed to be doing can be very difficult.
As an example, suppose that your organisation uses an iCalendar-compliant calendaring system to enable scheduling within the organisation. So, how do you schedule a meeting with someone from outside that organisation? How do you tell your family and friends about work commitments?
Or, suppose you are going to a conference. How do you get the information from the web page of the conference into your own calendar? How does your airline let you know when your flights are? How do you schedule imprompu meetings with interesting people from outside your organisation while at the conference?
There are two intermingled issues here. One concerns privacy and data protection: your family are not allowed to see the schedules for all your work colleagues; your work colleagues should not need to know about all your non-work events, and so on. This paper will not examine this issue, except to note that it is a non-trivial problem for any part of the Semantic Web where private and public data is merged together (and perhaps retransmitted in some form).
The other issue is interoperability. It is simply the need to gather all the information relevant to your schedule and be able to examine it all in one place. Scheduling is a very complex social activity. It concerns priorities which may vary over time and change as events get nearer. It is something that requires your input and that of the other people you are scheduling a meeting with. It is something that requires you to have all the relevant information at your fingertips. Even if an agent (or secretary) does your scheduling for you, it or he requires all the relevant information in order to be able to make that decision. It simply makes no sense for scheduling information to be spread over many disparate sources.
Simply putting the calendar information in RDF doesn't help with this latter problem, except insofar as it represents a common format for the information. ICalendar also represents a common format for calendar information, so:
Events are the glue between people, places, documents, pictures. The reasons for having an RDF vocabulary for calendaring become compelling when we consider that (a) descriptions of events tend to spill out into other areas, and (b) creating vocabularies is difficult.
Most of the usecases which came from the Semantic Web Calendaring workshop in Bristol concerned mixing non-calendar with calendar information. For example:
When attending an event, be able to download into a Palm or similar all the information needed about the event, for example, maps, travel itinerary, information about the participants, agenda.
I haven't met most of the people coming to this meeting - what do they look like? Also, what papers have they written?
Who else is attending this conference? who could I try and meet?
What pictures where taken at this conference?
What restaurants are close to this event? Are any of them recommended by people I know?
ICalendar does not (and, we argue below, should not) try to encompass all these different aspects of people, pictures, locations and documents. Events are one way of making connections between people, for example, but that does not mean that an event vocabulary should encompass all the different things we might want to know about people (their address, their papers, who they know), as well as all the aspects of events that we care about, such as date, time, attendee.
The advantage of encoding data in an RDF vocabulary for calendaring rather than iCalendar is that many different vocabularies can then be combined with event data, connecting people (or documents) without placing the burden of vocabulary design for many intersecting domains on one group.
Previous conversions of iCalendar to RDF had been created manually by one or two people, and so because of the length of RFC 2445 are somewhat inconsistent and uncertain.
For example, Michael Arick and Libby Miller created a model of iCalendar in RDF in June 2003, termed 'hybrid' because it represented the merging of two separate attempts at conversion. As the vocabulary was created there were many judgement calls to be made, for example:
The STATUS of a VEVENT (an event) in iCalendar can be one of TENTATIVE CONFIRMED, or CANCELLED. These could be represented as a class with three possible values, or as URIs or as literals. It was not clear how best to encode this information within an RDF schema.
ICalendar uses the profile of iso 8601 YYYYMMDDThhmmss; W3C's XML schema uses a slightly different profile, both syntactically (using W3C Date Time format, YYYY-MM-DDThh:mm:ss) and in terms of specification of the timezone, which in iCalendar uses an identifer for a timezone component and in W3C Date Time format uses an offset from UTC.
A CAL-ADDRESS is a URI, usually a mailto: URI. Implicitly there is a calendar user behind that address. Should we try to model that user in order to make it easier to work with other vocabularies that can talk about people, for example?
Recurrance rules in iCalendar are complex and contain defaults. How can we map this to a set of rules? What language should be use for that mapping? Or should we just encode the rules in RDF, leaving their semantics undefined?
A VCALENDER is simply a group of VEVENTS or other calendar components such as VTODOs. What are the semantics of this grouping?
Certain iCalendar objects appear to have a sublass relationship to each other, or to some unnamed superclass, for example VEVENT, VTODO and VJOURNAL. Should we attempt to describe this relationship?
When we make any of the above decisions how can we ensure that we have applied them consistenmtly? and in general,
Decisions like these, made in this translation represented somewhat arbitrary judgement calls, making roundtripping to iCalendar difficult to automate. Broadly, the decisions we had to make came under the following headings:
Experiences like these and the usecases developed at the Bristol workshop led to a number of quite specific requirements for an RDF conversion of iCalendar, which have governed the recent work on RDFiCal.
The ability to roundtrip unambiguously to and from iCalendar emerged as an essential requirement for the RDFiCal vocabaulary. This is not just a question of converting 'legacy' data; many tools use iCalendar and will continue to do so, and to create data which cannot be used with these tools would only increase the degree of the Personal Information Disaster.
For RDFiCal to be a Semantic Web vocabulary, and to be useful for linking different types of information, the ability to map between other commonly used vocabularies and RDFiCal, and link RDFiCal data to other data is essential. Examples vocabularies include FOAF (people), Dublin Core (documents), Wordnet (a dictionary), CYC (a large, comprehensive and detailed ontology, including such concepts as time and place).
'Data that isn't used rots'@@.
Since there is already a great deal of existing data in iCalendar, the RDFiCal vocabulary should refect its established use in tools. Similarly, data describing usecases results in a robust vocabulary.
There is a signifiant community of interest in the RDF and calendaring area, and they need to be informed and be able to comment and reject proposed parts of the vocabulary.
Decisions should be made in public and the reasons documented, to resolve any future ambiguities and mapping problems.
At the end of the Bristol RDF Calendaring workshop in October 2001, we had a number of guidelines to drive the development of a new RDF Calendar vocabulary, and overcome some of the need for judgement calls.
The first of these was to automate where possible. Many of the problems with the 'hybrid' version had arisen because of the length of the iCalendar specification, and the consequent problem of maintaining consistency within a manual conversion, making roundtripping difficult. This and the requirement for consistency with available data, gave us the following guideline:
Create an RDF schema for iCalendar which is mechanically derived from the iCalendar data
We have created two software tools that convert from iCalendar to RDF and one that converts from RDFiCal to iCalendar. The iCalendar to RDF tools create a syntactic conversion. As we have seen this is not a straightfoward process because of the judgement calls noted above. However once these decisions have been made and the community has been consulted, a syntactic transformation of iCalendar data files is all that is required to create RDF versions. We have also created a repository of testcases in iCalendar and RDF, generated by existing tools (Mozilla calendar, Apple iCal, Evolution) and usecases (restaurant opening hours, bus timetables, conferences).
Among the testcase repository are a number of usecase-driven examples. We have also used a wiki to write up links with vocabularies such as FOAF and GEO.
The schema we use is generated from accepted testcases using CWM (Closed World Machine), a forward-chaining inference engine. This meant that the schema reflected real data use, rather than the schema driving the use of the data. This bottom-up approach ensured consistency with the substantial established useage of iCalendar. It also effectively ensured that the use of RDFiCal and the schema were consistent.
Other guidelines are social, to do with maintaining links with the community, and taking the advice of the community on difficult issues, and being open with the community, for example:
Create a W3C namespace for a schema, http://www.w3.org/2002/12/cal/
This means that we have a place to put the schema and test data which has a good persistence policy.
Announce all changes to the schema to the www-rdf-calendar mailing list. If anyone objects, within 7 days, we'll back out the changes (for further discussion).
Status says 'If the CVS date below haven't changed in the last two months, active developments have likely ceased'
Another decision was to try to use IRC to discuss development and implementations. This resulted in discussion with Chandler, Apple iCal, Mozilla calendar and RDF calendar developers.
We announced virtual meetings to the RDF calendar mailing list, and also the logs of the meetings, any changes to the schema and any actions.
The result is an automatically generated schema which is consistent with the established use of iCalendar, which was created with community input; which has documented reasons for decisions, and which is underpinned by testcases and examples.
We almost never want to talk about an event without talking about people, documents or places. Where is this conference? What else is close by? Who else is attending? What documents are relevant?
A single vocabulary for events is unlikely to be detailed enough to convey all the information we would like to have about events, people, places, documents. For example, in iCalendar, CATEGORIES are free-text keywords, but it would be interesting to use controlled vocabularies of dictionaries such as wordnet to describe the category of meeting. Or of an ATTENDEE, it would be useful to know more than their name and email address, for example what they have written, who they work for, where they are based and who they know.
Vocabularies which describe many different fields of interest do exist but they are usually developed by a small group. Attempting to develop such a vocabulary with input from the relevant communities would be very difficult and time-consuming. In a process which involves communities who have an interest, it is much more efficient to devolve the creation of specific vocabularies to specific communities. Therefore being able to use these different vocabularies together in predictable ways is very important.
ICalendar refers to people indirectly (through 'calendar user-agent') and locations directly (via 'location' and 'geo'). Our approach has been to model these within the RDFiCal vocabulary in order to enable roundtripping, but also to map these particular classes and properties to overlapping vocabularies. For describing people we have mapped calendar user agent to the FOAF vocabulary, and for geo to the GEO vocabulary. These initial points of contact between the vocabularies provide connectives to the rest of the FOAF vocabulary for describing people, and to other data described using the FOAF and GEO vocabularies.
Resina? semaview? Masaka's tools? ical2rdf. @@hm...data will drive applications@@
@@others? tools?@@
When people create descriptons of events (and other 'metadata') they make mistakes and omissions This is relevant to how we identify events in RDFiCal Tools and best practice examples to help people are important.
Deciding when one event is the same event is another is in the general case a notoriously difficult problem. Proximity in space and time is generally not sufficient. @@more, ref@@ In practical terms, two meetings may be in close proximity in time and space but not the be the same meeting; teleconferences do not have a common location. The specific case of deciding when two descriptions of an event refer to the same event is also difficult. A conference may be described as a single event lasting three days, a single day event repeated three times, a series of related events running sequentially and in parallel. These different ways of describing the same event make reference by description, for example using a query, inaccurate in some cases.
To make interesting connections between events, some way of definitely or approximately inferring that two descriptions refer to the same event is essential. Where event descriptions are generated by many different people and tools, using an explicit unique identifier for the event is impractical. One proposed solution is to use the homepage of an event as an owl:inverseFunctionalProperty, so that OWL reasoners can deduce that two descriptions refer to the same event. This is obviously not useful where the event in question does not have a homepage (such as a flight), and may be damaging in cases where people make calatoging mistakes, for example mixing up the homepage of a conference with the homepage of a particular paper within the conference.
In general, when humans create data for machine-processing, many errors and ommissions creep in: mistakes of time and place, inaccuracies of identification, and ommissions because the information is obvious to them and so does not need to be described for them. When the information is shared and processed, these errors and ommissions become a significant problem for the usecases we have described. Tools to enhance the creation of event data, enabling the reuse of the means of identification of events, people and things are an important part of minimising these errors; documentation and clear examples of best practice are also vital means of improving data creation. When data is created in a distributed way for sharing, data creation tools and techniques - the practice of 'cataloging' - becomes a universal problem, not one confined to information professionals.
@@Some events are generated automatically...still an issue?@@
A different approach may be to use software which can approximately match events from their descriptions, i.e. identification occurs when the data is used rather than when it is created.
In no order yet
[1]
[2]
[3] iCalendar RDF Schema, http://www.w3.org/2002/12/cal/ical.rdf
[4]
[5]
[6]
[7]
[8]
[9] RDFIG Geo vocab workspace, 2003-01 http://www.w3.org/2003/01/geo/
[10]
[11] Mozilla Calendar http://www.mozilla.org/projects/calendar/
[12] Apple iCal http://www.apple.com/ical/
[13] RDF Interest Group http://www.w3.org/RDF/Interest/
[14] W3C technical plenary, March 2001 http://www.w3.org/2001/02/allgroupoverview.html
[15]
[16]
[17]
[18]
2003-11-12 todo: - pull out llinks and create references - abstract and conclusion - apps - replace 'vocabulary' with 'ontology'? 2003-11-10 - tried to summarise each section, intags; gradually filling out the sections. DanC's inital email to rdf-interest http://lists.w3.org/Archives/Public/www-rdf-interest/1999Nov/0010.htmlthe point of the paper could be: yes, modelling iCalendar in RDF (a) is feasible and (b) is useful. Libby's talk: http://www.w3.org/2001/sw/Europe/talks/200311-rdfical/all.html questions I was asked included - how to identify events - using inverseFunctionalproperty? - what is the killer app for calendaring? - were there any tools for creating/displaying RDFical (especially combined with other namespaces) people liked the discussion about devolving vocabulary creation to different groups of experts, with not too much coordination, and so got the importance of extensibility. http://www2004.org/cfp/refereed.html submission date: November 14, 2003 final papers due: February 28, 2004 Libby is travelling 11-15 nov; intermittent network access 11,12 attending a workshop 13,14 [[ We encourage authors to submit concise papers with up to 8 pages; however, papers with up to 10 pages may be submitted, and an additional 2 pages (for a maximum of 12) may be purchased at a cost of $100/page. Over-length or incorrectly formatted submissions may be rejected without reviews. Final copies of accepted papers will be required in both PDF and XHTML formats. ]] style to be used: http://www2004.org/cfp/www2004-submission.htm Introduction: Modelling iCalendar in RDF (a) is feasible and (b) is useful. - much of this text was adapted from the bristol calendaring reworkshop port. History: W3C RDF Interest Group calendaring taskforce created March 2001, following previous work by Tim Berners-Lee 'hybrid' RDF iCalendar vocabulary created June 2001 by Michael Arick and Libby Miller Other schemas based on Palm tools by Dan Connolly and others @@http://lists.w3.org/Archives/Public/www-rdf-calendar/2001Jul/0001.html useful email@@previous versions: Some issues were: * resolving ambiguities/clarifying/understanding RFC 2445 * identifying best practice in the RDF part of the translation * identifying how usecases affect the RDF patterns used * identifying which parts of the specification were used by applications * deciding whether to divert from the letter of RFC 2445 in order to do something useful in RDF "the hybrid schema makes a number of judgement calls to make clear the model behind the iCalendar mime-directory RFC 2445. This means that it's difficult to automate the conversion between iCalendar and RDF, which seems to be essential in the current circumstances." (@@http://www.w3.org/2001/sw/Europe/reports/dev_workshop_report_2/@@)combining vocabularies is important RDFiCal approach is to model overlapping properties and classes within the RDFiCal vocabulay, and then map these to other vocabularies. THis is to preserve roundtripping. @@not actually done all of this! but perhaps a way forward@@ @@note: something about the importance of devolving vocabulary creation to expert groups with little or no interaction with each other? this gives a strong reason to use RDf for the latter usecase@@ [[ The search services do know which part of your page is the title, because thetag in the HTML markup tells them. Why not just add and and and such tags to HTML? Because # technically, it would produce a mess: HTML is hard enough to process now, and if we make it harder, we reduce the chance that new tools will come along and make the Web smarter. # socially, it wouldn't work: the HTML specification is maintained by a small group of experts who are trusted to Do The Right Thing on behalf of the community; that small group doesn't have expertise in all subjects that may be covered by Web pages, and if we added that expertise to the group, it would be too large to function. It is much better to give everyone a tool that they can easily adapt for their own particular needs. ]] http://www.nature.com/nature/webmatters/xml/xml.html seems very relevant here