Back in the day
updated about 6 years ago; latest suggestion about 6 years ago
Dates are easy, in the abstract. Then I started working on a project where I had to parse dates like 'mid 1930s' from large chunks of prose.
Once I'd stopped gibbering I realised that there's nothing wrong with a date like 'mid 1930s' - people talk about dates with varying degrees of precision all the time. The question is, how do you meaningfully parse them?
These kinds of dates – from almost-complete dates like 'January 2013', to very vague decade dates like 'circa 2000s' – have several kinds of precision, including the sureness of the date (definitely January 2013, maybe January 2013), the possible range of the date (January 2003 – 2003 – 2000s), and whether a date represents a point in time (a single event that happened some time in the 2000s) or a span of time (manufactured during the 2000s).
In this talk I'll take you through what I discovered about how people write dates, how I went about parsing them, and what to do with them once you have a representation of them as data. We'll have particular fun addressing questions like 'Which is earlier, spring 1930 or mid 1930?', 'where does Winter come?', and looking at the concrete things I was able to do with a bunch of wooly dates.
This is an interesting topic and I'd like to hear more about it. I've never understood how software should handle ambiguous durations like "a month" (or even "a year", depending on how much you care about precision), so I'd be curious to know whether that sort of thing presented a problem.
I've clarified in the proposal: I'm specifically talking about processing dates in text, not dates which users enter into date fields. I'm not planning to cover the kinds of fuzzy input fields Leo and Murray mentioned. I will be covering how I developed the matcher, and how I proved it was good enough for my data
I would be very interested if your talk mentioned (and ideally tied into) HCI research around how users perceive "fuzzy" input fields and whether they help or hinder the process of entering data.
Ideally I would walk away from the talk either convinced that my next app will be improved by fuzzy date input fields (and armed with the technical chops to do it), or that it would be a waste of time as they don't provide any appreciable improvements over standard day-month-year selects.
On the web I'm used to filling in dates via the standard 3 selects, mostly they're enhanced with a picker, and in a few cases are enhanced with something that I guess is backed by Chronic. I'd be interested to hear about how fuzzy dates were used in your application, and why you had to parse them in the first place! Is it just to make things easier for the users entering data, and if so, did it actually help?
For completeness-sake I'd love to hear about the approach you took when developing your fuzzy matcher. How did you decide when it was ready to ship, and how often did you have to go back to enhance it after encountering another form of fuzzy date?
I think this sounds interesting. I'd hope to learn not just about dates, but also how in ruby you dealt with this inherent fuzziness.
BTW, should 'definitely January 2013, maybe January 2013' be more like 'definitely January 2003, maybe January 2013'?
That's a good point. I edited the proposal, which hopefully makes more sense to you now :-)
I think I'd have a better grasp on what you mean by wooly dates if you added a few more examples to the proposal. What's the most-specific-yet-still-wooly date?