Supporting user-defined multiple granularities for the management and querying of temporal clinical information

Starting date
November 30, 2004
Duration (months)
24
Departments
Computer Science
Managers or local contacts
Posenato Roberto
URL
2004094558_003
Keyword
TEMPORAL GRANULARITY, TEMPORAL DATABASES, CLINICAL DATABASES, SEMISTRUCTURED DATA

TEMPORAL DATA AND MEDICINE
Time is important for clinical medicine both in defining the diagnosis, in identifying the therapy, and in defining the prognosis [Combi98]: these decision making actions hold during a time and are dealt with according to information that is temporally characterized. In order to assess a diagnosis, in fact, the physician finds out the clinical history of the patient, composed usually by previous pathologies, therapies, and symptoms, the patient narrates; this information completes data collected directly from the patient, i.e. the blood pressure, heart rate, identified by their temporal location [Combi95]. In supporting the computer-based storage and retrieval of this information, we need to represent the temporal dimension of clinical data. The considered temporal dimension is usually the valid time, i.e. the time during which the information is true in the modeled reality [Jensen98].

GRANULARITY
The granularity of a given temporal information is the level of abstraction at which information is expressed. Different units of measure allow one to represent different granularities [Snodgrass95,Bettini98a].
Granularity is present in several application domains as, for example, geographic information systems, planning and scheduling, medical information systems, office information systems, real-time systems, natural language processing [Brusoni99,Chittaro00,Combi97, Koubarakis99,Maiocchi92,Montanari96,Snodgrass95,Staab99]. In general, the temporal dimension of information can be expressed with different time granularities – e.g., "from 12:30, October 12, 1997 to December 25, 1997", "from August 1996 to January 1997", "in 1996 for three days", "from June 24, 1995 for 6 months". Moreover, in querying the system about the stored temporal information it is usual to use different granularities with no relation to the ones used when storing data. Supporting different granularities involves several research topics, as the representation of multiple granularities and the modeling and querying of temporal data given at different granularities.

REPRESENTING GRANULARITIES
Different research directions deal with the issue of time granularity. Among them, some works propose different frameworks allowing the formal definition of multiple granularities and of relationships among them [Bettini98a, Clifford88, Goralwalla01, Montanari92, Montanari96]. Other research efforts focus on granularity and calendars [Snodgrass95]. Mainly, a granularity is represented as the partitioning of the basic time line.
Clifford and Rao [Clifford88] introduce a general structure for time domains called "temporal universe" which consists of a totally ordered set of granularities (e.g., years, months, and days). Operations are defined on a temporal universe, which basically convert different anchored times to a (common) finer granularity before carrying out the operation. Wang et al. [Wang93] extend this work by providing semantics for moving up and down a granularity lattice. Montanari et al. [Montanari92] examined the issue of representing multiple granularities in temporal reasoning systems. In [Goralwalla01], granularity is modeled as a special kind of unanchored temporal primitive that can be used as a unit of time. Granularities are accommodated within the context of calendars and granularity conversions are presented and discussed in terms of unanchored durations of time.
Bettini et al. [Bettini98b] propose a general framework for defining time granularity systems and analyze different kinds of relationships between granularities. In [Bettini99], Bettini and De Sibi propose formal definitions and a mathematical characterization of finite and periodical time granularities.
An Algebraic framework for time granularities has been proposed by Ning et al [Ning02]. In this framework, a bottom granularity is assumed, and a finite set of calendar operators is exploited to create new granularities by suitably manipulating other granularities. A granularity is hence identified by an algebraic expression.
A recent approach for the definition of temporal granularities is described in [Combi04] and is based on the adoption of a propositional temporal logic to express sets of granularities.

MODELING AND QUERYING TEMPORAL DATA WITH DIFFERENT GRANULARITIES
TSQL2 [Snodgrass95] TSQL2 is a temporal extension to the SQL-92 language standard. Granularity and indeterminacy are in considered in TSQL2 both in the data model and in the query language. Different data conversions between different granularities are allowed. A probabilistic approach is taken in managing the indeterminacy coming from conversions from coarser granularities to finer ones.
Combi et al. [Combi97] propose an object-oriented data model and a related query language to deal with temporal information given at different granularities and/or with indeterminacy: they propose the adoption of a three-valued logic for managing uncertainty coming from relationships between intervals and/or instants given at different granularities.
In [Snodgrass99] different semantics of temporal queries are introduced, but with a limited capability of supporting different granularities.

REPRESENTING AND QUERYING TEMPORAL DATA ON THE WEB
During the last years the increasing amount of information accessible through the Web has presented new challenges to academic and industrial research on database.
In this context, data are either structured, when coming from relational or object-oriented databases, or semistructured or completely unstructured, when they consist of simple collections of text or image files.
A number of research projects have addressed the problem of accessing in a uniform way semistructured data [Abiteboul97].
Among these, we cite LOREL [McHugh97], UnQL [Buneman96], WebSQL [Mendelzon96], WebOQL [Arocena98], StruQL [Fernandez97], G-Log [Paredaens95].
It is a common approach to represent semistructured data by using data models based on labeled graphs [Buneman97,Paredaens95,Consens90,Ceri99].
As for the classical database field, also in the context of semistructured data it is interesting to take into account the dynamic aspects of data, i.e their evolutions through time and eventually through consecutive updates, in order to query and impose restrictions on how data changes in time.
A work which addresses the problem of representing and querying changes in semistructured data is [Chawathe98,Chawathe99].
The authors introduce a model, DOEM, and a language, Chorel, for querying over data and changes. The model uses annotations on both the nodes and edges of OEM graphs (see [Papakonstantinou95]) to represent changes. Intuitively, annotations are the representation of the history of nodes and edges. This method causes a growth in the number of nodes of the final graph over which it is possible to apply the query.
A data model based on labeled graphs, useful to represent static and dynamic aspects related to semistructured data which reduces the amount of annotations needed for information representation, and a SQL-like query language for querying it, are presented in [Oliboni01].
In this model, in order to represent dynamic aspects of semistructured databases and to allow querying about their evolution through time, the information about the valid time of objects and relations is added to the document graph.
This approach allows one to store the minimal amount of information, and to avoid duplication of nodes and edges. The mentioned SQL-like query language TS-QL, aimed at querying semistructured data modeled by means of semistructured temporal graphs, uses the concept of path expression, which identifies a path on the information graph.
Recently, temporal aspects of semistructured data have been considered in [Combi04a, Combi03, Combi03a]: the focus is on the definition of constraints for valid and transaction times in semistructured and XML data and also on specific features of semistructured clinical data.

Sponsors:

Ministero dell'Istruzione dell'Università e della Ricerca
Funds: assigned and managed by the department
Syllabus: COFIN - Progetti di Ricerca di Interesse Nazionale

Project participants

Alberto Belussi
Associate Professor
Carlo Combi
Full Professor
Roberto Posenato
Associate Professor
Research areas involved in the project
Sistemi informativi ed analisi dei dati
Data management systems

Activities

Research facilities

Share