Evolution and Quality Management in Dynamic Data Integration Systems

 

STIC-AmSud scientific-technological cooperation program:     2008 - 2010

PRiSM Laboratory Centro de Informática da UFPE Laboratoire LSIS, Université Paul Cézanne LIA - UFC InCo - FIng - UdeLaR


Abstract

Data integration is a problem faced by large enterprises and organizations, which need integrated access to distributed data sources. In the last years several architectures have been proposed to solve this problem, including the federated databases, the mediator architecture and the peer database management systems (PDBMS). The general principle of such solutions is that in order to offer integrated access to distributed data they need to provide semantic mappings between the data sources.

The development of data integration systems, independently of the architecture they are based on (i.e. mediation systems, P2P architectures or GRID infrastructures) poses two major problems: the evaluation of the quality of the information offered by these systems and the maintenance of the semantic mappings which connect the data sources. So, the absence of techniques to perform the quality evaluation of the information offered by the system and the maintenance of the semantic mappings can make data integration systems inoperative and obsolete.

Actually, without the data quality evaluation the data offered by these systems will not be useful as support to the decision making. Information about the freshness and precision of the data are crucial for such task. Besides, the environment over which data integration system is built is not static it may evolve frequently. Consequently, in order to maintain the data integration system alive, it is necessary to dynamically reconsider the semantic links and adapt them to the new changes. Otherwise, the data integration system becomes progressive useless. Additional cost paid for this dynamic maintenance of semantic links may dramatically increase with the volume of change events and with the frequency of these events. The data quality evaluation and the evolution management are not independent processes, frequently the data sources evolution may cause changes in the data quality.
Therefore, in this project these two problems will be considered together and integrated solutions will be proposed.

The project overall objective is the development of techniques, algorithms and tools to provide support for the evolution and quality management in data integration systems. Different types of data integration systems will be considered for this project, including well structured ones, as mediation systems, and less structured ones as peer data management systems. Besides the technical and scientific results, this project will be of fundamental importance for strengthening the collaboration among the partners and for fostering new partnerships.

Participants