Invited Talks

Ahmed K. Elmagarmid (Qatar Computing Research Institute)

  • Title: Insight into Data Cleaning and Linkage
  • Abstract:
  • In this talk, we will start by discussing some of the handicaps facing creating effective and efficient data cleaning solutions. Then we will discuss some challenges and criticize current conservative approaches to this very critical problem of data repair, cleaning and de-duplication. Finally we will discuss some of our work at QCRI and Purdue in this area such as Guided data Repair, record Linkage, Web data cleaning, and user-assisted data cleaning.
  • Bio:
    Ahmed K. Elmagarmid is the Qatar Foundation's executive director of the Qatar Computing Research Institute. Before joining Qatar Foundation, he was director of both the Indiana Center for Database Systems and the Cyber Center at Purdue University's Discovery Park and a professor of Computer Science at Purdue, where he taught and researched for 25 years. Elmagarmid's research interests focus ranges on a large spectrum of foundational and application-oriented database research. He is on several editorial boards and program committees, he is the editor in chief of the distributed and parallel database journal and the book series on advances in Database Systems. Early in his career, in 1988, he received the National Science Foundation's Presidential Young Investigator award from President Ronald Reagan. The Ohio State University (1993) and the University of Dayton (1995) have both named him among their distinguished alumni. He is a fellow of the IEEE and an ACM distinguished scientist.

Wenfei Fan (University of Edinburgh)

  • Title: Data Quality: Theory and Practice
  • Abstract:
    Real-life data are often dirty: inconsistent, inaccurate, incomplete stale and/or duplicated. The prevalent use of Internet has been increasing the risks, in an unprecedented scale, of creating and propagating dirty data. Dirty data are estimated to cost US industry alone billions of dollars each year. There is no reason to believe that the scale of the problem is any different in any other society that is dependent on information technology. This highlights the need for the study of data quality. This talk is to provide an overview of recent advances in the area of data quality, from theory to practical techniques. We present a conditional dependency theory for capturing data inconsistencies, matching dependencies for data deduplication, relative information completeness for characterizing incomplete data, and a data currency model for querying possibly stale data. We also briefly discuss techniques for automatically discovering data quality rules, detecting errors in real-life data, and for correcting the errors with performance guarantees.
  • Bio:
    Wenfei Fan is Chair of Web Data Management in the School of Informatics, University of Edinburgh, UK. He is a Fellow of the Royal Society of Edinburgh, UK, a National Professor of the Thousand-Talent Program and a Yangtze River Scholar, China. He received his PhD from the University of Pennsylvania, USA, and his MS and BS from Peking University, China. He is a recipient of the Alberto O. Mendelzon Test-of-Time Award of ACM PODS 2010, the Best Paper Award for VLDB 2010, the Roger Needham Award in 2008 (UK), the Best Paper Award for IEEE ICDE 2007, the Outstanding Overseas Young Scholar Award in 2003, the Best Paper of the Year Award for Computer Networks in 2002, and the Career Award in 2001 (USA). His current research interests include database theory and systems, in particular data quality, data integration, database security, distributed query processing, query languages, social networks, Web services and XML.