The Nineteenth Annual Applied Statistics Workshop


An Overview and Examination of Data Mining

Richard D. De Veaux
Department of Mathematics
Williams College


Data mining is the exploration and analysis of large data sets, by automatic or semiautomatic means, with the purpose of discovering meaningful patterns. These patterns, or rules, are then used for decision making via a process known as knowledge discovery. Much of exploratory data analysis and inferential statistics concern the same problems. What's different about data mining? What's similar? We will attempt to answer these questions by providing a broad survey of the problems that motivate data mining and the approaches that are used to solve them.

Data mining comprises techniques from Computer Science, Machine Learning and Statistics. Like Statistics, it concerns itself with learning, or generalization from data, but typically, only retrospectively. In typical business applications, data mining is the obvious next step after data warehousing. It has the potential to rank among the most strategic applications in organizations because of the enormous payoffs it brings. Some of the more notable applications include fraud detection, identifying good (and bad) credit risks, product warranty management, evaluating the effectiveness of retail promotions, and customer life cycle management. Recent innovative applications of data mining have included clinical trials and relating biological activity to chemical structure.

We will start the discussion by defining data mining and the knowledge discovery process and the typical applications that motivate it. We will next survey the collection of data mining techniques that are most commonly found in commercial data mining tools including neural networks, decision trees, K-nearest neighbor methods and MARS. We will discuss the role of Statistics and statistical thinking in data mining and what the field of Statistics can bring to the data mining effort. We will conclude with applications and case studies of data mining in a variety of fields including Marketing, Telecommunications, E-commerce and Bioinformatics.


Schedule of the Day

Registration Form

Campus Map to University Student Union


For further information contact

Connie Vadheim at vadheim@humc.edu or [310] 222-3842
or
Nancy Berman at berman@gcrc.humc.edu or [310] 222-1874