Monday, December 29, 2008

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition - Ian H. Witten and Eibe Frank

The second edition of this book, published in 2005, and written by Ian H. Witten and Eibe Frank, professors of University of Waikato in New Zealand is, in my opinion, the most complete and comprehensive book about data mining. The authors define Data Mining: is about solving problems by analyzing data already present in databases, and their book is about looking for patterns in data.

They divide the book into two parts, the first one called Machine Learning Tools and Techiniques and the last one called The Weka machine learning workbench. In the first part, consisting of 8 chapters, they start describing what is data mining and machine learning, and the fielded applications. After, are described the concepts, instances and atributes involved in data mining, that they call input. They consider output the knowledge representation. The knowledge representation is a key topic in classical intelligence artificial, and the word knowledge is used just because they need some word to refer to the structures that learning methods produce. In a chapter, are described the types of knowledge representation, since the simplest decision tables, the decision trees, called for them a "divide-and-conquer" approach, the classification rules, the association rules, rules with exceptions, rules involving relations, trees for numeric prediction, the instance-based representation and the clusters.

In other chapter, well detailed, are described the basic methods of algorithms, with a topic of discussion in the end of each method. They explain how evaluate what's been learned, and also how to implement the real machine learning schemes, developing the types of knowledge representation. They define the transformation used in the input and output, and finish the first part showing the main applications of data mining.

The last part is about Weka, an open source collection of machine learning algorithms and data preprocessing tools, for data mining tasks. It includes all the algoritms described in the book. The Weka development project started at University of Waikato in New Zealand, where the authors are professors, and part of academic staff that defined the project. Weka is an interesting data mining tool and since 2006, it forms the data mining and predictive analytics component of the Pentaho business intelligence suite, that has become major sponsor of Weka development.

This is a nice book, the authors explain the subject using pratical applications, and it is a good source for someone interested in learning about data mining and machine learning.

No comments: