Showing posts with label Data Mining. Show all posts
Showing posts with label Data Mining. Show all posts

Monday, December 29, 2008

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition


Data Mining: Practical Machine Learning Tools and Techniques, Second Edition - Ian H. Witten and Eibe Frank

The second edition of this book, published in 2005, and written by Ian H. Witten and Eibe Frank, professors of University of Waikato in New Zealand is, in my opinion, the most complete and comprehensive book about data mining. The authors define Data Mining: is about solving problems by analyzing data already present in databases, and their book is about looking for patterns in data.

They divide the book into two parts, the first one called Machine Learning Tools and Techiniques and the last one called The Weka machine learning workbench. In the first part, consisting of 8 chapters, they start describing what is data mining and machine learning, and the fielded applications. After, are described the concepts, instances and atributes involved in data mining, that they call input. They consider output the knowledge representation. The knowledge representation is a key topic in classical intelligence artificial, and the word knowledge is used just because they need some word to refer to the structures that learning methods produce. In a chapter, are described the types of knowledge representation, since the simplest decision tables, the decision trees, called for them a "divide-and-conquer" approach, the classification rules, the association rules, rules with exceptions, rules involving relations, trees for numeric prediction, the instance-based representation and the clusters.

In other chapter, well detailed, are described the basic methods of algorithms, with a topic of discussion in the end of each method. They explain how evaluate what's been learned, and also how to implement the real machine learning schemes, developing the types of knowledge representation. They define the transformation used in the input and output, and finish the first part showing the main applications of data mining.

The last part is about Weka, an open source collection of machine learning algorithms and data preprocessing tools, for data mining tasks. It includes all the algoritms described in the book. The Weka development project started at University of Waikato in New Zealand, where the authors are professors, and part of academic staff that defined the project. Weka is an interesting data mining tool and since 2006, it forms the data mining and predictive analytics component of the Pentaho business intelligence suite, that has become major sponsor of Weka development.

This is a nice book, the authors explain the subject using pratical applications, and it is a good source for someone interested in learning about data mining and machine learning.

Friday, September 12, 2008

Einstein was a Business Intelligence and Data Mining early adopter


Before yesterday, Calumo, a BI/PM company, published in its blog, an interesting post called CALUMO cites Einstein as Business Intelligence and Data Mining early adopter, when they show how to solve a logic puzzle using the concepts of Business Intelligence.

According Calumo's post:
"The logic puzzle below is attributed to Albert Einstein and it is said that only 2% of us can solve it.

The Puzzle
There are 5 houses in 5 different colours.
In each house lives a person with a different nationality.
These 5 owners drink a certain type of beverage, smoke a certain brand of cigar, and keep a certain pet.
No owners have the same pet, smoke the same brand of cigar or drink the same drink.

Hints
1. The Brit lives in a red house.
2. The Swede keeps dogs as pets.
3. The Dane drinks tea.
4. The green house is on the left of the white house.
5. The green house owner drinks coffee.
6. The person who smokes Pall Mall rears birds.
7. The owner of the yellow house smokes Dunhill.
8. The man living in the house right in the middle drinks milk.
9. The Norwegian lives in the first house.
10. The man who smokes Blend lives next door to the one who keeps cats.
11. The man who keeps horses lives next door to the man who smokes Dunhill.
12. The owner who smokes Blue Master drinks beer.
13. The German smokes Prince.
14. The Norwegian lives next to the blue house.
15. The man who smokes Blend has a neighbor who drinks water.

The Question: Who owns the fish?"

They broke the problem down into a matrix, or cube, comprising six dimensions with each dimension comprising five elements.

You can see the explanation in the Calumo's blog.

The BI/PM companies are increasingly innovating as they advertise their products or services. After iDashboards, a BI company, published a Dashboard to show the results of Olympics (I wrote a post about that), this is another interesting way to attract attention.

Saturday, April 12, 2008

The Ten Emerging Technologies in 2008


The Technology Review Magazine, published by MIT, in its cover story and in its online edition, published a special report about the 10 Emerging Technologies in 2008, that they defined as the 10 most exciting, world-changing technologies of the year.


All technologies are very interesting, but for us from IT, mainly from BI/PM, the concept of Modeling Surprise is amazing.

The definition of Modeling Surprise: combines data mining and machine learning to create computer models for predict surprise events.

For those interested in data mining and machine learning, there is a good book:




- Data Mining: Practical Machine Learning Tools and Techniques, Second Edition - Ian H. Witten, Eibe Frank