It has been a while since I published a book review on this web site. I have read a lot of books in the mean time, but none of them seemed worth to recommend. Programming Collective Intelligence by Toby Segaran not only captured my attention, it also captured my imagination.
The subtitle of the book probably describes the content a lot better: “building smart web 2.0 applications”. This book gives an overview of the algorithms that power many of the most popular web 2.0 sites today, providing features such as search engine page ranking, recommending products, spam prevention, match-making, identifying related topics and collaborative filtering. Solutions for those problems involve crunching huge data sets and utilising smart and elegant algorithms. As someone coming from a Math background, I really enjoyed reading a book about smart algorithms for a change. If you like that sort of stuff, this book has plenty to offer: clustering, genetic algorithms, non-deterministic optimisation, Bayesian filtering, support-vector machines and even programs that automatically create other programs to facilitate learning in games AI. If you are not a Math geek, don’t worry. This book is written from a developer perspective and all advanced Math used in the algorithms is explained in enough detail for an average programmer to be able to follow.
All the examples in this book are written in Python. I am an occasional Python user, so an additional bonus for me was learning about several open-source Python extensions that Toby used to complete the tasks, including stuff for graph plotting, xml parsing and matrix operations. I guess that knowing Python is a prerequisite to read the book effectively, but understanding just the basics should be good enough as the focus is more on the algorithms and how to apply them effectively than on actual code.
The book has about 350 pages from cover to cover so it is fairly easy to read on a bus or train. I strongly recommend it to Math geeks and data crunchers, but it should also appeal to the broader programming community, especially with its focus on forces behind popular web 2.0 sites.