必发365手机网页版|官网

                      
                      

                                  Showing posts with label recommendation systems. Show all posts
                                  Showing posts with label recommendation systems. Show all posts

                                  Monday, May 21, 2012

                                  Data Is More Important Than Algorithms


                                  Netflix Similarity Map

                                  In 2006 Netflix offered to pay a million dollar, popularly known as the Netflix Prize, to whoever could help Netflix improve their recommendation system by at least 10%. A year later Korbel team won the Progress Prize by improving Netflix's recommendation system by 8.43%. They also gave the source code to Netflix of their 107 algorithms and 2000 hours of work. Netflix looked at these algorithms and decided to implement two main algorithms out of it to improve their recommendation system. Netflix did face some challenges but they managed to deploy these algorithms into their production system.

                                  Two years later Netflix awarded the grand prize of $1 million to the work that involved hundreds of predictive models and algorithms. They evaluated these new methods and decided not to implement them. This is what they had to say:
                                  "We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then."
                                  This appears to be strange on the surface but when you examine the details it totally makes sense.

                                  The cost to implement algorithms to achieve incremental improvement isn't simply justifiable. While the researchers worked hard on innovating the algorithms Netflix's business as well as their customers' behavior changed. Netflix saw more and more devices being used by their users to stream movies as opposed to get a DVD in mail. The main intent behind the million dollar prize for Netflix was to perfect their recommendation system for their DVD subscription plan since those subscribers carefully picked the DVDs recommended to them as it would take some time to receive those titles in mail. Customers wanted to make sure that they don't end up with lousy movies. Netflix didn't get any feedback regarding those titles until after their customers had viewed them and decided to share their ratings.

                                  This customer behavior changed drastically when customers started following recommendations in realtime for their streaming subscription. They could instantaneously try out the recommended movies and if they didn't like them they tried something else. The barrier to get to the next movie that the customers might like significantly went down. Netflix also started to receive feedback in realtime while customers watched the movies. This was a big shift in user behavior and hence in recommendation system as customers moved from DVD to streaming.

                                  What does this mean to the companies venturing into Big Data?

                                  Algorithms are certainly important but they only provide incremental value on your existing business model. They are very difficult to innovate and way more expensive to implement. Netflix had a million dollar prize to attract the best talent, your organization probably doesn't. Your organization is also less likely to open up your private data into the public domain to discover new algorithms. I do encourage to be absolutely data-driven and do everything that you can to have data as your corporate strategy including hiring a data a scientist. But, most importantly, you should focus on your changing business — disruption and rapidly changing customer behavior — and data and not on algorithms. One of the promises of Big Data is to leave no data source behind. Your data is your business and your business is your data. Don't lose sight of it. Invest in technology and more importantly in people who have skills to stay on top of changing business models and unearth insights from data to strengthen and grow business. Algorithms are cool but the data is much cooler.

                                  Monday, December 1, 2008

                                  Does Cloud Computing Help Create Network Effect To Support Crowdsourcing And Collaborative Filtering?

                                  Nick has a long post about Tim O'Reilly not getting the cloud. He questions Tim's assumptions on Web 2.0, network effects, power laws, and cloud computing. Both of them have good points.

                                  O'Reilly comments on the cloud in the context of network effects:

                                  "Cloud computing, at least in the sense that Hugh seems to be using the term, as a synonym for the infrastructure level of the cloud as best exemplified by Amazon S3 and EC2, doesn't have this kind of dynamic."

                                  Nick argues:

                                  "The network effect is indeed an important force shaping business online, and O'Reilly is right to remind us of that fact. But he's wrong to suggest that the network effect is the only or the most powerful means of achieving superior market share or profitability online or that it will be the defining formative factor for cloud computing."

                                  Both of them also argue about applying power laws to the cloud computing. I am with Nick on the power laws but strongly disagree with him on his view of cloud computing and network effects. The cloud at the infrastructure level will still follow the power laws due to the inherent capital intensive requirements of a data center and the tools on the cloud would help create network effects. Let's make sure we all understand what the powers laws are:

                                  "In systems where many people are free to choose between many options, a small subset of the whole will get a disproportionate amount of traffic (or attention, or income), even if no members of the system actively work towards such an outcome. This has nothing to do with moral weakness, selling out, or any other psychological explanation. The very act of choosing, spread widely enough and freely enough, creates a power law distribution."

                                  Any network effect starts with a small set of something and it eventually grows bigger and bigger - users, content etc. The cloud makes it a great platform for such systems that demand this kind of growth. The adoption barrier is close to zero for the companies whose business model actually depends upon creating these effects. They can provision their users, applications, and content on the cloud and be up and running in minutes and can grow as the user base and the content grows. This actually shifts the power to the smaller players and help them compete with the big cloud players and yet allow them to create network effects.

                                  The big cloud players, that are currently on the supply side of this utility mode, have few options on the table. They either can keep themselves to the infrastructure business and I would wear my skeptic hat and agree with a lot of people on the poor viability of this capital intensive business model that has very high operational cost. This option alone does not make sense and the big companies have to have a strategic intent behind such large investment.

                                  The strategic intent could be to SaaS up their tools and applications on the cloud. The investment and control over the infrastructure would provide a head start. They can also bring in partner ecosystem and crowdsource large user community to create a network effect of social innovation that is based on collective intelligence which in turn would make the tools better. One of the challenges with the recommendation systems that uses collaborative filtering is to be able to mine massive information that includes users' data and behavior and compute the correlation by linking it with massive information from other sources. The cloud makes a good platform for such requirements due to its inherent ability to store vast amount of information and perform massive parallel processing across heterogeneous sources. There are obvious privacy and security issues with this kind of approach but they are not impossible to resolve.

                                  Google, Amazon, and Microsoft are the supply side cloud infrastructure players that are already moving in the demand side of the tools business though I would not call them the equal players exploring all the opportunities.

                                  And last but not the least, there is a sustainability angle around the cloud providers. They can help consolidate thousands of data centers into few hundreds based on the geographical coverage, availability of water, energy, and dark fiber etc. This is similar to consolidating hundreds of dirty coal plants into few non-coal green power plants that can produce clean energy with efficient transmission and distribution system.

                                                      
                                                      

                                                                  entertainment

                                                                  Technology

                                                                  entertainment

                                                                  aviation

                                                                  search for

                                                                  education

                                                                  car

                                                                  Mobile phone

                                                                  constellation