Zestimate, Zillow’s “best guess” algorithm that compiles quotes on the value of homes is getting an upgrade. Though more accurate today with a median error rate of 5.6% as compared to an error rate of 14% when it was launched 11 years ago, Zillow and its 166 million monthly users, 90% of whom use Zestimates, want to drive down that error rate “a lot.”
To that end, the Kaggle platform is sponsoring a contest for data teams anywhere to improve the Zestimates model. With a prize of $1M to the winning team, some 200 data teams are working right now to create an algorithm to better predict the sale prices for approximately 110 million homes listed on Zillow. Kaggle will select 100 of those teams as finalists January, 2018. Winners will be announced January, 2019.
Much is at stake here for consumers as well as service providers (real estate data networks such as Redfin and home.com, banks, real estate agents, etc.) that can spit out valuation quotes nearly instantaneously. For consumers, an error rate of Zestimates, currently 5.6% means that 50% of the houses sold will be within that 5.6% sales price valuation and 50% of the houses sold will be outside of that 5.6% sales price valuation. Such an error rate can be expensive. Just ask Zillow’s CEO Spencer Rascoff. Last year, he sold a home for 40% less than its Zestimate and, on the other end, he bought a home for $1.6M more than its Zestimate. Ouch! Zillow still admits that real estate agents are slightly ahead at predicting sales numbers than their computations are. No surprise there!
Stan Humphries, Zillow’s chief analytics officer who created the original Zestimates algorithm, now has a team of 15 data scientists and machine-learning engineers who are responsible to run 7.5M statistical models nightly to update Zestimates. He believes that the way to improve Zestimates’ error rate is to use hyperlocal data since error rates vary by locations. Cut the error rate by using hyperlocal data along with the hundreds of other data points, comparable home sales, local assessed values, analyses of home pictures, etc.and “…we’ll be able to drive the error rate a lot lower…(though)…neither humans nor computers will get to zero……we think that having more options (from hyperlocal data) will make consumers more comfortable.”