Wed Oct 24, 2012, 12:41 PM
groovedaddy (6,184 posts)
Mining Truth From Data Babel - "Signal and the Noise"
A friend who was a pioneer in the computer games business used to marvel at how her company handled its projections of costs and revenue. “We performed exhaustive calculations, analyses and revisions,” she would tell me. “And we somehow always ended with numbers that justified our hiring the people and producing the games we had wanted to all along.” Those forecasts rarely proved accurate, but as long as the games were reasonably profitable, she said, you’d keep your job and get to create more unfounded projections for the next endeavor.
This doesn’t seem like any way to run a business — or a country. Yet, as Nate Silver, a blogger for The New York Times, points out in his book, “The Signal and the Noise,” studies show that from the stock pickers on Wall Street to the political pundits on our news channels, predictions offered with great certainty and voluminous justification prove, when evaluated later, to have had no predictive power at all. They are the equivalent of monkeys tossing darts.
As one who has both taught and written about such phenomena, I have long felt like leaning out my window to shout, “Network”-style, “I’m as mad as hell and I’m not going to take this anymore!” Judging by Mr. Silver’s lively prose — from energetic to outraged — I think he feels the same way.
The book’s title comes from electrical engineering, where a signal is something that conveys information, while noise is an unwanted, unmeaningful or random addition to the signal. Problems arise when the noise is as strong as, or stronger than, the signal. How do you recognize which is which?
3 replies, 952 views
Mining Truth From Data Babel - "Signal and the Noise" (Original post)
|phantom power||Oct 2012||#1|
Response to groovedaddy (Original post)
Wed Oct 24, 2012, 12:56 PM
phantom power (25,966 posts)
1. Having been in that business, the biggest problem is usually getting the right data
And most of the problems with getting the right data are human/social, not technical.
Example: customers/clients who refuse to release data because it is 'proprietary' - thus preventing mining guy (me) from building model.
Example: the huge amount of really juicy and useful data you can mine from the web, which almost immediately runs up against citizen privacy issues.
Example: social rating sites (e.g. angie's list) that could get far more ratings than they do, but consumers don't really spend a lot of time filling out rating data.
Example: the bias in rating data toward 'strong negative' ratings, and 'strong ratings' in general. People feel the most motivation to rate something when they are pissed off with it. Or, at least if they feel very strongly about it in some way.
Response to phantom power (Reply #1)
Wed Oct 24, 2012, 02:01 PM
bemildred (90,061 posts)
3. "Garbage in, garbage out", as the very old saying goes.
Verifying that your stuff actually works is very expensive and tedious too, and is often replaced with hand-waving and spot checks. I was often considered a wizard, when what I really was was a drudge (small 'd') when it came to testing.