Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.By the end of an average day in the early twenty-first century, human bei...

DownloadRead Online
Title:Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Author:Seth Stephens-Davidowitz
Rating:
Edition Language:English

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Reviews

  • Atila Iamarino

    Acertei em cheio nessa leitura! Seth Stephens-Davidowitz apresenta uma análise de como as pessoas se comportam, na mesma linha do

    e do

    . Mas enquanto Signal and the Noise fala de tendências de dados e Dataclisma fala do comportamento das pessoas dentro do OkCupid!, Everybody Lies fala de como as pessoas se comportam em geral.

    O autor usa uma série de dados de forma b

    Acertei em cheio nessa leitura! Seth Stephens-Davidowitz apresenta uma análise de como as pessoas se comportam, na mesma linha do

    e do

    . Mas enquanto Signal and the Noise fala de tendências de dados e Dataclisma fala do comportamento das pessoas dentro do OkCupid!, Everybody Lies fala de como as pessoas se comportam em geral.

    O autor usa uma série de dados de forma bastante inovadora, como tendências de buscas no Google (onde ele trabalha), buscas no PornHub, Facebook e outras fontes de big data para fazer o que ele chama de "sociologia de verdade" ou sociologia baseada em evidências. Os dados que ele mostra sobre preconceito (buscas por temas preconceituosos), insegurança de auto-imagem, inseguranças em relação aos filhos e afins mostram uma imagem bem mais crua e feia da sociedade do que o que pintamos com postagens em Facebook e Instagram.

    Outros revelam informações no mínimo interessantes, sobre a diferença que se formar em Harvard pode fazer (nenhuma, o ponto parece estar em quem se forma), onde criar os filhos, como aumentar as chances de sucesso em um encontro... O livro lembra bastante uma versão mais nova e, na minha opinião, mais curiosa da abordagem inovadora de Freakonomics.

    Se você não está interessado na revolução que o registro e a disponibilidade de dados está causando no mundo, e no estrago que empresas e governos conseguem fazer com o controle que têm sobre a informação, no mínimo vai curtir o livro pelos fatos curiosos e mórbidos que ele levanta dos dados. Saber por exemplo que o número de homens que buscam como fazer bem sexo oral nas mulheres é o mesmo que busca por como fazer sexo oral em si mesmo fala muito sobre como as pessoas pensam. Um livro para todos os gostos.

  • Lori

    When sociologist ask people if they waste food, people give the only correct answer. It's wrong to waste food.

    When sociologist survey the contents of the same people's garbage, they get a more accurate answer.

    Just imagine how much more information is available trolling through internet searches.

  • Jim

    writes the author early on & he shows why. (Google trends is available to all here:

    ) He also checked other big data sets including Wikipedia, Facebook, Pornhub, & even Stormfront, the largest racist site. What he found was really interesting & it will help harden the soft, social sciences. It's a new frontier.

    He points out problems with tradit

    writes the author early on & he shows why. (Google trends is available to all here:

    ) He also checked other big data sets including Wikipedia, Facebook, Pornhub, & even Stormfront, the largest racist site. What he found was really interesting & it will help harden the soft, social sciences. It's a new frontier.

    He points out problems with traditional reporting. In the section about child abuse & abortions, Google searches suggest that child abuse does increase during economic downturns while gov't figures incorrectly show little change. Closing abortion clinics doesn't stop them, it simply leads to more self-induced abortions. Both happen off the books, but there is now convincing supporting data to show us what we need to address & make more informed decisions with resources.

    Big data has an advantage over every other type of survey because few realize it is being collected, so we don't lie to make ourselves look better. It's also anonymous & aggregate, so caution needs to be used when forming conclusions. For instance, based on Pornhub searches, the author concludes that about 5% of men are gay because they searched for gay porn. That seemed a reasonable conclusion until he pointed out that 15% of women search for rape porn. Does that mean they want to be raped? The author says of course not & makes a big deal out of the difference between fantasy & reality. That makes me question his first conclusion, although it seems about right.

    Gut reactions are often wrong & he provides several examples where it's wrong due to cognitive biases. He also points out "The Curse of Dimensionality". Given large enough sets of data, there will be correlations just through chance. For instance, there are graphs that show how closely autism diagnoses track with organic food sales or Jenny McCarthy's popularity. Separating these out is a whole other problem.

    Big Data only gives us trends that we need to examine. We can't use it on the individual level. While 1000 people searched for how to kill their girl friend, only 1 girl was killed in his example. That's horrific & might have been stopped if someone had looked at his search history, but do we give up everyone's privacy for a 1 in 1000 chance that we might prevent a murder? Some might be willing, but I'm not, so we also have new questions to address.

    The audio book was well narrated & I didn't miss the graphs too much. They're provided in the extra material, but weren't handy when I was listening & the book took that into account for the most part. Highly recommended in either format.

  • Will Byrnes

    There’s lies, damned lies and then there are statistics. One must won

    There’s lies, damned lies and then there are statistics. One must wonder. Do the lies get bigger as the datasets grow? Seth Stephens-Davidowitz posits that the availability of vast sums of new data not only allows researchers to make better predictions, but offers them never-before-available tools that can offer insight that direct questioning never could.

    We have seen steps up of this type before. Malcolm Gladwell has made a career of such, with

    ,

    , and

    .

    is the one I would expect most folks would know. Nate Silver put his data expertise into

    . All these looks at data and how we interpret it rely on the analyst, regardless, pretty much, of the data. While the same might be true of Stephens-Davidowitz’s approach, he focuses on the availability of materials that have not been there in the past. The smarts that must be applied to get the most interesting results can now be applied to new oceans of data. It is more possible than it has ever been to draw inferences and actually test them out.

    In addition to the

    of data that is now available, there is the

    . The author looks at Google and FB data for evidence of underlying realities. Surveys can sometimes offer inaccurate outcomes, when the people being queried do not provide honest answers.

    . But one can look at what people enter into Google to get a sense of possible racism by geographic area.

    Looking for queries on jokes involving the N-Word, for example, turns out to yield a telling portrait of anti-black sentiment, which also correlates with lower black life expectancy. (And pro-Trump vote totals)

    We are treated to looks into a variety of research subjects, from picking the ponies, to seeing what

    interests/concerns people sexually, looking for patterns of child abuse, selecting the best wine, using the texts of a vast number of books and movie scripts to come up with six simple plot structures.

    I thought the most interesting piece was on the use of associations, and provoking curiosity, rather than relying on overt statements to influence how people feel about a different group of people. Another was on using a data comparison of one’s (anonymous) medical information to others who share many characteristics to improve medical diagnoses.

    There are some areas in which it was not entirely persuasive that the methodology in question was tracking what was claimed. SS-D sees in searches of Pornhub, for example,

    Really? I expect that what people check out on-line does not necessarily track with what might be of interest in real life. It would be like someone with an interest in mysteries being thought to have homicidal tendencies after searching for a variety of homicide related titles. Should a writer doing research into a dark subject like child pornography, human trafficking or cannibalism expect the heavy knock of the police on his/her door? Where is the line between an academic or titillation search and one made for planning?

    SS-D makes a point about there being a significant difference between searches that offer projections for groups or areas, and their inapplicability for predicting individual behavior, although that will not necessarily remain the case. In baseball, for example, the explosion of available information may very well be applied to specific players to diagnose and even correct flaws in technique, or recognize patterns that might expose underlying medical issues, or predict their arrival. The Big Data related here is much more macro, looking at group proclivities. Useful for spotting trends, measuring public sentiment, but in more detail than has been heretofore possible.

    And of course there is the impact of dark players. Those with the resources and motivation could manipulate the Big Data produced by Google and Facebook. Such players would not necessarily be limited to Russian cyber-spies and pranksters, but corporate and ideological players as well, like

    . There could have been a bit more in here on those concerns.

    The book offers plenty of anecdotal bits that could have been lifted from any of the other data books noted at the top of this review. What one needs, ultimately is smart, insightful analysis. Having all the data in the world (that means you, NSA) is merely a burden unless there is someone insightful enough to figure out the right questions to ask, and how to ask them.

    SS-D notes several Google (Trends, Ngrams, Correlate) services that might be familiar to folks doing actual research, but which were news to me. It might be useful to check out some of these, maybe even come up with meaningful queries to shed light on pressing, or even completely frivolous questions.

    Not all problems can be solved, or even examined by the addition of ever more data. Sometimes, many times, the information that is available is perfectly sufficient to the task, but other factors prevent the joining together of its various pieces to create a meaningful whole. The now classic example is from 9/11, when an absence of coordination between the CIA and FBI resulted in suicide bombers who could have been foiled succeeding in their mission. Politics and the culture of nations and organizations figure into how data is used

    So if everybody lies, is Seth Stephens-Davidowitz telling us the truth? I am sure there is a query one could construct that would look at diverse data sources, pull them all together and give us a fuller picture, but for now, we will have to make do with reading his book and articles, checking out his videos, applying the analytical tools already incorporated into our brains, and seeing if there is enough information there with which to come to a well-grounded conclusion. And that’s no lie.

    Review posted – May 5, 2017

    Publication date – May 9, 2017

    =============================

    Links to the author’s

    ,

    , and

    pages

    VIDEOS – SS-D speaking

    -----

    -----

    - Arts & Ideas at the JCCSF

    -----

    - The Julis-Rabinowitz Center for Public Policy and Finance

    The June 2017

    cover story has particular relevance to the treatment of actual truth in today's political environment. It is illuminating, if not exactly uplifting. -

    - By Yudhijit Bhattacharjee

    July 12, 2017 - Washington Post - one of the very serious applications of big data -

    - by Philip Bump

    July 15, 2017 - One of the ways big data gets compromised is via automated dishonesty -

    by Tim Wu - Thanks to Henry B for letting us know about the article

  • David

    This is an engaging book about how big data can be used to improve our understanding of human behavior, thinking, emotions, and preference. The basic idea is that if you ask people about their behavior or their preferences in surveys, even anonymous surveys, they will often lie. People do not like to admit to low-brow preferences; racists do not want to admit to their prejudices, most people who watch pornography do not want to admit to it, and even voting is often misrepresented; some people wh

    This is an engaging book about how big data can be used to improve our understanding of human behavior, thinking, emotions, and preference. The basic idea is that if you ask people about their behavior or their preferences in surveys, even anonymous surveys, they will often lie. People do not like to admit to low-brow preferences; racists do not want to admit to their prejudices, most people who watch pornography do not want to admit to it, and even voting is often misrepresented; some people who voted for Trump would not admit to it.

    But, by analyzing immense datasets from Google, public archives, social media, and the like, Seth Stephens-Davidowitz has been able to unearth a lot of fascinating answers to puzzling questions. For example, he is able to predict, through Google searches for various symptoms, who is likely to have early stages of pancreatic cancer. He can predict epidemic breakouts of some contagious diseases well before they are announced by the CDC (Center for Disease Control). He shows that the single factor that correlates with voting for Trump is that of racism.

    Then there are the fun factoids, about the sorts of things that people search for most often on Google. Most commonly, the search "Is my son ..." is followed by "gifted", while the search "Is my daughter ..." is followed by "overweight". That tells us something about stereotypes for the way people think about their children. Interestingly, the release of a new violent movie in a city is correlated with a

    in violent crime in that city. Perhaps the reason is that violent people who are watching the movie are not out on the streets, committing crimes.

    And here we get to the main problem with this sort of analysis. Undoubtedly, the research and analysis of big datasets is done correctly. However, once a surprising result is found, understanding the motivations behind the online activity are often subjective and open to interpretation. While this book is very careful about its underlying assumptions, it is a slippery road to getting the correct interpretations and explanations.

    This is an easy, well-paced book that should appeal to anybody who enjoys books like

    .

  • Richard Derus

    I have nothing unique to add to the conversation about this book. I think those most in need of reading it won't, and that's frustrating.

    If you've ever seen a number adduced to explain a trend, read this book. If you've ever asserted that a certain percentage of something was something/something else, read this book. If you've ever seen a politician quote a study and your innate bullshit filter clogged up, read this book.

    Really simple, high-level terms:

  • aPriL does feral sometimes

    I was annoyed by the author’s writing style in ‘Everybody Lies’. I have no doubts author Seth Stephens-Davidowitz was trying to write to a

    general audience, including that assumed class of American non-science reader who hates math and binge watches ‘Keeping Up with the Kardashians’. Good for him, and maybe you, right? But I became more and more annoyed as I read. Ah, well. It is an interesting and informative read, in spite of trying

    hard to be fun, imho.

    What is the book about? I am g

    I was annoyed by the author’s writing style in ‘Everybody Lies’. I have no doubts author Seth Stephens-Davidowitz was trying to write to a

    general audience, including that assumed class of American non-science reader who hates math and binge watches ‘Keeping Up with the Kardashians’. Good for him, and maybe you, right? But I became more and more annoyed as I read. Ah, well. It is an interesting and informative read, in spite of trying

    hard to be fun, imho.

    What is the book about? I am glad to report it has genuine information about the science of statistics and ‘big data’ collecting, and how the erroneous selection of study parameters or assumptions about what is relevant data to study affects conclusions (as far as I know - I am a dunce at scientific math, despite that I passed a statistics class). The author used what seemed to me genuinely interesting new methods to formulate statistical studies, primarily using Google’s forensic tools, along with other sources.

    I was shocked by what people type into Google Search (which Google compiles into anonymous data). For example, President Obama’s race appears to have truly ignited racists into coming out of their closets. Comparing survey interviews with people who state they are racist (a low percentage) with the percentage of those who Googled “n***** jokes” state by state turns out to show some truly hidden pockets of unexpected racism - and the total percentage of racist searches on Google was WAY higher than the racism that typical surveys show. In addition, those places who adore Trump also searched most for “n***** jokes”. Correlation? Idk, no one does know for the record, but I think yes.

    Also of interest to me (please don’t bust my balls because of my prurient interests - and maybe there is a pun in this sentence, hehheh - read on) men really truly do Google a lot about penis sizes. Come on, fellas, give it a rest! (Yes, I am trying to be snarky since the too much ‘at rest’ position is part of what men appear to be most anxious about!) Men prowl porn sites in humongous numbers - shocking, right? - which is good for statisticians looking for Truth about sexuality for their inputs into their mathematical equations. Based on Google porn searches, the author estimates 5% of the population is gay. (Btw, conservatives mostly use the word ‘homosexual’ while liberals use the phrase ‘same-sex’, statistically, in Google searches.)

    Not to neglect what Google says about what the ladies’ biggest sexual worry is, all I can say is, Oh. My. God. Vagina odor. Really? Really!!

    All statisticians should take note - interrogative surveys often show different results from those statistics revealed in Google searches about the percentages of who is thinking/feeling what where and when, especially in those morally-weighted or personally embarrassing areas of society. Of course, interpretation is always fraught with possible erroneous judgements whatever the source of sampling.

    I have always trusted those insurance actuarial tables FAR more than political or media spins or even university data studies - so now I am adding Google statistics to my ‘trusted info’ list. Of course, gentle reader, I know any compilations of data can be erroneously or purposely manipulated or massaged. ‘Garbage in, garbage out’ still applies...which is the case ‘Everybody Lies’ makes as well. The book seemed on top of the science, as far as I know. I am not a science-brain, but an amateur wannabe.

    My one irritation with this book is all about the manner in which the information is explained. Gentle reader, my complaint is subjective as hell. Honestly, I can’t put my finger on it, though. The writer seemed to be trying to fill out his actual 200-page book to 300 pages by having personal emotional filler similar to the gaspy asides many shows use to increase the viewers’ emotional high about what is being discussed. Are you familiar with those TV shows that, after each commercial break, recap the entire show in the preceding minutes before the commercial break in a breathless montage manner? And they often had a shocked-gasp teaser of what will be shown before the commercial break? Anyway, I felt there was a lot of that style of emotional manipulation (and extending of the material) going on in this book, somehow. I simply did not appreciate the personal ‘fun’ filler so much. Maybe there wasn’t enough snark. I prefer snarky humor, if there is humor. Bite me. Maybe a more tightly edited book would have worked better for me to enjoy reading it. Anyway, I realize I am floundering about here. None of this may be true at all for you.

    Ultimately, this is a book worthy of reading for the general reader (for the record, I definitely have a lit/history brain, so yes, I am a general science reader!) and the explanatory information about how statistical studies are done (the only math-involved college class which engaged me) and what people are really feeling and thinking (if Google searches are to be believed, and I think they are).

    Included are extensive Notes and Index sections.

  • Trish

    Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

    Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that G

    Maybe everyone does lie. But they don’t lie all the time. Stephens-Davidowitz makes the good point that asking people directly doesn’t always, in fact may not often, yield true answers. People have their own reasons for answering pollsters untruthfully, but it is clear that this is a documented fact. People sometimes lie to pollsters.

    Stephens-Davidowitz was told by mentors and advisors not to consider Google searches worthwhile data, but the more he looked at it, the more he was convinced that Google searches contained the best data for determining what people are concerned about. He has uncovered some interesting trends that are not apparent through direct questioning because people are sometimes ashamed of their fears, feelings, prejudices, and predilections.

    I didn’t really like this book. Partly the reason is because I listened to it, and Stephens-Davidowitz gives charts, graphs, data points that obviously cannot be represented in the audio version. These usually help me to grasp things easily and maybe bypass pages of material that is not as interesting to me. It wasn’t that his material was hard, it was that I oftentimes did not like what he was talking about. He had a tendency to focus on deviant behavior, e.g., sexual predators, abuse, porn, etc. One might make the argument that these behaviors are important to understand and therefore worth looking at. Possibly. However, if ‘everybody lies,’ one might make the argument that we do not have to look at deviance to find untruthfulness.

    What we discover is that to test Stephens-Davidowitz’s thesis that ‘everybody lies,’ we have to spend quite a lot of time with statistics and creating studies, or as he is wont to do, studying big data. Big data probably irons out discrepancies in the reasons for our Google searches, e.g., that it is not me that is interested in the herpes virus, it is my brother, because in the end it doesn’t matter why we did the search; what matters is that we did the search. Besides, maybe I’m lying about my brother having the virus, but my interest in the topic is not a lie.

    Stephens-Davidowitz has made a career so far out of the study of big data, showing us ways to slice and dice it so that it is useful to our view of the world. Only thing is, I am not as interested in what big data tells us as he is. He’d trained as an economist, and towards the end of the book he hit a couple of areas I did find more interesting, like the notion of regression discontinuity, a term used to describe a statistical tool created to measure the outcomes of people very close to some arbitrary cut-off.** S-D talks about using this tool on federal inmates, discovering criminals treated more harshly committed more crimes upon their release. But S-D also studied students on either side of the admissions cut-off for the prestigious Stuyvesant High School: those who attended Stuyvesant did not have a significant performance difference in later life than students who did not.

    Apparently Stephens-Davidowitz went into data science because of

    , the bestselling book by Steven D. Levitt. He believes that many of the next generation of scientists in every field will be data scientists. I did finish the audiobook, another study he took note of in the last pages. Apparently few readers finish ‘treatises’ by economists. He believes this is his big contribution to our knowledge base, and there is no doubt his contrariness did highlight ways big data can be used effectively.

    If I may be so bold, I might be able to suggest a reason why many female readers may not be as interested in the material presented, or in Stephens-Davidowitz himself (he was/is apparently looking for a girlfriend). Stay away from the deviant sex stuff, Seth. It may interest you but I can guarantee that fewer women are going to find that appealing or reassuring conversation or reading material.

    An interesting corollary to this economists’ data view is the question of whether the truth matters, which is how I came to pick up this book. Recently on PBS’ The Third Rail with Ozy, Carlos Watson asked whether the truth matters. At first blush the answer seems obvious, and two sides debated this question. One side said of course truth matters…but most of us know one man’s truth to be another man’s lie. The other side said ‘everybody lies.’ It got me to thinking…I do think the two ways of coming to the notion of lying dovetail at some point, and one has to conclude that truth may not matter as much as we think. What matters is what we believe to be true.

    Finally, it appears Stephens-Davidson agrees to some degree with Cathy O'Neill, author of

    , in that he agrees you best not let algorithms run without human tweaking and interference. The best outcomes are delivered when humans apply their particular observations and knowledge and expertise

    big data.

    ** S-D describes it this way:

  • Jessica

    This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less

    This book tries too hard to be Freakonomics. The first two parts are full of random examples of interesting but mostly pointless things that can learned via Google search trends. However, a whole lot of assumptions are made off these bits of data that don't seem to have much basis in factual scientific methods of research. Unprofessional jokes are thrown in randomly. If you need a footnote to explain why a joke was not homophobic maybe you should have just skipped the joke. And any book of less than 300 pages of text should not need to use the same example three times, especially when it's about how the author can't believe women are concerned about the smell of their vagina.

    The last section of the book explains the limitations big data holds and is really the most grounded section, the rest being almost hagiography. It would have done a lot to work the third section into the examples of the first two sections. It would have balanced out the praise and also would have done much to explain the flaws present in some of the examples included.

    Some cool facts buried in a lot of murky oddness.

    Disclaimer: I was given this book in a Goodreads giveaway.

Best Free Books is in no way intended to support illegal activity. Use it at your risk. We uses Search API to find books/manuals but doesn´t host any files. All document files are the property of their respective owners. Please respect the publisher and the author for their copyrighted creations. If you find documents that should not be here please report them


©2018 Best Free Books - All rights reserved.