Category Archives: Links

Latte Art, Experimentally Investigated

Back in March, Aaron wrote a popular philosophy essay about latte art, which incorporates perspectives from philosophy, psychology, and practice. He concluded on a speculative note:

It seems that latte art can make a difference to us. Neither taste nor even flavour experiences are all we care about when it comes to a cup of coffee. […] Latte art surely affects the visual experience of sighted drinkers of coffee […] it seems possible—although I know of no research directly on the question—that latte art could affect how we taste the coffee beneath it. […] If this is right, then it is possible that latte art itself could have a subtle effect on perceived coffee flavour. So if latte art is an art, it could be an art that really matters!

An article in the August issue of Journal of Sensory Studies empirically investigates exactly this topic. A group of researchers, including psychologist Charles Spence and Colonna & Small‘s Maxwell Colonna-Dashwood, empirically investigated the effect of latte art on how people perceive and value coffee. What’s the practical upshot?

The results reported here suggest that the addition
of latte art influences how much people expect, and are willing to pay for milkbased coffees. As such, for the cafe owner thinking about how to increase profits, the experiments reported here suggest thapeople are willing to pay between 11–13% more for coffee with latte art than for those without it.

Very interesting finding!

The State of Reproducibility in Experimental Philosophy

Over at Feminist Philosophers, Dan Hicks casually mentions in a comment:

I haven’t had too many interactions with folks interested in experimental philosophy, but based on those interactions it seems like (1) x-phi folks are modeling their methods explicitly on these kinds of studies in psychology, while (2) not being aware of or responsive to the epistemic crisis surrounding these methods in psychology. So I would encourage x-phi folks to check out the links above.

Despite some initial defensiveness, I’ve come to think that Hicks’s remarks are actually quite reasonable for those philosophers who haven’t really kept up with the nitty-gritty happenings in experimental philosophy. Having heard about the replicability crisis in psychology, it is easy to assume that experimental philosophy is susceptible to the same problems. Having not kept up with the nitty-gritty happenings in experimental philosophy, it is easy to assume that experimental philosophers are unaware of or unresponsive to these issues.

The point of this post is to dispel those assumptions.

To be clear, my aim in writing this post isn’t really to pick on a casual comment, but to give a more general state of the art on what experimental philosophy, as a community, is doing and has been doing about methodological and epistemological issues related to reproducible science.

* * *

To start thinking about issues with reproducibility, we can go back to the context of Hicks’s comment: a “sexy” result from a low-powered social psychology study that says you can reduce implicit bias in your sleep. The replicability crisis in psychology has, in large part, to do with with replicating findings of a similar sort. As such, having learned about the replicability crisis, a philosopher is prima facie reasonable to cast a skeptical eye toward experimental philosophy findings.

The skeptical eye, I take it, has to do with a companions-in-guilt charge. Experimental philosophy often borrows methods from (social) psychology; since there’s a crisis in psychology, would it be such a surprise if experimental philosophy doesn’t suffer from the same methodological and epistemological issues?

There is something to the companions-in-guilt charge, but we should clarify who the companions really are. It’s not just psychology that suffers from a replicability crisis. It’s also cancer research, pharmaceutics, political science, behavioral economics, etc. In other words, there is a replicability crisis in science. So, philosophers should worry about experimental philosophy… insofar as and as much as they are worried about science.

* * *

 Is there reason to be more worried about experimental philosophy compared to psychology and other sciences? Not only do I think the answer to that question is ‘no’, but I think we in fact have reasons to be relatively optimistic about experimental philosophy.

The experimental philosophy community has been aware of the issues that underlie the replicability crises even before their recent publicity. For a long time (in a young research program), Joshua Knobe (and later Christian Mott) has maintained the experimental philosophy replication page, which documents experimental philosophy findings that have (and have not) replicated well. Looking at the replication page, one can see that the majority of effects in experimental philosophy have, in fact, been well-replicated. (The big exception is the cluster of demographic effects.) So, not only are experimental philosophers aware of methodological and epistemological issues with replicability, they have already taken some steps to address it.

The replication page is one exemplar of a broader pattern. The community, in general, is very welcoming to cumulative research practices, which encourage internal corrections and challenges. For example, on this very blog there was an extended discussion about the replicability of an age effect. For example, Chandra Sripada and Sara Konrath shared their data so that David Rose, Jonathan Livengood, Justin Sytsma, and Edouard Machery can re-analyze it to challenge Sripada and Konrath’s interpretation. For example, Adam Feltz and Florian Cova have systematically examined the moral responsibility and free will literature via a meta-analysis.

I’ll let you in on a little secret. In the earlier days of experimental philosophy (and to a lesser extent even today), many people ran their online studies using Knobe’s Qualtrics account. So everyone can in principle see what everyone else is working on, and even download the data if they wanted. In my view, the general culture of transparency discourages outright frauds like data manipulation or even faking data.

In addition to a general culture of transparent and cumulative research, experimental philosophers may also be in a better position than, say, psychologists because, well, they’re philosophers. And philosophy has a long tradition of thinking about the relationships between experiments, statistical inference, theory confirmation, etc. Some experimental philosophers are also philosophers of science who are attentive to such issues. For example, Edouard Machery has written about the interpretation of null results. But even the experimental philosophers who don’t also specialize in philosophy of science will have some exposure to these issues. And, based on the attendance at the Preconference Workshop on Replication in the Sciences at SPP, experimental philosophers are very keen to learn even more about these issues from both philosophers of statistics like Deborah Mayo, as well as methods experts from diverse cognate fields.

* * *

None of this is to say that current practices of experimental philosophy are anywhere near perfect. There is much more we can all do to do better: post data on repositories, run better powered studies, check with statistical and methodological consultants, keep up with latest best practices, etc. [I also have some thoughts about how to make the experimental philosophy replication page even more useful, but I’ll save that for another occasion.] But we can acknowledge the need for improvements in the future, while also acknowledge the efforts for improvements in the past. We can recognize how far experimental philosophy still has to go, with respect to the methodological and epistemological issues associated with the replicability crises, while also recognize how far it has come.

[Much of this post is based on a talk at the 2014 Experimental Philosophy UK Workshop. I thank the organizers for giving me the occasion to think more systematically about these issues.]

[x-posted at The Experimental Philosophy Blog; please comment there!]

The State of Amazon Mechanical Turk

Like many other areas of the social sciences, experimental philosophy is nowadays heavily fueled by Amazon Mechanical Turk. (The Experimental Philosophy Blog certainly helped with that too!) So I thought it might be helpful to consider the state of Amazon Mechanical Turk, for both experimental philosophers and critics of experimental philosophy. (By the way, the Experimental Turk blog / website remains an invaluable resource!)

In my view, the best academic paper that gives a thorough overview of Amazon Mechanical Turk’s strengths and weaknesses is Gabriele Paolacci and Jesse Chandler’s “Inside the Turk: Understanding Mechanical Turk as a Participant Pool” (2014). My main takeaways from the paper are

  1. In general, MTurk data quality is at least as good as university lab data quality.
  2. However, there is serious concern about MTurk participants’ non-naivety. So the data quality of common experimental paradigms (such as cognitive reflection test, ultimatum game, and in my view, trolley problems) is relatively poor.
  3. MTurk data participants are much more demographically representative than university lab participants.
  4. There is not much point to doing attention checks, given implausible theoretical assumptions (such as constancy of attention throughout all study tasks) and given participant non-naivety.

There is also a somewhat recent PBS profile of Amazon Mechanical Turk workers that makes similar points, but in much more accessible terms.

One tidbit that’s new to me in the PBS profile is this:

Early results by the team suggests another potentially interesting finding. Turkers seem more likely to provide false negatives – failing to observe a phenomenon that exists — than false positives — falsely observing something that doesn’t exist. (An example of a false positive would be a study that shows a relationship between vaccines and autism that doesn’t really exist. A test that fails to show the effectiveness of a successful drug would be a false negative.)

In other words, if anything, we should be a little more skeptical of null results from an MTurk sample (e.g. a negative replication) than a positive result from an MTurk sample. Though it’s still helpful to remember that, with in mind the non-naivety caveat, the overall data quality with MTurk sample is pretty good.

On the point about non-naivety, in addition to trying to not use well-known (to Turkers) experimental paradigms, I exclude repeat workers in a series of studies on the same topic using Unique Turker. And sometimes I monitor reddit and other discussion boards to make sure there are no inappropriate discussions. Though, again, it’s useful to stress here that, as the PBS profile mentions, most Turkers take great pride in their work and the community self-polices as well: “No disclosure or discussion of attention memory checks. No discussion of survey content, period. That can affect the results.”.

Any other thoughts, experiences, and tools you’ve gathered from using Amazon Mechanical Turk in your research?

[x-posted at The Experimental Philosophy Blog; please comment there!]

Workshop: Authenticity and Art

In general, we seem to have a preference for “the real thing”. We tend to like people who we find genuine. We tend to find authentic food more delicious. However, nowhere is this preference more apparent than in the domain of artworks. We look down upon copies, replicas, and forgeries because they lack the aesthetic virtue of authenticity. But why is this? This workshop explores recent advances in the cognitive science of art, and their philosophical implications.

Wednesday, November 12th, 2014

03:30pm-04:30pm George Newman
04:30pm-05:30pm Gregory Currie
05:30pm-06:00pm (General Discussion)

Leeds Humanities Research Institute
29-31 Clarendon Place
Leeds, West Yorkshire LS2 9JT

The workshop is free and open to all. Please register using this form to help our event planning. There is limited space for dinner. Please make a note in the registration form if you would like to come.


Gregory Currie (philosophy, University of York)
“Authenticity and the Traces of Making”
Authentic Rembrandts are Rembrandts, and vice versa. What does “authentic” add? I argue that its role is metalinguistic in the way that “not” sometimes is. But there is a substantive issue raised by authenticity: why do people care about an object’s history? Newman and Bloom consider two hypotheses: Contagion and the Quality of Making. I suggest that there is a way of taking contagion which brings these two hypotheses close together.

George Newman (management, Yale University)
“The Valuation of Authentic Goods”
Why do people value original artworks more than identical duplicates? What explains consumer demand for celebrity memorabilia or luxury products? This talk explores the psychological mechanisms underlying people’s preferences for authentic objects. I will discuss the results of several empirical studies aimed at uncovering the key psychological factors, as well as broader questions surrounding the origins of this phenomenon.


The issue of authenticity and art has ramifications beyond philosophy and psychology of art. Nina Simon, the Executive Director of the Santa Cruz Museum of Art & History, recently highlighted some implications of this research for museum professionals in a blog post,

“In museums, we care about both perceived authenticity and real authenticity. We want the power of the story–and the facts to back it up. This can come off as contradictory. We want visitors to come experience “the real thing” or “the real site,” appealing to the spiritual notion that the personhood in the original artifact connotes a special value. At the same time, we don’t always tell folks that what they are looking at is a replica, a simulation, or a similar object to the thing they think they are seeing.”

This workshop thus holds interest also for museum professionals, art historians, and others in cognate professional and academic fields. All are welcome!


This event is a part of the Experimental Philosophical Aesthetics and Human Nature project workshops, supported by Marie Skłodowska-Curie Action Grant PIIF-GA-2012-328977. It is also part of the Ethics / Aesthetics Seminar Series at the University of Leeds.

Why Do Experiments?

On psychologist Simine Vazire’s always-excellent blog, sometimes i’m wrong, there is an excerpt from John Doris’s forthcoming book that reacts to #repligate. Doris makes many important points about how philosophers should respond to this episode in psychology, such as not relying too much on any single study, including any single replication.

However, I want to take issue with one parenthetical remark. Doris writes “(Less cynically: if scientific findings weren’t surprising, why would we need experiments and publications?)”. Although this may just be a throwaway remark for Doris, I actually think it might be a somewhat common thought. The thought is that experiments get some of their value from surprisingness — i.e. disconfirming some intuitive thought. Or, put it in the reverse direction, if people were able to reliably predict whether an experiment would confirm or disconfirm a commonsense belief, then there would be less reason for us to do such an experiment.

I don’t think that’s the right view about experiments. Duncan Watts’s book Everything is Obvious gives an easy way to see why. As Watts points out, many of our commonsense beliefs appear in tension with one another. Absence makes the heart grow fonder. But out of sight is out of mind. So you might imagine that, if the result of an experiment went one way, it’d be framed as “Counter-intuitive finding! Absence does not make the heart grow fonder!” for publication, but if the result of the same experiment went the other way, it’d be framed as “Counter-intuitive finding! Out of sight is not out of mind!” for publication.

Given that many of our commonsense beliefs appear in tension with one another in this way, whether an experiment is counter-intuitive or not hardly has any connection to its value. Instead, as Watts points out, what is valuable about scientific experiments is that they can delineate the scope of such commonsense beliefs. And I might add, they can estimate the magnitude of the highlighted causal relationships. From a scientific perspective, it’s not that interesting to find out whether “out of sight is out of mind” is true or not, but it is interesting to find out in which cases it holds and in which case it doesn’t, and to what extent it holds.

I think this view of why do experiments actually coheres with some important methodological lessons that came out from #repligate. There should be less emphasis on the yes/no question of whether p is above or under 0.05. Instead, there should be more emphasis on effect sizes, which measure the magnitude of a causal relationship, including giving confidence intervals for effect sizes. Moreover, there should be more emphasis on “scope” — covering aspects ranging from moderating factors to conditions under which the effect is expected to replicate. For example, as I’ve mentioned before on The Experimental Philosophy Blog, psychologist Dan Simons suggests that publications of studies should come with an explicit method section on “Limits in scope and generalizability”. At the same time, Many Labs 2 and Many Labs 3 are now investigating, respectively, replication across sample and setting and replication across timing of experiments.

Why do experiments? The answer has nothing to do with intuitiveness, but everything to do with magnitude and scope of causal relationships that are of interest to some of our aims.

[x-posted at The Experimental Philosophy Blog]

Meta-Analysis of the Day

Adam Feltz and Florian Cova have recently made publicly available their excellent meta-analysis on the influence of affect on determinism intuitions. Their paper, entitled “Moral Responsibility and Free Will: A Meta-Analysis“, is forthcoming in Consciousness and Cognition.

Here is the abstract:

Fundamental beliefs about free will and moral responsibility are often thought to shape our ability to have healthy relationships with others and ourselves. Emotional reactions have also been shown to have an important and pervasive impact on judgments and behaviors. Recent research suggests that emotional reactions play a prominent role in judgments about free will, influencing judgments about determinism’s relation to free will and moral responsibility. However, the extent to which affect influences these judgments is unclear. We conducted a metaanalysis to estimate the impact of affect. Our meta-analysis indicates that beliefs in free will are largely robust to emotional reactions.

As far as I know, this is the first published meta-analysis in experimental philosophy. Substantively, it provides convincing evidence that the original explanation for the abstract / concrete determinism asymmetry, which was given in Shaun Nichols and Joshua Knobe’s “Moral Responsibility and Determinism: The Cognitive Science of Folk Intuitions“, cannot be correct as it stands. Methodologically, it represents another encouraging step toward an open and collaborative field.

And, for those readers in the UK, you can catch Florian Cova on September 19th in Oxford at the British Society of Aesthetics annual meeting, and on September 23rd in Leeds at our workshop on values and concepts of art!

Psychology Paper of the Day

… in Frontiers in Psychology. And it comes with an incredible title:

Weight lifting can facilitate appreciative comprehension for museum exhibits

Given the recent perceived intellectual crisis in psychology, my pessimistic self thought that the paper would be describing another one of those studies with counterintuitive conclusions that cannot be replicated. However, to my surprise, behind the seemingly outrageous title is a neat little idea about better structuring visitors’ museum experiences.

In its most basic form, the neat little idea is that exhibit surrogates, even extremely simple ones, can allow visitors to have richer, multimodal experiences of exhibits. In this case, the researchers used simple weights to approximate the haptic dimension of the animal skeletons on display. The animal skeletons are, of course, locked behind glass and otherwise untouchable. The simple weights give museum visitors a way to “touch” them. Through this richer, multimodal experience, visitors found greater enjoyment in the exhibit.

Open Science Project of the Day

Everything about this “crowdstorming” research project is absolutely lovable:

In a standard scientific analysis, one analyst or team presents a single analysis of a data set. However, there are often a variety of defensible analytic strategies that could be used on the same data. Variation in those strategies could produce very different results.

In this project, we introduce the novel approach of “crowdstorming a dataset.” We hope to recruit multiple independent analysts to investigate the same research question on the same data set in whatever manner they see as best. This approach should be especially useful for complex data sets in which a variety of analytic approaches could be used, and when dealing with controversial issues about which researchers and others have very different priors. If everyone comes up with the same results, then scientists can speak with one voice. If not, the subjectivity and conditionality on analysis strategy is made transparent.

This first project establishes a protocol for independent simultaneous analysis of a single dataset by multiple teams, and resolution of the variation in analytic strategies and effect estimates among them. The research question for this first attempt at crowdsourcing is as follows:

Are soccer referees more likely to give red cards to dark skin toned players than light skin toned players?

Op-ed of the Day

Gary Marcus and Ernest Davis on the limitations of big data, in the New York Times:

Fourth, even when the results of a big data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem. Consider Google Flu Trends, once the poster child for big data. In 2009, Google reported — to considerable fanfare — that by analyzing flu-related search queries, it had been able to detect the spread of the flu as accurately and more quickly than the Centers for Disease Control and Prevention. A few years later, though, Google Flu Trends began to falter; for the last two years it has made more bad predictions than good ones.

See also the Language Log commentary. This quote stuck out to me:

Posts here on Language Log (especially those by Mark Liberman) have shown that over and over again, as any regular reader will know. 21st-century linguists would be deeply foolish to stick to typical 20th-century methodology: largely ignoring what occurs, and basing everything on personal intuitions of what sounds acceptable.