In today’s empirical age, the magic words in a disagreement aren’t “Simon Says” but “Studies Show.” Unfortunately, studies now show that the conventional test to demonstrate that a finding is significant turns out to be easily gameable.

In a paper in Psychological Science, Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn walk readers through four ways to create the illusion of a significant finding, even when the data doesn’t back you up.

For example, if you wanted to be able to publish a paper saying that something had a significant link to future income, you could just examine an enormous number of possible factors. You might collect data on race, gender, parents’ income, size of school, SAT scores, astrological sign, etc. The more comparisons you made, the greater the chance that at least one of them would register as significant, by chance alone.

Simmons, Nelson, and Simonsohn crunch the numbers and determine that, if you used all four of their tricks in one study, you’d be able to get meaningless data to register as “significant” over 60% of the time. They even run a fake study of their own (“proving” that listening to music about aging causes subjects to become younger) to demonstrate their methods in action.

The researchers conclude their paper with advice for authors and journal reviewers, but none for laypeople. Scientific research shapes the policies we recommend and the choices we make in our day-to-day lives, but how much credence should we give new findings, when we know that they may be statistical flukes (or worse, designed to deceive)?

Ultimately, the peer-reviewed journal system is, to paraphrase Churchill, the worst approach to understanding the world, except for all the others that have been tried. When we make an idol of empiricism, any flawed result or pervasive bias leaves us feeling betrayed and defiant.

Instead of thinking and talking about science as the purest form of inquiry, we might be better off thinking of it as a somewhat finicky old car. It usually gets us where we need to go, but it’s a good idea to check out the engine and be prepared to swap out or repair parts. The reforms proposed by Simmons, Nelson, and Simonsohn will keep the kludge running well enough until the next element breaks, and it’s time to work out another fix.

In our day-to-day lives, that means instead of accepting scientific results on faith, or gleefully nitpicking the methodology of inconvenient results, we should look for opportunities to replicate or test the latest results.

One of the simplest solutions is trying out new ideas with pilot programs. On a governmental level, that might mean setting up randomized controlled trials (e.g., if the Obamacare mandate applied only to people whose social security numbers were odd) and, on a personal level, designing quick and dirty experiments (e.g., seeing if you can successfully guess which days your spouse gave you decaffeinated coffee instead of caffeinated, and how much of a difference it really makes to your productivity).

When we’re active auditors of scientific results, instead of spectators, we force ourselves to think about the implications of our assumptions, because we have to make our models of the world specific enough to test. Ideally, we can make statistical research not just resilient in the face of problems, but antifragile: actively strengthened by perturbations. We won’t be afraid of discovering errors, because the adjustments we’ll have to make to avoid them will make the whole edifice stronger.