By Joern Fischer
“Simply put, the era of data-intensive science is here. Those who step up to address major environmental challenges will leverage their expertise by leveraging their data. Those who do not run the risk of becoming scientifically irrelevant.”
I just read Hampton et al.’s new paper in Frontiers in Ecology and the Environment, entitled “Big data and the future of ecology”. In a nutshell, the paper encourages ecologists to more routinely share their data. The underlying premise is that data sharing will lead to bigger and better (or at least additional) insights, because there are large amounts of small datasets that – if widely shared – would allow more effective quantitative analyses using lots of those small datasets in a big way. Other disciplines, according to Hampton et al., are ahead of ecology in sharing their data – among ecologists, only geneticists share their data widely (partly because they have to), while many others don’t.
Several journals have now made it a requirement to share data (unless there are strong reasons why you can’t), e.g. Proceedings of the Royal Society London B and Journal of Applied Ecology. What’s going on here? Is this an obvious case – so much more could be gained if only we all had access to more data?
That, it seems, is what Hampton et al. genuinely believe. They suggest there are four things we ought to do:
- Organise and preserve our data for posterity, no matter how small the dataset, including appropriate meta-data.
- Share data through publically accessible databases.
- Collaborate in networks where data are shared, e.g. to combine the insights of multiple case studies.
- Address issues of data management with students and junior researchers in your labs.
I immediately agree with points 3 and 4. One my recent posts in this blog was about the PECS network, for example – which is exactly about the kind of thing raised in point 3. It is a network of people who each do local-scale studies, but would like to see their findings synthesized in a useful way.
I kind of don’t have much of a problem with point 1, but I’m not terribly convinced about point 2. I see the following issues with a generic “you ought to share your data”:
- I think there is a misunderstanding that “big data” is what is needed to solve today’s problems. From data, we need to get to information that is usable; from their to analysis and insight; and from there to wise societal decisions. I would argue that if there is one problem we DON’T have in our modern world, it’s a lack of data! I would argue the opposite in fact: that the ever-increasing availability of data is blinding us from the real problems. It looks as if additional data would somehow help – it’s an enticing prospect to have all this data! Wow! But as I argued in “Human behavior and sustainability” (also in Frontiers), a lack of data, information, or knowledge is not the problem for sustainability. We know well enough what we ought to be doing; we lack the means of putting our knowledge (based on information, based on data) into action.
- I think there is a serious risk that data is misinterpreted if used by others who are NOT explicitly chosen collaborators in a network. This is not a matter of meta-data. It’s a matter of ecological field data coming from places, and being appropriately understood only if one understands the place. That is why Discussion sections of journal articles aren’t auto-generated once you have written the Results, but require (subjective!) expertise. Meta-analyses channel our focus towards questions that can be asked, not towards questions that must be asked. There is a real risk that we search for universal truths across study systems, at the price of glossing over local details that are fundamentally important. A simple example is what constitutes a “patch”. This is assessed differently in different parts of the world. Just using people’s data on “patches” could lead to serious misinterpretations about many things, including patch-size-effects (for example). I am critical of many existing meta-analyses for this reason already – having all data available to everyone, to my mind, will simply increase this trend away from deep, locally based ecological knowledge.
- Following on from the previous point, what happened to the argument by Lindenmayer and Likens on losing the culture of ecology? Ecology is about places, just like geography and anthropology are. Good ecologists go in the field and learn about life there; they develop an ecological intuition, which is the only way to stop them from writing nonsense in their Discussion sections. I am deeply concerned that a trend towards yet more data will even further erode the field-based culture of ecology. Yet more PhD students will make their careers out of modeling, rather than going in the field.
- Finally, this raises important ethical issues. Modeling experts will then “own” top journals like Ecology Letters, and (I hope not) Frontiers in Ecology and the Environment. But none of those will be field ecologists!… those, in the meantime, have to publish their work in “regional journals” (i.e. not widely read ones) because their stuff is less relevant. Basically, they had to spend months in the field for someone else to get a free ride out of it in a more esteemed journal.
I’m all for addressing big questions. I’m all for synthesis, though I believe much of the societally relevant stuff will be qualitative not quantitative. I’m all for sharing data with the right people for the right reason – but I do not believe that universal sharing either is a safe recipe towards a better science of ecology, nor do I believe that a lack of data is in fact the primary problem we face today. And universal sharing does have risks of data being used wrongly by others, and some taking a free ride on the backs of field ecologists.
Big data? Sure, it can be a part of what ecology does, too. But I found that Hampton et al. were far too one-sided about this issue, essentially seeing no downsides or limitations.
Finally … (deep breath), this is an issue I may yet change my mind on. For now, I don’t buy the arguments put forward, but undoubtedly I will be confronted with this over the next few years again and again (say because I want to publish in one of the journals requiring data sharing!) … so who knows, I may yet change my mind. It’s worth putting the issue on the table, and Hampton et al. have done that nicely. As I said, some of their points and conclusions I agree with – but some I don’t, and so overall, I’m a lot less enthusiastic about big data than they are. According to the quote above, my skepticism towards big data will render me scientifically irrelevant in the near future… I can’t wait.
I’d be really interested in other people’s comments on this!