Log in

No account? Create an account

Previous Entry | Next Entry

Let me try and tell you a little bit about the current dataset I am dealing with, so as to try and open some discussion on the subject of data management and analysis.

Now that I've sent off another cricket manuscript, the project I'm currently turning towards is one where we gave crickets diets that contain different amounts of protein and carbohydrate, and then measured how much they ate, how long they lived, and how many eggs they laid.

Sounds simple in principle, right? Well, here are some additional logistical considerations. To get a meaningful sample size, I had to use over a hundred female crickets (closer to 150), and I needed to get each one started at Day 0 of adulthood, which means catching them right at the adult molt. I had a limited amount of space for this project, both for rearing the crickets to adulthood, and actually running the experiment once they reached adulthood. The incubators look something like this when full of experiment crickets:

Cricket rearing chamber

What this all means is that I had to stagger the cricket starting dates - start 3 crickets on the first day, 5 on the next, and so forth, over a period of about a month and a half.

This causes compounding problems when it comes to repeated measures on individuals over the course of an experiment. The worst of the compounding problems was that I initially scheduled things to switch out the crickets' food dishes every five days, while mating them overnight and collecting out egg cups every seven days. I have several calendars full of color-coded numbers that document this schedule. I actually ran two separate cohorts/generations, so the second time around I spaced out the feeding to line up with the mating, and also scheduled things so I wouldn't have to work on weekends quite so much.

Then, there's annual timing. When I started the experiment, I didn't know exactly how long the crickets would wind up living. I started in August, and as it turns out, they lived for between 70-90 days. As I like to put it, I've discovered the secret to immortality: sign up for an experiment that measures lifespan. That meant that the first set of crickets didn't wrap up until around or after Thanksgiving. The second set ran straight through Christmas and New Year's. And in the meantime, I kind of wanted to have a life now and again - ride my bike, go to a conference, maybe see scrottie and my family once in a blue moon. Plus, sifting out and counting cricket eggs is incredibly labor-intensive (as documented starting here). I could get a days' worth of eggs done on my own, if that was all that I did and I focused really hard on it and worked a ten-hour day every day for months on end. I did manage to set up that part in such a way that I didn't have to count eggs on weekends, but still. I was also supposed to be writing papers.

So, I got awesome undergraduates to help me with the project. What this also meant is that multiple people had to write down data and do things in a consistent fashion over a long timeframe.

This is a setup for making mistakes. One must budget for the mistakes and accept that they will happen, and be careful to ensure they don't affect the full integrity of the work.

So now, data analysis. When I design and conduct an experiment, I go in with a hypothesis. If the only thing I accomplished in the experiment was the confirmation of the hypothesis, however, I'd have a pretty boring experiment. Instead, experiments are a discovery process, and the papers written about them reflect a revisionist history.

Accounting for mistakes plus the discovery process mean data analysis is rarely a straightforward project. It's still a good idea to have a pretty clear sense of how to analyze the data going in, but questions and contingencies will inevitably crop up.

Even with careful data entry and carefully documented scripts, it all still takes a long time. I had to hurry-up and conduct some initial analyses of the data for the sake of a botched conference presentation in November (the botching wasn't my fault), but my standards for publication are high - all of the fiddly little decisions need to be fully justified, and sample sizes need to be clear and consistent.

And what do you do, for instance, with the fact that a number of the crickets supplemented their diets by turning their male partner into a snack? These things have to be pondered.

When I'm trying to think through these things, the worst thing that can happen is having someone breathing down the back of my neck wanting immediate answers, or lots of interruptions. I'm grateful to my Ph.D. advisor for recognizing this. I had to go back and do something similar with the current leafcutter manuscript not too long ago, and I think in the long run I save time by making the time to do a proper job of it all.

It all makes me sympathize with folks like this.


Latest Month

April 2018


Powered by LiveJournal.com
Designed by Naoto Kishi