• Home
  • About
  • Join ELB
    • Become an Interim Board Member
  • Events
  • Podcast
  • Blog
  • Mentorship
    • Meet Our Mentors
    • Mentorship Overview
  • Contact
  • Home
  • About
  • Join ELB
    • Become an Interim Board Member
  • Events
  • Podcast
  • Blog
  • Mentorship
    • Meet Our Mentors
    • Mentorship Overview
  • Contact

Storytelling is the most powerful way to put ideas into the world - Robert McKee

Beginner data analysis

1/16/2022

0 Comments

 
by Natasha Barlow

​Starting my Masters at the University of Waterloo was terrifying. Imposter syndrome hits hard, you’re surrounded by people who will question your beliefs (rightly so; intellectual debates are useful), and you’re a fledgling when it comes to understanding the vast world that is ~statistics~. I remember frantically searching online for a ‘how-to’ guide on how to even wrap my brain around starting an analysis on my data. Luckily, I was in Dr. Brad Fedy’s lab, and was blessed to be surrounded by incredibly intelligent people who were willing to help little old me.
 
In part one of this blog series, I am going to give a brief, high-level summary on how I was taught to start analyzing my data. The next post will be dedicated to the format I use for writing manuscripts. This is not to say there aren’t other ways to accomplish the same thing, but this format worked for me, and it’s my hope that it will be useful to you, too. 

For the purposes of this post I will be using my paper as an example. I will write this blog in a way that doesn’t require you to have access to the article for understanding. However, I am writing this blog post under the assumption that the reader has a basic knowledge of statistics.
 
So…you’ve gone out for a few field seasons (or just one!) and have collected a dataset for your project. Now what? I am a strong believer that the data scientists collect should be published and peer-reviewed to maintain credibility and honesty, and that the knowledge we gain should be shared. Once you decide that you want to analyze your data, and potentially publish your research in a peer-reviewed journal, what is your first step?
Picture
Field technicians collecting habitat vegetation variables along a transect in northeastern Wyoming, USA in 2017. Photo: Natasha Barlow.

​Your Seven Steps for Success


1. Determine your question. As mentioned in my previous ELB blog post, my advisor, Dr. Fedy, was a strong advocate of the approach, “you can only answer a question as well as you ask it”. It is ideal if you have a good sense of what specific questions you want to answer prior to gathering your data, although these can change with what data you were actually able to collect.

2. Research which analysis is the most appropriate for your question, your study design, your data collection methods, and your data. I really like using Google Scholar, but dedicated websites (e.g., Web of Science) work, too. By looking in the literature, you will likely come across studies that have collected similar data and have answered similar questions. Determine which analysis they performed and look into what is involved. Just because someone published a paper using one specific analysis does not mean it is the most appropriate way to analyze your data. New and improved analyses are being developed all the time, and it is best if you attempt to understand the analyses so your answer is closer to the truth. I eventually came across this McFarlanad, et al., 2017 paper which collected similar data to what I had for my manuscript, and the analysis they used looked like it might work for my study design. I went with it!

BONUS TIP: If you email the authors directly or look on github, they may be able to provide you with their code.

3. Organize your data. Our lab uses the software R, and the program RStudio, because they’re free, you have arguably better control over your analysis, and it is used throughout many disciplines. We do not generally manipulate the raw data files themselves, and any transformation of the raw data into a useable format is done within R itself. The organization of your data will likely be dependent on what the analysis requires, and what R package(s) you will be using. You can see some short documents like this one (or longer vignettes) which can assist you in determining what format your data will need to be in. Know that this also takes some trial and error. Save your data organization script as a separate R file and save the transformed data output as a .csv (note: .csv files only save ONE active sheet, so if you change an Excel document to a .csv, it will only save one tab. For R, you will want all your data on one page anyways).

4. View your data. Open a new, separate R file and use it for your analysis. Import your .csv file from step 3, and you’re ready to go. It is generally a good idea to do another quality check on your data, using histograms, boxplots, and any visual tests to ensure that nothing was lost in transition, and there are not any mistakes in your data. This is where you can also test assumptions like normality, using qqplots, and others.

5. Prepare your data. Next, you can prepare your data for analysis. For example, in some cases I standardized my continuous variable data so different variables collected on different scales can be compared (e.g., the percentage of shrubs collected in %, height of grass collected in cm). Perhaps you will need to do log transformations on your data, or other preparations prior to analysis.

6. Check for correlations. It is generally a good idea, if you have a statistical model with multiple variables, that no variables should be correlated within the same model. This can make things very messy, for a variety of reasons that I will not explain here (and would not explain well).

7. Determine which variables to use. This step is generally required if you have a suite of predictor variables and you suspect that some may not actually influence your response variable. In my manuscript, we used univariate model selection and 85% confidence intervals, as well as thinking about biological relevance to the study species. That said, there are a variety of ways to determine which variables to use (e.g., AIC model selection). We started with over 20 habitat variables and narrowed it down to primarily being interested in 7 that seemed to inform our response variable.

8. Run your model. I used conditional logistic regression in my manuscript to determine the probability of nest-site selection of Brewer’s Sparrow based on a suite of variables I collected (the 7 from step #7 above). By viewing the documents on the R packages, you will be better equipped to set up the model formula correctly. Just remember, there are many models that are essentially just extensions of y = mx + b.
Picture
Hatch day for Brewer’s Sparrow nestlings in northeastern Wyoming, USA, in 2017. Photo: Natasha Barlow.

​***
​​Next, you work on ensuring that you understand the results from your analysis. You can always get answers out of data, but they may not be the right answer. It is our job as scientists to work hard to minimize the risk that we are interpreting something incorrectly. There are also great resources like the free R for Data Science website, or books that can help you navigate this difficult realm of academia! I recognize that this may seem like a huge hurdle to overcome, but I am absolutely not a guru and feel like I should have much more in depth understanding of this process and the analysis than what I currently have. Therefore, you can definitely accomplish this. It can be a challenge, but you can definitely persevere. In the next post, I will briefly discuss my tips for writing manuscripts, so stay tuned!
​About the Author

Natasha L. Barlow (B.Sc, M.E.S), is a Boreal Conservation Project Specialist at Birds Canada. Her master’s research was focused on evaluating the umbrella species concept of the greater sage-grouse, at fine-spatial scales for nesting Brewer’s sparrows (Spizella breweri), as well as the influence of landscape-scale habitat reclamation on songbird abundance. She is interested in evidence-based approaches that focus on the complex realm of human-natural resource interactions that also work to rectify the disconnect between the public and scientists in regard to wildlife conservation. Connect with her on Instagram (@natashalbarlow) or Twitter (@nlynnbarlow) to chat about research, or to join her birding. 
Picture
0 Comments



Leave a Reply.

    ELB Members

    Blogs are written by ELB members who want to share their stories about Ontario's biodiversity.

    Interested in sharing your story?
    ​Contact el4biodiversity@gmail.com.

    Archives

    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    August 2021
    June 2021
    May 2021
    April 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    October 2019
    September 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    April 2018
    February 2018
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    December 2016
    November 2016

    Categories

    All
    Amphibian
    Bats
    Biodiversity
    Butterflies
    Conservation
    COP13
    Fish
    Forests
    Invasive Species
    Lakes
    Land Conservation
    Plastic Free
    Research
    Winter Wildlife
    World Water Day
    Zero Waste

    RSS Feed

Proudly powered by Weebly