Wrangling and Pivoting Data

class: center, middle, inverse, title-slide

.title[
# Wrangling and Pivoting Data
]
.subtitle[
## IBS 519 - Week 4 - TA session
]
.author[
### Ashlyn Johnson
]
.date[
### 9/14/22
]

---

## Agenda
### First Half of Class
-   Review of **`dplyr`** package
-   **Tidy data** and **pivoting** data frames

### Second Half of Class
-   Homework Questions
-   Additional practice (if time allows)

---
class: inverse, center, middle
# `dplyr`
### a grammar of data manipulation providing a consistent set of verbs

### last week, we learned...
---
.pull-left[
### `filter()`
]
.pull-right[
subset a data frame for **rows** that satisfy your conditions
]
--
.pull-left[
### `mutate()`
]
.pull-right[
 adds new variables
]
--
.pull-left[
### `select()`
]
.pull-right[
subset a dataframe for desired **columns**
]
--
.pull-left[
### `arrange()`
]
.pull-right[
**orders** the rows of a data frame by the values of selected columns; 
default is ascending order, but can be paired with `desc()` to obtain descending order
]
--
.pull-left[
### `summarise()`
]
.pull-right[
creates a new data frame with one column for each grouping variable and one column for each summary statistic
]
---
class: center, middle, inverse
### No one programmer can remember everything! 
it is OK to google questions    
read help pages `?function()`    
ask for help    
R studio even makes [cheatsheets]("https://www.rstudio.com/resources/cheatsheets/") for their packages (use them!! I do!)    
---
### Let's practice some data wrangling! 
To manipulate data, we'll need the tidyverse! Mainly `dplyr`, but I like to load everything in just in case.

```r
library(tidyverse)
```

```
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
```

---
### We also need data to wrangle

Let's borrow a dataset from the Tidy Tuesday repository. 
.center[
<img src="data:image/png;base64,#images/tidy_tuesday_screengrab.png" width="60%" />
]
---
We'll use a dataset from the American Kennel Club that includes various information about different dog breeds.

```r
breed_traits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_traits.csv')
```

```r
glimpse(breed_traits)
```

```
## Rows: 195
## Columns: 17
## $ Breed                        <chr> "Retrievers (Labrador)", "French Bulldogs…
## $ `Affectionate With Family`   <dbl> 5, 5, 5, 5, 4, 5, 3, 5, 5, 5, 5, 3, 5, 4,…
## $ `Good With Young Children`   <dbl> 5, 5, 5, 5, 3, 5, 5, 3, 5, 3, 3, 5, 5, 5,…
## $ `Good With Other Dogs`       <dbl> 5, 4, 3, 5, 3, 3, 5, 3, 4, 4, 4, 3, 3, 3,…
## $ `Shedding Level`             <dbl> 4, 3, 4, 4, 3, 1, 3, 3, 3, 2, 4, 3, 1, 2,…
## $ `Coat Grooming Frequency`    <dbl> 2, 1, 2, 2, 3, 4, 2, 1, 2, 2, 2, 2, 5, 2,…
## $ `Drooling Level`             <dbl> 2, 3, 2, 2, 3, 1, 1, 3, 2, 2, 1, 1, 1, 3,…
## $ `Coat Type`                  <chr> "Double", "Smooth", "Double", "Double", "…
## $ `Coat Length`                <chr> "Short", "Short", "Medium", "Medium", "Sh…
## $ `Openness To Strangers`      <dbl> 5, 5, 3, 5, 4, 5, 3, 3, 4, 4, 4, 3, 5, 4,…
## $ `Playfulness Level`          <dbl> 5, 5, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ `Watchdog/Protective Nature` <dbl> 3, 3, 5, 3, 3, 5, 2, 5, 4, 4, 5, 3, 5, 4,…
## $ `Adaptability Level`         <dbl> 5, 5, 5, 5, 3, 4, 4, 4, 4, 4, 4, 3, 5, 3,…
## $ `Trainability Level`         <dbl> 5, 4, 5, 5, 4, 5, 3, 5, 5, 4, 4, 5, 4, 4,…
## $ `Energy Level`               <dbl> 5, 3, 5, 3, 3, 4, 4, 3, 5, 3, 4, 5, 4, 4,…
## $ `Barking Level`              <dbl> 3, 1, 3, 1, 2, 4, 4, 1, 3, 5, 4, 3, 4, 3,…
## $ `Mental Stimulation Needs`   <dbl> 4, 3, 5, 4, 3, 5, 4, 5, 5, 3, 4, 5, 4, 4,…
```

???
But what are these traits exactly?
---
<div id="htmlwidget-d0ed0d69322c2d468c7a" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-d0ed0d69322c2d468c7a">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16"],["Affectionate With Family","Good With Young Children","Good With Other Dogs","Shedding Level","Coat Grooming Frequency","Drooling Level","Coat Type","Coat Length","Openness To Strangers","Playfulness Level","Watchdog/Protective Nature","Adaptability Level","Trainability Level","Energy Level","Barking Level","Mental Stimulation Needs"],["Independent","Not Recommended","Not Recommended","No Shedding","Monthly","Less Likely to Drool","-","-","Reserved","Only When You Want To Play","What's Mine Is Yours","Lives For Routine","Self-Willed","Couch Potato","Only To Alert","Happy to Lounge"],["Lovey-Dovey","Good With Children","Good With Other Dogs","Hair Everywhere","Daily","Always Have a Towel","-","-","Everyone Is My Best Friend","Non-Stop","Vigilant","Highly Adaptable","Eager to Please","High Energy","Very Vocal","Needs a Job or Activity"],["How affectionate a breed is likely to be with family members, or other people he knows well. Some breeds can be aloof with everyone but their owner, while other breeds treat everyone they know like their best friend.","A breed's level of tolerance and patience with childrens' behavior, and overall family-friendly nature. Dogs should always be supervised around young children, or children of any age who have little exposure to dogs.","How generally friendly a breed is towards other dogs. Dogs should always be supervised for interactions and introductions with other dogs, but some breeds are innately more likely to get along with other dogs, both at home and in public.","How much fur and hair you can expect the breed to leave behind. Breeds with high shedding will need to be brushed more frequently, are more likely to trigger certain types of allergies, and are more likely to require more consistent vacuuming and lint-rolling.","How frequently a breed requires bathing, brushing, trimming, or other kinds of coat maintenance. Consider how much time, patience, and budget you have for this type of care when looking at the grooming effort needed. All breeds require regular nail trimming.","How drool-prone a breed tends to be. If you're a neat freak, dogs that can leave ropes of slobber on your arm or big wet spots on your clothes may not be the right choice for you.","Canine coats come in many different types, depending on the breed's purpose. Each coat type comes with different grooming needs, allergen potential, and shedding level. You may also just prefer the look or feel of certain coat types over others when choosing a family pet.","How long the breed's coat is expected to be. Some long-haired breeds can be trimmed short, but this will require additional upkeep to maintain.","How welcoming a breed is likely to be towards strangers. Some breeds will be reserved or cautious around all strangers, regardless of the location, while other breeds will be happy to meet a new human whenever one is around!","How enthusiastic about play a breed is likely to be, even past the age of puppyhood. Some breeds will continue wanting to play tug-of-war or fetch well into their adult years, while others will be happy to just relax on the couch with you most of the time.","A breed's tendency to alert you that strangers are around. These breeds are more likely to react to any potential threat, whether it's the mailman or a squirrel outside the window. These breeds are likely to warm to strangers who enter the house and are accepted by their family.","How easily a breed handles change. This can include changes in living conditions, noise, weather, daily schedule, and other variations in day-to-day life.","How easy it will be to train your dog, and how willing your dog will be to learn new things. Some breeds just want to make their owner proud, while others prefer to do what they want, when they want to, wherever they want!","The amount of exercise and mental stimulation a breed needs. High energy breeds are ready to go and eager for their next adventure. They'll spend their time running, jumping, and playing throughout the day. Low energy breeds are like couch potatoes - they're happy to simply lay around and snooze.","How often this breed vocalizes, whether it's with barks or howls. While some breeds will bark at every passer-by or bird in the window, others will only bark in particular situations. Some barkless breeds can still be vocal, using other sounds to express themselves.","How much mental stimulation a breed needs to stay happy and healthy. Purpose-bred dogs can have jobs that require decision-making, problem-solving, concentration, or other qualities, and without the brain exercise they need, they'll create their own projects to keep their minds busy -- and they probably won't be the kind of projects you'd like."]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Trait<\/th>\n      <th>Trait_1<\/th>\n      <th>Trait_5<\/th>\n      <th>Description<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":3,"columnDefs":[{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[3,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
---
###My dog Marvel is some sort of Chihuahua mix. Can I use his traits to narrow down what other breeds may contribute to his genetic make up? 
.pull-left[
<img src="data:image/png;base64,#images/marvel_at_rest.JPEG" width="70%" />
]
--
.pull-right[
### Notable Traits: 
-   Coat Length = Short
-   Trainability >= 4
-   Energy >= 3
-   Barking = 5 
]
---
Which breeds have short hair, a trainability level greater than or equal to 3, an energy level greater than or equal to 3, and a barking level of 5? Let's use `filter()` for this.

```r
marvels_traits <- breed_traits %>% 
  filter(`Coat Length` == 'Short') %>% 
  filter(`Trainability Level` >= 3) %>% 
  filter(`Energy Level` >= 3) %>% 
  filter(`Barking Level` == 5)
```
--
<div id="htmlwidget-2ff49e4804adfa4310bd" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-2ff49e4804adfa4310bd">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12"],["Dachshunds","Chihuahuas","Collies","Bloodhounds","Miniature Pinschers","Miniature Bull Terriers","Fox Terriers (Smooth)","Australian Terriers","Canaan Dogs","Harriers","American Foxhounds","English Foxhounds"],[5,4,4,4,5,5,5,3,3,5,3,5],[3,1,5,3,3,3,3,5,3,5,5,5],[4,3,3,3,4,3,3,3,3,5,5,5],[2,2,3,3,3,2,3,1,4,3,3,3],[2,1,3,2,1,1,2,2,2,1,1,1],[2,1,2,5,1,2,1,1,1,2,1,2],["Smooth","Smooth","Smooth","Smooth","Smooth","Smooth","Smooth","Wiry","Smooth","Double","Smooth","Double"],["Short","Short","Short","Short","Short","Short","Short","Short","Short","Short","Short","Short"],[4,2,3,3,3,3,3,3,3,4,3,4],[4,4,4,3,4,4,4,4,3,4,3,4],[4,4,3,2,5,4,5,4,4,3,3,3],[4,4,4,3,4,4,4,3,3,4,3,4],[4,3,4,4,3,3,3,4,4,4,3,4],[3,4,3,3,5,4,4,4,3,4,4,4],[5,5,5,5,5,5,5,5,5,5,5,5],[3,3,3,3,5,4,3,4,3,4,3,4]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Breed<\/th>\n      <th>Affectionate With Family<\/th>\n      <th>Good With Young Children<\/th>\n      <th>Good With Other Dogs<\/th>\n      <th>Shedding Level<\/th>\n      <th>Coat Grooming Frequency<\/th>\n      <th>Drooling Level<\/th>\n      <th>Coat Type<\/th>\n      <th>Coat Length<\/th>\n      <th>Openness To Strangers<\/th>\n      <th>Playfulness Level<\/th>\n      <th>Watchdog/Protective Nature<\/th>\n      <th>Adaptability Level<\/th>\n      <th>Trainability Level<\/th>\n      <th>Energy Level<\/th>\n      <th>Barking Level<\/th>\n      <th>Mental Stimulation Needs<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":3,"scrollX":true,"columnDefs":[{"className":"dt-right","targets":[2,3,4,5,6,7,10,11,12,13,14,15,16,17]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[3,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
---
Since I know that Marvel is good with other dogs and affectionate with family, why don't I sort the dataframe by those qualities to see which dog breeds rise to the top? Let's use `arrange()` and `desc()` for this.

```r
marvels_traits_2 <- marvels_traits %>% 
  arrange(desc(`Good With Other Dogs`), desc(`Affectionate With Family`)) %>% 
  select(Breed, `Good With Other Dogs`, `Affectionate With Family`)
```
--
<div id="htmlwidget-c6dcad251b11bf13cb0d" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-c6dcad251b11bf13cb0d">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12"],["Harriers","English Foxhounds","American Foxhounds","Dachshunds","Miniature Pinschers","Miniature Bull Terriers","Fox Terriers (Smooth)","Chihuahuas","Collies","Bloodhounds","Australian Terriers","Canaan Dogs"],[5,5,5,4,4,3,3,3,3,3,3,3],[5,5,3,5,5,5,5,4,4,4,3,3]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Breed<\/th>\n      <th>Good With Other Dogs<\/th>\n      <th>Affectionate With Family<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":6,"scrollX":true,"columnDefs":[{"className":"dt-right","targets":[2,3]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[6,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
---
In our dataset, there are two variables that deal with different aspects of coat. What if we wanted just one? We can use `mutate()` to create a new column.

```r
breed_traits_2 <- breed_traits %>% 
  mutate(Coat = paste(`Coat Type`, `Coat Length`, sep = "_")) %>% 
  select(Breed, Coat)
```
--
<div id="htmlwidget-dc173b3f7d27cf2562ad" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-dc173b3f7d27cf2562ad">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195"],["Retrievers (Labrador)","French Bulldogs","German Shepherd Dogs","Retrievers (Golden)","Bulldogs","Poodles","Beagles","Rottweilers","Pointers (German Shorthaired)","Dachshunds","Pembroke Welsh Corgis","Australian Shepherds","Yorkshire Terriers","Boxers","Great Danes","Siberian Huskies","Cavalier King Charles Spaniels","Doberman Pinschers","Miniature Schnauzers","Shih Tzu","Boston Terriers","Bernese Mountain Dogs","Pomeranians","Havanese","Cane Corso","Spaniels (English Springer)","Shetland Sheepdogs","Brittanys","Pugs","Spaniels (Cocker)","Miniature American Shepherds","Border Collies","Mastiffs","Chihuahuas","Vizslas","Basset Hounds","Belgian Malinois","Maltese","Weimaraners","Collies","Newfoundlands","Rhodesian Ridgebacks","Shiba Inu","West Highland White Terriers","Bichons Frises","Bloodhounds","Spaniels (English Cocker)","Akitas","Portuguese Water Dogs","Retrievers (Chesapeake Bay)","Dalmatians","St. Bernards","Papillons","Australian Cattle Dogs","Bullmastiffs","Samoyeds","Scottish Terriers","Soft Coated Wheaten Terriers","Whippets","Pointers (German Wirehaired)","Chinese Shar-Pei","Airedale Terriers","Wirehaired Pointing Griffons","Bull Terriers","Alaskan Malamutes","Cardigan Welsh Corgis","Giant Schnauzers","Old English Sheepdogs","Italian Greyhounds","Great Pyrenees","Dogues de Bordeaux","Russell Terriers","Cairn Terriers","Irish Wolfhounds","Setters (Irish)","Greater Swiss Mountain Dogs","Miniature Pinschers","Lhasa Apsos","Chinese Crested","Coton de Tulear","Staffordshire Bull Terriers","American Staffordshire Terriers","Rat Terriers","Chow Chows","Anatolian Shepherd Dogs","Basenjis","Spaniels (Boykin)","Lagotti Romagnoli","Brussels Griffons","Retrievers (Nova Scotia Duck Tolling)","Norwegian Elkhounds","Standard Schnauzers","Dogo Argentinos","Bouviers des Flandres","Pekingese","Keeshonden","Border Terriers","Leonbergers","Tibetan Terriers","Neapolitan Mastiffs","Setters (English)","Retrievers (Flat-Coated)","Borzois","Fox Terriers (Wire)","Miniature Bull Terriers","Belgian Tervuren","Setters (Gordon)","Silky Terriers","Norwich Terriers","Spinoni Italiani","Japanese Chin","Welsh Terriers","Toy Fox Terriers","Schipperkes","Parson Russell Terriers","Pointers","Belgian Sheepdogs","Tibetan Spaniels","American Eskimo Dogs","Irish Terriers","Beaucerons","Afghan Hounds","Boerboels","Fox Terriers (Smooth)","Bearded Collies","Black Russian Terriers","Black and Tan Coonhounds","Spaniels (Welsh Springer)","American Hairless Terriers","Norfolk Terriers","Xoloitzcuintli","Manchester Terriers","Kerry Blue Terriers","Australian Terriers","Spaniels (Clumber)","Lakeland Terriers","Bluetick Coonhounds","English Toy Spaniels","German Pinschers","Tibetan Mastiffs","Bedlington Terriers","Greyhounds","Pulik","Salukis","Barbets","Redbone Coonhounds","Swedish Vallhunds","Sealyham Terriers","Spanish Water Dogs","Briards","Berger Picards","Entlebucher Mountain Dogs","Treeing Walker Coonhounds","Icelandic Sheepdogs","Wirehaired Vizslas","Pumik","Portuguese Podengo Pequenos","Spaniels (American Water)","Retrievers (Curly-Coated)","Spaniels (Field)","Lowchen","Nederlandse Kooikerhondjes","Affenpinschers","Petits Bassets Griffons Vendeens","Finnish Lapphunds","Scottish Deerhounds","Plott Hounds","Norwegian Buhunds","Glen of Imaal Terriers","Setters (Irish Red and White)","Ibizan Hounds","Spaniels (Sussex)","Bergamasco Sheepdogs","Spaniels (Irish Water)","Polish Lowland Sheepdogs","Otterhounds","Kuvaszok","Komondorok","Cirnechi dell’Etna","Pharaoh Hounds","Dandie Dinmont Terriers","Pyrenean Shepherds","Skye Terriers","Canaan Dogs","American English Coonhounds","Chinooks","Finnish Spitz","Grand Basset Griffon Vendeens","Sloughis","Harriers","Cesky Terriers","American Foxhounds","Azawakhs","English Foxhounds","Norwegian Lundehunds"],["Double_Short","Smooth_Short","Double_Medium","Double_Medium","Smooth_Short","Curly_Long","Smooth_Short","Smooth_Short","Smooth_Short","Smooth_Short","Double_Short","Double_Medium","Silky_Long","Smooth_Short","Smooth_Short","Double_Medium","Wavy_Medium","Smooth_Short","Wiry_Medium","Double_Long","Smooth_Short","Double_Medium","Double_Long","Double_Long","Smooth_Short","Double_Medium","Double_Long","Double_Short","Smooth_Short","Double_Long","Double_Medium","Double_Medium","Double_Short","Smooth_Short","Smooth_Short","Smooth_Short","Smooth_Short","Silky_Long","Smooth_Short","Smooth_Short","Double_Medium","Smooth_Short","Double_Short","Double_Medium","Double_Long","Smooth_Short","Double_Medium","Double_Medium","Curly_Long","Wiry_Medium","Smooth_Short","Smooth_Short","Silky_Medium","Smooth_Short","Smooth_Short","Double_Long","Wiry_Medium","Wavy_Medium","Smooth_Short","Wiry_Medium","Smooth_Short","Wiry_Short","Wiry_Medium","Smooth_Short","Double_Medium","Double_Medium","Wiry_Medium","Double_Long","Smooth_Short","Double_Medium","Smooth_Short","Wiry_Short","Wiry_Medium","Wiry_Medium","Silky_Medium","Smooth_Short","Smooth_Short","Silky_Long","Hairless_Short","Double_Long","Smooth_Short","Smooth_Short","Smooth_Short","Smooth_Medium","Smooth_Short","Smooth_Short","Double_Medium","Double_Medium","Wiry_Short","Double_Medium","Double_Medium","Wiry_Medium","Smooth_Short","Rough_Medium","Double_Long","Double_Long","Wiry_Short","Double_Long","Double_Long","Smooth_Short","Double_Medium","Smooth_Medium","Curly_Medium","Wiry_Medium","Smooth_Short","Double_Medium","Double_Medium","Silky_Long","Wiry_Short","Wiry_Medium","Silky_Medium","Wiry_Medium","Smooth_Short","Double_Short","Smooth_Short","Smooth_Short","Double_Medium","Double_Medium","Double_Medium","Wiry_Medium","Smooth_Short","Silky_Long","Smooth_Short","Smooth_Short","Silky_Long","Double_Medium","Smooth_Short","Double_Medium","Hairless_Short","Wiry_Short","Hairless_Short","Smooth_Short","Wavy_Medium","Wiry_Short","Wavy_Medium","Wiry_Short","Smooth_Short","Double_Medium","Smooth_Short","Double_Medium","Curly_Medium","Smooth_Short","Corded_Long","Smooth_Short","Curly_Medium","Smooth_Short","Double_Short","Wiry_Medium","Corded_Medium","Double_Long","Wiry_Medium","Smooth_Short","Smooth_Short","Double_Medium","Wiry_Short","Curly_Medium","Wiry_Medium","Double_Medium","Double_Short","Smooth_Medium","Wavy_Long","Double_Medium","Wiry_Short","Rough_Long","Double_Medium","Wiry_Medium","Plott Hounds_Plott Hounds","Double_Medium","Wiry_Medium","Double_Medium","Wiry_Short","Double_Medium","Corded_Long","Curly_Medium","Double_Long","Rough_Medium","Double_Medium","Corded_Long","Smooth_Short","Smooth_Short","Double_Medium","Smooth_Medium","Double_Long","Smooth_Short","Smooth_Short","Smooth_Medium","Double_Medium","Wiry_Medium","Smooth_Short","Double_Short","Wavy_Medium","Smooth_Short","Smooth_Short","Double_Short","Double_Short"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Breed<\/th>\n      <th>Coat<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":6,"scrollX":true,"columnDefs":[{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[6,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
---
.pull-left[
Finally, what if we wanted to know the mean trainability level of dogs with different types of coats? For this, we can use `group_by()` and `summarize()`.

```r
breed_traits %>% 
  mutate(Coat = paste(`Coat Type`, 
                      `Coat Length`, 
                      sep = "_")) %>%
  group_by(Coat) %>% 
  summarize(mean_trainability = mean(`Trainability Level`, 
                                     na.rm = TRUE))
```
]
--
.pull-right[

```
## # A tibble: 19 × 2
##    Coat                      mean_trainability
##    <chr>                                 <dbl>
##  1 Corded_Long                            4   
##  2 Corded_Medium                          4   
##  3 Curly_Long                             5   
##  4 Curly_Medium                           3.8 
##  5 Double_Long                            3.88
##  6 Double_Medium                          4.18
##  7 Double_Short                           3.82
##  8 Hairless_Short                         4.33
##  9 Plott Hounds_Plott Hounds              0   
## 10 Rough_Long                             3   
## 11 Rough_Medium                           4   
## 12 Silky_Long                             3   
## 13 Silky_Medium                           4   
## 14 Smooth_Medium                          4.4 
## 15 Smooth_Short                           3.75
## 16 Wavy_Long                              4   
## 17 Wavy_Medium                            3.4 
## 18 Wiry_Medium                            3.89
## 19 Wiry_Short                             3.45
```

]
---
class: inverse, center, middle

# Tidy data and pivoting data frames 
---
# Tidy data
*"Tidy datasets are all alike, but every dataset is messy in its own way."* - **Hadley Wickham**

Tidy data have 3 rules: 
  1. Each variable must have its own column. 
  2. Each observation must have it's own row. 
  3. Each value must have it's own cell. 
  
---
### What does it mean for data to not be tidy?

```r
messy_data <- data.frame(independent_replicate = c(1:3), 
           control_1 = rnorm(3, mean = 5, sd = 1), 
           control_2 = rnorm(3, mean = 5, sd = 1), 
           control_3 = rnorm(3, mean = 5, sd = 1), 
           treatment_1 = rnorm(3, mean = 7, sd = 1), 
           treatment_2 = rnorm(3, mean = 7, sd = 1), 
           treatment_3 = rnorm(3, mean = 7, sd = 1))
messy_data
```

```
##   independent_replicate control_1 control_2 control_3 treatment_1 treatment_2
## 1                     1  5.807225  6.097076  3.756127    5.733711    6.115535
## 2                     2  3.946806  4.946443  5.456057    7.694489    7.861499
## 3                     3  4.084722  5.516974  4.908542    7.828350    8.072564
##   treatment_3
## 1    5.969955
## 2    6.220383
## 3    7.672997
```
--
In this format (also known as wide format): 
  - Difficult to compute summary statistics for each treatment group 
  - The columns are not a variable 
  - Multiple observations per row
---
### How do we make this data tidy?
### `pivot_longer()`
The `pivot_longer()` function can be used to transform data from wide to long format.  
.pull-left[

```r
pivot_longer(data = ,    
    cols = ,     
    names_to = ,    
    values_to = ) 
```

```r
tidier_data <- messy_data %>% 
  pivot_longer(cols = control_1:treatment_3, 
               names_to = "treatment_group", 
               values_to = "response")
head(tidier_data)
```
]
--
.pull-right[

```
## # A tibble: 6 × 3
##   independent_replicate treatment_group response
##                   <int> <chr>              <dbl>
## 1                     1 control_1           5.81
## 2                     1 control_2           6.10
## 3                     1 control_3           3.76
## 4                     1 treatment_1         5.73
## 5                     1 treatment_2         6.12
## 6                     1 treatment_3         5.97
```
]
---
Our data frame is certainly longer now. Let's remove the underscores from our treatment_group column and calculate some quick summary statistics using `mutate()` and `summarise()`

```r
tidier_data %>% 
  mutate("treatment_group" = str_remove_all(.$treatment_group, "_\\d"))
```

```
## # A tibble: 18 × 3
##    independent_replicate treatment_group response
##                    <int> <chr>              <dbl>
##  1                     1 control             5.81
##  2                     1 control             6.10
##  3                     1 control             3.76
##  4                     1 treatment           5.73
##  5                     1 treatment           6.12
##  6                     1 treatment           5.97
##  7                     2 control             3.95
##  8                     2 control             4.95
##  9                     2 control             5.46
## 10                     2 treatment           7.69
## 11                     2 treatment           7.86
## 12                     2 treatment           6.22
## 13                     3 control             4.08
## 14                     3 control             5.52
## 15                     3 control             4.91
## 16                     3 treatment           7.83
## 17                     3 treatment           8.07
## 18                     3 treatment           7.67
```
---

```r
tidy_data_summarized_technicalrep <- tidier_data %>% 
  mutate("treatment_group" = str_remove_all(.$treatment_group, "_\\d"))%>% 
  group_by(independent_replicate, treatment_group) %>%
  summarise(response = mean(response))
```

```
## `summarise()` has grouped output by 'independent_replicate'. You can override
## using the `.groups` argument.
```

```r
knitr::kable(tidy_data_summarized_technicalrep)
```

| independent_replicate|treatment_group | response|
|---------------------:|:---------------|--------:|
|                     1|control         | 5.220143|
|                     1|treatment       | 5.939734|
|                     2|control         | 4.783102|
|                     2|treatment       | 7.258790|
|                     3|control         | 4.836746|
|                     3|treatment       | 7.857970|
---
class: inverse, center, middle
# Ask questions and practice!
---
### Ask questions about the homework if you have them!

### In R studio cloud, open the project for this session. Your assignment is to pick at least 3 of the traits in the AKC dog breeds dataset that you would use to pick out your ideal dog. Determine what your conditions would be (i.e. `Affectionate With Family` > 3, `Shedding Level` == 1, etc.) and use `filter()` to identify which dog breeds meet those conditions.

### If any of you discover any particularly cute dog breeds, please feel free to post pictures in the slack under the #random channel.