The Shark Tank Effect

After watching Shark Tank the other evening, I was particularly inspired by one of the products. Even before the pitch ended, I got on my phone and started searching for a way to purchase it. Much to my dismay, the product was already backordered for months. Watching it on Hulu two days later had clearly put me at a disadvantage. But it got me thinking about the impact it has on the pitching companies:

  • How large is the initial impact on a company from airing on Shark Tank?
  • What is the sustained, long-term impact of airing on Shark Tank?

Well the best indicator of impact on a company is probably their profits, but I’m pretty sure we can’t get access to that data. Another good indictor is web traffic, but again, we probably won’t be gaining access to all their google analytics accounts. So maybe we are out of luck here.

There is actually one other data source we can use. Google Trends can tell us about the volume of web searches on Google, and that might just be exactly what we need to measure the impact. We can look at search volume immediately before/after the original air dates to see the short-term impact. Then we can use search volume over the next few months to measure the sustained impact after the initial hype dies down.

We might also expect the effect to change depending on whether the company’s pitch was successful.

We’ll need to do some munging of data from disperate sources in order to do this analysis, in addition to leveraging the Google Trends API.

Alright, let’s get started!

Part 1: Preparing the data

Let’s start by loading the data into R.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(DT)

# Load Air Dates ---
# copy data from "http://epguides.com/common/exportToCSVmaze.asp?maze=329" into a local csv file
#dir <- "/Users/anthony/Documents/avp-website"
x <- readr::read_csv(file.path(getwd(), "datasets", "shark_tank.csv"))
## Parsed with column specification:
## cols(
##   number = col_character(),
##   season = col_integer(),
##   episode = col_integer(),
##   airdate = col_character(),
##   title = col_character(),
##   `tvmaze link` = col_character()
## )

# clean the 'airdate' column
x <- dplyr::mutate(x, airdate = as.Date(airdate, "%d %B %y"))
x
## Warning in seq.default(along = x): partial argument match of 'along' to
## 'along.with'
## # A tibble: 205 x 6
##    number season episode airdate    title     `tvmaze link`               
##    <chr>   <int>   <int> <date>     <chr>     <chr>                       
##  1 1           1       1 2009-08-09 Pilot     https://www.tvmaze.com/epis…
##  2 2           1       2 2009-08-16 Classroo… https://www.tvmaze.com/epis…
##  3 3           1       3 2009-08-23 Turbobas… https://www.tvmaze.com/epis…
##  4 4           1       4 2009-08-30 Gift Car… https://www.tvmaze.com/epis…
##  5 5           1       5 2009-09-06 Body Jac  https://www.tvmaze.com/epis…
##  6 6           1       6 2009-09-13 Element … https://www.tvmaze.com/epis…
##  7 7           1       7 2009-09-29 Grill Ch… https://www.tvmaze.com/epis…
##  8 8           1       8 2009-10-06 Notehall  https://www.tvmaze.com/epis…
##  9 9           1       9 2009-10-13 Chill So… https://www.tvmaze.com/epis…
## 10 10          1      10 2009-10-20 The Bobb… https://www.tvmaze.com/epis…
## # ... with 195 more rows

Awesome!

The next step is to get a list of products that were pitched on each Shark Tank episode. We can generate this as a csv using the unnest function (thank you StackOverflow!).

A quick look through the data indicates that there is some additional information in the title column that we should ignore when generating this list of companies. We’ll ignore anything before the first colon in the episode title (i.e., episode details), and any episode without a list of companies.

company_df <- x %>% 
  mutate(
    new_title = gsub(".*:", "", title),                    # ignore anything before ':'
    company = stringr::str_split(new_title, ",")
    ) %>% 
  unnest(company) %>%
  filter(
    !stringr::str_detect(company, "Pilot|Episode [0-9]"),   # ignore episodes w/out a company list
    nchar(company) > 0,                                      # ignore empty companies
    episode != 0                                             # ignore non-pitch episodes
  )

Great! Now that we have a list of companies and their episode air-date, we can pass this information on to Google Trends to determine how impactful their pitch was.

Part 2: Obtaining search-volume data from GoogleTrends

What can we say with this data? - Was the day after airing on Shark Tank the most popular day within this period? - What was the the magnitude increase in search volume immediately after airing? (before vs day_1) - What was the sustained % increase in search volume N months after airing? - How buzz-worthy was the company? (day_1 vs week_1) - How many companies went from 0 –> 100 –> 0

Company Search Volume By Period

library(gtrendsR)

# pass each company through the GoogleTrends API
search_df <- 
  purrr::map_df(
    #1:nrow(company_df),
    #sample(nrow(company_df), 30),
    1:20,
    function(i) {
      message(paste("Company", i, "of out", nrow(company_df)))
      max_date <- min(c(Sys.Date(), company_df$airdate[i] + 90))
      gtrends(
        keyword = company_df$company[i], 
        geo = "US", 
        time = paste(company_df$airdate[i] - 30, max_date)
      )$interest_over_time
    }
  )
## Company 1 of out 564
## Company 2 of out 564
## Company 3 of out 564
## Company 4 of out 564
## Company 5 of out 564
## Company 6 of out 564
## Company 7 of out 564
## Company 8 of out 564
## Company 9 of out 564
## Company 10 of out 564
## Company 11 of out 564
## Company 12 of out 564
## Company 13 of out 564
## Company 14 of out 564
## Company 15 of out 564
## Company 16 of out 564
## Company 17 of out 564
## Company 18 of out 564
## Company 19 of out 564
## Company 20 of out 564

# summarize search-volume results
results <- search_df %>%
  dplyr::left_join(
    dplyr::select(company_df, keyword = company, air_date = airdate, season, episode),
    by = "keyword"
  ) %>%
  #tidyr::unite(episode, season, episode, sep = "-")
  dplyr::group_by(keyword, season, episode) %>%
  dplyr::summarize(
    before = mean(hits[date < air_date]),
    day_0 = hits[date == air_date],
    day_1 = hits[date == air_date + 1],
    week_1 = mean(hits[date >= air_date & date < air_date + 7]),
    month_1 = mean(hits[date >= air_date & date < air_date + 30]),
    month_2 = mean(hits[date >= air_date + 30 & date < air_date + 60]),
    month_3 = mean(hits[date >= air_date + 60 & date < air_date + 90])
  ) %>%
  ungroup()

results %>%
  mutate_if(is.numeric, round, digits = 1) %>%
  DT::datatable(rownames = FALSE)

The fact that most companies achieve a search volume of 100 on the first day after airing on Shark Tank indicates that this day had the largest search volume within the period. Because GoogleTrends data is always scaled [0-100] to indicate proportional search volume, we cannot actually compare these values directly across companies.

# search volum on day after airing on Shark Tank
table(results$day_1)
## 
##   0  95 100 
##   1   1  18

Estimating the Shark Tank Effect

We can identify companies that experienced the largest impact from airing on Shark Tank by comparing values to the period before they aired. We’ll do this for each period to see both short- and long-term impacts.

Some special cases: - If volume was 0 before and non-zero after, set to a 10,000-fold increase - If volume was 0 before and after, set to a 0-fold increase (i.e., no change)

# Convert search volume to the N-fold change (vs. before airing)
change <- results %>%
  dplyr::mutate_at(
    c("day_1", "week_1", "month_1", "month_2", "month_3"),
    funs(round((. - before) / before, 1))
  ) %>%
  dplyr::mutate_at(
    c("day_1", "week_1", "month_1", "month_2", "month_3"),
    funs(dplyr::case_when(is.infinite(.) ~ 10000, is.nan(.) ~ 0, TRUE ~ .))
  ) %>%
  dplyr::select(keyword, season, episode, day_1:month_3) %>%
  arrange(desc(day_1))

# find a good candidate to describe
id <- which(change$day_1 != 10000)[1]

DT::datatable(change, rownames = FALSE)

For example, we can say that Send a Ball experienced an immediate 749-fold increase in search volume, and over the next 3 months they saw sustained increases of 41-fold, 6.2-fold, and 2.8-fold, respectively.

Part 3: Modeling the Shark Tank Effect

We can use our data to estimate an overall Shark Tank effect.

# Overall
summarize_if(change, is.double, median)
## # A tibble: 1 x 5
##   day_1 week_1 month_1 month_2 month_3
##   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
## 1  353.   87.8    29.4     3.2    2.95

Additionally, it’s conceiveable that the effect varies by season.

# By Season
change %>%
  group_by(season) %>%
  summarize_if(is.double, median) %>%
  arrange(season)
## # A tibble: 2 x 6
##   season day_1 week_1 month_1 month_2 month_3
##    <int> <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
## 1      1  332.   82.2    28.4    2.85    2.10
## 2      2  417.   94.5    29.8    4.45   15.6