31  Drawing samples

Author
Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

Warning

This page is still under construction. More content will be added soon!

31.1 Preparation

Load libraries:

library(tidyverse)
library(quanteda)
library(sampling)
library(data.table)

Perform query:

# Load corpus
ICE_GB <- readRDS("ICE_GB.RDS")

# Perform query
kwic_think <- kwic(ICE_GB, "think")

# Count number of observations
nrow(kwic_think)
[1] 2648
# Show first few
head(kwic_think)
Keyword-in-context with 6 matches.                                                       
   [ICE_GB/S1A-001.txt, 55]          1: B > I | think |
  [ICE_GB/S1A-001.txt, 218]          1: B > I | think |
  [ICE_GB/S1A-001.txt, 534]          1: B > I | think |
  [ICE_GB/S1A-001.txt, 588] difference <, > I | think |
  [ICE_GB/S1A-001.txt, 675]        <, > and I | think |
 [ICE_GB/S1A-001.txt, 1049]     B > Uhm and I | think |
                           
 the main things that I    
 the m <, >                
 that the <,,              
 the main difference that I
 one of the things that    
 f for for myself <        

31.2 Stratified sample

# Source function from GitHub
source("https://raw.githubusercontent.com/VBuskin/Stats_with_R/refs/heads/main/Custom_functions.R")

# Apply function to the output of kwic() to perform weighted sampling
stratified_sample_ICE(kwic_think, 500)
# A tibble: 501 × 8
   Text_category File_number  from    to pre               keyword post  pattern
   <chr>         <chr>       <int> <int> <chr>             <chr>   <chr> <fct>  
 1 ICE_GB/S1A    001.txt       588   588 difference < , >… think   the … think  
 2 ICE_GB/S1A    001.txt      1866  1866 : B > And I       think   that… think  
 3 ICE_GB/S1A    001.txt      1974  1974 initial difficul… think   that… think  
 4 ICE_GB/S1A    002.txt       386   386 lazy really for … think   < , … think  
 5 ICE_GB/S1A    002.txt       649   649 : C > And I       think   when… think  
 6 ICE_GB/S1A    002.txt      1578  1578 think there ' s I think   ther… think  
 7 ICE_GB/S1A    002.txt      3621  3621 : B > Therefore I think   that… think  
 8 ICE_GB/S1A    002.txt      3710  3710 : B > But I       think   < , … think  
 9 ICE_GB/S1A    003.txt       427   427 , > why do you    think   phys… think  
10 ICE_GB/S1A    003.txt      2613  2613 > Well I Well I   think   it '… think  
# ℹ 491 more rows