4.3 Reading the survey and direct estimates

This code performs several operations on a labor survey in Jamaica, represented by the encuesta object, which is read from a file in RDS format. Here’s the breakdown:

  1. Data Reading: The code reads data from the Jamaican labor survey from an RDS file located at ‘Resources/05_Employment/01_data_JAM.rds’. The data is stored in the encuesta object.

  2. Data Transformation: Through a sequence of operations using the %>% pipe and the transmute() function from the dplyr package, the following transformations are performed on the survey:

    • Specific columns dam2, RFACT, PAR_COD, CONST_NUMBER, ED_NUMBER, STRATA, and EMPSTATUS are selected from the survey.
    • More descriptive names are assigned to some columns such as fep for RFACT, upm by combining PAR_COD, CONST_NUMBER, and ED_NUMBER, estrato using conditions to define the value based on STRATA, and empleo_label and empleo representing specific categories derived from EMPSTATUS with labeled levels and categorical values.

In summary, the code reads a labor survey in Jamaica and performs a series of transformations on selected columns, renaming and reorganizing them for future analyses or processing.

encuesta <- readRDS('Recursos/05_Empleo/01_data_JAM.rds')
## 
id_dominio <- "dam2"

encuesta <-
  encuesta %>%
  transmute(
    dam2,
    fep = RFACT,
    upm = paste0(PAR_COD , CONST_NUMBER, ED_NUMBER),
    estrato = ifelse(is.na(STRATA) ,strata,STRATA),
    empleo_label = as_factor(EMPSTATUS ,levels  = "labels"),
    empleo = as_factor(EMPSTATUS ,levels  = "values") 
  )

The presented code defines the sampling design for the analysis of the “survey” in R. The first line sets an option for handling singleton PSU (primary sampling units), indicating that adjustments need to be applied in standard error calculations. The second line uses the “as_survey_design” function from the “survey” library to define the sampling design. The function takes “encuesta” as an argument and the following parameters:

  • strata: The variable defining the strata in the survey, in this case, the “estrato” variable.

  • ids: The variable identifying the PSUs in the survey, here, the “upm” variable.

  • weights: The variable indicating the survey weights of each observation, in this case, the “fep” variable.

  • nest: A logical parameter indicating whether the survey data is nested or not. In this case, it’s set to “TRUE” because the data is nested by domain.

Together, these steps allow defining a sampling design that takes into account the sampling characteristics and the weights assigned to each observation in the survey. This is necessary to obtain precise and representative estimations of the parameters of interest.

options(survey.lonely.psu= 'adjust' )
diseno <- encuesta %>%
  as_survey_design(
    strata = estrato,
    ids = upm,
    weights = fep,
    nest=T
  )

The following code conducts a descriptive analysis based on a survey design represented by the object diseno.

  1. Grouping and Filtering: It uses the %>% function to chain operations. Initially, it groups the data by the domain identifier (id_dominio) using group_by_at() and subsequently filters observations where the variable empleo falls within the range of 3 to 5.

  2. Variable Summary: With the summarise() function, it computes various summaries for different categories of the variable empleo. These summaries include the weighted count for employed, unemployed, and inactive individuals (n_ocupado, n_desocupado, n_inactivo). Furthermore, it utilizes the survey_mean() function to obtain weighted mean estimates for each category of empleo, considering the variable type (vartype) and design effect (deff).

indicador_dam <-
  diseno %>% group_by_at(id_dominio) %>% 
  filter(empleo %in% c(3:5)) %>%
  summarise(
    n_ocupado = unweighted(sum(empleo == 3)),
    n_desocupado = unweighted(sum(empleo == 4)),
    n_inactivo = unweighted(sum(empleo == 5)),

    Ocupado = survey_mean(empleo == 3,
      vartype = c("se",  "var"),
      deff = T
    ),
    Desocupado = survey_mean(empleo == 4,
                          vartype = c("se",  "var"),
                          deff = T
    ),
    Inactivo = survey_mean(empleo == 5,
                          vartype = c("se",  "var"),
                          deff = T
    )
  )
  1. Upms counts by domains: This code performs operations on the survey data. First, it selects the columns id_dominio and upm, removes duplicate rows, and then counts the number of unique upm values for each id_dominio. Subsequently, it performs an inner join of these results with an existing object indicador_dam based on the id_dominio column, thus consolidating information about the quantity of unique upm values per identified domain in the survey.
indicador_dam <- encuesta %>% select(id_dominio, upm) %>%
  distinct() %>% 
  group_by_at(id_dominio) %>% 
  tally(name = "n_upm") %>% 
  inner_join(indicador_dam, by = id_dominio)
#Save data----------------------------
saveRDS(indicador_dam,'Recursos/05_Empleo/indicador_dam.Rds' )