Chapter 4 Ego-network composition

4.1 Overview

Ego-network composition refers to the distribution of attributes of alters (for example, age and race/ethnicity) or attributes of ties between alters and the ego (for example, closeness or frequency of contact). For brevity, these are often called simply alter attributes in the following text. Recall that, in typical egocentric network data, the level of alters is the same as the level of ego-alter ties: for each alter there is one and only one ego-alter tie, and vice versa. Like in other chapters, we consider composition by first analyzing just one ego-network, then replicating the same type of analysis on many ego-networks at once.

This chapter covers the following topics:

  • Calculating measures of composition for one ego-network.
  • Representing data from multiple ego-networks as data frames.
  • Running the same operation on many ego-networks and combining results back together (split-apply-combine).
  • Split-apply-combine on data frames with dplyr to analyze the composition of many ego-networks at once.

4.2 Measures of ego-network composition

  • Many different measures can be calculated in R to describe ego-network composition.
    • The result is typically an ego-level summary variable: one that assigns a number to each ego, with that number describing a characteristic of the ego-network composition for each ego.
  • To calculate compositional measures we don’t need any network or relational data (data about alter-alter ties), we only need the alter-level attribute dataset (for example, with alter age, gender, ethnicity, frequency of contact, etc.). In other words, we only need, for each ego, a list of alters with their characteristics, and no information about ties between alters.
    • So in this section, ego-networks are represented simply by alter-level data frames. We don’t need to work with the igraph network objects until we want to analyze ego-network structure (alter-alter ties).
  • There are at least three types of compositional measures that we may want to calculate on ego networks. All these measures are calculated for each ego.
    • Measures based on one attribute of alters. For example, average alter age in the network.
    • Measures based on multiple attributes of alters. For example, average frequency of contact (attribute 1) between ego and alters who are family members (attribute 2).
    • Measures of ego-alter homophily. These are summary measures of the extent to which alters are similar to the ego who nominated them, with respect to one or more attributes. For example, the proportion of alters who are of the same gender (ethnicity, age bracket) as the ego who nominated them.
  • What we do in the following code.
    • Look at the alter attribute data frame for one ego.
    • Using this data frame, calculate compositional measures based on one alter attribute.
    • Calculate compositional measures based on two alter attributes.
    • Do the level-1 join: join ego attributes into alter-level data.
    • Using the joined data, calculate compositional measures of homophily between ego and alters.
    • Calculate multiple compositional measures and put them together into one ego-level data frame.
# Load packages.
library(tidyverse)
library(skimr)
library(janitor)

# Load data.
load("./Data/data.rda")

# For compositional measures all we need is the alter attribute data frame.
# The data.rda file loaded above includes the alter attribute data frame for
# ego ID 28.
alter.attr.28
## # A tibble: 45 × 12
##    alter_ID ego_ID alter_num alter.sex alter.age.cat alter.rel    alter.nat
##       <dbl>  <dbl>     <dbl> <fct>     <fct>         <fct>        <fct>    
##  1     2801     28         1 Female    51-60         Close family Sri Lanka
##  2     2802     28         2 Male      51-60         Other family Sri Lanka
##  3     2803     28         3 Male      51-60         Close family Sri Lanka
##  4     2804     28         4 Male      60+           Close family Sri Lanka
##  5     2805     28         5 Female    41-50         Close family Sri Lanka
##  6     2806     28         6 Female    60+           Close family Sri Lanka
##  7     2807     28         7 Male      41-50         Other family Sri Lanka
##  8     2808     28         8 Female    36-40         Other family Sri Lanka
##  9     2809     28         9 Female    51-60         Other family Sri Lanka
## 10     2810     28        10 Male      60+           Other family Sri Lanka
## # ℹ 35 more rows
## # ℹ 5 more variables: alter.res <fct>, alter.clo <dbl>, alter.loan <fct>,
## #   alter.fam <fct>, alter.age <dbl>
# Compositional measures based on a single alter attribute                  ----
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# Summary of continuous variable: Average alter closeness.

# Check out the relevant variable
alter.attr.28$alter.clo
##  [1] NA  3 NA NA NA NA  5  4  5  5  5  4  3  5  3  3  1  5  4  5  5  5  5  5  2
## [26]  3  4  5  5  5  5  3  5  3  4  5  5  5  5  3  3  3  3  3  5
# Or, in tidyverse syntax
alter.attr.28 |> 
  pull(alter.clo)
##  [1] NA  3 NA NA NA NA  5  4  5  5  5  4  3  5  3  3  1  5  4  5  5  5  5  5  2
## [26]  3  4  5  5  5  5  3  5  3  4  5  5  5  5  3  3  3  3  3  5
# Battery of descriptive stats.
alter.attr.28 |>
  skim_tee(alter.clo) 
## ── Data Summary ────────────────────────
##                            Values
## Name                       data  
## Number of rows             45    
## Number of columns          12    
## _______________________          
## Column type frequency:           
##   numeric                  1     
## ________________________         
## Group variables            None  
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean   sd p0 p25 p50 p75 p100 hist 
## 1 alter.clo             5         0.889  4.1 1.08  1   3   5   5    5 ▁▁▅▂▇
# Get a single summary measure (useful when writing functions)
mean(alter.attr.28$alter.clo, na.rm = TRUE)
## [1] 4.1
# Summary of categorical variable: Proportion female alters.

# Check out the relevant vector
alter.attr.28$alter.sex
##  [1] Female Male   Male   Male   Female Female Male   Female Female Male  
## [11] Male   Male   Male   Male   Male   Male   Male   Male   Male   Male  
## [21] Male   Male   Male   Male   Male   Male   Male   Female Male   Male  
## [31] Male   Female Male   Male   Male   Male   Female Male   Male   Male  
## [41] Male   Male   Male   Male   Male  
## Levels: Female Male
# Get frequencies
alter.attr.28 |>
  tabyl(alter.sex)
##  alter.sex  n   percent
##     Female  8 0.1777778
##       Male 37 0.8222222
# Same for nationalities
alter.attr.28 |>
  tabyl(alter.nat)
##  alter.nat  n    percent
##      Italy  0 0.00000000
##      Other  2 0.04444444
##  Sri Lanka 43 0.95555556
# Another way to get the proportion of a specific category (this is useful when
# writing functions).
mean(alter.attr.28$alter.sex == "Female")
## [1] 0.1777778
mean(alter.attr.28$alter.nat == "Sri Lanka")
## [1] 0.9555556
# The function dplyr::summarise() allows us to calculate multiple measures, name
# them, and put them together in a data frame.
alter.attr.28 |>
  summarise(
    mean.clo = mean(alter.clo, na.rm=TRUE), 
    prop.fem = mean(alter.sex=="Female"), 
    count.nat.slk = sum(alter.nat=="Sri Lanka"), 
    count.nat.ita = sum(alter.nat=="Italy"), 
    count.nat.oth = sum(alter.nat=="Other")
  )
## # A tibble: 1 × 5
##   mean.clo prop.fem count.nat.slk count.nat.ita count.nat.oth
##      <dbl>    <dbl>         <int>         <int>         <int>
## 1      4.1    0.178            43             0             2
# What if we want to calculate the same measures for all ego-networks in the data?
# We'll have to use the data frame with all alter attributes from all egos.
alter.attr.all
## # A tibble: 4,590 × 12
##    alter_ID ego_ID alter_num alter.sex alter.age.cat alter.rel    alter.nat
##       <dbl>  <dbl>     <dbl> <fct>     <fct>         <fct>        <fct>    
##  1     2801     28         1 Female    51-60         Close family Sri Lanka
##  2     2802     28         2 Male      51-60         Other family Sri Lanka
##  3     2803     28         3 Male      51-60         Close family Sri Lanka
##  4     2804     28         4 Male      60+           Close family Sri Lanka
##  5     2805     28         5 Female    41-50         Close family Sri Lanka
##  6     2806     28         6 Female    60+           Close family Sri Lanka
##  7     2807     28         7 Male      41-50         Other family Sri Lanka
##  8     2808     28         8 Female    36-40         Other family Sri Lanka
##  9     2809     28         9 Female    51-60         Other family Sri Lanka
## 10     2810     28        10 Male      60+           Other family Sri Lanka
## # ℹ 4,580 more rows
## # ℹ 5 more variables: alter.res <fct>, alter.clo <dbl>, alter.loan <fct>,
## #   alter.fam <fct>, alter.age <dbl>
# dplyr allows us to "group" a data frame by a factor (here, ego IDs) so all
# measures we calculate on that data frame via summarise (means, proportions,
# etc.) are calculated by the groups given by that factor (here, for each ego
# ID).
alter.attr.all |> 
  group_by(ego_ID) |> 
  summarise(
    mean.clo = mean(alter.clo, na.rm=TRUE), 
    prop.fem = mean(alter.sex=="Female"), 
    count.nat.slk = sum(alter.nat=="Sri Lanka"), 
    count.nat.ita = sum(alter.nat=="Italy"), 
    count.nat.oth = sum(alter.nat=="Other")
  )
## # A tibble: 102 × 6
##    ego_ID mean.clo prop.fem count.nat.slk count.nat.ita count.nat.oth
##     <dbl>    <dbl>    <dbl>         <int>         <int>         <int>
##  1     28     4.1    0.178             43             0             2
##  2     29     4.03   0.0889            44             1             0
##  3     33     3.62   0.378             32             2            11
##  4     35     3.78   0.289             33             4             8
##  5     39     3.73   0.244             39             5             1
##  6     40     3.32   0.356             34             1            10
##  7     45     4.02   0.244             14            19            12
##  8     46     3.48   0.4               33             7             5
##  9     47     4.05   0.267             45             0             0
## 10     48     4.07   0.311             39             4             2
## # ℹ 92 more rows
# We'll talk more about this and show more examples in the next sections.


# Compositional measures based on multiple alter attributes                 ----
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# Using indexing we can combine multiple alter attribute variables.

# Mean closeness of alters who are "Friends".

# Check out the relevant vector.
alter.attr.28 |>
  filter(alter.rel=="Friends") |>
  pull(alter.clo)
##  [1] 5 4 3 5 3 3 5 5 5 5 5 5 4 5 5 5 5 5 4 5 5 5 5 3 3 3 5
# Get its mean.
alter.attr.28 |>
  filter(alter.rel=="Friends") |>
  pull(alter.clo) |> 
  mean()
## [1] 4.444444
# Mean closeness of alters who are "Acquaintances".
alter.attr.28 |>
  filter(alter.rel=="Acquaintances") |>
  pull(alter.clo) |> 
  mean()
## [1] 2.75
# Equivalently (useful for writing functions)
mean(alter.attr.28$alter.clo[alter.attr.28$alter.rel=="Acquaintances"])
## [1] 2.75
# Count of close family members who live in Sri Lanka vs those who live in Italy.

# In Sri Lanka.
alter.attr.28 |>
  filter(alter.rel == "Close family", alter.res == "Sri Lanka") |> 
  count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1     5
# Equivalently (useful for writing functions)
sum(alter.attr.28$alter.rel == "Close family" & alter.attr.28$alter.res == "Sri Lanka")
## [1] 5
# In Italy.
sum(alter.attr.28$alter.rel == "Close family" & alter.attr.28$alter.res == "Italy")
## [1] 0
# Again, we can put these measures together into a data frame row with dplyr.
alter.attr.28 |>
  summarise(
    mean.clo.fr = mean(alter.clo[alter.rel=="Friends"]), 
    mean.clo.acq = mean(alter.clo[alter.rel=="Acquaintances"]),
    count.fam.slk = sum(alter.rel=="Close family" & alter.res=="Sri Lanka"),
    count.fam.ita = sum(alter.rel=="Close family" & alter.res=="Italy")
  )
## # A tibble: 1 × 4
##   mean.clo.fr mean.clo.acq count.fam.slk count.fam.ita
##         <dbl>        <dbl>         <int>         <int>
## 1        4.44         2.75             5             0
# Compositional measures of homophily between ego and alters                ----
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# Level-1 join: Bring ego-level data into alter-level data frame for ego 28.
(data.28 <- left_join(alter.attr.28, ego.df, by= "ego_ID"))
## # A tibble: 45 × 20
##    alter_ID ego_ID alter_num alter.sex alter.age.cat alter.rel    alter.nat
##       <dbl>  <dbl>     <dbl> <fct>     <fct>         <fct>        <fct>    
##  1     2801     28         1 Female    51-60         Close family Sri Lanka
##  2     2802     28         2 Male      51-60         Other family Sri Lanka
##  3     2803     28         3 Male      51-60         Close family Sri Lanka
##  4     2804     28         4 Male      60+           Close family Sri Lanka
##  5     2805     28         5 Female    41-50         Close family Sri Lanka
##  6     2806     28         6 Female    60+           Close family Sri Lanka
##  7     2807     28         7 Male      41-50         Other family Sri Lanka
##  8     2808     28         8 Female    36-40         Other family Sri Lanka
##  9     2809     28         9 Female    51-60         Other family Sri Lanka
## 10     2810     28        10 Male      60+           Other family Sri Lanka
## # ℹ 35 more rows
## # ℹ 13 more variables: alter.res <fct>, alter.clo <dbl>, alter.loan <fct>,
## #   alter.fam <fct>, alter.age <dbl>, ego.sex <fct>, ego.age <dbl>,
## #   ego.arr <dbl>, ego.edu <fct>, ego.inc <dbl>, empl <dbl>,
## #   ego.empl.bin <fct>, ego.age.cat <fct>
# Note the left join: We only retain rows in the left data frame (i.e., alters
# of ego 28), and discard all egos in the right data frame that do not
# correspond to those rows.

# Example: Proportion of alters of the same gender as ego.

# View the relevant data
data.28 |> 
  dplyr::select(alter_ID, ego_ID, alter.sex, ego.sex)
## # A tibble: 45 × 4
##    alter_ID ego_ID alter.sex ego.sex
##       <dbl>  <dbl> <fct>     <fct>  
##  1     2801     28 Female    Male   
##  2     2802     28 Male      Male   
##  3     2803     28 Male      Male   
##  4     2804     28 Male      Male   
##  5     2805     28 Female    Male   
##  6     2806     28 Female    Male   
##  7     2807     28 Male      Male   
##  8     2808     28 Female    Male   
##  9     2809     28 Female    Male   
## 10     2810     28 Male      Male   
## # ℹ 35 more rows
# First create a vector that is TRUE whenever alter has the same sex as ego in 
# data.28 (the joined data frame for ego 28).
data.28$alter.sex == data.28$ego.sex
##  [1] FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [25]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
## [37] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# The proportion we're looking for is simply the proportion of TRUE's in this
# vector.
mean(data.28$alter.sex == data.28$ego.sex)
## [1] 0.8222222
# Similarly: count of alters who are in the same age bracket as the ego.
sum(data.28$alter.age.cat == data.28$ego.age.cat)
## [1] 9
# Again, we can put these measures together into a data frame row with dplyr.
data.28 |>
  summarise(
    prop.same.gender = mean(alter.sex == ego.sex), 
    count.same.age = sum(alter.age.cat == ego.age.cat)
  )
## # A tibble: 1 × 2
##   prop.same.gender count.same.age
##              <dbl>          <int>
## 1            0.822              9

4.3 Analyzing the composition of many ego-networks

4.3.1 Split-apply-combine in egocentric network analysis

  • Now that we’ve learned how to represent and analyze data on one ego-network, we’re ready to scale these operations up to a collection of many ego-networks.
  • Often in social science data analaysis (or any data analysis), our data are in a single file, dataset or object, and we need to:
    1. Split the object into pieces based on one or multiple (combinations of) categorical variables or factors.
    2. Apply exactly the same type of calculation on each piece, identically and independently.
    3. Combine all results back together, for example into a new dataset.
  • This has been called the split-apply-combine strategy (Wickham, 2011) and is essential in egocentric network analysis. With ego-networks, we are constantly (1) splitting the data into pieces, each piece typically corresponding to one ego; (2) performing identical and independent analyses on each piece (each ego-network); (3) combining the results back together, typically into a single ego-level dataset, to then associate them with other ego-level variables.
  • In base and traditional R, common tools to perform split-apply-combine operations include for loops, the apply family of functions, and aggregate.
  • The tidyverse packages provide new ways of conducting split-apply-combine operations with more efficient and readable code:
    • Grouping and summarizing data frames with the dplyr package. This is particularly relevant to ego-network composition (next section).
    • Applying the same function to all elements in a list with the map family of functions in the purrr package. This is more relevant to ego-network structure (see Section 5.4)

4.3.2 Grouping and summarizing with dplyr

  • Whenever we have a dataset in which rows (level 1) are clustered or grouped by values of a given factor (level 2), the package dplyr makes level-2 summarizations very easy.
    • In egocentric analysis, we typically have an alter attribute data frame whose rows (alters, level 1) are clustered by egos (level 2).
  • In general, the dplyr::summarise function allows us to calculate summary statistics on a single variable or on multiple variables in a data frame.
    • To calculate the same summary statistic on multiple variables, we select them with across().
  • If we run summarise after grouping the data frame by a factor variable with group_by, then the data frame will be “split” by levels (categories) of that factor, and the summary statistics will be calculated on each piece: that is, for each unique level of the grouping factor.
    • So if the grouping factor is the ego ID, we can immediately obtain summary statistics for each of hundreds or thousands of egos in one line of code. The code below provides examples.
  • What we do in the following code.
    • Use summarise to calculate summary variables on network composition for all of the 102 egos at once.
    • Join the results with other ego-level data (level-2 join).
# The summarise function offers a concise syntax to calculate summary 
# statistics on a data frame's variables. 

# Let's see what happens if we apply this function to the alter attribute data
# frame including all alters, without grouping it by ego ID.

# * Mean alter closeness:
alter.attr.all |>
  summarise(mean.clo = mean(alter.clo, na.rm = TRUE))
## # A tibble: 1 × 1
##   mean.clo
##      <dbl>
## 1     3.87
# * N of distinct values in the alter nationality variable (i.e., number of 
# distinct nationalities of alters):
alter.attr.all |>
  summarise(N.nat = n_distinct(alter.nat))
## # A tibble: 1 × 1
##   N.nat
##   <int>
## 1     3
# * N of distinct values in the alter nationality, country of residence, and
# age bracket variables. In this case, we apply the same summarizing function
# to multiple variables (not just one), to be selected via across().
alter.attr.all |>
  summarise(
    across(c(alter.nat, alter.res, alter.age.cat), 
           n_distinct)
    )
## # A tibble: 1 × 3
##   alter.nat alter.res alter.age.cat
##       <int>     <int>         <int>
## 1         3         3             8
# Because we ran this without previously grouping the data frame by ego ID, each
# function is calculated on all alters from all egos pooled (all rows of the
# alter attribute data frame), not on the set of alters of each ego.

# If we group the data frame by ego_ID, each of those summary statistics is
# calculated for each ego:

# * Mean alter closeness:
alter.attr.all |>
  # Group by ego ID
  group_by(ego_ID) |>
  # Calculate summary measure
  summarise(mean.clo = mean(alter.clo, na.rm = TRUE))
## # A tibble: 102 × 2
##    ego_ID mean.clo
##     <dbl>    <dbl>
##  1     28     4.1 
##  2     29     4.03
##  3     33     3.62
##  4     35     3.78
##  5     39     3.73
##  6     40     3.32
##  7     45     4.02
##  8     46     3.48
##  9     47     4.05
## 10     48     4.07
## # ℹ 92 more rows
# * N of distinct values in the alter nationality variable (i.e., number of 
# distinct nationalities of alters):
alter.attr.all |>
  group_by(ego_ID) |>
  summarise(N.nat = n_distinct(alter.nat))
## # A tibble: 102 × 2
##    ego_ID N.nat
##     <dbl> <int>
##  1     28     2
##  2     29     2
##  3     33     3
##  4     35     3
##  5     39     3
##  6     40     3
##  7     45     3
##  8     46     3
##  9     47     1
## 10     48     3
## # ℹ 92 more rows
# We can also "permanently" group the data frame by ego_ID and then calculate all
# our summary measures by ego ID.
alter.attr.all <- alter.attr.all |> 
  group_by(ego_ID)

# * N of distinct values in the alter nationality, country of residence, and
# age bracket variables:
alter.attr.all |>
  summarise(
    across(c(alter.nat, alter.res, alter.age.cat), 
           n_distinct)
    )
## # A tibble: 102 × 4
##    ego_ID alter.nat alter.res alter.age.cat
##     <dbl>     <int>     <int>         <int>
##  1     28         2         3             7
##  2     29         2         3             7
##  3     33         3         2             6
##  4     35         3         3             7
##  5     39         3         3             7
##  6     40         3         3             6
##  7     45         3         3             8
##  8     46         3         3             7
##  9     47         1         3             6
## 10     48         3         3             7
## # ℹ 92 more rows
# We can also use summarise to run more complex functions on alter attributes
# by ego.

# Imagine we want to count the number of alters who are "Close family", "Other
# family", and "Friends" in an ego-network.

# Let's consider the ego-network of ego ID 28 as an example.
alter.attr.28
## # A tibble: 45 × 12
##    alter_ID ego_ID alter_num alter.sex alter.age.cat alter.rel    alter.nat
##       <dbl>  <dbl>     <dbl> <fct>     <fct>         <fct>        <fct>    
##  1     2801     28         1 Female    51-60         Close family Sri Lanka
##  2     2802     28         2 Male      51-60         Other family Sri Lanka
##  3     2803     28         3 Male      51-60         Close family Sri Lanka
##  4     2804     28         4 Male      60+           Close family Sri Lanka
##  5     2805     28         5 Female    41-50         Close family Sri Lanka
##  6     2806     28         6 Female    60+           Close family Sri Lanka
##  7     2807     28         7 Male      41-50         Other family Sri Lanka
##  8     2808     28         8 Female    36-40         Other family Sri Lanka
##  9     2809     28         9 Female    51-60         Other family Sri Lanka
## 10     2810     28        10 Male      60+           Other family Sri Lanka
## # ℹ 35 more rows
## # ℹ 5 more variables: alter.res <fct>, alter.clo <dbl>, alter.loan <fct>,
## #   alter.fam <fct>, alter.age <dbl>
# Calculate the number of alters in each relationship type in this ego-network.

# Vector of alter relationship attribute.
alter.attr.28$alter.rel
##  [1] Close family  Other family  Close family  Close family  Close family 
##  [6] Close family  Other family  Other family  Other family  Other family 
## [11] Friends       Friends       Friends       Friends       Friends      
## [16] Friends       Acquaintances Friends       Acquaintances Friends      
## [21] Friends       Friends       Friends       Friends       Acquaintances
## [26] Acquaintances Friends       Friends       Friends       Friends      
## [31] Friends       Acquaintances Friends       Acquaintances Friends      
## [36] Friends       Friends       Friends       Friends       Friends      
## [41] Friends       Acquaintances Acquaintances Friends       Friends      
## Levels: Acquaintances Close family Friends Other family
# Flag with TRUE whenever alter is "Close family"
alter.attr.28$alter.rel=="Close family"
##  [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# Count the number of TRUE's
sum(alter.attr.28$alter.rel=="Close family")
## [1] 5
# The same can be done for "Other family" and "Friends"
sum(alter.attr.28$alter.rel=="Other family")
## [1] 5
sum(alter.attr.28$alter.rel=="Friends")
## [1] 27
# With dplyr we can run the same operations for every ego
N.rel <- alter.attr.all |>
  summarise(N.clo.fam = sum(alter.rel=="Close family"),
            N.oth.fam = sum(alter.rel=="Other family"),
            N.fri = sum(alter.rel=="Friends"))
N.rel
## # A tibble: 102 × 4
##    ego_ID N.clo.fam N.oth.fam N.fri
##     <dbl>     <int>     <int> <int>
##  1     28         5         5    27
##  2     29         5         6    30
##  3     33         3        16    13
##  4     35         4         1    38
##  5     39         4         9    27
##  6     40         5         7    17
##  7     45         3         6    30
##  8     46         3        14    12
##  9     47         1        20    18
## 10     48         4         6    26
## # ℹ 92 more rows
# After getting compositional summary variables for each ego, we might want to
# join them with other ego-level data (level-2 join).

# Merge with summary variables.
ego.df |>
  left_join(N.rel, by= "ego_ID")
## # A tibble: 102 × 12
##    ego_ID ego.sex ego.age ego.arr ego.edu ego.inc  empl ego.empl.bin ego.age.cat
##     <dbl> <fct>     <dbl>   <dbl> <fct>     <dbl> <dbl> <fct>        <fct>      
##  1     28 Male         61    2008 Second…     350     3 Yes          60+        
##  2     29 Male         38    2000 Primary     900     4 Yes          36-40      
##  3     33 Male         30    2010 Primary     200     3 Yes          26-30      
##  4     35 Male         25    2009 Second…    1000     3 Yes          18-25      
##  5     39 Male         29    2007 Primary       0     1 No           26-30      
##  6     40 Male         56    2008 Second…     950     4 Yes          51-60      
##  7     45 Male         52    1975 Primary    1600     3 Yes          51-60      
##  8     46 Male         35    2002 Second…    1200     4 Yes          31-35      
##  9     47 Male         22    2010 Second…     700     4 Yes          18-25      
## 10     48 Male         51    2007 Primary     950     4 Yes          51-60      
## # ℹ 92 more rows
## # ℹ 3 more variables: N.clo.fam <int>, N.oth.fam <int>, N.fri <int>
# We can then ungroup alter.attr.all by ego ID to remove the grouping information.
alter.attr.all <- ungroup(alter.attr.all)

# To get the size of each personal network, we can simpy use the count() function:
# it counts the number of rows for each unique value of a variable. The
# number of rows for each unique value of ego_ID in alter.attr.all is the number
# of alters for each ego (personal network size).
alter.attr.all |> 
  dplyr::count(ego_ID)
## # A tibble: 102 × 2
##    ego_ID     n
##     <dbl> <int>
##  1     28    45
##  2     29    45
##  3     33    45
##  4     35    45
##  5     39    45
##  6     40    45
##  7     45    45
##  8     46    45
##  9     47    45
## 10     48    45
## # ℹ 92 more rows
# ***** EXERCISES 
#
# (1) Extract the values of alter.attr.all$alter.sex corresponding to ego_ID 28
# (dplyr::filter). Calculate the proportion of "Female" values. Based on this
# code, use summarise() to calculate the proportion of women in every ego's
# personal network. Hint: mean(alter.sex=="Female").
#
# (2) Subset alter.attr.all to the rows corresponding to ego_ID 53 (dplyr::filter).
# Using the resulting data frame, calculate the average closeness ($alter.clo)
# of Italian alters (i.e. $alter.nat=="Italy"). Based on this code, run
# summarise() to calculate the average closeness of Italian alters for all egos.
#
# *****

References

Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1–29. http://www.jstatsoft.org/v40/i01