Skip to contents

Introduction

chinadevfin2 is primarily a data package enabling an efficient method for working with AidData’s Global Chinese Development Finance Dataset, Version 2.0 (GCDF 2.0) in R.

chinadevfin2is designed to eliminate two time-consuming pain-points when working with datasets:

  1. Data import & cleaning: The chinadevfin2 package loads the dataset, and all variables are already formatted with the correct data types. Users can begin doing meaningful analysis with just a few lines of code.
  2. Country name standardization: By default, the chinafindev2 package loads the dataset with standardized country names and iso3c codes. This allows users to easily join other data sourced elsewhere, for example about income groups, custom country grouping, or macroeconomic data, without worrying whether about whether each dataset uses different versions of a country’s name (e.g. Philippines or The Philippines).

Finding insights quickly

[get_gcdf2_dataset()] loads the GCDF 2.0 dataset with standarized country names. Since the data is already cleaned, we can start doing interesting analysis right way.

What countries have been the largest receipients of Chinese development finance? We can answer this by finding the sum of the commitments (in constant 2017 USD).

library(chinadevfin2) # load the package
library(dplyr) # for data analysis 

# get the dataset, with standardized country names
committments_by_country <- get_gcdf2_dataset() |> 
  # See `recommended_for_aggregates` in the gcdf2_data_dictionary to learn more about this. 
  filter(recommended_for_aggregates == "Yes") |>
  # filter out regional projects 
  filter(country_or_regional == "country") |>
  # for each country
  group_by(country_name, iso3c) |> 
  # Find the sum in constant 2017 USD
  summarize(total_commitments_bn = sum(amount_constant_usd2017, na.rm = TRUE)/10^9) |> 
  # ungroup to avoid strange side effects of grouped tibbles
  ungroup() |> 
  # arrange by descending order
  arrange(total_commitments_bn |> desc())

committments_by_country
#> # A tibble: 142 × 3
#>    country_name iso3c total_commitments_bn
#>    <chr>        <chr>                <dbl>
#>  1 Russia       RUS                  125. 
#>  2 Venezuela    VEN                   91.1
#>  3 Angola       AGO                   52.7
#>  4 Kazakhstan   KAZ                   42.3
#>  5 Brazil       BRA                   41.5
#>  6 Indonesia    IDN                   34.9
#>  7 Pakistan     PAK                   34.6
#>  8 Vietnam      VNM                   18.5
#>  9 Iran         IRN                   17.1
#> 10 Ecuador      ECU                   16.9
#> # ℹ 132 more rows

Standardized country names

Standardized country names makes it easy to add on data from other sources.

Here’s a simple example. Above, we calculated the total commitments per country during the 2000-2017 period. But that doesn’t give us much perspective on how large those commitments are in the context of a country’s economy.

One way we can do this is by looking at the size of the commitments versus a relevant denominator, like population or economy size. Here, we’ll get data from the World Bank’s API using the wbstats R package. Because AidData’s commitments are displayed in constant 2017 USD, we’ll get countries’ 2017 USD GDP. A handful of countries, such as Venezuela, only have GDP data for earlier years, so for those we’ll get the most recent non-empty value.

Get outside data

# load the wbstats R package to access World Bank data
library(wbstats)

# get USD GDP from 2017, to align with GCDF 2.0's reporting figures in 2017 USD.
wb_gdp_usd_2017 <- wb_data(indicator = c("gdp_usd_2017" = "NY.GDP.MKTP.CD"),
                           start_date = 2017,
                           end_date = 2017) |> 
  select(iso3c, gdp_usd_2017)

# Venezuela (a large recipient of Chinese lending) only reported GDP data through 2014. This is true for a small handful of other countries such as Eritrea and South Sudan too.  For those, we will get the most recent non-empty value (`mnrev`) 
wb_gdp_usd_mrnev <- wb_data(indicator = c("gdp_usd_mrnev" = "NY.GDP.MKTP.CD"),
                           mrnev = 1) |> 
  select(iso3c, gdp_usd_mrnev)

# combine the datasets together to get 2017 GDP where available, but use the most recent non-empty (`mnrev`) value where 2017 data is not available. 
wb_gdp_usd_2017_or_mrnev <- wb_gdp_usd_2017 |> 
  left_join(wb_gdp_usd_mrnev, by = "iso3c") |> 
  mutate(gdp_usd_2017_or_mrnev = if_else(is.na(gdp_usd_2017), 
                                true = gdp_usd_mrnev,
                                false = gdp_usd_2017),
         # change the scale to billions of USD, in line with commitments
         gdp_usd_2017_or_mrnev_bn = gdp_usd_2017_or_mrnev/10^9) |> 
  select(iso3c, gdp_usd_2017_or_mrnev_bn)

wb_gdp_usd_2017_or_mrnev
#> # A tibble: 217 × 2
#>    iso3c gdp_usd_2017_or_mrnev_bn
#>    <chr>                    <dbl>
#>  1 ABW                      3.09 
#>  2 AFG                     18.9  
#>  3 AGO                     69.0  
#>  4 ALB                     13.0  
#>  5 AND                      3.00 
#>  6 ARE                    391.   
#>  7 ARG                    644.   
#>  8 ARM                     11.5  
#>  9 ASM                      0.612
#> 10 ATG                      1.47 
#> # ℹ 207 more rows

Attach the data and do cool things

Now we’ll add the GDP data to our committments_by_country tibble we created above, and then calculate commitments as a percent of 2017 (or most recently available) GDP.

committments_by_country |> 
  # join the two datasets by iso3c codes
  left_join(wb_gdp_usd_2017_or_mrnev, by = "iso3c") |> 
  # calculate the value of total commitments as a percentage of GDP
  mutate(commitments_pct_gdp = total_commitments_bn/gdp_usd_2017_or_mrnev_bn * 100) |> 
  # remove the iso3c column so the rest of the columns will be visible
  select(-iso3c) |> 
  # arrange the output in descending order by commitments as percent of GDP
  arrange(commitments_pct_gdp |> desc()) 
#> # A tibble: 142 × 4
#>    country_name  total_commitments_bn gdp_usd_2017_or_mrne…¹ commitments_pct_gdp
#>    <chr>                        <dbl>                  <dbl>               <dbl>
#>  1 Marshall Isl…                3.21                   0.213              1508. 
#>  2 Laos                        14.5                   17.1                  85.0
#>  3 Angola                      52.7                   69.0                  76.5
#>  4 Tonga                        0.326                  0.460                70.9
#>  5 Sierra Leone                 2.40                   3.72                 64.5
#>  6 Djibouti                     1.77                   2.76                 64.2
#>  7 Congo - Braz…                6.61                  11.1                  59.6
#>  8 Eritrea                      1.00                   2.07                 48.6
#>  9 Samoa                        0.406                  0.885                45.9
#> 10 Cambodia                    10.0                   22.2                  45.3
#> # ℹ 132 more rows
#> # ℹ abbreviated name: ¹​gdp_usd_2017_or_mrnev_bn

This is interesting. One should have extreme humility about GDP measurements from small & lower income countries. Nevertheless, this gives us useful context of how large Chinese development finance committments betwee 2000-2017 have been for many smaller and poorer countries.

About the GCDF 2.0 dataset:

AidData’s summary of the dataset:

AidData’s Global Chinese Development Finance Dataset, Version 2.0. records the known universe of projects (with development, commercial, or representational intent) supported by official financial and in-kind commitments (or pledges) from China from 2000-2017, with implementation details covering a 22-year period (2000-2021). The dataset captures 13,427 projects worth $843 billion financed by more than 300 Chinese government institutions and state-owned entities across 165 countries in every major region of the world. AidData systematically collected and quality-assured all projects in the dataset using the 2.0 version of our Tracking Underreported Financial Flows (TUFF) methodology.

Please see the AidData’s dataset website for full citation details.

If you are unfamiliar with the dataset, the following resources are a great place to start:

Exploring the GCDF 2.0 data dictionary

The dataset’s data dictionary, with definitions of all 70 variables, is available in the object gcdf2_data_dictionary. Let’s use the reactable package make a table to explore the data definitions.

Here’s what you’ll find in gcdf2_data_dictionary:

  • column_name: the name of the column in the gcdf2_dataset. It is the snake_case version of the name given to the variable by AidData that is displayed in field name, so that it is easier to work with in R.
  • column_class: the data type of the variable, such as numeric, character, or Date.
  • field_name: the original name of the variable given by AidData.
  • description: the detailed definition of the variable.

Take your time. There’s a lot there.

# load reactable library to make pretty tables
library(reactable)

gcdf2_data_dictionary |> 
  reactable(searchable = TRUE,
            sortable = TRUE,
            filterable = TRUE,
            bordered = TRUE,
            defaultPageSize = 3,
            columns = list(
              column_class = colDef(minWidth = 65),
              description = colDef(minWidth = 250)
            )
            )

Creating Aggregates

In the examples above, we found meaning in the dataset by creating aggregates.

The GCDF 2.0 dataset’s recommended_for_aggregates variable identifies the projects that AidData recommends be used for creating data aggregates. As in the example above, use filter(recommended_for_aggregates == "Yes") as part of your dplyr pipeline if you are creating aggregates.

Here is AidData’s full explanation from the data dictionary:

This field identifies projects that AidData recommends including in analysis that requires the aggregation of projects supported by official financial (or in-kind) commitments from China, including analysis of monetary amounts and project counts. It is useful for identifying formally approved, active, and completed Chinese government-financed projects – and excluding all cancelled projects, suspended projects, and projects that never reached the formal approval (official commitment) stage. The field is set to “Yes” for all projects with a status designation of Pipeline: Commitment, Implementation, and Completion that have not also been designated as umbrella agreements. It is set to “No” for all cancelled projects, suspended projects, and projects that never reached the official commitment stage (i.e. those projects with a status designation of Pipeline: Pledge, Suspended, and Cancelled). Additionally, to avoid double-counting, the field is set to “No” for all umbrella agreements. For more information on umbrella agreements, see the description of the “Umbrella” field in this file. Also, note that not all projects with a “Recommended for Aggregates” value of “True” identify a financial transaction value (since some transactions are difficult to monetize, such as in-kind donations, technical assistance, scholarships, and training activities).