Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# WHO Tuberculosis Estimated Incidence Rate Import

## Overview
This dataset contains national-level statistics for the Estimated Tuberculosis Incidence Rate (per 100,000 population).
Specifically, it provides incidence rates for two categories:
- Overall TB incidence
- HIV-positive TB incidence

The generated statistical variables capture the incidence rate for these conditions.
Examples of the statvars generated:
- `dcid:Count_MedicalConditionIncident_ConditionTuberculosis_AsAFractionOf_Count_Person`
- `dcid:Count_MedicalConditionIncident_ConditionTuberculosisAndHIV_AsAFractionOf_Count_Person`

**type of place:** Country, Region (M49 codes), WHO Regions, Overseas Territory, Special Administrative Regions
**years:** 2000 to 2024
**place_resolution:** Resolved to DCIDs (e.g., dcid:country/FRA, dcid:country/IND)

## Data Source
**Source URL:**
https://data.who.int/indicators/i/EB68992/2674B39

**Provenance Description:**
The data comes from the World Health Organization (WHO) master database and the public API. It tracks the estimated TB incidence rate globally (Indicator ID: `EB689922674B39`).

## Refresh Type
Automatic Refresh

For refresh of the data, the import includes a Python script (`tb_data_download.py`) to automatically fetch the data from the WHO API, join it with ISO3 geographic identifiers, and save the formatted CSV.

## Data Publish Frequency
Release Frequency = Annual

## How To Download Input Data
To download the data, run the provided script:
```bash
python3 tb_data_download.py
```
This will fetch the latest full dataset, process the ISO3 codes, and save it locally as `input_files/Estimated_incidence_rate_per_100_000_population.csv` making it available for stat var processing.

## Processing Instructions
To process the WHO Tuberculosis Incidence Rate data and generate statistical variables, use the following command from the import directory:

**For Data Run**
```bash
python3 ../../../tools/statvar_importer/stat_var_processor.py \
--input_data=input_files/* \
--pv_map=tuberculosis_estimated_incidence_rate_pvmap.csv \
--output_path=tuberculosis_estimated_incidence_rate_output \
--config_file=tuberculosis_estimated_incidence_rate_metadata.csv
```

This generates the following output files:
- tuberculosis_estimated_incidence_rate_output.csv
- tuberculosis_estimated_incidence_rate_output_stat_vars_schema.mcf
- tuberculosis_estimated_incidence_rate_output_stat_vars.mcf
- tuberculosis_estimated_incidence_rate_output.tmcf

**For Data Quality Checks and validation**
Validation of the data is done using the lint flag in the DataCommons import tool.

```bash
java -jar datacommons-import-tool-0.1-jar-with-dependencies.jar lint tuberculosis_estimated_incidence_rate_output_stat_vars_schema.mcf tuberculosis_estimated_incidence_rate_output.csv tuberculosis_estimated_incidence_rate_output.tmcf tuberculosis_estimated_incidence_rate_output_stat_vars.mcf
```

This generates the following output files:
- report.json
- summary_report.csv
- summary_report.html

The report files can be analyzed to check for errors and warnings. Further, linting is performed on the generated output files using the DataCommons import tool.

## Testing
Testing is performed using the provided `test_data` directory:
- Input: `test_data/tuberculosis_estimated_incidence_rate_input.csv`
- Output (expected): `test_data/tuberculosis_estimated_incidence_rate_output.csv`
- MCF (expected): `test_data/tuberculosis_estimated_incidence_rate_output.tmcf`
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"import_specifications": [
{
"import_name": "WHO_TuberculosisEstimatedIncidenceRate",
"curator_emails": ["support@datacommons.org"],
"provenance_url": "https://data.who.int/indicators/i/EB68992/2674B39",
"provenance_description": "Estimated number of new episodes of TB cases arising in a given year per 100 000 population.",
"scripts": ["tb_data_download.py",
"../../tools/statvar_importer/stat_var_processor.py --input_data=gs://unresolved_mcf/who/TB_Estimated_Incidence_Rate/input_files/* --pv_map=gs://unresolved_mcf/who/TB_Estimated_Incidence_Rate/tuberculosis_estimated_incidence_rate_pvmap.csv --config_file=gs://unresolved_mcf/who/TB_Estimated_Incidence_Rate/tuberculosis_estimated_incidence_rate_metadata.csv --output_path=gs://unresolved_mcf/who/TB_Estimated_Incidence_Rate/tuberculosis_estimated_incidence_rate_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
],
"import_inputs": [
{
"template_mcf": "tuberculosis_estimated_incidence_rate_output.tmcf",
"cleaned_csv": "tuberculosis_estimated_incidence_rate_output.csv"
}
],
"source_files": ["input_files/*.csv"],
"cron_schedule": "15 22 15 12 *"
}
]
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import os
import requests
import io
import pandas as pd

def download_who_data():
# 1. Get the Clean Data from the API using the new Indicator ID
api_url = "https://xmart-api-public.who.int/DATA_/RELAY_TB_DATA"
params = {
"$filter": "IND_ID eq 'EB689922674B39'",
"$format": "csv"
}

print("1. Fetching clean percentage data from WHO API...")
# Timeout set to 5s for connection, 300s for data transfer
api_response = requests.get(api_url, params=params, timeout=(5, 300))

if api_response.status_code != 200:
print(f"Failed to fetch API data. HTTP {api_response.status_code}")
return

# Load the clean API data into a pandas table
api_df = pd.read_csv(io.StringIO(api_response.text))

# 2. Get ONLY the iso3 code from the master database
print("2. Fetching country iso3 codes from WHO master database...")
master_url = "https://extranet.who.int/tme/generateCSV.asp?ds=notifications"

# Fetch data first to ensure timeout is applied correctly
master_response = requests.get(master_url, timeout=(5, 300))
master_response.raise_for_status()

# Load from the response text to avoid a second unprotected download call
geo_columns = ['country', 'iso3']
master_df = pd.read_csv(io.StringIO(master_response.text), usecols=geo_columns).drop_duplicates()

# 3. Merge the two datasets together based on the country name
print("3. Merging data and formatting...")
# The API uses uppercase 'COUNTRY', the master uses lowercase 'country'
merged_df = pd.merge(api_df, master_df, left_on='COUNTRY', right_on='country', how='left')
Comment thread
smarthg-gi marked this conversation as resolved.

# Drop the duplicate lowercase 'country' column used for joining
merged_df = merged_df.drop(columns=['country'])

# Reorder columns so the iso3 code sits right next to the Country name
final_columns = [
'IND_ID', 'INDICATOR_NAME', 'YEAR', 'COUNTRY', 'iso3', 'DISAGGR_1', 'VALUE'
]
merged_df = merged_df[final_columns]

# 4. Save to CSV in a new folder
output_dir = "input_files"
filename = os.path.join(output_dir, "Estimated_incidence_rate_per_100_000_population.csv")
Comment thread
smarthg-gi marked this conversation as resolved.

os.makedirs(output_dir, exist_ok=True)

# Save without the pandas index column
merged_df.to_csv(filename, index=False)
print(f"Success! Data saved locally as '{filename}'")

if __name__ == "__main__":
try:
download_who_data()
except requests.exceptions.Timeout:
print("Error: The request timed out. The server may be slow or offline.")
except Exception as e:
print(f"An unexpected error occurred: {e}")

Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
IND_ID,INDICATOR_NAME,YEAR,COUNTRY,iso3,DISAGGR_1,VALUE
EB689922674B39,Estimated TB incidence rate per 100 000 population,2003,France,FRA,TB incidence,12
EB689922674B39,Estimated TB incidence rate per 100 000 population,2004,France,FRA,TB incidence,10
EB689922674B39,Estimated TB incidence rate per 100 000 population,2005,France,FRA,TB incidence,9.8
EB689922674B39,Estimated TB incidence rate per 100 000 population,2006,France,FRA,TB incidence,9.7
EB689922674B39,Estimated TB incidence rate per 100 000 population,2007,France,FRA,TB incidence,11
EB689922674B39,Estimated TB incidence rate per 100 000 population,2008,France,FRA,TB incidence,10
EB689922674B39,Estimated TB incidence rate per 100 000 population,2009,France,FRA,TB incidence,9.4
EB689922674B39,Estimated TB incidence rate per 100 000 population,2010,France,FRA,TB incidence,9.2
EB689922674B39,Estimated TB incidence rate per 100 000 population,2011,France,FRA,TB incidence,9
EB689922674B39,Estimated TB incidence rate per 100 000 population,2012,France,FRA,TB incidence,8.9
EB689922674B39,Estimated TB incidence rate per 100 000 population,2013,France,FRA,TB incidence,8.8
EB689922674B39,Estimated TB incidence rate per 100 000 population,2014,France,FRA,TB incidence,8.5
EB689922674B39,Estimated TB incidence rate per 100 000 population,2015,France,FRA,TB incidence,8.4
EB689922674B39,Estimated TB incidence rate per 100 000 population,2016,France,FRA,TB incidence,8.7
EB689922674B39,Estimated TB incidence rate per 100 000 population,2017,France,FRA,TB incidence,9
EB689922674B39,Estimated TB incidence rate per 100 000 population,2018,France,FRA,TB incidence,8.8
EB689922674B39,Estimated TB incidence rate per 100 000 population,2019,France,FRA,TB incidence,9
EB689922674B39,Estimated TB incidence rate per 100 000 population,2020,France,FRA,TB incidence,8.1
EB689922674B39,Estimated TB incidence rate per 100 000 population,2021,France,FRA,TB incidence,7.4
EB689922674B39,Estimated TB incidence rate per 100 000 population,2022,France,FRA,TB incidence,7
EB689922674B39,Estimated TB incidence rate per 100 000 population,2023,France,FRA,TB incidence,8.3
EB689922674B39,Estimated TB incidence rate per 100 000 population,2024,France,FRA,TB incidence,7.8
EB689922674B39,Estimated TB incidence rate per 100 000 population,2000,French Polynesia,PYF,TB incidence,62
EB689922674B39,Estimated TB incidence rate per 100 000 population,2001,French Polynesia,PYF,TB incidence,60
EB689922674B39,Estimated TB incidence rate per 100 000 population,2002,French Polynesia,PYF,TB incidence,61
EB689922674B39,Estimated TB incidence rate per 100 000 population,2003,French Polynesia,PYF,TB incidence,45
EB689922674B39,Estimated TB incidence rate per 100 000 population,2004,French Polynesia,PYF,TB incidence,53
EB689922674B39,Estimated TB incidence rate per 100 000 population,2005,French Polynesia,PYF,TB incidence,52
EB689922674B39,Estimated TB incidence rate per 100 000 population,2006,French Polynesia,PYF,TB incidence,54
EB689922674B39,Estimated TB incidence rate per 100 000 population,2007,French Polynesia,PYF,TB incidence,47
EB689922674B39,Estimated TB incidence rate per 100 000 population,2008,French Polynesia,PYF,TB incidence,35
EB689922674B39,Estimated TB incidence rate per 100 000 population,2009,French Polynesia,PYF,TB incidence,36
EB689922674B39,Estimated TB incidence rate per 100 000 population,2010,French Polynesia,PYF,TB incidence,27
EB689922674B39,Estimated TB incidence rate per 100 000 population,2011,French Polynesia,PYF,TB incidence,41
EB689922674B39,Estimated TB incidence rate per 100 000 population,2012,French Polynesia,PYF,TB incidence,31
EB689922674B39,Estimated TB incidence rate per 100 000 population,2013,French Polynesia,PYF,TB incidence,32
EB689922674B39,Estimated TB incidence rate per 100 000 population,2014,French Polynesia,PYF,TB incidence,33
EB689922674B39,Estimated TB incidence rate per 100 000 population,2015,French Polynesia,PYF,TB incidence,27
EB689922674B39,Estimated TB incidence rate per 100 000 population,2016,French Polynesia,PYF,TB incidence,31
EB689922674B39,Estimated TB incidence rate per 100 000 population,2017,French Polynesia,PYF,TB incidence,29
EB689922674B39,Estimated TB incidence rate per 100 000 population,2018,French Polynesia,PYF,TB incidence,30
EB689922674B39,Estimated TB incidence rate per 100 000 population,2019,French Polynesia,PYF,TB incidence,27
EB689922674B39,Estimated TB incidence rate per 100 000 population,2020,French Polynesia,PYF,TB incidence,35
EB689922674B39,Estimated TB incidence rate per 100 000 population,2021,French Polynesia,PYF,TB incidence,19
EB689922674B39,Estimated TB incidence rate per 100 000 population,2022,French Polynesia,PYF,TB incidence,30
EB689922674B39,Estimated TB incidence rate per 100 000 population,2023,French Polynesia,PYF,TB incidence,19
EB689922674B39,Estimated TB incidence rate per 100 000 population,2024,French Polynesia,PYF,TB incidence,24
EB689922674B39,Estimated TB incidence rate per 100 000 population,2000,Gabon,GAB,TB incidence,332
EB689922674B39,Estimated TB incidence rate per 100 000 population,2001,Gabon,GAB,TB incidence,336
EB689922674B39,Estimated TB incidence rate per 100 000 population,2002,Gabon,GAB,TB incidence,342
EB689922674B39,Estimated TB incidence rate per 100 000 population,2003,Gabon,GAB,TB incidence,350
EB689922674B39,Estimated TB incidence rate per 100 000 population,2004,Gabon,GAB,TB incidence,361
EB689922674B39,Estimated TB incidence rate per 100 000 population,2005,Gabon,GAB,TB incidence,373
EB689922674B39,Estimated TB incidence rate per 100 000 population,2006,Gabon,GAB,TB incidence,387
EB689922674B39,Estimated TB incidence rate per 100 000 population,2007,Gabon,GAB,TB incidence,405
EB689922674B39,Estimated TB incidence rate per 100 000 population,2008,Gabon,GAB,TB incidence,426
EB689922674B39,Estimated TB incidence rate per 100 000 population,2009,Gabon,GAB,TB incidence,445
EB689922674B39,Estimated TB incidence rate per 100 000 population,2010,Gabon,GAB,TB incidence,461
EB689922674B39,Estimated TB incidence rate per 100 000 population,2011,Gabon,GAB,TB incidence,475
EB689922674B39,Estimated TB incidence rate per 100 000 population,2012,Gabon,GAB,TB incidence,483
EB689922674B39,Estimated TB incidence rate per 100 000 population,2013,Gabon,GAB,TB incidence,488
EB689922674B39,Estimated TB incidence rate per 100 000 population,2014,Gabon,GAB,TB incidence,491
EB689922674B39,Estimated TB incidence rate per 100 000 population,2015,Gabon,GAB,TB incidence,488
EB689922674B39,Estimated TB incidence rate per 100 000 population,2016,Gabon,GAB,TB incidence,475
EB689922674B39,Estimated TB incidence rate per 100 000 population,2017,Gabon,GAB,TB incidence,456
EB689922674B39,Estimated TB incidence rate per 100 000 population,2018,Gabon,GAB,TB incidence,440
EB689922674B39,Estimated TB incidence rate per 100 000 population,2019,Gabon,GAB,TB incidence,427
EB689922674B39,Estimated TB incidence rate per 100 000 population,2020,Gabon,GAB,TB incidence,413
EB689922674B39,Estimated TB incidence rate per 100 000 population,2021,Gabon,GAB,TB incidence,401
EB689922674B39,Estimated TB incidence rate per 100 000 population,2022,Gabon,GAB,TB incidence,390
EB689922674B39,Estimated TB incidence rate per 100 000 population,2023,Gabon,GAB,TB incidence,380
EB689922674B39,Estimated TB incidence rate per 100 000 population,2024,Gabon,GAB,TB incidence,371
EB689922674B39,Estimated TB incidence rate per 100 000 population,2000,Gambia,GMB,TB incidence,187
EB689922674B39,Estimated TB incidence rate per 100 000 population,2001,Gambia,GMB,TB incidence,189
EB689922674B39,Estimated TB incidence rate per 100 000 population,2002,Gambia,GMB,TB incidence,190
EB689922674B39,Estimated TB incidence rate per 100 000 population,2003,Gambia,GMB,TB incidence,191
EB689922674B39,Estimated TB incidence rate per 100 000 population,2004,Gambia,GMB,TB incidence,191
EB689922674B39,Estimated TB incidence rate per 100 000 population,2005,Gambia,GMB,TB incidence,190
EB689922674B39,Estimated TB incidence rate per 100 000 population,2006,Gambia,GMB,TB incidence,188
EB689922674B39,Estimated TB incidence rate per 100 000 population,2007,Gambia,GMB,TB incidence,186
EB689922674B39,Estimated TB incidence rate per 100 000 population,2008,Gambia,GMB,TB incidence,183
EB689922674B39,Estimated TB incidence rate per 100 000 population,2009,Gambia,GMB,TB incidence,180
EB689922674B39,Estimated TB incidence rate per 100 000 population,2010,Gambia,GMB,TB incidence,179
EB689922674B39,Estimated TB incidence rate per 100 000 population,2011,Gambia,GMB,TB incidence,178
EB689922674B39,Estimated TB incidence rate per 100 000 population,2012,Gambia,GMB,TB incidence,177
EB689922674B39,Estimated TB incidence rate per 100 000 population,2013,Gambia,GMB,TB incidence,176
EB689922674B39,Estimated TB incidence rate per 100 000 population,2014,Gambia,GMB,TB incidence,175
EB689922674B39,Estimated TB incidence rate per 100 000 population,2015,Gambia,GMB,TB incidence,173
EB689922674B39,Estimated TB incidence rate per 100 000 population,2016,Gambia,GMB,TB incidence,170
EB689922674B39,Estimated TB incidence rate per 100 000 population,2017,Gambia,GMB,TB incidence,166
EB689922674B39,Estimated TB incidence rate per 100 000 population,2018,Gambia,GMB,TB incidence,162
EB689922674B39,Estimated TB incidence rate per 100 000 population,2019,Gambia,GMB,TB incidence,158
EB689922674B39,Estimated TB incidence rate per 100 000 population,2020,Gambia,GMB,TB incidence,153
EB689922674B39,Estimated TB incidence rate per 100 000 population,2021,Gambia,GMB,TB incidence,149
EB689922674B39,Estimated TB incidence rate per 100 000 population,2022,Gambia,GMB,TB incidence,145
EB689922674B39,Estimated TB incidence rate per 100 000 population,2023,Gambia,GMB,TB incidence,142
EB689922674B39,Estimated TB incidence rate per 100 000 population,2024,Gambia,GMB,TB incidence,138
EB689922674B39,Estimated TB incidence rate per 100 000 population,2000,Georgia,GEO,TB incidence,229
EB689922674B39,Estimated TB incidence rate per 100 000 population,2001,Georgia,GEO,TB incidence,222
Loading
Loading