In order to handle large datasets our API supports Pagination. Pagination allows you to retrieve the subsets of data, making it easier to manage and process. It also has filtering and sorting capabilities to search the endpoints.
Pagination is deactivated by default and must be specified in each API request. There are 4 available query parameters that you may use namely, offset
, limit
, next
and previous
.
limit
(optional):
Description
: It determines the number of records that will be retrieved.Data type
: IntegerExample
: limit=10Note
: The default value if 50. The maximum value is 1000offset
(optional):
Description
: It determines how many records will be skipped before being included in the returned results. It specifies the starting point from where the data should be fetched.Data type
: IntegerExample
: offset=50Note
: if not provided, the default is 0For example if there are a total of 100 records, by setting offset to 50 and limit to 20, we are returning a total of 20 records from the second set of records. It means that a total of 20 records will be returned from the second page of the API endpoint.
next
(optional):
Description
: Provides the URL to fetch the next page of the dataData type
: String (URL)Example
: https://goadmin.ifrc.org/api/v2/personnel/?limit=50&offset=150
Note
: The parameter is present only if the next page is availableprevious
(optional):
Description
: Provides the URL to fetch the previous page of the dataData type
: String (URL)Example
: https://goadmin.ifrc.org/api/v2/personnel/?limit=50&offset=50
Note
: The parameter is present only if there is a previous page is available.It is very important to verify whether the correct number of records have been retrieved from the database. Once you have used the Pagination query parameters to retrieve all the records from the database, you no longer need to make additional API requests.
To determine if additional API requests are needed, you should compare the value of limit
parameter with the total number of obtained records. If the number of records are less than the limit
parameter's value, then no additional requests are required.
If the number of records is equal to the limit
parameter's value, then you may need to send additional API requests. It is also important to check the value of the offset
parameter. An offset
value of 0 indicates that the data is fetched from the first first page.
If you see an empty array []
in the body of the response, it means that there are no records for that request and you don't need to send additional API request.
Request
curl -X 'GET' \
'https://goadmin.ifrc.org/api/v2/event/?imit=100&offset=0&l®ions__in=2&countries__in=84' \
-H 'accept: application/json'
Response
Response body
Download
{
"count": 78,
"next": null,
"previous": null,
"results": [
{
"dtype": {
"id": 4,
"name": "Cyclone",
"summary": "",
"translation_module_original_language": "en"
},
"countries": [
{
"iso": "IN",
"iso3": "IND",
"id": 84,
"record_type": 1,
"record_type_display": "Country",
"region": 2,
"independent": true,
"is_deprecated": false,
"fdrs": "DIN001",
"average_household_size": null,
"society_name": "Indian Red Cross Society",
"name": "India",
"translation_module_original_language": "en"
}
(...)
In the example, the API response we fetch the data from the Event
API endpoint. The data is obtained based on the previous and next parameters. The countries__in
and regions__in
have been used to filter the data. limit==100 means that we are obtaining 100 records and offset=0 means that data is fetched from the first till the last page. In the response we obtain only 78 records because these records have been filtered out for Nepal using the countries__in
and regions__in
parameters.
Whenever you are using PowerBI or any other service by default it will be accessing and showing you the data from the first page of the API endpoint which contains 50 records by default. If you want to access all the data coming stored on any particular API endpoint, this section shows you how you can do that in PowerBI/ Power Querry.
Using the steps below you can retrieve the data from the API using pagination along with authorization token and parameters. In this example we will be accessing the Events data and will use regions__in
and countries__in
as parameters.
regions__in
and countries__in
parameters that will be used in querying the data from the events API endpoint.regions__in
and countries__in
parameters as blank.let
// Get the parameter values
ParametersTable = Parameters,
Token = Table.SelectRows(ParametersTable, each [Column1] = "Token"){0}[Column2],
region = Table.SelectRows(ParametersTable, each [Column1] = "regions__in"){0}[Column2],
country = Table.SelectRows(ParametersTable, each [Column1] = "countries__in"){0}[Column2],
BaseUrl = "https://goadmin.ifrc.org",
RelativePath = "api/v2/event/",
PageSize = 50, // Number of items per page
// Construct the query based on parameter values
queryOptions =
[
countries__in = if country <> null then Text.From(country) else "",
regions__in = if region <> null then Text.From(region) else "",
limit = Text.From(PageSize),
offset = "0" // Set initial offset to 0 as a string
],
GetPage = (offset) =>
let
Url = BaseUrl,
Headers = if Token <> null then [#"Accept-Language"="en", #"Authorization"="Token " & Token] else [#"Accept-Language"="en"],
Response = Json.Document(Web.Contents(Url, [Headers=Headers, RelativePath = RelativePath, Query=queryOptions & [offset = Text.From(offset)]])),
Results = Response[results]
in
Results,
CombineAllPages = List.Generate(
() => [Offset = 0],
each let page = GetPage([Offset]) in page <> null and List.Count(page) > 0,
each [Offset = [Offset] + PageSize],
each GetPage([Offset])
),
#"Converted to Table" = Table.FromList(CombineAllPages, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1")
in
#"Expanded Column1"
https://goadmin.ifrc.org/api-docs/swagger-ui/.
queryOptions
variable in this code is constructing the filter query parameter that will be appended to the API URL to filter the results. It is checking if the region
and country
parameter values are null or not, and constructing different filter values based on that. The filter allows the query to the API to be customized based on the input parameters, to filter the results returned from the API.GetPage
function is responsible for constructing the full API URL for a specific page, making the API request based on the parameters used, parsing the response and returning just the array of results for that page.CombineAllPages
function is responsible for looping through all available pages of results from the API and combining them into a single list. It loops through all the available pages by incrementing the page counter. It calls GetPage
to fetch each API page, stops when the GetPage
return 0 and accumulates all the results into a single list.Note: We can also specify the values of
regions__in
andcountries__in
parameters in the Parameters Table. This will further filter the data from all the API pages. For example if we specify theregions__in = 2
andcountries__in = 84
, then we will obtain 77 records.
Here we show how can you retrieve data from the GO API using python. We will be retrieving the data from all the pages of the Event
API endpoint and will also filter the data using the regions__in
and countries__in
parameters. First we import the necessary python packages that will be used to retrieve the data.
import requests
import os
from concurrent.futures import ThreadPoolExecutor
from functools import partial
import pandas as pd
The following packages have been imported :
requests
: Used for sending HTTP requests and handling responses, commonly used for web scraping, API interaction, and downloading web content.concurrent.futures.ThreadPoolExecutor
: Part of the concurrent.futures
module, used for managing a pool of worker threads to execute functions concurrently.functools.partial
: Allows you to create partial functions by fixing a certain number of arguments of a function and generating a new function.pandas
: Provides data structures and data analysis tools for working with structured data.Next we specify the API URL and Authorization token. The Page_limit is set to 50 because each page contains 50 records.
api_url = "https://goadmin.ifrc.org/api/v2/event/"
auth_token = "Your Authorization token"
page_limit = 50
Next we define a function called fetchpage
which will be used to retrieve the data from the API pages. In this function we also specify our filtering parameters namely regions__in
and countries__in
which with the help of which we will be filtering out data. We also specify a header dictionary in which we provide the authorization token.
def fetch_page(page_num, limit=50, offset=0, regions__in=None, countries__in=None):
params = {"limit": limit, "offset": offset + limit * (page_num - 1)}
if regions__in or countries__in:
params["regions__in"] = regions__in
params["countries__in"] = countries__in
headers = {"Authorization": f"Bearer {auth_token}"}
response = requests.get(f"{api_url}", params=params, headers=headers)
return response.json()
Note: The data from the GO API can be requested in 4 supported languages namely English, French, Arabic and Spanish. If you want the API response in the any of the 4 languages, you can specify another header in the fetch_page function as shown below:
headers = { "Authorization": f"Bearer {auth_token}", "Accept-Language": "two letter language code" # Add the Accept-Language header with the desired language }
Since the API endpoint contains over 100 pages, we will be parallelizing the task. We first calculate the total number of API pages and will be utilizing the concurrent.futures
module to parallelize the execution of a function called fetch_page
across multiple threads.
max_workers = multiprocessing.cpu_count()
response = requests.get(api_url)
if response.status_code == 200:
total_records = response.json()["count"]
else:
print(f"Failed to fetch total records count. Status code: {response.status_code}")
total_pages = (total_records + page_limit - 1) // page_limit
print("total_pages: ", total_pages, "\ntotal cpu count: ", max_workers)
with ThreadPoolExecutor(max_workers=max_workers * 4) as executor:
page_nums = range(1, total_pages + 1)
partial_fetch_page = partial(fetch_page, limit=50, offset=0)
results = executor.map(
partial_fetch_page, page_nums
) # regions__in = "2", countries__in = "84"
final_results = []
for result in results:
item = result
final_results.extend(item["results"])
The partial function is used to fix some arguments of fetch_page while leaving others as parameters to be filled later. In this case, it sets default values for limit, offset, regions__in
, and countries__in
.
executor.map
method is used to apply the partial_fetch_page
function to each page number in page_nums
. This means that the fetch_page function will be executed concurrently for each page number using the thread pool. The map function returns an iterator of results.In summary, the code is using a thread pool to fetch data from multiple pages concurrently, and it collects the results into a list called final_results.
The code below converts the retrieved data in to pandas data frame. The expand
is a function which has been used to flatten out the data in the dtype
, countries
and appeals
JSON objects. The same function can also be used to expand/ flatten out other JSON objects in the data and later the user can filter out the columns and prepare the data for visualization. Lastly the data has been exhorted to a csv format.
df = pd.DataFrame(final_results)
def expand(data_frame, column=str):
result = data_frame.explode(column)
result = result.reset_index(drop=True)
return result
# expandingh the countries and appeals columns as
# the data is stored as list
result = expand(df, "countries")
result = expand(result, "appeals")
# Get the index of the columns "countries" , "dtype" and "appeals"
countries_index = result.columns.get_loc("countries")
appeals_index = result.columns.get_loc("appeals")
dtype_index = result.columns.get_loc("dtype")
# Concatenate and expand columns
df_expanded = pd.concat(
[
pd.json_normalize(result["dtype"], sep="_").add_prefix("dtype."),
pd.json_normalize(result["countries"], sep="_").add_prefix("countries."),
result.iloc[
:, countries_index + 1 : appeals_index
], # columns between "countries" and "appeals"
pd.json_normalize(result["appeals"], sep="_").add_prefix("appeals."),
result.iloc[:, appeals_index + 1 :], # columns after "appeals"
],
axis=1,
)
#storing the data in csv format
df_expanded.to_csv("your/path/to/save/the/file/df_expanded.csv")