Redefining Collaboration between Civil Society Organisations and the UN: Introducing the Partner Portal
Redefining Collaboration with the UN: Introducing the Partner Portal Bolstering Success in Development and Humanitarian Endeavours: Navigate, Connect, and Flourish with the UN Partner Portal
As a consultancy firm devoted to backing development and humanitarian initiatives, we are thrilled to bring to your attention a pioneering tool set to transform your engagement with the United Nations (UN): The UN Partner Portal.
This comprehensive digital interface, jointly crafted by prestigious UN agencies including the UN Secretariat, UN Women, FAO, UNDP Crisis Bureau, UNFPA, UNHCR, UNICEF, and WFP, with operational support from UNICC, offers an unparalleled opportunity to streamline the cooperation between civil society organisations (CSOs) with the UN. Drawing from decades of fruitful collaborations between the UN and civil society, the UN Partner Portal is not just a platform – it has the potential to be a revolution in partnership management.
Discover the Benefits of the UN Partner Portal:
- Expand Your Understanding: Delve into the nuances of UN partnership processes and broaden your strategic perspective for successful collaborations.
- One-Stop Registration: Set up your online profile just once, and it’s accessible to multiple UN agencies. Embrace efficiency and say goodbye to repetitive data entry.
- Plethora of Opportunities: Consolidate your exploration for partnership opportunities from an array of UN agencies. Streamline your search, diversify your connections, and accelerate your progress.
- Idea Exchange: Submit your concept notes (both solicited and unsolicited) directly to UN agencies. Transform your innovative ideas into impactful realities.
Distinctive Features:
- Boosted Visibility: Your CSO profile alerts UN agencies of your field presence. Garner the recognition you merit and build potent synergies.
- Streamlined Declarations: A unified partner declaration system accepted across all UN agencies offers you the best of seamless uniformity.
- Accelerated Processing: Experience a substantial reduction in timelines for partnership selection and processing. Embark on your UN collaboration journey swifter than ever.
- Risk Profiling: Benefit from faster verification and risk profiling of prospective partners.
- Enhanced Analysis: Facilitate key partner profile data extraction for UN Agency analysis towards more data-driven decisions for impactful collaborations.
As trusted consultants to various CSOs and UN agencies, we understand the value of streamlined collaboration. The UN Partner Portal stands testament to the efforts of organisations such as UNHCR, UNICEF, and WFP in achieving the commitments of the 2016 World Humanitarian Summits’ Grand Bargain. This groundbreaking tool aims to minimise redundancy, lower management costs, and enhance partnerships with local and national actors.
Join this new era of global collaboration. Seize this opportunity to learn, connect, and grow with the UN Partner Portal. Here, unity meets opportunity, leading to unparalleled opportunities for progress.
Keywords: UN Partner Portal, Civil Society Organisations, UN agencies, partnership opportunities, concept notes, CSO partner profiles, partnership management, UNHCR, UNICEF, WFP, World Humanitarian Summits, Grand Bargain commitments, global collaboration.

Research on chlorine tablets for water treatment in emergencies
Research on chlorine tablets for water treatment in emergencies A multi-country analysis commissioned by UNICEF about the distribution, challenges and alternative water-treatment methods
Research implemented by movimentar GmbH (March 2022)
This post presents the result of a multi-country study on the use of chlorine tablets in emergencies which movimentar GmbH conducted for UNICEF’s Global Supply Division. The research brief summarises the results which covered a reproducible random sample of 40 sites (villages, camps, and towns) in Bangladesh, Ethiopia, South Sudan, Sudan and Yemen. These countries have been targeted by UNICEF programmes, and all sites had received chlorine tablets at least six months prior to the study.
Diarrhoeal diseases accounted for the death of over 500,000 children under-five globally in 2017 (1) and are the third leading cause of mortality in this age group (2). Approximately 88% of those deaths result from unsafe water, inadequate sanitation, and insufficient hygiene (3), making effective WASH interventions crucial for the health and survival not only of children but also youths and adults. Until centrally treated, piped water can be delivered to every family, the initial critical need is the provision of microbiologically safe drinking water to reduce the incidence of diarrhoea and other waterborne diseases. UNICEF’s global chlorine tablet programme distributes tablets and trains recipients on how to use them, with a focus on emergency or humanitarian situations.
The research brief below summarises the results of a study which covered a reproducible random sample of 40 sites (villages, camps, and towns) in five countries that have been targeted by UNICEF programmes. All sites had received chlorine tablets at least six months prior to the study.
Purposes of the study:
- To document the actual use of chlorine tablets by end-users and primary stakeholders in contexts of emergencies;
- To assess the correct use of chlorine tablets by recipients and compare their use with other HWTS (household water treatment and safe storage) technologies for supporting public health in humanitarian settings;
- To assess UNICEF’s global chlorine tablet programme, specifically in terms of areas for improvement related to distribution, monitoring, and effectiveness from a user perspective.
Methodology:
Data were collected between September and December 2021 and included a range of variables, such as measurement of water parameters (pH, turbidity, and free chlorine residual – FCR), from the following sources:
- Document review.
- Observations via transect walks and measurement of water-source parameters in a sample of 40 sites.
- Household survey and measurements of water parameters from 493 households in the observed sites.
- 163 key-informant individual and group interviews.
- Online surveys with 34 staff from governments, WASH partners, and UNICEF.
The study provided forward-looking recommendations that aimed at improving the UNICEF’s global chlorine tablet programme, as well as actions by WASH actors at all levels.
The research brief is available for download below. Please contact UNICEF Supply Division (supply@unicef.org) for further questions or to access the full report.
- UNICEF (2021): Diarrhoea. Available at https://data.unicef.org.
- Our World in Data: Causes of death in children under five year old. Available at: https://ourworldindata.org.
- CDC: Global Diarrhea Burden. Available at: https://www.cdc.gov.
ImmoBot – Artificial intelligence to forecast house prices
Young families can have a hard time navigating through the complexity and risks of real estate markets.
Is the price of a house in line with the market?
What is the amount that I should offer for a specific house?
Those planning to sell houses also need to engage one or more real estate brokers to estimate their house price; these, however, have an interest of their own in the process. Yet, when wanting to sell one’s house, one needs to know whether the house price estimate is in line with market prices.
Artificial intelligence can help to shed light on some of these questions. To demonstrate that, we used two machine learning algorithms to forecast house prices based on their specific characteristics. For that we used publicly-available web-scraped data from real-estate websites, which contain house offers in Bremen (where we are based).
Since the process behind the curtains can be quite complex, we packed everything into a web application (in short, ‘web-app’). This application, which we called “ImmoBot” simply requires users to specify the house characteristics and will then provide them with the estimates in no time. ImmoBot also presents the used dataset and plots, which we will be expanding from time to time.
Price forecasts are based on regression analysis from a gradient boosting algorithm and from a Random-Forest algorithm, which are common machine learning tools. Since the forecasts from each algorithm differ, we added the average of the forecast estimates as an additional piece of information.
Gradient boosting and Random Forest are decision tree-based ensemble models. In gradient boosting, a shallow and weak tree is first trained, and then the next tree is trained based on the errors of the first one. The process continues with a new tree being sequentially added to the ensemble and the new successive tree correcting the errors of the ensemble of preceding trees. In contrast, random forest is an ensemble of deep independent trees.
We hope that this helps as a simple example of how artificial intelligence can be applied while helping our Bremen users navigating the complex real estate markets.
Please share this web-app and let us know in case you have any suggestions or questions.
Disclaimer: Please notice that the price forecasts are merely illustrative. Neither movimentar GmbH nor any person acting on their behalf may be held responsible for the use that may be made of the information presented here.
Monitoring mountain glacier extents
Introduction
The retreat of sea ice and glaciers, sea-level rise, and extreme climate events are predicted to cause immense economic and social chaos. Around 10% of the world’s population lives in coastal areas that are less than 10 meters above sea level. Furthermore, farming requires the availability of sufficient fresh water, the largest source of which are rivers, which are largely fed by rain and mountain glaciers in the melt season. Therefore, monitoring mountain glacier extents is inevitable in the era of global warming.
A Geographic Information System (GIS) is a computer-based tool for storing, managing, analyzing, and visualizing many different types of data. GIS technology allows for deeper insights into data and supports decision-making in various fields. Geo-information is largely based on raster and numerical data collected from satellite sensors and plays a critical role in environmental data science. Satellite image processing is an important tool for any environmental scientist to research the climate and to support their research based on in-situ data. GIS also plays a key role in the evaluation of public policies, programmes, and projects related to natural resource management and other fields whenever datasets are georeferenced (include at least latitudes and longitudes of each observation).
In this study, we have analysed satellite images of the Bolivian mountain range of Cordillera Quimsa Cruz (Figure 1) to determine the glaciers’ extent of retreat. Such retreats can be measured through temporal analysis of multispectral satellite images. The spectrum-based supervised classification allowed us to distinguish snow-covered surfaces from other surface types and helped us to create a temporal glacier inventory. This post shows the step-by-step procedures to reproduce and monitor glacier changes anywhere. This aims at helping anyone interested in contributing to the evaluation of the impacts of global warming and hopefully motivate actions in terms of national and international policies, programmes, and projects.
According to Oxfam, Bolivia is particularly vulnerable to the impacts of climate change for six main reasons:
- It is one of the poorest countries in Latin America and suffers from one of the worst patterns of inequality. Low-income groups in developing countries are the most exposed to climate change impacts.
- It is the country in South America with the highest percentage of indigenous people and hence a concentration of poverty and inequality.
- It is one of the most bio-diverse countries in the world, with a wide variety of ecosystems that are vulnerable to different impacts from climate change.
- More than half of the country is Amazonian, with high levels of deforestation, which adds to the vulnerability to flooding.
- Located in a climatically volatile region, it is one of the countries in the world most affected by ‘natural’ disasters in recent years.
- It is home to about twenty per cent of the world’s tropical glaciers, which are retreating more quickly than predicted by many experts.
The map below shows Bolivia and the area which has been analysed here. We analysed glacier retreat in a time span of 30 years from 1985 to 2015 for the month of August with 10-year intervals. We observed one individual glacier, which showed a reduction of ice extent by 40.47% from 1985 to 2015. This indicates that global-warming consequences in Bolivia will affect the ecosystems and lives of its people, particularly those ill-equipped to adapt to the present and future impacts. Cities such as La Paz and El Alto are vulnerable to the accelerated retreat of glaciers, which are the sources of a significant amount of their drinking water (Oxfam 2009). In addition, thousands of poor Andean farmers are dependent on glacial melt for parts of their water supply to irrigate their crops. Glacier retreat can also impact in terms of unpredictability of the rainfall, more extreme weather events, and higher temperatures, with negative impacts for livelihoods and the ecosystems.
Preparative steps (data retrieval): Exploring satellite images
For creating this glacier inventory, that is, attribute tables, we used multispectral satellite images and a Digital Elevation Model (DEM). The attribute tables included latitude, longitude, area, and elevation for each polygon representing one specific glacier. In this study, images are used from satellites Landsat 05 and Landsat 08. The temporal interval is 10 years in the time span of 1985 to 2015. In case of the glacier retreat study, the end of the ablation period is important (in this study: August).
Images had to be selected that showed no visible cloud cover over the region of interest (ROI), with shadows as small as possible. The satellite images (Level 1 – Geo TIFF) and the DEM (ASTER GLOBAL DEM tiles) can be downloaded from the website of the U.S. Geological Survey.
In this study, images are retrieved from Landsat 05 (Thematic Mapper) for the years 1985, 1995, and 2005; from Landsat 08, we retrieved the Operational Land Image (OLI) and Thermal Infrared Sensor (TIRS) combined for the year 2015. The images from Landsat 07 (Enhanced TM) could not be selected due to scan-line failure.
Creating a false-colour RGB image
We created a project file in QGIS, loaded the desired bands (SWIR, NIR, and Red) of one of the downloaded satellite images taken on 5 August 1985, and created a false-colour image using the Build virtual raster tool.
Since our ROI is the mountain range, we cropped the remaining parts of the region using the Clip raster by extent tool. The extent of the first ROI image layer was later used to define the ROI extents of the rest of the satellite images layers in order to keep all layers of the same extent. We loaded the images of the bands SWIR, NIR, Red and clipped them using the extent of ROI. Similarly, the DEM was also clipped by using the extent of ROI.
Creation of ratio image
A sharp reflectance of Red colour by a snow-covered surface as contrary to SWIR can be used to classify snow surfaces from other surface types. A band ratio image is one of the methods, such as Red by SWIR, NIR by SWIR, and Normalized Difference Snow Index (NDSI). In this study, a simple band ratio Red by SWIR is used. The band ratio image can be created using the Raster Calculator tool.
Finding a suitable Threshold Value (THV) for supervised classification of glaciers
The THV can be determined by using the Identify Features tool located at the Attributes toolbar, zooming in somewhere to the edge of the glacier extent and clicking on the least visible pixel that can be identified as snow. A simultaneous look over the ratio image and false-colour image together with a google base map as shown in the figure below helped to identify the least possible THV.
The lowest THV was selected from the ratio image (i.e., 2.2), allowing us to assume that all pixels >2.2 were classified as snow.
Creating a reclassified image based on THV
We created a constant layer of a pixel size of 30m (same as ratio-image resolution) of the same extent as of the ratio image and saved it as a tif-file. Each pixel value of the constant layer should be 1. We now used a Raster Calculator to create a reclassified image of expression (band-ratio-image > THV) * constant layer and saved it as tif-file. The created layer was a reclassified raster image for the snow surface where pixel value 1 (white) represented snow surface and pixel value 0 (black) represented non-snow surface.
Converting to vector layer and separating the catchment areas
We converted the reclassified image (Figure 7) to a vector image using the Polygonize tool and, afterwards, Digitizing tools to clean the small polygons far from mountain ridges representing leftover snow in catchment areas. We were required to fill the gap areas among glaciers (that were not classified as snow due to shadow cover) and merge those small polygons that were aligned with relatively bigger polygons.
We needed to fix the polygons’ geometry using the Fix geometries tool before we divided glaciers (polygons) based on their location relative to mountain ridges. We could then draw boundary lines (a new vector layer) over mountain ridges with the help of the DEM (using Aspect) in order to divide the glacier regions.
Creating attribute table
The attribute table contains important information such as latitude, longitude, elevation, and area size. We extracted elevation information of polygons from DEM using the Zonal statistics tool. We calculated the area of the polygons using the Field calculator tool and fixed the geometry using the Fixed geometry tool. For information on the coordinates, we created a centroids layer using the Centroids tool to get centralized coordinate values of the glacier polygons by using the Field calculator tool, and joined the attribute table of Centroids layer to the vector layer by using the Joins feature in vector layer properties.
Repeating the procedure with satellite images of 1995, 2005 and 2015
The above procedure was repeated with other temporal images, using the same layer extent of ROI and boundary lines layer of 1985 throughout the process. The layout shows the retreating glaciers’ extent. The anomaly for a sub-divided glacier was calculated by using the attribute table below.
Figure 12 : Selected glacier to measure retreat extent 1985 (left) and 2015 (right).
Year | Latitude | Longitude | Area(sqkm) | Elev_mean | Elev_min | Elev_max |
---|---|---|---|---|---|---|
1985 | -1874765 | 673362 | 2.52 | 5329 | 4945 | 5711 |
1995 | -1874638 | 673415 | 2.15 | 5349 | 4995 | 5711 |
2005 | -1874663 | 673364 | 1.83 | 5322 | 4951 | 5704 |
2015 | -1874520 | 673402 | 1.50 | 5339 | 5034 | 5639 |
Table 1: Attribute table of the selected glacier in Figure 12.
The retreat of selected glacier extent from 1985 to 2015 equals 40.47%.
The consolidated map
The consolidated map was created using all the end-result glacier extent layers of the years 1985, 1995, 2005, and 2015 using the New Print Layout tool located at the Project toolbar.
Final remarks
Bolivia is particularly vulnerable to the impacts of climate change and is home to about twenty percent of the world’s tropical glaciers, which are retreating more quickly than predicted by many experts. This post shows the step-by-step procedures to reproduce and monitor glacier changes anywhere. This aims at helping anyone interested in contributing to the evaluation of the impacts of global warming and hopefully motivate actions in terms of national and international policies, programs and projects.
Using QGIS (version 3.10), we analyzed glacier retreat in a time span of 30 years from 1985 to 2015 for the month of August with 10-year intervals. The consolidated map shows a clear detectable temporal retreat of the extent of glaciers throughout the mountain range. The retreat is more prominent at eastern glaciers of the mountain range. We observed that one individual glacier, the glacier in Figure 12, has reduced its ice extent by 40.47% from 1985 to 2015. We can also see in the consolidated map that the overall glaciers’ extents are undergoing temporal retreat even though a lower THV is used for later years. The THV used were 2.2, 2.0, 2.0, and 1.4 for the years 1985, 1995, 2005, and 2015, respectively. This indicates that global warming consequences in Bolivia will affect the ecosystems and lives of its people, particularly those ill equipped to adapt to the present and future impacts. Cities such as La Paz and El Alto are vulnerable to the accelerated retreat of glaciers, which are the sources of a significant amount of their drinking water. In addition, thousands of poor Andean farmers are dependent on glacial melt for part of their water supply to irrigate their crops. Glacier retreat can also impact in terms of unpredictability of the rainfall, more extreme weather events and higher temperatures, with negative impacts for livelihoods and the ecosystems.
In the future, a similar study could be applied to Himalayan glaciers (up to 2000m above sea level) in southeast Asia, which provide water to five river basins of the Indus, Ganges, Yellow, Brahmaputra, and Yangtze rivers for 1.4 billion people (over 20% of the global population).
Written by Usman Ahmed (MSc) and Eduardo W. Ferreira (PhD)
Acknowledgement
We would like to express our gratitude to Annalena Oeffner Ferreira and Valerie Kateb in reviewing and providing suggestions for this post. Many thanks to Dr. Marco Möller (University of Bremen) for supervising the project which made this post possible.
© Headline photo by WaSZI
Useful Links
Data and images
https://movimentar.co/glacier_data
Used GIS application: QGIS version 3.10
https://download.qgis.org
Satellite images:
https://earthexplorer.usgs.gov/
https://www.usgs.gov/faqs/what-are-band-designations-landsat-satellites?qt-news_science_products=0#qt-news_science_products
Starting an ECHO or DIPECHO application under time pressure
In one of my previous assignments I had to cope with a 15-day deadline to deliver distance-facilitation services for the design of a new disaster-preparedness project in Asia for one of my clients (an international non-governmental organisation). The team was composed of staff members in different locations (Europe, national and sub-national levels). So, I had to minimise the working time of the client’s team as well as I could. I asked myself then: What are the main sections of the e-Single Form that one should start from when preparing an ECHO/DIPECHO application with a short submission deadline?
I present and explain in this post my attempt of answering this question. Why one should start in the order suggested here? In short, my answer is that: They are interconnected in a way that it is easier to design their contents in the order presented here. Although this post focuses on the design process of an ECHO or DIPECHO action, I believe that some of the ideas here can be helpful in writing proposals or offers for other competitive procedures as well.
ECHO and DIPECHO proposals need to be prepared in a special PDF file, the e-Single Form, which only works if you open the file using some Adobe Acrobat product (Linux and Mac users need to have at least the free version of Adobe Reader installed). Once the e-Single Form has been generated using APPEL (the application system for electronic exchange of information between DG ECHO and its partners), you will see its various sections (or chapters, if you prefer) such as: title of the action (“action” is how a project or a programme is usually called in competitive procedures by the European Union, incl. ECHO/DIPECHO), narrative summary, area of intervention, start date and duration, presence in the area, problem, needs and risks and response analyses, previous evaluations or lessons-learned exercises, among others.
All these sections are very important but what would be the most important ones to begin with when you are about to start working on a new proposal?
In my opinion, the four most important points in sequential order are:
- Problem, needs, risk and response analyses
- Logframe + monitoring and evaluation design
- Budget
- Estimation of direct beneficiaries
Problem, needs, risk and response analyses
This should be the first step in the proposal development process. Depending on the information you already have, your team may still need to do some fieldwork after the publication of the humanitarian implementation plan (equivalent to a call for proposals for ECHO/DIPECHO actions). ECHO partners’ site explains this section in detail. In the e-Single Form, the indicators for the specific objective and results have fields for baseline and target values. So, ideally, you should already have such updated analyses ready by the time of the application, since they should inform the logframe design.
Depending on their quality, important data sources are also endline studies or final external evaluations from past projects.
For this section, you will need to provide information about:
a) the date when the assessment took place,
b) the methodology employed in the assessment (e.g., sampling processes and analytical frameworks such as the food consumption score, which will depend on the focus of your proposal);
c) problem, needs and risk analysis, and
d) response analysis, which explains how you plan to respond to the aspects mentioned in the previous point (problem, needs and risk analysis).
The length of the text you can add to the form is limited (usually a maximum of 2,000 characters) but one may add an annex. Although it is optional, such annex can help ECHO to assess the quality of your analysis and response strategy in more detail.
Logframe + monitoring and evaluation design
The logical framework (or logframe) is a table that summarises your project in a standard format. It should help anyone to understand the proposed action and its contexts in a brief but effective way. Thus, it is good to define abbreviations and avoid them at least in the logframe. You will find additional information on the logframe in ECHO’s helpdesk website and in the EU Project Cycle Management Guidelines. The logframe needs to be firmly linked to the monitoring and evaluation design. This will make your life easier later when assessing, learning and reporting on the implementation. It will also contribute to support accountability of the implementation.
Let us now first focus on the logframe.
You should not change the basic purpose of the action without consulting with the donor first (or contractor, if you prefer – please see ECHO’s information on changes to actions). The basic purpose of the action can be defined by the intervention logic (first column of the logframe), which includes principal and specific objectives, results and activities. Funders, donors and taxpayers assume that one first did careful planning before submitting a proposal and clearly highlighted any important pre-conditions, risks and assumptions that can affect the action’s implementation. Following that principle, however, the logframe should be seen as a “live” document, which needs to be updated throughout the implementation based on context changes and new synergy opportunities.
When designing the logframe, one critical aspect is to develop quantified and clear activities. They need to reach their respective result and should be presented in a sequential or chronological order. Sub-activities are usually not presented in the intervention logic in order to ensure clarity by simplicity. Optionally, one can include sub-activities in the work plan, which you will need to submit as well. Activities, however, need to be defined in detail in order to ease estimation of direct beneficiaries, indicators and budget. In the example below, for a two-year action, one could use the information between parentheses for internal purposes only (e.g., budget estimation and direct beneficiary estimation):
To conduct 24 trainings on disaster-safe buildings to 480 workers in districts W, X, Y, and Z (3 yearly trainings of 180 hours per location in the first two years for 20 participants each).
Please check ECHO’s Key Result Indicators or KRIs. KRIs are proposed for the five sectors that jointly cover 80% of ECHO funding: food, nutrition, health, wash and shelter. KRIs enable ECHO to aggregate data on the results of the actions it has funded. In addition to highlighting some of ECHO’s priority development changes, KRIs will also help you inspiring yourself for designing any custom indicators you may need to use. Therefore, the logframe is where you can plan a viable and solid monitoring and evaluation system for your action.
Technical care about transparent and statistically sound data (from sampling, to collection, cleaning and analysis) are key here not only for one’s proposal but also for the improvement of the humanitarian and development policy sector in general. Powerful statistical computing (e.g., RStudio) and geographic information system (GIS) applications (e.g., QGIS) are available for download free of cost to anyone. Wonderful online courses on websites such as Coursera (I particularly like the one on Data Science, but there are many others, including on questionnaire design for social science) and videos can help one evolving ones understanding on such tools. In short, although you will be under time pressure (remembering our starting question), donors expect you to clearly explain and demonstrate the technical qualities of your M&E plans. Some external support (contact us) here can make your life easier later while ensuring validity and reliability of your data and results. A good yardstick to help thinking on what one should invest for the action’s M&E can be: Go for a monitoring and evaluation design for which you would not mind publishing its documented procedures and results.
The increased rigour comes in some extent from the improved accessibility to high-quality data analysis and sharing tools (e.g., cloud databank services) combined with pressure for improved transparency in national and international public service delivery, including the delivery of national and international humanitarian action.
Documenting all your steps from sampling to analysis in some freely accessible markdown language (e.g., RMarkdown) is becoming more and more important. In line with nowadays’ scientific age and good-governance standards, I see five crucial characteristics of high-quality and modern M&E design. Such a design:
- Tries to account for bias through reproducible sampling and source triangulation, clearly explaining limitations and margins of error.
- Ensures (statistically) true beneficiary participation for valid and reliable results (e.g., beneficiary satisfaction surveys with trained, independent interviewers) while maximising efficiency in the data collection process,
- Employs a good blend of quantitative and qualitative data collection methods,
- Ensures transparent implementation by documenting well all procedures (e.g., using the freely accessible RMarkdown or syntax of other paid statistical packages) and publicly sharing them together with monitoring and evaluation data (e.g., publishing online while ensuring beneficiaries’ anonymity), and
- Counts on some form of validation by an independent evaluator comparable to what financial auditors do, although in a more supportive and learning-oriented manner. I tend to think that evaluators nowadays need to work more as facilitators of a multi-stakeholder dialogue, providing constructive feedback for improvement, as a good research supervisor in an institute or university would do. This is particularly important for projects testing innovative approaches with potential for replication in other geographical areas, since scaling them up requires robust analysis.
Budget
The financial statement annex is a must apart from urgent actions (e.g., sudden natural or man-made disaster). There is no compulsory format you will need to use for ECHO. According to ECHO’s helpdesk: “… the partner can use its own internal financial reporting formats provided that the requested information” is included. The sample financial statement shared by ECHO is helpful to know more about the type of information and level of detail they need to see as minimum requirements. It includes a description, total budget in Euro and information on the percentages attributed to each result in the logframe. However, in order to estimate the figures to fill out the optional template for a financial statement one often needs to prepare a separate full budget, which includes each item with units, unit costs and total costs. For that, the standard budget format used in other EU external actions can come handy. I would recommend using that standard budget format to estimate costs for the entire action duration and for the first year (12 months), indicating in the description of the activity-related budget lines the respective activity that the budget line refers to. Once you have the sheet budget and financial sources ready, you can add the sheet “financial statement” to the file and indicate percentages by result.
The three main advantages of the standard budget format are that:
- it fits well to the logic of the financial information you will need to provide in the e-Single Form,
- it allows you and ECHO to review the unit costs and year 1 forecast with more information, and
- it contains additional details to help your finance team to plan costs in a more structured and complete manner. This can be useful particularly when you face the problem of short time before the submission deadline and need to delegate the drafting part of this task.
Estimation of direct beneficiaries
In the e-Single Form you will need to provide figures of direct beneficiaries. Those are organisations and/or individuals who will receive the assistance or who will benefit directly from the action, within its timeframe. ECHO partner’s site explains this section in detail. You will need to provide estimated percentages of female and male beneficiaries in different age groups.
According to ECHO, “partners are free to select the options which correspond best to the nature of their action, provided that the same beneficiaries are not counted twice”. One alternative to estimate such figures is to build a table with activities in its lines and then include columns to estimate male and female direct beneficiaries by activity (this will be much easier with well-quantified activities. See point above). Based on that, one can include other columns for each one of the age subcategories as from the e-Single Form. Such a table is likely to suffer from multiple counting. One possibility to account for that can be to employ some correction factor.
For example, let us assume that your organisation is planning to undertake 2 trainings to the very same participants in different activities and each of the trainings will have 20 participants. So, correcting the simple sum of participants (2 trainings * 20 participants = 40 participants) by 50% (40 * 0.5 = 20) will be closer to the real number of participants who directly benefited from the action. One can also increase this correction factor to 60% (40 * 0.4 = 16) or 70% (40 * 0.3 = 12) to account for any potential risk that may lead to a lower than expected number of participants. In this way, one can avoid overestimating direct beneficiaries.
Those are just some initial ideas to start designing and writing up your proposal. You and your team will still have a lot of work ahead of you to fill out the entire e-Single Form once you are finished with the first drafts for the four sections above. However, I hope that these ideas can help you preparing a more effective and competitive proposal despite short deadlines. I believe that this can contribute to improved operations and delivery. I also hope that this post can motivate further discussion and knowledge exchange among those like me, interested in this sort of questions. So, please share your views and experiences. They will be most welcome!
More tips about logframe and M&E design:
- New EuropeAid logframe template
- Designing survey forms with evaluative scales
- Reproducible random-sample generator in Shinyapps for improved data and learning
Written by: Eduardo W. Ferreira, PhD – Consultant, trainer and facilitator in designing, managing and evaluating projects and programmes in Africa, Asia, Europe, Central and South America for governments, consultancy firms, research institutions, international and non-governmental organisations.
Digitalisation and big data – A cost-effective and reliable way for decision-making in the international development context
movimentar GmbH is a social enterprise providing advisory, training, and software development services for organisations and individuals working in sustainable development and humanitarian actions. Its founder Eduardo works as a project-management and data-science consultant, trainer, and facilitator in designing, managing, and evaluating projects and programmes across the globe. In the following interview, he outlines his work, his motivation, and why he thinks that using technology and data science in international development projects can speed up humanity’s efforts towards sustainable progress.
Eduardo, why do you do the job you do today?
Digitalisation and big data are gigantic challenges for agencies working in the international development and humanitarian sector including governments, international organisations, civil society, and the private sector. Our job is to help these actors, using technology and data science (statistical computing), to get the most of their data in a cost-effective and reliable way in time for decision-making. Science and technology can speed up humanity’s efforts towards sustainable progress by helping actions to become more evidence-based and results-driven.
Why are you working in this industry?
I do not believe that there are single industries anymore. Everything is connected, and data science can help us to see that better with increased precision and quality. My goal is to bring science and technology closer to project and programme management in different thematic areas. Independently from the topic, professionally designed digital data-collection processes together with big-data techniques (e.g., text mining) and artificial-intelligence algorithms such as machine learning methods can work just like glasses. They help one see better. They are not there to replace one’s eyes (and brains), but they allow for a more realistic picture of the world with reduced error and bias. Hans Rosling is a great inspiration to me, and I try to contribute to his legacy of promoting more factfulness in public policies and international development actions.
What do you enjoy most about your job?
Discovering new ways to contribute to sustainable development for thousands of people. Every assignment is full of discoveries. It always amazes me to see new and counter-intuitive relationships between variables and problems. As anything else, they are also continuously changing in space (geographic area) and time. Analysing the interrelations between variables and problems is the first step towards addressing the world’s problems.
What was the best decision in your career and why?
Start learning computer languages, especially R and Python. They are always evolving and one is always learning something new. I have always worked with international cooperation for development and humanitarian aid. This is a sector that has seen many impressive achievements such as the reduction of extreme poverty by half in the last 20 years. The more progress humanity achieves, the harder it gets to make further improvements. R and Python are the key languages when it comes to data science and artificial intelligence. They are free software and therefore extremely powerful through their worldwide community. Everyday one can see further developments by thousands of developers around the world. Data-science skills have increased my potential contribution to more evidence-driven decisions and transparency in the international development and humanitarian sectors.
What makes you excited about Mondays?
The possibility of taking part in the improvement of the lives of more people around the world.
What has been your greatest career disappointment? What did you learn from it?
Once I worked on a large action with environmental groups working with development education. I designed a system to provide them with real-time information on the satisfaction of participants about their activities with digital data collection. I was invited to present the system at a large initial partner conference, and I did. I had so far only exchanged ideas with the leading organisation and did not know the others. During my presentation, some of the older and more influential members were determined in not letting me even present the tools. We had to interrupt the presentation and exclude the data-collection system from our contract. My lesson from that: To never underestimate the fear of technology as well as the change-management efforts particularly among older and less technology-savvy decision-makers.
Describe the environments in which your leadership style is most effective. Where have you been frustrated and less successful?
Charles Darwin once said: “In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed.” Participation and collaboration are of crucial importance. Multiple minds can produce better than one. Technology is helping people to collaborate more effectively. Every now and then we come across cases where people still prefer a more top-down approach. Depending on the organisational culture of the client, some staff are not willing to make direct contributions to the text of a proposal, for example, and prefer to make general or subjective comments without taking ownership for the product. I understand that this is also a way to reduce risk exposure, but it can be inefficient and slows down processes. Luckily, these cases are becoming increasingly less frequent.
What does success look like for you?
Increased capacity of our clients to deliver humanitarian and development actions faster, at scale and with real-time information on their impact and beneficiary satisfaction.
We are on LinkedIn
We are pleased to announce that we are now officially on Linkedin to engage and connect with our clients and people interested in digitalisation, data science and technology for management and evaluation of international development and humanitarian actions. Make sure you check our company page regularly, as we will update you on current projects and any potential job opportunities. Connect with us and join the exchange! #projectmanagement #datascience #internationaldevelopment
Reproducible random-sample generator in Shinyapps for improved data and learning
Reproducible samples and analyses are critical for data quality, particularly in monitoring and evaluation of project activities. Okay, some may say, but does it really matter at all? Yes, it does. It helps setting a seed for the future.
Evolution in project monitoring and evaluation
Let us take some capacity-development activity such as a training as an example. In the past, it might have been easy to simply write a “qualitative” description of the contents of a training with some technical jargon / acronyms and mention the number of participants that were there (hopefully also being able to provide or “estimate” a percentage of women among participants).
Monitoring and reporting project implementation was largely seen as a bureaucratic requirement. There was no structured way to learn from participants. There was virtually no systematic data collection for knowing how to improve training contents, methods, materials and results based on participants’ views and suggestions (feedback).
This fitted well with the traditional top-down approach to capacity development. In a way, it also made sense. Digital data-collection tools were complicated to use and considered expensive to maintain. They also represented more pressure on project managers and staff.
Project managers could be confident (and I am afraid some still are) that treating reports merely as regular “literary” exercises while focusing efforts on financial compliance would be enough. After all: “in the end we will write something nice in the implementation report”. Learning from project implementation and evolving from experiences were suffocated by a binary logic of “success” or “failure”. In such a context, it is easy to miss the fact that experimentation “failures” are important steps towards learning for impact success.
Paradigm change
This context has been changing fast in the era of data abundance and analytics. Many still see terms such as “automation” and “machine learning” as threatening. Personally I think that improvised, unstructured and scientifically-weak monitoring, evaluation, accountability and learning systems have done enough harm in terms of loss of resources and opportunities. This is particularly so in the public and international development sector. It is great to see that things are finally evolving from discourse to practice.
Learning from experience is gradually becoming easier and cheaper. The powerful and open source computational tools that are freely available such as R and Python can make it easier to reduce sample-selection bias but require at least some basic knowledge of their syntax. Many organisations are still adapting to the paradigm shift from top-down, specialised expertise to a more collaborative and multidisciplinary data-driven approach to monitoring, evaluation and learning. This process requires data-science skills that blend computing and statistics following professional monitoring and evaluation standards. Investment in human resources and targeted recruitment / contracting are key. Data management and analysis using traditional spreadsheet software such as MS-Excel and conventional, proprietary statistical packages (e.g., SPSS and STATA) are not enough anymore for a world with complex, unstructured data.
Sampling in a scientifically-robust (but simple) way
A common question that clients have asked me is about how best to select participants for feedback surveys in activities such as training and events. Thinking about this and the context above I developed a very simple app using Shinyapps. This app generates a reproducible random sample list of numbers. Samples are reproducible with the function set.seed(44343) using R.
You can access and use the app at https://movimentar.shinyapps.io/randomsample/. You will simply need to input the total number of participants in the event/activity, and a sample percentage. This will depend on the size of your activity. After that, you can visualise the result in a reactive table and download the output in XLSX format (MS Excel).
The “magic” here is that if anyone executes set.seed() with the same number specified between the parentheses, one will always see the same randomised sample. This makes it reproducible while avoiding the problem of sample-selection bias. So, people in the future can also learn from your experience with assurance that you put some thought into data quality.
It is also possible to draw reproducible samples in many other statistical computing languages. In Python, for example, you should import random and call random.seed() to set your seed number. After that you need to import numpy and call numpy.random.choice() to get your sample. However, be aware that the seed number (44343) used as reference in the randomsample app will generate a different sample in Python as the app is built in R.
The app’s source code is publicly available for download on Github. I hope that this helps others to learn more about these tools. Code contributions will be very welcome too.
Let us learn for real. It is time to set.seed() for the future.
Written by: Eduardo W. Ferreira, PhD / Consultant, data scientist, trainer and facilitator. Eduardo supports designing, managing and evaluating projects and programmes for consultancy firms, non-governmental organisations, governments, research institutions and international organisations (Additional information).
Designing survey forms with evaluative scales
Project management has some things in common with playing with a kite. One needs to adapt well and quickly to changes in external conditions following observation of performance. Otherwise, one runs the risk of blindly hitting the ground.
Differently from kites, though, a project or a programme requires more than open eyes. It requires sound data collection and analysis. Beneficiary / client feedback in opinion surveys can help tracking performance, user satisfaction and improvement needs. This is particularly so if one uses reproducible, computer-based random samples in line with professional statistical/data-science standards. This is the way to learn from project implementation based on modern scientific and computational methods.
Lesson learned
Some time ago, I had a consulting assignment with a youth-violence reduction project in Brazil. The project needed baseline data for its logframe indicators. So, we designed a system for collecting, storing, processing and reporting data using Open Data Kit in Android devices. We also used R (statistical-computing language) for programming a reproducible sample, as well as all data processing and analytical reporting.
Before collecting data, I had to train over 25 people including interviewers and partner staff, who also suggested changes to the data collection form. This post is about one of these suggestions, which was a particularly good lesson learned.
Jane Davidson’s article “Breaking out of the Likert scale trap” inspired me to propose the inclusion of direct evaluative questions instead of the traditional Likert scales. It is a very good post claiming that by using evaluative terms right in the questionnaire, participant ratings become a lot easier to interpret in terms of quality or value. I also think so.
The Likert scale using “strongly agree” to “strongly disagree” is great for assessing opinions and knowledge from respondents. However, the scale makes it difficult to draw evaluative conclusions on quality or value of a training workshop, project or programme, for example. So, the scale suggested by Davidson was as follows:
- “poor / inadequate”;
- “barely adequate”;
- “good”
- “very good”
- “excellent”
The draft data-collection form used the same label categories as those above but translated to Portuguese. During the interviewer training workshop, one of the participants spotted a potential problem that I also did not notice before. The label categories were not well balanced…
The problem was that in the scale above there are three positive and two negative scale categories or levels. Hence, the likelihood / probability of a positive result tends to be higher. Those unbalanced options are a potential source of bias.
For preventing such bias, we changed the labels proposed by Davidson to:
- “very poor” or “very low”
- “poor” or “low”
- “regular” or “average”
- “good” or “high”
- “very good” or “very high”.
Additional answer categories
I would recommend to include the categories “Not sure, I don’t know”, “Not applicable”, in order to allow a more complete respondent feedback. The numeric scale can integrate these new categories depending on the question (e.g., answering not applicable or reporting not to know the action under evaluation can also indicate the quality of its outreach and impact).
Sometimes, it can also be interesting to have the answer option “I do not want to answer” for sensitive questions about income or abuse, for example. This option, of course, should not be part of the numeric evaluative scale. Otherwise, one will mix up different types of result.
Numeric analysis
The corresponding numeric intervals must also be balanced.
For a scale from 1 to 5 (one being the worst case, as in the article from Davidson, or the other way round as it is the case in Germany where the score one is the best), the interval from the function “cut” in R (statistical computing language) is:
> cut(1:5, breaks = 5)
[1] (0.996,1.8] (1.8,2.6] (2.6,3.4] (3.4,4.2] (4.2,5]
This would be equivalent to:
- from 1 to 1.80: “very poor/very low”
- from 1.81 to 2.60: “poor/low”
- from 2.61 to 3.40: “regular”
- from 3.41 to 4.20: “good”
- from 4.21 to 5.00: “very good”
The same can be done for a scale based one the interval from 1 to 7 if one includes the categories “Not sure, I don’t know”, “Not applicable”. The R output from the cut function for a scale with seven categories is as follows:
> cut(1:7, breaks = 7)
[1] (0.994,1.86] (1.86,2.71] (2.71,3.57] (3.57,4.43] (4.43,5.29]
[2] (5.29,6.14] (6.14,7.01]
Preventing response bias
For further preventing bias, the survey introduction can try to make survey participants aware about the risk of providing biased answers. An introduction following the paragraph below can help:
Respondents in such questionnaires sometimes repeat the same answers for different questions, mark extreme answers trying to be polite or as form of calling attention to a specific aspect, or even rate items in the middle categories in order to keep neutrality when they are actually thinking something else. Please avoid this as much as you can, as it prevents us from understanding the real situation.
If you are asking for real feedback from clients/beneficiaries and stakeholders, interviewers must be external to your project team. Ideally, they should be outsourced and receive training on interviewing methods and not associated to the implementing organisations or related to their staff members. This helps preventing interviewer bias (when results are different depeding on who collects data). This can be the case, for example, when humanitarian-aid beneficiaries have suggestions for support improvement but fear loosing future support after having provided critical feedback.
Final remarks
I benefited from Davidson’s contribution and I thought it would be good to try to contribute as well. Monitoring and evaluations with robust scientific standards can powerful for learning and improving policies, programmes, projects and products.
The evaluative scales can be very helpful but it does not mean that Likert scales should be avoided by all means. I also use Likert scales in my forms, particularly in those aiming to test subject knowledge from participants in capacity development actions such as projects including training workshops or a course module.
Also, it is worth including an open question about problems (e.g., What are the three main problems in your village?) as well as an open question about suggestions for improvement or additional comments. Text data can be analysed with word clouds and dendrograms, for example. This can complement well scoring data in monitoring and evaluation. It is also an opportunity for projects and programmes to track opportunities while making sure that they are addressing the issues that their beneficiaries or clients consider most important.
I hope you enjoyed this post and would be happy to receive any suggestion or comment.
Good monitoring and evaluation!
How much are charity, fundraising, NGO and non-profits currently paying their new staff? Web scrapping CharityJobs
In this post I try to explore this and some other questions using open-source statistical computing R language and public recruitment data from CharityJob’s website. According to CharityJob, the site is the United Kingdom’s busiest one for charity, fundraising, NGO and not for profit jobs.
In addition to presenting these powerful open-source tools and data-exploring techniques, I hope that this post can help the public, specially applicants and workers to get an update on salaries and trends in the sector. The jobs analysed here are mostly UK-based ones and published by UK-based organisations. Therefore, the results below are not meant to represent the entire sector worldwide. I still hope though that this post can provide some positive contribuition to the evolution of the sector in both the southern and the northern hemispheres.
For those of you who are only interested in the end analysis, please jump to the results section. However, I encourage you to explore how these tools work. I believe that they can help speeding up and improving quality of the so-much-needed charity, social-enterprise, development-aid and humanitarian work globally.
I used here some basic techniques of web scraping (web harvesting or web data extraction), which is a computer software technique of extracting information from websites. The source code in RMarkdown is available for download and use based on GNU General Public License at this link: Rmarkdown code. Everything was preapred with the open-source, freely-accesible and powerful statistical computing language “R” (R version 3.2.0 ) and the development interface RStudio (Version 0.99.441).
This post is based on public data. The post is my sole responsibility and can in no way be taken to reflect the views of CharityJobs’ staff.
Downloading data from CharityJobs
Using RStudio, the first step is to download the website data. CharityJobs’ search engine contains over 140 webpages, each of them with a list of 18 jobs in most cases. Hence I expected to get information about around 2,500 job announcements. For that, the first step was to download the data and get rid of what I did not wanted (e.g. css and hmtl codes). The code chunck below describes how I did it. The code contains explanatory comments indicated by hashtags (‘#’). I am sure that many would be able to write this code in a much more elegant and efficient way. I would be very thankful to receive your comments and suggestions!
# Loading the necessary packages. It assumes that they are installed.
# Please type ‘?install.packages()’ on the R console for additional information.
suppressWarnings(suppressPackageStartupMessages(require(rvest))) # Credits to Hadley Wickham (2016)
suppressPackageStartupMessages(require(stringr)) # Credits to Hadley Wickham (2015)
suppressPackageStartupMessages(require(lubridate)) # Credits to Garrett Grolemund, Hadley Wickham (2011)
suppressPackageStartupMessages(require(dplyr)) # Credits to Hadley Wickham and Romain Francois (2015)
suppressPackageStartupMessages(require(xml2)) # Credits to Hadley Wickham (2015)
suppressPackageStartupMessages(require(pander)) # Credits to Gergely Daróczi and Roman Tsegelskyi (2015)
suppressPackageStartupMessages(require(ggplot2)) # Credits to Hadley Wickham (2009)
## Creating list of URLs (webpages)
urls <- paste(“https://www.charityjob.co.uk/jobs?page=”, seq(1:140), sep = “”)
## Downloading website information into a list called `charityjobs` and closing connections
charityjobs <- lapply(urls, . %>% read_html(.))
Tyding up and parsing data
The next step is to parse or clean up the text string of each of the about 140 webpages. I decided to build a custom function for that, which I could use to loop through the content of each element of the charityjobs list. The function should also save the parsed data into a data frame. This data frame should include information on recruiters, position titles, salary ranges and deadline data. The code chuck below presents this function, which I called salarydata.
## Creating a function for parsing data which uses the read_html output (list ‘charityjobs’)
salarydata <- function(list) {
# Creating auxiliary variables and databases
list_size <- length(list)
salaries <- data.frame(deadline=character(),
recruiter=character(),
position=character(),
salary_range=character())
for (i in seq_along(1:list_size)){
size <- list[[i]] %>% html_nodes(“.salary”) %>% html_text() %>% length()
#Intermediary dataframe
sal <- data.frame(deadline=rep(NA, size),
recruiter=rep(NA, size),
position=rep(NA, size),
salary_range=rep(NA, size))
## Filling out intermediary data for deadlines for application
sal$deadline[1:size] <- list[[i]] %>%
html_nodes(“.closing:nth-child(4) span”) %>% html_text() %>%
.[!grepl(“^[Closing:](*)”,.)] %>% rbind()
## Filling out intermediary data for recruiters
sal$recruiter[1:size] <- (list[[i]] %>%
html_nodes(“.recruiter”) %>% html_text() %>%
gsub(“\r\n\ \\s+”, “”,.) %>%
gsub(“\r\n”, ” “, .) %>%
gsub(“^\\s+|\\s+$”, “”, .)) %>%
rbind()
## Filling out intermediary data for positions
sal$position[1:size] <- list[[i]] %>%
html_nodes(“.title”) %>% html_text() %>%
gsub(“\r\n\ \\s+”, “”,.) %>%
gsub(“\r\n”, ” “, .) %>%
gsub(“^\\s+|\\s+$”, “”, .) %>%
rbind()
## Filling out intermediary data for salary ranges
sal$salary_range[1:size] <- list[[i]] %>%
html_nodes(“.salary”) %>% html_text() %>%
gsub(“(£..)\\.”, “\\1”, .) %>% gsub(“\\.(.)k(+) |\\.(.)K(+)”, “\\100 \\2”, .) %>%
gsub(“(*)k(+) |(*)K(+)”, “\\1000 \\2”, .) %>%
gsub(“k”, “000 “, .) %>% # Substituting remaining ks
gsub(“^(£..)\\-“, “\\1000; “, .) %>% # Adding thousands for figures withou “k”
gsub(“- £”, “; “, .) %>% gsub(“-£”, “; “, .) %>% gsub(“£”, “”, .) %>% # Removing pounds signs
gsub(“-“, “;”, .) %>% gsub(“–”, “;”, .) #%>% # Removing dashes
## Excluding per-hour and per-day jobs
sal <- sal %>% filter(!grepl(“hours”, sal$salary_range))
sal <- sal %>% filter(!grepl(“hour”, sal$salary_range))
sal <- sal %>% filter(!grepl(“p/h”, sal$salary_range))
sal <- sal %>% filter(!grepl(“week”, sal$salary_range))
sal <- sal %>% filter(!grepl(“ph”, sal$salary_range))
sal <- sal %>% filter(!grepl(“day”, sal$salary_range))
sal <- sal %>% filter(!grepl(“daily”, sal$salary_range))
sal <- sal %>% filter(!grepl(“plus”, sal$salary_range))
sal <- sal %>% filter(!grepl(“\\+”, sal$salary_range))
salaries <- rbind(salaries, sal)
}
return(salaries)
}
Creating full dataframe and other adjustments
The last step before exploring the data was to run the function salarydata to create the full dataframe. After that, I parsed lower and upper salaries into separated columns, deleted data which may have been incorrectly parsed or data concerning daily-rate and hourly-rate jobs / consulting assignments. Only yearly salaries between GBP 4,000 and GBP 150,000 have been considered. All salary data is in British Pounds (GBP) and refer to annual salaries, which sometimes do not include benefits such as pension.
Cleaning the salary-range variable was a tricky step as the website allows users to type in both salary amounts and additional text (e.g. 30,000, 30K, or 25-30k). Therefore, I had to iterate some times until the output was good enough. I am quite sure that the code chunk below can be written in a more elegant way. Again, please let me know in case you have any suggestions here.
# Creating a full and clean dataframe
salaries <- salarydata(charityjobs)
# Parsing salary-range variable
salaries$salary_range <- gsub(“, “, “,”, salaries$salary_range) %>%
gsub(” ; “, “;”, .) %>% gsub(“; “, “;”, .) %>%
gsub(“,[:A-z:]”, ” “, .) %>%
gsub(“\\(*”, “”, .) %>%
gsub(“\\:”, “”, .) %>%
gsub(“[:A-z:],[:A-z:]”, ” “, .) %>%
gsub(“(..),00\\…”, “\\1,000”, .) %>%
gsub(“(..),0\\…”, “\\1,000”, .) %>%
gsub(“[A-z]”, “”, .) %>% gsub(“,”, “”, .) %>%
gsub(“\\.”, “”, .) %>% gsub(“^\\s+”, “”, .) %>%
gsub(“\\s([[:digit:]])”, “;\\1”, .) %>%
gsub(“\\s+”, “”, .) %>% gsub(“^[[:digit:]]{1};”, “”, .) %>%
gsub(“\\(“, “”, .) %>% gsub(“\\)”, “”, .) %>% # Deleting “(” and “)”
gsub(“\\/”, “”, .) %>% gsub(“000000”, “0000”, .) %>% # Deleting “/” and correcting digits
gsub(“([[:digit:]]{2})00000”, “\\1000”, .) %>% # Correcting number of digits
gsub(“([[:digit:]]{5})00”, “\\1”, .) # Correcting number of digits
# Adjusting data and computing lower and upper salaries using “;” as separator
salaries <- suppressWarnings(salaries %>%
mutate(upper_salary=gsub(“^.*;”, “”, salaries$salary_range)) %>%
mutate(lower_salary=gsub(“;.*”, “”, salaries$salary_range)) %>%
mutate(upper_salary=as.numeric(upper_salary)) %>%
mutate(lower_salary=as.numeric(lower_salary)) %>%
filter(upper_salary<150000) %>% filter(upper_salary>4000) %>%
filter(lower_salary<150000) %>% filter(lower_salary>4000) %>%
mutate(lower_salary=ifelse(lower_salary>=upper_salary, NA, lower_salary)) %>%
filter(is.na(upper_salary)!=TRUE) %>% tbl_df() %>%
select(deadline, recruiter, position,
lower_salary, upper_salary, salary_range) %>%
mutate(deadline=dmy(deadline)))
The output below presents the summary of the full dataframe (first 10 observations).
## Source: local data frame [1,704 x 6]
##
## deadline recruiter
## (time) (chr)
## 1 2016-09-04 ZSL
## 2 2016-09-12 Alliance Publishing Trust
## 3 2016-08-31 Save the Children
## 4 2016-08-30 Blind Veterans UK
## 5 2016-09-08 Headway SELNWK
## 6 2016-08-30 Saferworld
## 7 2016-09-22 Pro-Finance
## 8 2016-09-06 TPP Recruitment
## 9 2016-09-06 Harris Hill
## 10 2016-09-20 Hays London Ebury Gate
## .. … …
## Variables not shown: position (chr), lower_salary (dbl), upper_salary
## (dbl), salary_range (chr)
Results
The final dataset contains information of 1,704 jobs of various types, based on yearly-salary figures. They exclude consultancy assignments and other jobs based on hour and day rates as well as jobs which did not provide salary information. The table below presents the summary statistics concerning the lower and upper salaries.
The table below presents standard descriptive statistics for lower and upper salaries. For job announcements providing a single value (not a salary range), that single amount has been incorporated to the dataset variable upper_salary while the variable lower_salary was set as NA (not available). That is why the number of observations (N) is 785 for lower salaries and 1,704 for upper salaries. About 54% of the job announcements did not provide salary range information but just the single salary amount.
Summary statistics of salaries (in British pounds / GBP)
In a more in-depth analysis for some future post, it can be interesting to look into payments for jobs paying by hour and by day as well for more specific job categories. One way for approaching specific job categories can be by defining tags for job titles using standard words from titles (e.g., director, management, assistant) and groupping them by tag type in a new factor variable.
Histogram with distribution of lower salaries (GBP)
Histogram with distribution of upper salaries (GBP)
The 10 most frequent recruiters
The table below presents the ranking of the 10 most frequent recruiters in the dataset. Column “N” presents the number of total announcements for each recruiter while column “Freq” shows the percentage of total announcements for each recruiter. Among these are also recruitment agencies.
The tables below show the ranking of the jobs with the 10 lowest and 10 highest upper salaries.
The jobs with the 10 lowest upper salaries (GBP)
The jobs with the 10 highest upper salaries (GBP)
I also wanted to quickly explore possible relationships between deadline dates and salary levels, just for fun. It could be, for example, that some periods had lower average-salary offers than others.
Despite the large number of job announcements in the dataset (N=1704), all observations refer to jobs with application deadlines between 24 August 2016 and 23 September 2016. This is a short time span for such analysis, but I explored it anyway just as an example of what these tools and techniques can do.
The plot below shows the mean (or average) upper salary for each day throughout the period. The variation in the mean salary as well as salary levels seem higher for jobs with deadlines as from September. The dashed line represents the results of the linear regression. The linear model however fails to detect any statistically significant relationship between mean salary and application date (R2 = 0.006; p = 0.69).
Average upper salary by date (GBP)
Next, I will use word clouds to explore job titles. The larger the word in the cloud, the higher is its frequency in the dataset. The words below are only those mentioned in at least 10 job announcements. The plot indicates that management positions are the most frequent ones, followed by coordination jobs, as well as officer, recruitment and fund-raising jobs.
Word cloud of job titles
The cloud below shows the most frequent words in the names of the recruiting institutions. I assumed that its results could provide hints about the most active thematic areas in terms of job anouncements. The words in the plot below are also those which have been mentioned in at least 10 job anouncements. The word cloud suggests that recruitment agencies are among the leading ones, as expected (see section “The 10 most frequent recruiters”). Organisations working with children, cancer and alzheimer patients also seem to stand out.
Word cloud of recruiters
Moving forward
The charity, development aid, not-for-profit and social enterprise sector is evolving rapidly. This process is powered both by increasingly-critical global challenges and, of course, by capable and motivated entrepreneurs, staff and service suppliers. This is a sector which is sometimes too much romantised by some people. As a consultant and entrepreneur in the sector, I am often asked how I manage to deal with all day dreamers I come accross in my way. No judgment about that but this indicates how much the sector is still unknown to the public. This is a sector which has become increasingly professional and results oriented. I believe that computing for data analysis can help the sector, particularly concerning monitoring and evaluating performance, which should include staff and beneficiary / client satisfaction.
I hope you enjoyed this tour and would be happy to receive your suggestions for additonal analysis and improvement. You can access this post with more updated data at: https://rpubs.com/EduardoWF/charityjobs.
Keep coding and take care!
Written by: Eduardo W. Ferreira, PhD / Consultant, data scientist, trainer and facilitator. Eduardo supports designing, managing and evaluating projects and programmes for consultancy firms, non-governmental organisations, governments, research institutions and international organisations (Additional information).