Case study YoungCapital

Youngcapital and Relevant Online utilise the power of AI.
Driving recruitment efficiency: predictive matchmaking model.

About YoungCapital

YoungCapital is a recruitment agency with over 20,000 candidates at work every week. YoungCapital connects the new generation employees and employers by taking care of the recruitment and selection process as well as contracts and salary payments. It has the largest database of young people in Europe and its growth has been unstoppable in recent years.

About Relevant Online
Relevant Online is a Data Tracking, Data Engineering and Data Science agency helping its customers to make the most out of their data. In an ever evolving industry Relevant Online develops and optimises (data) architecture, tracking, dashboarding and builds (predictive) models to fit their clients’ needs.


Challenges in online recruitment

YoungCapital invests heavily in online advertising efforts with two main goals: to find new talent as well as employers looking to fill vacancies. Matching the demand and supply of the right candidates with open vacancies is a huge challenge as it is very labour-intensive.

Manual allocation of the marketing budgets (across various online channels, regions and ads campaigns) by marketing specialists in an efficient way is extremely hard. The factors influencing increases or decreases in demands for vacancies are very hard to guess. Furthermore, due to the sheer amount of different vacancies there always is a compromise in the amount of campaigns due to limited capacity and hours the marketing team has available.

Intuition alone is not sufficient to navigate the complexities of marketing platforms and audience behaviour. Different networks are trying to solve this challenge with their own tools and models like for example Google with “Performance Max”. Such tools all have the same limitations; they are black boxes and they are built only to optimise for that specific network.

Other programmatic SAAS solutions also don’t offer enough depth in matching based on the right amount of candidates seeked at any given time. Most only run based on the amount of budget put in, not the actual amount the employers seek.

YoungCapital and Relevant Online therefore defined the goal to develop a model which should predict the best match between candidates’ and clients’ (the employers) needs. The Predictive Automated Matchmaking Model intends to find the most cost-efficient channel to match candidates with employers at the lowest possible “cost per hire”.


“By combining YoungCapital’s knowledge of recruitment with Relevant Online’s AI expertise, the coöperation revolutionises the procurement of candidates, resulting in increased efficiency, a significant lower cost per hire and enhanced customer- and candidate satisfaction.”
Peter Segerius, Senior Online Marketer at YoungCapital


The idea is to have a granulated understanding of data, and statistical impact patterns with the leverage of ML algorithms. This includes analysis of the statistical impact of marketing spend grouped by campaign, job type (job function), location and across different performance marketing channels such as Google Ads, Meta Ads manager, e-mail marketing and so on.


The model is trained on data of job openings, job applications and Google Analytics 4. Ingesting more and more data does not necessarily have a positive impact on the model’s accuracy.

There is a general misconception that the more data is used, the better the outcome of the model will be. This is not always the case and we managed to find a sweet spot in the number of months of data to ingest from both business and machine learning perspectives.

The model is then used to predict the number of job applications for the job openings of the current week (i.e. the week of the prediction). The model is retrained every week using the new data of the previous week.

When building any model, data engineering is obviously a fundamental aspect. For the project, we utilise a data set stored in the data warehouse which was built together with Relevant Online over the past few years. The dataset consists of job openings, categories, locations, applications and marketing related info from, among others, Google Analytics 4. It enables tracking of candidates through YoungCapital’s systems, from the initial acquisition channel up until the final moment the candidate signs his or her contract.

Training the model

For training the model (Decision-making in ML infrastructure) we don’t use all of the features of the data set. Instead, we extract the features that are most relevant for the prediction of the number of job applications. This so-called “feature extraction” implies adjusting the data in a manner that provides better prediction results.

For the sake of transparency and explainability, Relevant Online and YoungCapital prefer a method, which enables tracing back which part of the data has affected the outcome of the machine learning product. Unlike some known and fancy neural networks, which are basically black box solutions, traceability and transparency of the ML are key factors continuously guiding Relevant Online’s development.

If you want to learn in more detail about the (open source) model, the metric and the algorithm which are used, you can read further after the results and conclusions of this case study.

Business implications and costs

Building and running a predictive model like the Predictive Automated Matchmaking Model has business implications and accompanying costs because it requires:

  • cloud infrastructure on Google Cloud Platform (GCP);
  • space for storing data in BigQuery and storing `metadata` of the pipeline process in Google Cloud Storage (GCS);
  • hosting for the `Dataproc` cluster for data preprocessing training the model;
  • time spent for maintenance of the pipeline, testing and adding new features.

Results and conclusion

Business value

The business value of the model stretches further than just as a candidate acquisition channel. The model also spots new chances in the market regarding scarcity of candidate profiles in the database, availability of matching job openings, difficulty of finding such profiles for the available jobs and the final revenue gained.

Besides the marketing department, many facets of the company benefit from the model. The allocation of ad spend, and how channels and candidates match, is also valuable information for the sales- and account teams in order to provide valuable candidate profile scarcity information to existing and potential customers.

In short, it offers the right candidates at the right time for the recruitment teams, improved allocation and understanding of marketing investments, insights for sales and national account teams about opportunities in the market, more information about pricing and the difficulty of finding profiles to take to customer negotiations, and a better understanding of financial drivers for marketing expenses for the finance department.

The accuracy of the models’ results, the transparency it offers and the ability to optimise it over multiple advertising channels makes it a worthwhile investment. The model will continue to help achieve the best possible ad spent and cost per hire, now and in the future.


Machine Learning (ML) configuration

For now, we utilise an open source model called LightGBM Regression (Light Gradient Boosting Machine) algorithm [1]. It was developed by Microsoft, and it stands out with its rate of accuracy to the computations needed. In comparison to potentially heavy neural networks (NN), LightGBM can provide similar results but it’s more efficient and therefore less costly.

As for the metric we use the Tweedie loss function. It goes in accordance with the business goals and is usually used in the figures forecasting. It is famous for its ability to handle zero-inflated data. Since many combinations of `job opening`, `job function`, and `job region` have zero applicants, this metric is a good fit for our case.

To find an equilibrium between the expected number of applicants and the advertising expenses, we use the Dual Annealing algorithm[2], which was chosen due to its technical characteristic.


[1] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., … & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient-boosting decision tree. Advances in neural information processing systems, 30.
[2] Dual Annealing




How do we develop?

  • We work with the most populair cloud providers: Azure, AWS and Google Cloud platform
  • We build scalable cloud infrastructures with automated CI/CD pipelines
  • We help with setting up your cloud architecture, data pipelines or automation projects
  • We do our development in house


We love to learn new things and we love to share this knowledge with you.