Methodology

01. How do your machine learning models work?


Our natural-language processing models analyse 120 million peer-reviewed academic articles, seeking instances where a sustainability concept is mentioned in relation to one of the 12,000products in our universe. The sentence is then interpreted to determine whether the product has a positive or negative impact on the sustainability concept.


For example, our models may pick up the sentence: “Electric vehicles contribute to the reduction
of greenhouse gas emissions.” Here, “electric vehicle” is the product, and “greenhouse gas” is the sustainability concept. Since the former is said to “reduce” the latter, we conclude electric vehicles decrease greenhouse gases.


02. How do you come up with your sustainability framework and concepts?


We use the SDGs as our sustainability framework. There are 17 SDGs, which are subdivided into a total of 169 targets. We have further divided these into more granular sustainability concepts that underpin all goals and targets.
For example, one of the targets of SDG 15 (Life on Land) is the protection of biodiversity and natural habitats. To this SDG and target we have mapped concepts like “biodiversity”, “natural habitat” and “habitat loss” as concepts that contribute or negatively affect the SDG and target.
We have applied this approach to all 17 SDGs, giving us a universe of over 2,000 sustainability
concepts that make up our sustainability framework.


03. Does historical data have the same quality?


Our data quality is consistent across all the years we assess. While the products and services
issued by a company may change year-on-year, we capture that information and subject the
company to the same methodology for each year.


04. What is your rationale behind mapping all the SDGs?


We do not exclude any SDG from our analysis.
First, because it provides a more holistic perspective of a company’s impact. Other providers
classify some SDGs as non-investable and therefore yield no data for them, but in our view that
approach fails to capture a company’s real-world impact in its entirety.
Second, because there is overlap between the SDGs, and so including all of them helps us capture all concepts and their relationships. Underpinning our sustainability framework are 2,000+ sustainability concepts associated with the SDGs, which can include some connections that may not be immediately obvious. For example, we map “Global Warming” to SDG 1 (No Poverty) as global warming and extreme weather increase poverty levels by displacing or destabilizing communities.
That said, we only positively or negatively map a product to an SDG if there is academic evidence connecting them.


05. What type of academic data do you include? From what date?


Our corpus of 120 million peer-reviewed academic articles are sourced from journals covering
academic disciplines including medical, scientific, social science, engineering and environmental
journals, amongst others. They date from the 1950s.


06. How do you approach the conflicting evidence around specific topics?

 

For a given product and SDG: if we were to find 10 positive relationships and 10 negative
relationships, all of which were of equal quality, then they would net out. The final impact score for the product would be 0/no impact (since there is conflicting evidence and we cannot make a
judgement).
However, our models factor in the quality of the relationships, which has a bearing on the outcome.
If the positive relationships were to have a higher quality than the negative, for instance, the final
impact score would be positive.


07. How do you account for new evidence as it emerges?


Our models include a ‘time decay’ element. More recent evidence connecting a product and
sustainability concept will be given more weight than older evidence.
The weighting is assigned relatively, as opposed to absolutely. If no new evidence has emerged for, say, 40 years (as is the case for tobacco’s connection with lung cancer), the 40-year-old evidence is still considered the most recent evidence and is weighted accordingly: without time decay. If no new academic research has been published on a topic, it indicates minimal opposition to the established academic consensus.


08. How transparent is the information, and what is the data driven by?


We have identified thousands of relationships. Due to the sheer volume of our dataset, it is not
easily shared. Nevertheless, we can provide case studies of individual companies, analysis of
specific products and examples of the evidence we find in the academic text on request.