Skip to main content
Blog

From website content to knowledge graph

Recent years have seen a lot of advances in search technology, and the amount and type of intelligence applied for analyzing searchable content, and understanding user queries.

In the resulting evolution from “matching text” to “matching concepts”, representations of domain semantics play a central role.

Despite their importance, creating such representations for yet “unknown” domains from scratch requires considerable input from domain experts and is often prohibitive in terms of costs.

Our demo at this year’s SwissText shows how to derive directly from the searchable corpus an initial, approximate representation of a domain’s semantics, which can then be used to identify properties and relationships between domain concepts and entities.

The utilized analysis chain involves identifying the pertinent elements of the text using syntactic analysis and NLP; enriching the said elements with existing, “general” thesauri for the language – such as wikidata or others; applying embedding to extract element context; and combining all of the above into a concept-based knowledge graph.

The demonstrated approach is applicable in various application domains to support query understanding, relevancy scoring, as well as user interaction with the content in question.