By Amrapali Zaveri

Published on November 29, 2017

The World Wide Web has totally altered the way we share information by publishing ​documents as part of a global information space. This Web of Documents contains hypertext links to other web pages that enables users to retrieve this information using Web browsers. Even though this has brought about huge benefits and access to vast amount of information, the underlying data itself is stored without much structure or semantics (meaning).

Additionally, the data is stored in disparate silos, in different formats (CSV, XML, HTML etc.), which are not expressive enough to enable the linking of individual facts/entities in a particular document to be connected to related facts/entities​​in​​another​​document.

In recent years, the Linked Data paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. Linked Data is composed of four principles, which essentially entails a set of best practices for publishing and connecting structure data on the Web. This allows publishing and exchanging information in an interoperable and reusable fashion.

These Linked Data principles have already been adopted across many different communities on the Internet such as geographic, media, life sciences and government. Currently more than 50 billion facts are represented​​ in ​​an​​ standard,​​ machine-readable​​ and​​ interoperable ​​format.

Linked Data has, in turn, enabled a lot of interested use cases, particularly those that would not have been possible before specifically in the life science domain, which has a huge impact on the healthcare and pharmaceutical industries. For example, in a study titled "Using the Web of Data for Evaluating the Research-Disease Disparity",  Linked Data was used to evaluate the disparity between amount of research and burden of disease. It is widely accepted that there is large imbalance between the availability of treatment options and the prevalence of diseases in the world, especially in developing countries, and the research investments to address these conditions.

A classical example is tuberculosis, a widespread disease in most developing nations like India, but where treatment schemes have barely been updated for half a century. Among the contributing factors to this disparity is the difficulty of policy makers to access and process timely information about these disease to then choose where to invest their resources. With the ever increasing data on diseases and health care research available as Linked Data, this problem was tackled by combining relevant information from three datasets: (i) World Bank, (ii) World Health Organisation and (iii) Clinical Trials to analyse this disparity empirically.

This study contributed in providing metrics and analyses on available data and provide insights into where there was either an over or under investment of resources. This in turn, helped in reducing the information gap that would allow for better policy in ​​response ​​to​ ​these ​​disparities ​​for ​​particular ​​diseases ​​in ​​specific​ countries.

However, this is but a first stepping stone towards a more robust, timely and continuous monitoring system which would capture, store and provide accurate information using the Linked Data principles that can be utilised to tackle Life Science problems. The ultimate goal of Linked Data is to provide meaningful, structured and interoperable data that can be consumed intelligently​​ to ​​empower ​​data-driven ​​scientific ​​discoveries. 

(Amrapali Zaveri is a researcher at Institute of Data Science, Maastricht University, Netherlands. Her research interest include data quality, knowledge interlinking and fusion, biomedical and health care research.)