ONTOLOGY-DRIVEN KNOWLEDGE-BASED HEALTH-CARE SYSTEM AN EMERGING AREA - CHALLENGES AND OPPORTUNITIES – INDIAN SCENARIO

: Recent studies in the decision making efforts in the area of public healthcare systems have been tremendously inspired and influenced by the entry of ontology. Ontology driven systems results in the effective implementation of healthcare strategies for the policy makers. The central source of knowledge is the ontology containing all the relevant domain concepts such as locations, diseases, environments and their domain sensitive inter-relationships which is the prime objective, concern and the motivation behind this paper. The paper further focuses on the development of a semantic knowledge-base for public healthcare system. This paper describes the approach and methodologies in bringing out a novel conceptual theme in establishing a firm linkage between three different ontologies related to diseases, places and environments in one integrated platform. This platform correlates the real-time mechanisms prevailing within the semantic knowledgebase and establishing their inter-relationships for the first time in India. This is hoped to formulate a strong foundation for establishing a much awaited basic need for a meaningful healthcare decision making system in the country. Introduction through a wide range of best practices facilitate the adoption of this approach for better appreciation, understanding and long term outcomes in the area. The methods and approach illustrated in the paper relate to health mapping methods, reusability of health applications, and interoperability issues based on mapping of the data attributes with ontology concepts in generating semantic integrated data driving an inference engine for user-interfaced semantic queries.


INTRODUCTION
Health plays a major and crucial role in the life of any individual and is a major concern all over the world.Health care systems have gained significant attention and growth in recent years.Health being a complex entity, it depends on various factors like environment, life style, social and economic status etc.A complex analysis and multidisciplinary approach to knowledge is essential to understand the impact of various factors on public health.Environment contributes an important parameter in public health.The challenges for understanding and addressing the issues concerning the health care systems are use of Big data, non-conformance to standards, heterogeneous sources (in heterogeneous documents and formats), which need an immediate attention towards multidisciplinary complex data analytics and knowledge base.Recently, the Ministry of Health and Family Welfare has approved the recommendations made by the medical council of India, on the standards to be followed by the health care organisations in the country (SHS, 2013a).It is emphasized that these standards are dynamically varying and active documents which undergo a constant and regular changes as per the international standards.The regulations and the periodic releases are divided into various categories for adaptation and ease of adoption of these acceptable standards, which are briefly listed as follows: The primary need exist in bringing these standards together are to:  Promote interoperability for specific content exchange and vocabulary standards to establish a method for semantic interoperability  Support the evolution and timely maintenance and upgradation of adopted standards  Promote technical innovation using adopted standards  Encourage active participation and adoption by all vendors and stakeholders to evolve better healthcare systems  Keep implement affordable and reasonable costs for wider usage  Formulate best practices, experiences, policies and frameworks in a document for wide publicity and exposure  Adopt and adapt to standards thus generated which are modular and not interdependent Health care data is generated in various sources in diverse formats using different terminologies.Due to the heterogonous formats and lack of common vocabulary, the accessibility and utility of the big data of healthcare is very minimal for health data analytics and decision support systems.Vocabulary standards used to describe clinical problems and procedures, medications, and allergies.Various medical vocabulary standards, to name few are (Colin et al., 2011;3MHDD, 2014): Logical Observation Identifiers Names and Codes (LOINC®), International Classification of Diseases (ICD10), Systematized Nomenclature of Medicine--Clinical Terms (SNOMED-CT), Current Procedural Terminology, 4th Edition (CPT 4), ATC -Anatomic Therapeutic Chemical Classification of Drugs, Gene Ontology (GO), RxNorm, General Equivalence Mappings (GEMs), and Provider Taxonomy.
Effective health care systems can be built only when the large amount of diseases data can be brought to uniform formats with common vocabulary which will be recognised by man-machine.Uniform formats with the help of establishing the standards are presently needed to be addressed and made understandable by the man-machine interface through Ontology-tools and schemes.Therefore, ontology plays a vital role in public healthcare system increasing the awareness and necessity and applications for better and affordable public health schemes.Presently, the following organisations and agencies are using ontologies in health care systems: The current research work focuses on the development of an ontology driven semantic knowledge-base for public healthcare system.This paper describes the approach and methodologies in bringing out a novel conceptual theme in establishing a firm linkage between three different ontologies related to diseases, places and environments in one integrated common platform that correlates the true realities existing in the semantic knowledgebase and establishing their inter-relationships scientifically.
The rest of the paper is organized as follows.Section 2 gives brief description of related work, section 3 describes the current approach to build knowledge base, section 4 describes implementation, section 5 shows results, section 6 concludes.

RELATED RESEARCH WORK
Development of healthcare systems is very complex and information technology techniques and applications have brought a new dimension to the health care systems.Health care systems have to consider various data sources such as patient, disease, drug and environment etc.A centralised and complete knowledge base for public health care system is essential for data analysis and decision making.The major issues to build a centralised knowledge base are semantic and syntactic heterogeneity in health data.Present day research has brought in various methods and techniques.They are summarised briefly in the following paragraphs.(Bekkum, 2013) illustrates various steps and standards to be followed to develop health care ontologies.It also describes various ontologies of health care domain.Aidarus et al., (Aidarus et al., 2013) described and implemented an ontology based information retrieval system for health care information system.Nigel et al., (Nigel et al., 2010) described a system to identify health related events from text documents using text mining techniques and multilingual ontology.(Abdullahi et al., 2010) illustrates implementation of web based GIS system for public healthcare decision support.It also describes reusability of health applications and interoperability issues.(Stefan and Catalina, 2013) discuses interoperability improvement in health care domain using domain ontologies.(Colin et al., 2011) describes issues of ontology alignment, mapping and motivation in health care domain to support patient data integration and analysis.(Gao et al., 2012) illustrates an architecture for semantic health information retrieval by integrating non-spatial semantics and geospatial semantics.Craig et al., (Craig and Kuziemsky, 2010) described a four stage approach for ontology based health information system.The four stages of the system are 1) specification and conceptualization 2) formalization 3) implementation and evaluation and maintenance.(David et al., 2012) illustrates an ontology based decision support system for chronically ill patients by relating personal ontology with patient ontology.Furkh et al., (Furkh and Radziah, 2012) discussed development of medical ontology in the dynamic health care environment.Castellón et al., (Nurefsan et al., 2012) developed an ontology based GIS system for public health.(Lambrix and Tan, 2006;Jean-Mary et al., 2009;Hanif and Aono, 2009) describe medical ontology mapping methods.
The study on various methods and techniques for public health care system have indicated that there exist the following open challenges to be addressed in order to build effective knowledge base and develop a better health care system  Randomly maintained health records generated by various health centres with varying vocabulary structure  Urgent need to establish semantic relationship between health and environment data  Non-availability of a centralised knowledge base that includes health, place and environment data The current research work is structured based on the need for a robust methodology that builds centralised knowledge base by addressing the above challenges in a methodical, well-structured scientific approach.

APPROACH
The objective of the approach is building an efficient and effective semantic knowledge base for public health care system.The resulting centralised knowledge base provides semantic association among disease data with environment and place information.The approach is organized in to the following areas:

Data Collection and Extraction
The major contributions have been assimilated based on the data collected from hospitals, clinics and medical centres regarding the disease data.MOSDOC is the major source for satellite data focusing on environmental factors.The chosen study area is Krishnagiri district, Tamilnadu, India.Figure 1 shows study area.No specific standard formats are been followed on par with either national or international standards of health records.Recently, the medical council of India releases "recommendations on EMR and EHR standards", which has been approved by ministry of health and family welfare, Government of India (SHS, 2013).As per the recommendations and approved standards by ministry of health and family welfare the following Minimum Data Set (MDS) required for EHR to be used in India is as given in table 1.An EHR is an evolving concept defined as a systematic collection of electronic health information about individual patients or population (Gunter and Terry, 2005).EMR is a repository of information regarding the health of a subject of care in computer processable form that is able to be stored and transmitted securely, and is accessible by multiple authorized users (SHS, 2013a).
There arises a need for bringing health records data from heterogeneous formats to homogeneous formats.The homogeneous formats of EHR and EMR could be derived from EMR and EHR standards as per the regulations of medical council of India.There is a need of integrate data from various health organizations to build a centralized knowledge base.Resource Description Framework (RDF) is a powerful notation that provides common structure and associate semantics to the data.RDF representation of health data is critical phase of building central knowledge base for public health care system.RDF represents data in machine understanding format and supports semantic query on health data and construct RDF triples from health records.

Need for
Figure 3. Extracting place and environment data from HDF5 files

Domain Ontologies
Ontology is a formal and explicit specification of a shared conceptualization (Gruber, 1993).Domain ontologies provide vocabularies of concepts and their relations within a particular domain.Spatial and non-spatial data plays a vital role in health care systems.It is well known fact that impact of environment on diseases is high.There is a strong potential link between place, environment and diseases.Hence, there is a need for a detailed study to establish a firm link and correlation between the concepts of these domains.To establish the semantic relations, ontology plays a vital role.The ontologies studied for the current research work are disease, place and environment ontologies.The sources of the ontologies are (BBO, 2014;SWEET, 2014;Protégé Ontology, 2014;DAML, 2014;Ontologies and Vocabularies, 2014).

Knowledge Base Formulation
A complete health knowledge base helps the policy makers and health professionals to derive conclusions and take right decisions.The knowledge base becomes very effective decision making source.Ontology based sematic techniques provides a great support to build effective health knowledge base.The effective knowledge base enables to achieve (1) to provide a detailed and sound description of the relation among, disease in relation to the place and the environment, (2) to automatically obtain an intervention plan referenced to the health-care requirements, (3) to help physicians in the processes of disease prevention and detection, and (4) to facilitate the task of finding the most accurate intervention in each particular moment under the prevailing relationships between disease, places and environment, (5) this Ontology based knowledge base will help the policymakers in evolving national guidelines through the results obtained through the inter-relationships.
The knowledge base formulation, is a two stage process.The integrated ontology represents complete knowledge about the domain which can be shared among the groups.Multiple ontologies of a domain can be merged (Sunitha and Suresh Babu, 2013) and the merge process involves mapping concepts of different ontologies of a domain.Figure 4 shows multiple ontology mapping and merging process.Conceptually the merge process establishes the mapping between the concepts.
AgreementMaker (Daniel et al., 2013) is an open source ontology mapping tool.It identifies and derives relationships between concepts of different large scale ontologies.The current research work adopts AgreementMaker to map concepts of different ontologies of the same domain, which can be derived using equation (1).
The final complete merge ontology is obtained by merging ontologies and derived properties, denoted as equation ( 2): (2) The expert knowledge can be framed in the form of confidence matrix.The matrix is drawn to identify the strength of correlation between the concepts of different domain ontologies which can be formally defined in table 2.    (3)

3.3.2.2
Statistical Data Analysis with Mining Techniques Applied on Real Data: Voluminous disease data is available all over the world.Historical data is a major source to correlate the disease and environment ontologies.Effective analysis and mining techniques lead to new derivations and practical conclusions.Applying data analytics and mining techniques derives correlation coefficient between concepts of diseases and environment.Figure 5. Depicts linkage of ontologies of heterogeneous ontologies.Correlation coefficient between concepts c i e and c j d is denoted by r c i e ,c j d and it is given by equation ( 4) (Gupta and Kapoor, 2000): (5) The analysis of the correlation between the concepts of different domains results in new conclusion that provides impact on environment and causes of diseases.The derived conclusion is denoted as shown in equation ( 6): Linkage between environment, place and disease is represented as shown in equation ( 7): Figure 6 shows association of disease, place and environment data with ontologies.

IMPLEMENTATION MECHANISM
The proposed approach is implemented using Java, Jena API (Apache Jena, 2014), oracle semantic store (Chuck, 2014), Joseki (Joseki, 2013) and Sgvizler Java script API (Martin, 2012).Figure 7 and 8 shows home page and semantic query interface of the component.To evaluate the proposed approach, health records are collected from various hospitals in Hosur, Tamilnadu, India.A data set is created in CSV format from the collected data as per standards recommended by EMR Standards Committee.The proposed approach takes the sample EHR data set as input, extracts data, represents in RDF and builds knowledge base by mapping RDF triples with appropriate concepts of the domain ontology.

RESULTS
The results from the implementation of this approach are very promising.Figure 9 shows onto graph notation of linked ontologies.Figure 10 depicts RDF representation of sample EHR data.

CONCLUSION
The Ontology-centric approach provides an innovative, effective and an efficient means of capturing and organizing knowledge to represent the medical healthcare domain area for healthcare management system aiding the national policy makers and the general public which is presently, the need of the hour.This paper presents a robust methodology to develop and implement a centralised sematic knowledge base for healthcare system in India.This knowledge base includes mapping of diseases with place and environmental ontologies.This is achieved through extraction of data from structured and semantic heterogeneous health records representing extracted data in RDF triple format associating RDF triples with the appropriate domain ontology concepts.This infers data from domain ontologies and RDF triples using Oracle native inference engine using OWLPrime rules.The knowledge base developed for public healthcare using the present approach supports spatial and non-spatial semantic queries enabling public health care system stakeholders to take effective and efficient decisions.This approach is evaluated using data collected from hospitals at Krishnagiri, Tamil Nadu, India.

Figure 1 .
Figure 1.Study area 3.1.1Health and Disease Data Assimilation: Data is collected from various primary health centres and Government hospitals.Documents and data formats are presently scattered in different formats in practical perspective.It is observed that, most of the hospitals in India are maintaining huge data in manual records and in tabular or excel sheets.No specific standard formats are been followed on par with either national or international standards of health records.Recently, the medical council of India releases "recommendations on EMR and EHR standards", which has been approved by ministry of health and family welfare, Government of India (SHS, 2013).As per the recommendations and approved standards by ministry of health and family welfare the following Minimum Data Set (MDS) required for EHR to be used in India is as given in table1.An EHR is an evolving concept defined as a systematic collection of electronic health information about individual patients or population(Gunter and Terry, 2005).EMR is a repository of information regarding the health of a subject of care in computer processable form that is able to be stored and transmitted securely, and is accessible by multiple authorized users (SHS, 2013a).
Standards for EMR and EHR:The major source for effective health care management system is reliable data.Large volumes of disease data is available in various hospitals in different formats.To organise this data structure, effective data exchange, reuse and analysis techniques help in deriving new and effective conclusions.The data exchange and reusability is possible when the data is stored in standard formats that achieves interoperability.Incorporation of standards for interoperability of disease records, a technical solution is generated in the current research work.The knowledge base is generated through the available voluminous health data which is then maintained for records.This is achieved by machine learning and data extraction techniques.Disease data present in various formats is maintained in the prescribed formats of EMR and EHR by data extraction and machine learning algorithms shown in figure2.A training algorithm is developed with the support of training data set that extracts and brings out the data in the standard formats of EMR and EHR.

Figure 2 .
Figure 2. Data extraction from health records into EMR and EHR formats


Merging and unification of multiple ontologies of a domain  Finding the correlation between disease, place and environment Since ontology is collection of concepts and properties that describes a relationship.Various ontological elements can be formally defined as: concept of i th ontology of domain z p ij z : j th property of i th ontology of domain z P ij ′ : j th derived property of i th ontology p ij ′ H : Derived property between i th concept of one ontology to j th concept of another ontology in heterogeneous domains I ijk z : k th individual of j th concept of i th ontology of domain z 3.3.1 Merging and Unification of Multiple Ontologies of a Domain: Many researchers have developed multiple ontologies to represent the knowledge of various domains.There is a strong need to integrate the ontologies of a specific domain by enhancing the reusability of the existing ontologies and reduce the duplication of knowledge representation of the same domain.

Figure
Figure 4. Multiple ontology mapping and merging 3.3.2Correlation between Disease, Place and Environment: Mapping of disease ontology with place and environment supports health data with environment in relationship with places.Ontology mapping helps in building effective knowledge base for health that supports spatial and non-spatial semantic queries and efficient decision making.The firm correlation can be derived in two stages. Domain expert knowledge  Statistical data analysis with mining techniques applied on real data 3.3.2.1 Domain Expert Knowledge: Due to highly technical and significant domain, building knowledge base requires strong collaboration among multiple domain experts.The expert knowledge can be framed in the form of confidence matrix.The matrix is drawn to identify the strength of correlation between the concepts of different domain ontologies which can be formally defined in table 2.
c i e and j th concept of disease: c j d and 0 ≤ f ij ed ≤ 1, 1 ≤ i ≤ n and 1 ≤ j ≤ m .Relation between the concepts can be derived equation (3).

Figure 5 .
Figure 5. Linkage of ontologies of heterogeneous ontologiesFor instance "cause "is the significant relation between environment and disease and the correlation coefficient of cause relationship can be defined as shown in equation (5):

Figure 6 .
Figure 6.Association of disease, place and environment data with ontologies

Figure 9 .
Figure 9. Mapped ontology: maps disease, place and environment ontologies

Figure
Figure 11.Results of Q1

Table 1 .
Standards of EHR approved by ministry of health and family welfare.

3 Data Formulation from Spatial and Environmental Parameters:
Meteorology and Ocean Satellite Data Archival Centre (MOSDAC) (MOSDAC, 2013) is a major source of environment and place data.The data is received and available to the user at regular intervals in HDF5 formats.The current work develops an algorithm using Java HDF APIa service provided by HDF group (JHDF5, 2014).The algorithm extracts data from HDF5 files can be stored in CSV, RDF etc. formats as shown in figure3.

Table 2 .
Concept correlation tablef ijed Denotes correlation between i th concept of the environment: