MULTINOMIAL LOGISTIC REGRESSION PREDICTED PROBABILITY MAP TO VISUALIZE THE INFLUENCE OF SOCIO-ECONOMIC FACTORS ON BREAST CANCER OCCURRENCE IN SOUTHERN KARNATAKA

Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urbanrural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka. * Corresponding author. This is useful to know for communication with the appropriate person in cases with more than one author.


INTRODUCTION 1.1 Global situation of Breast Cancer
Globally, breast cancer has been the commonest female cancer representing 23% of all cancers in women.Incidence rates show a marked geographical variation from 27.3 per 1,00,000 in less developed countries to 66,6 per 1,00,000 women in more developed countries.However, mortality rates with breast cancer are high among the low income countries and survival rates with breast cancer are better in high income countries.

Breast cancer in India
Recent National Cancer Registry Program (NCRP) reports shows that there has been a rising incidence of breast cancer in India and It is essential to understand how the disease burden is shared among women in the society from different socio-economic background and risk factors associated with breast cancer can be predicted.and Chamrajnagar districts were evaluated.Independent variables associated with the occurrence of breast cancer like age, education occupation, residential location, socioeconomic status, marital status, religion, parity, availability of health insurance facilities.The models were developed as follows i) Rural-urban distribution of breast cancer cases using Geographic information System.ii) Rural-Urban differences across various socio-economic status were analysed using chi-square analysis.To study the extent of influence of various factors across different socio-economic factors were added as input for analysis using multinomial logistic regression analysis.
iii ) Predicted Probability mapping of results of the multinomial logistic regression analysis.

Logistic regression Analysis:
Logistic regression (Kleinbaum, 1994;Hosmer and Lemeshow, 2000;Helsel and Hirsch, 2002) is a statistical method that predicts the probability of an event occurring, in this case, the probability of occurrence of breast cancer Logistic regression is conceptually similar to multiple linear regression, because relations between one dependent variable and several independent variables are evaluated.Whereas multiple linear regression returns a continuous value for the dependent variable, logistic regression returns the probability of a positive binomial outcome (in this case, occurrence of breast cancer) in the form: where P is the probability of breast cancer occurrence, in percent; x is β0 + β1x1 + β2x2 + ….+ βixi ; βi is logistic regression coefficients; xi is values for the independent variables, such as age, education, occupation, marital status, socioeconomic status, parity, religion, health insurance facility i is the number of variables.
Logistic regression calculates several statistical parameters that determine the predictive success of the model (Kleinbaum, 1994;Hosmer and Lemeshow, 2000;Garson G D, 2011).The p-values calculated for each independent variable, indicates the statistical significance that each variable has on the overall logistic regression model.

Multinomial logistic regression analysis:
The multinomial (Polytomous ) logistic regression model is an extension of the binomial logistic regression model (Schwab, J A (2002); Friedman J, 2010; ) .It is used when dependent variable has more than two nominal or unordered categories.Like binary logistic regression, multinominal logistic regression uses maximum likelihood estimation to evaluate the probability of categorical membership.

Advantages of multinomial regression analysis over other multivariate analysis:
Most multivariate analysis require the basic assumptions of normality and continuous data, involving independent and /or dependent variables as aforementioned.Tabanick et al (2001) argued that multinomial logistic regression technique has a number of advantages as: i) it is more robust to violations of assumptions of multi-variate normality and equal variance and co-variance matrices across groups, ii) easily interpretable diagnostic statistics, iii) most importantly, MLR does not assume a linear relationship between the dependent and independent variables, iv) independent variables need not be interval, v) MLR does not require that the independents be unbounded and lastly vi) normally distributed error terms are not assumed.
With the above advantages, MLR (Vittinghoff , E 2005) is widely used a problem solving tool, particularly in the field of psychology, mathematical finance, engineering and medicine especially for risk analysis and identifying risk factors for a given condition/ event/disease.Data analysis was carried out with aid of both descriptive and inferential analysis.

Results Of Multi-Nominal Regression Analysis
Null Hypothesis (H0) for the MLR = There was no difference between the model without independent variables and the model with independent variables.

Alternate Hypothesis (Ha)
for MLR = There is a difference between model without independent variables and model with independent variables.First consideration was given to overall test relationship.Secondly, strength of MLR relationship was tested to establish strength of MLR relationship and lastly, evaluating the usefulness of logistic model and relationship between the independent and independent variables.

Overall test of relationship
First thing in MLR for any risk analysis is to describe the overall test of relationship between the dependent and independent variables.Model fitting information in table (3.3 ), describes the relationship between the dependent and independent variables and reveals that probability of the model chi-square 455.235 was 0.000, less than the level of significance of 0.05 ( i.e p < 0.05).

Urban and Rural Mapping of breast cancer in
Southern Karnataka

Logit Model of breast cancer
The occurrence of breast cancer in high, middle and low income families (there are three categories in the response variable) two logit models are computed; one comparing Occurrence of breast cancer in middle income families with the reference category and one comparing occurrence of breast cancer in low income families with the reference category (occurrence of breast cancer in high income families).The model of occurrence of breast cancer occurrence across socio-economic status can therefore be represented using two (i.e., j -1) logit models (log P).

Logistic regression equation
The employed)+1.848(nulliparous)+2.43(age < 40 years) Based on the probabilities that were saved for each case record by multinomial logistic regression was utilised and linked to base map.Geo statistical modelling like Kriging (Croner 2013; C Childs, 2004) generated maps with areas for high and low probability of breast cancer.log Pr [Y = Breast cancer in middle class ]The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, VolumeXL-8, 2014   ISPRS Technical Commission VIII Symposium, 09 -12 December 2014, Hyderabad, India    This contribution has been peer-reviewed.doi:10.5194/isprsarchives-XL-8-193-2014