APPLICATION FOR 3 D SCENE UNDERSTANDING IN DETECTING DISCHARGE OF DOMESTIC WASTE ALONG COMPLEX URBAN RIVERS

In our study we use 3D scene understanding to detect the discharge of domestic solid waste along an urban river. Solid waste found along the Ciliwung River in the neighbourhoods of Bukit Duri and Kampung Melayu may be attributed to households. This is in part due to inadequate municipal waste infrastructure and services which has caused those living along the river to rely upon it for waste disposal. However, there has been little research to understand the prevalence of household waste along the river. Our aim is to develop a methodology that deploys a low cost sensor to identify point source discharge of solid waste using image classification methods. To demonstrate this we describe the following five-step method: 1) a strip of GoPro images are captured photogrammetrically and processed for dense point cloud generation; 2) depth for each image is generated through a backward projection of the point clouds; 3) a supervised image classification method based on Random Forest classifier is applied on the view dependent red, green, blue and depth (RGB-D) data; 4) point discharge locations of solid waste can then be mapped by projecting the classified images to the 3D point clouds; 5) then the landscape elements are classified into five types, such as vegetation, human settlement, soil, water and solid waste. While this work is still ongoing, the initial results have demonstrated that it is possible to perform quantitative studies that may help reveal and estimate the amount of waste present along the river bank.


INTRODUCTION
Due to inadequate infrastructure and service support, residents along river banks in developing cities rely upon rivers for multiple environmental services (Steinberg, 2007;Vollmer and Gret-Regamey, 2013).These services include harvesting of plants, direct sanitary use, sewage disposal, recreation, solid waste disposal and groundwater use.This paper focuses on the detection of visible solid waste along the lower reaches of the Ciliwung River in Jakarta, Indonesia.This waste contributes to environmental degradation, including poor water quality, and may be linked to ill health (Marschiavelli, 2008).Furthermore, field campaigns and model simulations have demonstrated how measures proposed to solve the flood problem might further deplete water quality if the pollution load is not reduced (Costa et al., 2016).Therefore, the ability to identify and monitor point sources of solid waste disposal may aid in the development of strategies to reduce the pollution load.
The lack of spatially explicit information on urban conditions for example the prevalence of waste disposal along the river banks hinders feasible impact assessments (Padawangi et al., 2016).To date, limited targeted empirical analysis has been undertaken on the interactions between urban informal settlements and the environmental services they demand (Vollmer and Gret-Regamey, 2013).We acknowledge the complex reality of the urban river system that is characterised by heterogeneous distribution of both environmental attitudes and built spaces.Studies have shown that both sociocultural and spatial information can be useful in improving analysis and involving participatory collaboration (Prescott and Ninsalam, 2016;Padawangi et al., 2016;Curtis et al., 2013).
Often such research has relied on spatial extrapolation from surveys and interviews to locate problem areas (Vollmer et al., 2015).Furthermore, the research may be limited to the researchers' capability to gather information on the ground, and is resultantly constrained to the neighbourhood scale (such as Padawangi et al., 2016).Based on these challenges, researchers are turning to other methods of capturing spatial data.For example, video was used to capture street and building scale spatial information (trash accumulation, standing water etc.) and map the health risks of the urban environments in Haiti (Curtis et al., 2013).We are motivated to contribute to this approach by analysing images acquired along a river (river based images) to study the prevalence of waste along the river banks.
In this work, we study the application of 3D scene understanding in detecting discharge of domestic waste along complex urban rivers.In our methodology section we elaborate on how we utilised a strip of GoPro images, captured photogrammetrically, and the derived dense point cloud for our purpose.Depth for each image is generated through a backward projection of the point clouds.Meanwhile a supervised image classification method based on Random Forest (RF) classifier is applied on the view dependent red, green, blue and depth (RGB-D) data.Point discharge location can then be mapped by projecting the classified images to the 3D point clouds.Following which, the landscape elements may be subsequently classified into five types such as vegetation, human settlement, soil, water and solid waste.In the experimental results section, we present the statistics for the solid waste detected within the scene.We conclude by highlighting how this novel application in detecting solid waste along rivers contributes to producing spatially explicit detection of solid waste.We speculate on the potential monitoring efforts, which may be derived through the application of 3D scene detection.In so doing, this method can improve analysis approaches for policy and design action.

RELATED WORK
The use of remote sensed data to conduct hydromorphological assessment and monitoring of rivers is increasingly prevalent especially at the national scale (Bizzi et al. 2016).However, the application of photogrammetry and structure from motion has made the acquisition of fluvial topographic data more accessible and smaller data sets are now being established for site specific investigation.The spatial variability of the rivers under studyincluding depth, presence of vegetation cover, and length -has resulted in the deployment of a number of acquisition methods: from ground based hand-held helikite (Fonstad et al., 2013), or unipod-mounted (Bird et al., 2010), ground and airborne survey (Bangen et al. 2014), to boat based (Alho et al. 2009), and unmanned aerial vehicle acquisition (Woodget et al., 2015;Bagheri et al., 2015).These approaches are best suited to smaller areas based on time, effort, and relative cost required.Choosing to stay within the the low cost framework established by our work (Rekittke et al., 2014), we deployed a low cost acquisition solution to acquire data on the neighbourhood under study.
The wide availability of acquisition approaches has resulted in greater prevalence of spatial data, which has in turn has brought about a demand for scene interpretation (Weinmann, 2016).Although the methods for scene understanding have a wide range of uses, most applications of scene understanding focus on the semantic labelling of cities at street-level (Weinmann et al. 2014).Limited work has been done in the application of scene analysis along rivers.Among these, Brodu and Lague (2012) have provided an example of the application of classification techniques in which a multi-scale dimensionality criterion was used to separate riparian vegetation from ground.Furthermore, a notable application, similar to the aim of this research, involved the detection of debris in the city of Kamaishi, Japan (Sakurada et al., 2016).Both aerial and street view images were used to evaluate city-scale damage post-tsunami.This work contributes by demonstrating a novel application to reveal and potentially estimate the amount of waste present along the river bank.

METHODOLOGY
Our framework utilises the acquired river based images for 3D scene understanding.In this section, we explain the details of the proposed method within the context of domestic solid waste detection.Although the explanation focuses on solid waste as an example, the method can be applied to other visible elements.The proposed method (Figure 1) consists of the following five steps: For our experiment, we adopt VisualSfM (Wu, 2014) for structure from motion and a hierarchical semi-global matching method (Hirschmller, 2008;Rothermel et al., 2012) to generate dense point clouds.Due to our data acquisition mode (the images are acquired facing the investigated planes), objects in the view dependent orthographic images are less distorted and occluded, providing advances in feature description.Images are used for classification instead of the point clouds.Their ease of implementation and continuous nature mean that they are more efficiently computed.

Segmentation
To efficiently address the classification problem, we perform an object-based analysis.We first apply a segmentation approach to cluster pixels with similar colours and depth.The synergic meanshift segmentation method is adapted to utilise both the colour and depth information together (Christoudias et al., 2002;Comaniciu and Meer, 2002).This has been proven to have better performance (Qin et al., 2015).We use the canny edge of the depth images to constrain the mean shift direction and prevent it from extending beyond the depth discontinuities.

Feature extraction
We use both the colour and the depth information for feature extraction.The principal component analysis (PCA) is used to extract the spectral information to de-correlate the RGB bands.A simplified version of dual morphological top-hat profile (DMTHP), In total three image layers are being processed: the brightness image, darkness image and the DSM (in our case this is a depth image).The brightness and darkness images are essentially the first band of the PCA and its inverse image, as it contains the highest inter-band variance.Figure 2 shows the feature extraction process.The sequences of spatial features are concatenated to the spectral features (PCA) and normalised in each dimension for classification.In our method, we use a fixed structuring element, as the image contents are smooth and only small rocks and solid wastes create blob shapes.We input a value of 20 pixels as the diameter of the disk-shape structuring element, as determined by empirical selection.

Image Classification
We employ the Random Forest (RF) classifier (Breiman, 2001) to perform the classification.As a hierarchical classifier, RF performs well in separating features that are derived from different sources.In our case the spatial features are derived from both depth and radiance.We use 500 trees associated with the RF classifier for training.The reference data is manually sketched for comparison and 5% of the reference data are used for training.Figure 4 shows the test data and results for this experiment.

EXPERIMENTAL RESULTS
We evaluated the application of our proposed method using the case study of Kampung Melayu and Bukit Duri in East Jakarta, two districts situated on either side of the Ciliwung Rivers lower reaches.This meandering river runs 119 km through the cities of Bogor, Depok, and Jakarta.Approximately 5 million people reside within the 384 km 2 catchment area.
Image information was captured using a river based camera setup on board a lightweight Challenger 4 Inflatable boat.For the image capture, we assembled three GoPro HD Hero2 Outdoor Edition cameras fitted in waterproof housings.We positioned the cameras in a three-way pivot and mounted them onto a telescopic mast (Figure 3).We programmed the cameras to capture high definition resolution (1920 x 1080 pixels) videos at a rate of 30 frames per second.We used a 2.8 mm fix objective with a 127degree (medium) field of view.A handheld Garmin GPSMAP 60CSx receiver was used for tracking purposes.In this study, we processed 7 out of the 543 images extracted from the video sequence that documented a 2.86 km segment of the river starting from the following coordinates -6.224583N, 106.863715E.In our experiment, we notice that the oblique scene is quite complex and good results were obtained in regard to water and soil detection (Figure 4).Part of the background from the sky is classified as human settlement since we did not define the sky class explicitly.As our focus is on the detection of solid waste, we manually sketched all the identifiable solid waste for the assessment.The associated statistics in the confusion matrix is summarised as shown in Table 1.90.22% of waste was identified in our experiment.We expect other data sets to perform similarly.However, some parts of the trees and soil have been identified as solid waste as a result of matching uncertainties.These are a result of such complex environments.

DISCUSSION
The advantages of the proposed method for the environmental monitoring of urban rivers are exemplified by the functional ac-quisition of river based images and application of 3D scene understanding.The application of known 3D scene understanding techniques allows us to: 1) objectively identify, 2) statistically detect, 3) illustrate the qualitative condition, and 4) visualise the relationship of the landscape elements in the scene (Figure 4).Our method has been demonstrated to improve our ability to distinguish the characteristics of complex sites.
Beyond the detection of solid waste, the class-specific classification results allow for the monitoring of other essential landscape elements such as vegetation and human settlement.This application may be extended to use in post-flood inventory of buildings and vegetation, as mentioned earlier.We adopted a view dependent 2D approach as our initial investigation.Based on this, further studies that exploit the possibility for wide-area classification are expected.The current 2D feature extraction on viewdependent RGB-D data is advantageous for implementation, and works reasonably well for strips of side-look images.In our next steps, we will implement a voting strategy to determine the overlapped region of different views in the 3D scene.

CONCLUSION
In this paper, the application of 3D scene analysis has been demonstrated using river based images acquired by a low-cost solution to identify solid waste disposal discharge points.This work firstly contributes to producing spatially explicit information along a river reach in order to support context-appropriate, and targeted planning and design interventions.Secondly, this paper provides insights into the methods that may be deployed to support longerterm river rehabilitation programs in such communities.We speculate that the low-cost imaging solution may be deployed for river bank monitoring purposes.While our research is still in progress, this paper shows the preliminary results of scene understanding using side-look images.Our intention in this paper is to introduce the concept of using 3D scene understanding to solve environmental monitoring problems in complex scenarios.We hope in the next step to process more data and refine the feature extraction framework, as well as the voting handling across multiple view dependent orthophotos.

Figure 2 :
Figure 2: The feature extraction diagram using PCA and DMTHP.

Figure 3 :
Figure 3: Three GoPro HD Hero2 cameras, mounted on a telescopic mast.Balancing the camera set-up in the middle of the boat.(Photos: Rekittke et al., 2014)

Figure 4 :
Figure 4: (a) View dependent orthophoto, (b) depth image, (c) segmentation boundaries, (d) classification results and (e) colour point cloud where solid waste is labelled red.

Table 1 :
Statistics of solid waste detection in percentage.