AUTOMATED CARTOGRAPHIC GENERALIZATION IN ITALY

: Automated cartographic generalization has been a challenging research field for over 30 years; nowadays thanks to the continuous research is becoming a concrete opportunity to increase the speed of map sets production and maintenance. As some National mapping agencies are working in this direction, in Italy automated cartographic generalization is still a research field, although recent researches show its applicability to the Italian data models for the production of maps. This paper illustrates how the production of cartography in Italy is organized and the situation of the research on generalization in the country. The paper reports the most interesting results of the research, especially those brought by the on-going CARGEN project, that testify how automated cartographic generalization could play a key role also in the Italian cartographic processes.


INTRODUCTION
Despite the development of techniques and tools, the production of vector maps is still heavily based on human work, resulting in high costs and long elapsed time from the acquisition of images to the availability of map data.The same holds for the update of existing maps making it a very expensive task to create and maintain a set of maps at different scales, as required by a wide range of applications for the planning and managing of the territory and the environment.
Yet nowadays the results of more than 30 years of research on automated cartographic generalization offer the opportunity to produce and update a set of maps at reduced cost and times, increasing at the same time the overall quality of data.In a time when the demand for free or cheap up-to-date map data is increasing and the funds are shrinking, automated cartographic generalization may represent the only viable solution to keep the availability of updated maps at the required level.
While in other countries automated generalization plays a role in the production of maps or is planned to do so in the near future, in Italy this has not yet happened, although some steps taken in this direction and some research projects on this topic show an interest in it.This paper describes the status of automated cartographic generalization in Italy, outlining the present situation of cartography in the country, the major experiences in the field of generalization and their results; the paper focus is on the outcomes of an on-going research project, called CARGEN, investigating the generalization of Italian geographic data at medium-large scale.

CARTOGRAPHY IN ITALY
The Italian official National Mapping Agency is the Istituto Geografico Militare Italiano (IGMI), based in Florence and established in 1861, soon after the unification of Italy.
Among other products and services, the IGMI produces topographic maps at small-medium scale, covering the scales 1:25000, 1:50000 and 1:100000.The scale 1:100K was used for the first ever national map produced by IGMI: started in 1878, the "Nuova Carta d'Italia" took more than 30 years to be completed, with data being manually acquired at the scale 1:50000 (1:25000 for densely populated areas) and then generalized to the target scale.Nowadays the 1:100K scale is used in the "Serie 100" maps, comprising 278 sheets, each covering 30' by 20' (longitudelatitude), derived manually from the maps at the 1:25K scale.The 1:50000 scale is used for the "Serie 50" maps, comprising 636 sheets, each covering 20' by 12' (longitude-latitude).Each sheet is derived manually from the maps at the 1:25K scale.The "Serie 50", started in 1966, is still under completion and at the moment covers 80% of Italy.The 1:25000 is the base scale for the IGMI topographic maps.The first collection, the "Serie 25", was started during the works for the "Nuova Carta d'Italia" and comprised 3545 elements, each covering 5' by 7'30'' (longitude-latitude), manually drawn with direct surveying, or using the first photogrammetric techniques (after 1929).The coverage of Italy was completed in 1965.In 1986 a revision of the "Serie 25" brought the elements to be resized to 6' by 10' (longitudelatitude), reducing their total number to 2298, and to use the UTM projection.As a great novelty the production of this collection relied not only on direct surveying and stereo photogrammetry, but also on the manual derivation of the regional data in scale 1:5000 (see below).In 2000 the IGMI made the first steps toward the development of a single data warehouse where to store and retrieve all the data needed to produce their topographic maps: the data model DB25 was designed and set as the core for the production of the new "Serie 25DB".Despite the use of derived data would suggest a faster production time, the production of the new "Serie 25DB", that replaced the "Serie 25", is progressing very slowly and, at the time being, only a very small part of Italy is covered with the newly updated maps at this scale.For more than one century, IGMI was the main mapping agency of Italy.Things changed in 1977, when twenty new actors came into play.
With a bill of 1977 the production of large-scale cartography was passed to each of the twenty administrative regions (Regione) in which the Italian national territory is divided.Each Regione produced its own "Carta Tecnica Regionale" (CTR translates into Regional Technical Map), at a scale of 1:5000 (in some cases 1:10000 for uninhabited areas and 1:2000 for urban centres).The production of this cartography relies on external contractors that realize it using aerial stereo photogrammetry.The cartography is divided into sheets, each covering 2'30'' by 1'30'' (longitude-latitude): a group of fourby-four sheets covers one of the IGM 25K sheets.Initially the CTR was intended to be only a paper map; later on it was converted to its digital counter part, the "Carta Tecnica Regionale Numerica" (CTRN), to allow its use as vector data in GIS applications, but only recently the idea of producing a geographic database is gaining some favour, and CTRNs are being slowly replaced by the new concept of "Data Base Topografico" (DBT).Since there have been no central guidelines, the development of the regional cartography followed different paths in different regions; this led to a fragmented situation where, for instance, spatial data could not be shared easily across different regions.To overcome this situation, a technical committee was formed in 1996, with the goal to draw the specifications for the creation and maintenance of the large scale regional DBTs and to define a common data model.The adoption of a standard model for all the twenty Regional DBTs, among other advantages, will make the creation of a DBT for the whole country a lot easier.The outcome of this project, known as IntesaGIS, has then been examined by other committees and the final version should be shortly converted to a bill that, once approved, would legally bind each Regione to produce cartographic data consistent with the standard model.In the meantime, though, while the model is waiting to pass as a bill, the situation is confused, as some Regione keeps using and updating the old CTRNs, while others are producing new DBTs using a data model that might not be perfectly compliant with what will eventually become the legal standard.The existence of a single data model used across the whole National territory would represent also a great chance for generalization.In 2010 the IGMI formed a working group, including both IGMI and regional representatives, to investigate how this standard model could be used to derive the IGM DB25 data model.Although this initiative is merely speculative and is progressing very slowly, it is a sign that cartographic generalization is being taken into consideration.However, how the next section will show, the only research activities on automated cartographic generalization so far came from projects started by a few single Regione.

AUTOMATED CARTOGRAPHIC GENERALIZATION
The interest on automated cartographic generalization in Italy is testified by two research projects started in the past nine years.Despite the great advantages that generalization would bring to cartography in the whole of Italy, none of these projects came out from a national proposal, but both were started by a few single Regione.
Aside from these two research projects, some other experiences on the production of cartography at medium scales (e.g.1:25000, 1:50000) through semi-automatic processes of generalization have been undertaken by some Regione, contracting for this purpose private and vendor software houses.
Nevertheless in this paper we describe only those research activities involving universities or research centres that brought publicly available results.
In 2002 the Regione Sardinia funded a research project called "La generalizzazione cartografica automatica in ambiente GIS" (translates to automatic cartographic generalization in GIS environment).The research was developed at the University of Cagliari and lasted 2 years, with the following objectives: − study of a methodology for the generalization of geographical data from 1:10.000 to 1:25.000, 1:50.000,1:100.000and 1:250.000;− development of the methodology for automatic generalization; − test of the methodology on digital cartography; − validation of the results achieved in terms of accuracy, comprehensibility and truthfulness.The generalization process relied on automatic or semiautomatic procedures developed with the ESRI ArcGIS software platform (at the time version 8.1).The procedures used built-in functionalities or, in some cases, were developed by the research group in VisualBasic.As input data, the project worked on some sheets of the CTRN of Regione Sardinia.According to the reports of the project, the results were only partially good.The process was tested generalizing the input data at scale 1:10000 to the scales 1:50000, 1:100000 and 1:250000.Of these three trials, the authors state that only the former can be deemed good: at the 1:50000 scale the data retains a good accuracy, both from the geometric and semantic point of view, the relative density of the generalized data is not lost and results in a good legibility of the resulting map.On the contrary, the results at smaller scales are not very good, and can not be compared at all with a manually generalized map; in particular, the authors note that the process of selection necessary to reduce furthermore the scale is made extremely hard from a semantic point of view, as it is difficult to identify and retain the most important data while discarding the less relevant details as the input data model (the CTRN) does not provide enough information.
In 2006 the Regione Veneto funded a research project called "CARGEN" (CARtographic GENeralization).The research is run at the Department of Information Engineering of the University of Padua, with the cooperation of both the regional mapping agency and the IGMI.The project is still on-going and is now in the second of two consecutive phases: in the first phase the research focused on: − the conversion of the CTRN of the Regione Veneto into a DB, modelled as the regional DBT; − the generalization of the IGM DB25 from the DBT of the Regione Veneto.Input data were a some hundred of sheets of the CTRN representing part of the Alps in the Regione Veneto.In 2009 the project was extended and re-funded: in the present second phase the focus was shifted to another scale, the 1:50000.The objective of this second part of the project, that is still on-going, are the following: − the definition of the data model for a DB at the scale 1:50000 (DB50) representing the content of the IGM "Serie 50" maps; − the generalization of the DB50 from the DBT of the Regione Veneto and the production of the corresponding map sheets; The input test data was extended, adding another large area of the Regione Veneto, covering most of the inland on the lagoon of Venice.
When the CARGEN project started, the research cartographic generalization had already produced many results.Vendor software houses (e.g.Intergraph, ESRI, LaserScan) were offering generalization tools among their software, and research groups around the world had produced some solutions to specific generalization problems.After evaluating vendor software, the choice was to develop a custom software in house: the idea was to create algorithms that targeted specifically the Italian data models, exploiting the best available solutions and improving them or developing new ones when necessary.The software development was based on Java and while in the first phase of the project the algorithms relied also on Oracle Spatial, especially for spatial functionalities, in the second phase of the project a total independence from specific geospatial DBMS was obtained and the use of Java based GIS libraries was increased.The most interesting results of the project are discussed in detail in the next section.

SOME RESULTS OF THE CARGEN PROJECT
This section describes the most important results obtained by the research on automated cartographic generalization in Italy, which problems had to be faced and how they have been dealt with and which solution have been proposed.The focus of this section is mainly on the results of the CARGEN project: these results in fact integrate and extend those of the previous project; furthermore, being it the most recent research on generalization in Italy, CARGEN makes use of the latest data models and relies on the latest achievements of research; hence the achievements of this project can give a good idea of the results and of the challenges expected by the application of automated cartographic generalization to Italian geographic data.
Cartographic generalization has to deal with two aspects: the generalization of semantic data and the generalization of geometric data.The former assures that the needed information stored in the source data accordingly to the source data model is translated to the generalized data, expressed accordingly to the target data model.The latter transforms the representation of the source data in a way that is suitable with the target scale; this aspect is more relevant when deriving a paper map.Following this division, this section describes the results achieved by CARGEN in both these aspects.
Most of the following section will present the results related to the generalization of the DB25, while few will be specifically said about the DB50.This is due to two main reasons: first the work on the DB25, which started earlier, is in a more advanced phase and has already produced significant results; second the DB50 is very similar to the DB25, thus making the remarks made on the generalization of the DB25 applicable also to that of the DB50.

Semantic generalization
According to the final report by [Deruda et al, 2005], one of the biggest difficulties in semantic generalization came from the fact that the input data model was not designed to be used in a multi-scale environment.Indeed the CTRN and DB25 data models have been designed at different times, by different people and for different purposes.On the one side, the IGMI data model serves a descriptive purpose that is necessary for the compilation of a topographic map at a scale (1:25000) initially designed for military purposes and to comply to international obligations and meant to be used as a base for the compilation of maps at smaller scales.On the other side, the CTRN model reflects the need for a map whose purpose is to provide a basic informative layer to be used by technicians mainly for planning activities.The same holds for the DBTs that, being a revision of the CTRNs, do not modify the quantity of information stored; the differences between the two data models arose clearly during their analysis in the CARGEN project.
Analysing the two data models to design the mapping function between the DBT and DB25 feature classes, it was found that only few of the latter could be directly derived, while most of them required some sort of processing or did not have any corresponding class in the DBT, being the latter the bigger problem.
As the project policy was to derive the data only from the DBT, data integration from other sources was not a viable strategy to solve this problem; this left only two possible solutions open: retrieving the missing information from the data itself (data enrichment) or modifying the data models to improve their compatibility.
In the case of derivable feature class needing some transformation, usually it was required to apply some simple geometric or spatial functions; for instance the geometry could be collapsed: an area or a line could be reduced to a point (e.g.symbolization) or an area to a line.In the latter case the operation should return a line that lies inside the original area's boundary, in order to avoid conflicts with the neighbouring features.The algorithm developed for this purpose is based on the work of [Bader and Weibel, 1997].In the case of symbolization the algorithms needed to compute the coordinate of the point to apply the symbol to and its bearing (the angle of rotation of the symbol).glacier" it was necessary to perform a spatial query between the contour lines and the glacier feature classes In the case of not directly derivable feature classes, or attributes, the first choice was to try to gather the data needed for the derivation from other semantic data, or from the geometric data, through a process of data enrichment.
In some case the algorithm was very simple: for instance the feature class "contour line on glacier", required by the DB25 but not existing in the DBT, was generalized as the sections of the DBT contour lines passing over a glacier (see Figure 1).In other cases the algorithms resulted to be much more complex.
The model representing highways in the DBT, for instance, does not provide all the information needed by the DB25.It was necessary then to implement an algorithm to recognize, in the highway network, the carriageways, the slip roads, the toll stations and the rest areas or gas stations (see [Savino et al, 2010]).The same happened with rivers: the DBT data model does not store the flow direction of rivers, hence an algorithm devising it from the Z coordinate of the geometries was implemented (see [Savino et al, 2011b]).
As already said above, in the worst case, when it was not possible to derive the data at all, the only option was to modify the data models.This option was used as a last stance, as it meant to force one of the models to add or to drop some feature classes (or attributes).This was necessary for some feature classes (such as "aero motor" or "trough") present in the IGMI data model but not acquired in the DBT: as they were deemed not so useful anymore, they were dropped from the IGMI data model.
The most relevant revision of the data models was required by the derivation of the roads.The IGMI data model was much more detailed than the DBT, modelling roads on the basis of many attributes (e.g.road surface type, road width) that are not acquired during the compilation of the regional data sets.This lead to the inability to derive these data unless one of the two models was changed.It was decided to change the IGMI and to re-align it to the official Italian road classification, set by law: although this choice had consequences also to other IGMI classes related to roads (e.g.bridges, tunnels), this model is the one in use in the DBT (and other road datasets) and the only way to guarantee the derivability of this information.
In other cases, some attributes not present in the original DBT data model, were added to make it compatible with the needs of the DB25 data models (e.g. the attribute of the river type).This data model revision was done in accordance with both the IGMI and the Regione Veneto, leading to the definition of data models thus making the IGMI models completely derivable from the DBT.
For the generalization to the 1:50000 scale it was necessary to design a data model suitable for the content of the IGMI "Serie 50" map.The design started from the analysis of the map legend and it showed a good compatibility with the DB25: this was no surprise as both the DB25 and the "Serie 50" maps are produced by the same mapping agency, they serve the same purposes and the map based on the former is also used for the derivation of the latter.As a result, the DB50 data model that has been designed is completely derivable from the DB25 data model and the same remarks about the generalization from the DBT apply to both the IGMI models.

Geometric generalization
Geometric generalization is a very hard task: as generalization is usually a manual process, relying on the ability of the cartographers, there is no defined set of rules specifying how to perform it; the concept itself of "good representation" is difficult to formalize.Because of this the development of a generalization algorithm requires both good programming skills and a highly developed cartographic sensibility.
The development of the generalization algorithms in the CARGEN project could take advantage of the advices from the cartographers both of IGMI and Regione Veneto, and also some IGMI documentation.
For the 1:25000 scale some guidelines, drafted by IGMI for the derivation of the DB25 from regional data, were used; for the 1:50000 scale the IGMI specifications for the compilation of the "Scale 50" maps were used.These documents regard the data acquisition and the drawing of the map, containing both some constraints (e.g. two houses closer than 3 meters should be merged together), that helped to formalize the steps of the generalization process, and some direction on how to perform some operations (e.g. when two roads run parallel and too close to each other, remove the less important one or draw a line in the middle in the case they have the same importance), that helped to design the algorithms.
The algorithms enforcing the constraints listed in the guidelines (or proper of the IGMI data models) were the less difficult to develop: once the constraint is formalized into a threshold and an algorithm that evaluates its value is found, it is relatively easy to develop a generalization algorithm enforcing the constraint.
The most relevant algorithm of this kind are those to generalize buildings and big irregular areas as woods or crop fields.These algorithms perform quite quickly and produce good results in a robust way.
The generalization of buildings is performed applying the operators of aggregation, simplification and squaring (see Figure 2).The algorithms have been derived from existing ones: in particular the simplification is performed using a custom version of the [Sester, 2000] algorithm, modified in order to extend the applicability of the original algorithm also to buildings with walls oriented not at right angles.The squaring algorithm is based on the identification of the main direction of each building (see also [Duchene et al, 2003]); a particularity is the square-ability test that is performed prior to the application of the algorithm and that prevents it to run if a main direction can not be found.
The generalization of big irregular areas is performed using a technique derived from the work of [Mackannes et al, 2008] on forests.The algorithm performs the aggregation, selection and simplification of big areas like crop fields or woods.Another algorithm deals with the IGMI requirement to extend these features to eventual linear boundaries closer than a threshold (e.g. a road passing by a forest limit).This is achieved using an algorithm that relies on triangulation (e.g.see [Revell, 2005]) to fill the spaces between the boundary of the area feature and the linear feature.More difficult was the task to design the algorithms that had to follow the generalization directions, contained in the guidelines, which are not based on any constraint (e.g.typification) or, even more, those inferred through the analysis of the manual generalization processes (e.g.selection in networks).
The most relevant algorithm of the former kind are those developed for the typification (of ditches, parallel roads, buildings) while of the latter kind the most difficult ones to implement were those to prune the river and road network.
The typification operators work on groups of objects showing a particular pattern and are used to reduce the number of objects necessary to represent the group while retaining their spatial distribution.
In general the typification algorithms developed work through the following sequence of steps: detecting groups of elements showing a pattern in their spatial distribution (e.g.parallelism, alignment, grid layout), characterizing the pattern as direction (or directions) and extension (i.e. a sort of envelope), creating a model-object, that is an object representative for those in the group (e.g.picking the object in the group that best resembles the others), replace the original objects with copies of the model-object, placed accordingly to the original pattern's characteristics (usually evenly spaced all over the pattern extension and following the pattern directions, see Figure 3).These steps are at the base of the algorithms developed for ditches (see [Savino et al, 2011a]), buildings and road bends.The selection operators are among the most difficult to implement in a generalization process: a selection algorithm should be able to discern which features are to be discarded and which are to be retained while generalizing.At the scales we have worked at, the selection of buildings did not pose any particular difficulty because, with the exception of some relevant buildings (e.g.hospitals) whose identification could rely on semantic data, the selection could be based on geometric constraints (mostly their area and distance to other buildings).The same does not hold for networks like rivers or roads, where geometric constraints are not directly applicable (e.g. a road, although short, could be important for the connectivity of the network and should not be deleted) and the importance of a feature is not characterized sufficiently by semantic data (i.e. the taxonomy is not rich enough) to perform a class-based selection at the treated scales.
The algorithms designed to perform the selection on networks rely on data enrichment to build a refined taxonomy that allows to distinct important and less important features.On rivers this is based on various metrics, as the Strahler order [Strahler, 1952] and the pruning is performed on length and density [Savino et al, 2011b].On roads other metrics are used, derived from the work of [Jiang and Claramunt, 2005] (see Figure 4).The selection of roads required also to devise techniques to recognize particular structures in the road network that had to be generalized in special ways: these are complex road junctions (e.g.interchanges, roundabouts) whose representation had to be simplified (see [Savino et al, 2009]).

CONCLUSIONS
Automated cartographic generalization is one of the most important challenges for cartography in the 21st century: in an age when digital data are needed by the market and therefore must be produced at a very fast pace, it is the key element to allow for fast and cheap production and update of map sets.
In this paper automated generalization has been contextualized in Italy, where distinct bodies share the task of producing the country's official cartography: the IGMI and the Regional Mapping Agencies.Despite the hurdles posed by this fragmented context, some initial steps have been taken in the direction of promoting and investigating the use of cartographic generalization.
Among these initial steps, the CARGEN project has been illustrated, highlighting some of the problems encountered, especially in the data models, and some of the results obtained so far.It is our opinion that, on the basis of what has been achieved in the field of automated generalization, the processes of producing and updating the map sets in Italy should be reconsidered and should rely more on automated processes of cartographic generalization.
For this purpose, the data model of the DBT must be designed with a multi-scale approach in mind, considering for instance attributes useful to characterize the importance of the feature classes at different scales.
About what could be expected from automated generalization, the state-of-the-art shows that, even with a suitable input data model, good generalization algorithms still require an uncommon mix of programming skills and cartographic sensibility.
Our aim to realize a completely automatic generalization process may not be yet fully obtained, some minor tasks still needing human intervention, but the inclusion of the techniques already developed in the current, manual processes of generalization could change them into semiautomatic processes with great benefits in terms of time and cost in the production and maintenance of geographic data sets.But fully automatic generalization is, in our specific case, not impossible to be achieved provided that a shift in the visual quality expected from a cartographic product is accepted.
If the visual quality of an automatically generalized map could not meet the one obtained by an expert artist cartographer, one must consider, on the other hand, that many other characteristics (e.g.accuracy, correctness, repeatability) will certainly greatly improve.
From the experience gathered by the research in Italy, we are convinced of the following: − the research in the field of automated cartographic generalization has already brought very interesting results and more will be made available in the future; − the results obtained so far can be already used with great benefits for the production and maintenance of the Italian cartographic data sets; − it is of paramount importance in Italy to increase the investments in research and to encourage cooperation among universities, mapping agencies and map producers; − cooperation among the Italian National MA and the Regional MAs should be reinforced and directed towards cartographic generalization

Figure 1 .
Figure 1.To derive the DB25 feature class "Contour line on glacier" it was necessary to perform a spatial query between the contour lines and the glacier feature classes

Figure 2 .
Figure 2. Generalization of buildings: top to bottom, left to right, original situation, aggregation and selection, simplification, squaring

Figure 4 .
Figure 4. Data enrichment of the classification of roads to perform the selection.