# Using Maps in NJSHAD

NJSHAD query results and indicator reports that provide data by county or municipality display a map. The NJSHAD maps are
a type of map called "choropleth" maps. This page describes choropleth maps and the types of grouping options available with the choropleth maps on NJSHAD.

Visit the NJSHAD Help Page for Downloading Map Layers for use in other programs.

Choropleth maps display data for predefined geographic areas. The areas on a choropleth
map are shaded or patterned to reflect values of a variable such as population density or
birth rate. Choropleth maps are an easy way to visualize differences and patterns across
geographic areas.

One challenge presented by choropleth maps is that, by forcing the data into discrete geographic zones, the underlying data distribution can be obscured or misrepresented (either purposefully or accidentally). It will help to understand the methods used to group the map data. For the most part, data classification involves two basic issues: 1) identifying the number of groups and 2) identifying how to assign geographic areas to each group. If too few groups are used, a choropleth map may obscure subtle gradations in a spatial distribution. Too many categories are also unlikely to reveal any existing spatial patterns because a viewer can be visually overwhelmed. (Most map readers find difficulty in distinguishing among more than seven classes; Kraak and Ormeling 2003).

Different types of classification can be used to assign geographic areas to groups. Some grouping methods are better suited than others for different data types. When selecting a grouping method, the underlying data distribution should first be explored. Common classification types that are available on NJSHAD include Jenks natural breaks, mean standard deviation, equal intervals, equal groups (quantiles), geometric progression, and arithmetic progression. These classification methods and their applications are further described below.

Choropleth maps have an inherent weakness, in that they require the aggregation of data into geographic areas (e.g., counties) that do not necessarily correspond exactly with the data's underlying spatial distribution. To maximize the effectiveness of such a map, the data grouping method should strive to balance between several goals. After classification, each group should contain an appropriately apportioned share of observed data values. The resulting map should also faithfully represent spatial patterns without excluding extreme high or low values. The resulting map should also endeavor to approach the data's statistical surface (a three-dimensional data representation in which the z-coordinate is proportional to the data value) as closely as possible (Kraak and Ormeling 2003).

Finally, choropleth maps do possess other limitations. Small geographic areas that contain a large number of cases (e.g., cities) tend to impose a smaller visual impact and attract the viewer's attention less than large (e.g., rural) geographic areas which may be sparsely populated. Another common error is the use of raw data counts, which represent magnitude, when a choropleth is more appropriate to the use of normalized values that produce a map of rate, density, concentration, or the like, by geographic unit (Monmonier 1991:22-23).

See also: http://en.wikipedia.org/wiki/Choropleth_map

One challenge presented by choropleth maps is that, by forcing the data into discrete geographic zones, the underlying data distribution can be obscured or misrepresented (either purposefully or accidentally). It will help to understand the methods used to group the map data. For the most part, data classification involves two basic issues: 1) identifying the number of groups and 2) identifying how to assign geographic areas to each group. If too few groups are used, a choropleth map may obscure subtle gradations in a spatial distribution. Too many categories are also unlikely to reveal any existing spatial patterns because a viewer can be visually overwhelmed. (Most map readers find difficulty in distinguishing among more than seven classes; Kraak and Ormeling 2003).

Different types of classification can be used to assign geographic areas to groups. Some grouping methods are better suited than others for different data types. When selecting a grouping method, the underlying data distribution should first be explored. Common classification types that are available on NJSHAD include Jenks natural breaks, mean standard deviation, equal intervals, equal groups (quantiles), geometric progression, and arithmetic progression. These classification methods and their applications are further described below.

Choropleth maps have an inherent weakness, in that they require the aggregation of data into geographic areas (e.g., counties) that do not necessarily correspond exactly with the data's underlying spatial distribution. To maximize the effectiveness of such a map, the data grouping method should strive to balance between several goals. After classification, each group should contain an appropriately apportioned share of observed data values. The resulting map should also faithfully represent spatial patterns without excluding extreme high or low values. The resulting map should also endeavor to approach the data's statistical surface (a three-dimensional data representation in which the z-coordinate is proportional to the data value) as closely as possible (Kraak and Ormeling 2003).

Finally, choropleth maps do possess other limitations. Small geographic areas that contain a large number of cases (e.g., cities) tend to impose a smaller visual impact and attract the viewer's attention less than large (e.g., rural) geographic areas which may be sparsely populated. Another common error is the use of raw data counts, which represent magnitude, when a choropleth is more appropriate to the use of normalized values that produce a map of rate, density, concentration, or the like, by geographic unit (Monmonier 1991:22-23).

See also: http://en.wikipedia.org/wiki/Choropleth_map

## Data Grouping Methods

"An analysis of maps prepared by authors in various academic disciplines fails to
show any rational or standardized procedures for the selection of class intervals.
Evidently intuition, inspiration, revelation, mystical hunches, prejudices, legerdemain,
and predetermined ideas of what the class intervals should be have characterized the
work of most map-makers… Apparently many authors believe that maps are an art-form
which allow liberties not admissible in verbal or tabular presentation." (Jenks and
Coulson 1963:120).

The Jenks Natural Breaks method, also referred to as the Jenks Optimization method or the
goodness of variance fit (GVF), is a data-classification method designed to determine the
best way to classify features using natural breaks in data values. The method was developed
with the intention of dividing data into relatively few data classes (seven or fewer) for
mapping purposes. Jenks Natural Breaks iteratively compares the sums of the squared difference
between observed values within each class and the class means. The best resulting classification
identifies breaks in the ordered distribution of values that minimizes the variance within
classes and maximizes the variance between classes (Jenks 1967).

The Jenks Natural Breaks method is well suited to the creation of choropleth maps because it identifies real classes within the data, resulting in maps that can accurately portray data trends. This is a good choice for datasets that are multi-modal, but, this method is not recommended for data that have a low variance. Also, this classification is data-specific and is not useful for comparing multiple maps built from different datasets.

See also: http://wiki.gis.com/wiki/index.php/Jenks_Natural_Breaks_Classification

The Jenks Natural Breaks method is well suited to the creation of choropleth maps because it identifies real classes within the data, resulting in maps that can accurately portray data trends. This is a good choice for datasets that are multi-modal, but, this method is not recommended for data that have a low variance. Also, this classification is data-specific and is not useful for comparing multiple maps built from different datasets.

See also: http://wiki.gis.com/wiki/index.php/Jenks_Natural_Breaks_Classification

The standard deviation classification method forms classes by adding and subtracting a defined
portion of the standard deviation from the mean of the dataset. This method is most appropriately
suited for use with data that conforms to a normal (bell-shaped) distribution in a histogram, but
this method can provide valuable visual breaks even when used to map highly skewed data. Note
that the use of standard deviation classification is not appropriate for data ranges defined by
percentages, unless weighted averaging can be implemented (which is not presently available in
NJSHAD).

As implemented on NJSHAD, the proportion of the standard deviation that is captured in each class is dependent upon the number of classes that are selected.

See also: http://wiki.gis.com/wiki/index.php/Probability_distribution, http://en.wikipedia.org/wiki/Standard_deviation

As implemented on NJSHAD, the proportion of the standard deviation that is captured in each class is dependent upon the number of classes that are selected.

- If two classes are displayed, the breakpoint between the classes is the mean, and the low and the high class then include all standard deviations below and above the mean to the limits of the data values.
- If three classes are displayed, the middle class contains the mean and extends to +/- 0.5 standard deviations. The highest and lowest classes then include all other ranges from +/- 0.5 standard deviations out to the limits of the data values.
- If four classes are displayed, again the breakpoint between the two middle classes is the mean. The next highest and lowest classes extend +/- 0 to 1 standard deviation from the mean, and the maximally highest and lowest classes then include all other ranges from +/- 1 standard deviation to the limits of the data values.
- If five classes are displayed, again the middle class contains the mean and extends to +/- 0.5 standard deviations. The next highest and lowest classes then extend +/- 0.5 to 1.5 standard deviations, and the maximally highest and lowest classes then include all other ranges from +/- 1.5 standard deviations out to the limits of the data values.

See also: http://wiki.gis.com/wiki/index.php/Probability_distribution, http://en.wikipedia.org/wiki/Standard_deviation

This classification method splits the entire data span (from lowest to highest value) into
intervals that are the same size, each containing the same proportion of the range of values.
Data that are evenly distributed (i.e., showing a rectangular or flat shape in a histogram)
are well suited to equal interval classification. Choropleth maps created with this classification
are good for revealing values that are either over- or under-represented, but intervals that
are overrepresented will result in maps that are shaded mostly the same color.

See also: http://wiki.gis.com/wiki/index.php/Equal_Interval_classification

See also: http://wiki.gis.com/wiki/index.php/Equal_Interval_classification

This grouping method distributes all the values into some number of groups, with each group
having the same number of observations. Data that are evenly distributed (i.e., showing a
rectangular or flat shape in a histogram) are well suited to quantile classification. Also,
"quantiles seem to be one of the best methods for facilitating comparison [among a series of
maps] as well as aiding general map reading" (Brewer and Pickle 2002:679),
and this method is also useful for conducting experimental data analysis. One possible disadvantage
of quantile classification may arise when large gaps occur between attribute values; such gaps
may lead to an over-weighting of an outlier in that class. A two-class quantile identifies the
median, while three-class quantiles are called tertiles or terciles, four-class quantiles are
called quartiles, and five-class quantiles are called quintiles.

See also: http://wiki.gis.com/wiki/index.php/Quantile

See also: http://wiki.gis.com/wiki/index.php/Quantile

For data with heavy-tailed (skewed) distributions, classes generally cannot be imposed in a
linear manner (e.g., as equal steps); instead, a nonlinear method can be used. Using the geometric
progression method, the widths of the category intervals increase at a geometric (i.e., multiplicative)
rate. Starting from the lowest value, each following class breakpoint is derived from the previous
term by multiplying by a constant (C, the ratio of the series, which is derived by finding the
difference of the logarithms of the highest and lowest values and dividing by the number of classes;
Kraak and Ormeling 2003).

This method is best applied to NJSHAD data that is positively (right) skewed (producing a J-shaped distribution curve with a peak at the low end of a histogram), particularly when there is a long "stretch" between low and high values. For datasets that are normally distributed or that are rectangular or flat, the classification results of geometric progression may not provide useful discriminatory classes; in fact, the resulting classes may resemble an equal interval or arithmetic progression classification instead. Further, even when appropriately applied to a skewed dataset, it may be that class intervals imposed by geometric regression do not capture the underlying data hierarchy (Jiang 2013). Finally, for data that is heavily skewed toward the left, an inverse geometric progression could be implemented, but this functionality is not presently available in NJSHAD.

See also: http://en.wikipedia.org/wiki/Geometric_progression

This method is best applied to NJSHAD data that is positively (right) skewed (producing a J-shaped distribution curve with a peak at the low end of a histogram), particularly when there is a long "stretch" between low and high values. For datasets that are normally distributed or that are rectangular or flat, the classification results of geometric progression may not provide useful discriminatory classes; in fact, the resulting classes may resemble an equal interval or arithmetic progression classification instead. Further, even when appropriately applied to a skewed dataset, it may be that class intervals imposed by geometric regression do not capture the underlying data hierarchy (Jiang 2013). Finally, for data that is heavily skewed toward the left, an inverse geometric progression could be implemented, but this functionality is not presently available in NJSHAD.

See also: http://en.wikipedia.org/wiki/Geometric_progression

Similar to geometric progression in its applicability to skewed distributions, this classification
method increases the widths of the category intervals at an arithmetic (i.e., additive) rate. As
implemented on NJSHAD, if the first category is one unit wide, for example, the next categories are
incremented one additional unit at a time, resulting in a second category that is two units wide,
a third category three units wide, and so forth to the end of the distribution. This method has many
of the same strengths and shortcomings as the geometric method, but can provide a nonlinear
classification at a different scale, which may be appropriate to different data sets.

See also: http://en.wikipedia.org/wiki/Arithmetic_progression

See also: http://en.wikipedia.org/wiki/Arithmetic_progression

Visit the NJSHAD Help Page for Downloading Map Layers for use in other programs.

## References

- Jenks, George, and Michael Coulson. 1963. Class Intervals for Statistical Maps. International Yearbook of Cartography, 3:119-134.
- Kraak, Menno-Jan, and Ferjan Ormeling. 2003. Cartography: Visualization of Geospatial Data. Longman Group, United Kingdom.
- Mark Monmonier. 1991. How to Lie with Maps. University of Chicago Press.
- Brewer, Cynthia A., and Linda Pickle. 2002. Evaluation of Methods for Classifying Epidemiological Data on Choropleth Maps in Series, Annals of the Association of American Geographers, 92(4):662-681.
- Jenks, George F. 1967. The Data Model Concept in Statistical Mapping, International Yearbook of Cartography 7: 186-190.
- Jiang, Bin. 2013. Head/tail Breaks: A New Classification Scheme for Data with a Heavy-tailed Distribution, The Professional Geographer, 65(3), 2013, 482-494.