Greetings everyone,
Data classification methods are the name of the game this week. this is a part of our next couple of modules which are investigating different types of thematic maps. For this week though we aren't focusing as much on the thematic type, but the data classification methodology used to display the data set. Additionally, we continue to build upon the design principles of the previous modules. The key learning objectives though revolve around demonstrating differences between the 4 different methods presented below.
The subject area is Miami Dade County in southern Florida with the subject matter being the percentage of population above age 65 per census tract. Also, the raw number of members above age 65 normalized by square mile is also presented. Before we look into those, a touch on the 4 classification methods.
Equal Interval - This
classification method takes the range of data values (Max – Min Value) and
divides it by the desired number of classes. For example, values of 0 to 100
with 5 desired classes would equate to each class representing an increment of
20. One thing this doesn’t take into account is how the data is distributed
along a number line. It could result to classes with no values in them, and
classes with large amounts of values compared to the others. However there are
no gaps in the legend, but there may be some visual gaps if there are segments
without values.
Quantiles - This
classification method separates your data into an equal number of observations
per class. First, the data is rank ordered from lowest to highest or vice versa
and then observations are dispersed into your classes until all classes hold an
equal amount of observations in the ascending or descending order. One
potential problem is that this method does not take into account data
clustering or the natural break points observed as you can see when placing the
data on a number line. A positive is that there will be no empty classes within
your map.
Standard Deviation - This
classification method is best for data which is approximately normally
distributed along the number line. Provided the data follows the bell curve
model or rather has roughly equal amounts of data along both sides of the mean
it will represent well. One problem if the data is not equally distributed is
that you’re likely to have a skewed presentation with empty or misrepresented
color classes.
Jenks Natural Breaks - This classification method also takes into account where the data is
along the number line, but tries to group data items based on where they occur
most frequently. This attempts to group like values together and unlike values
in separate classes through a best fit algorithm.
Presentation 1.
Presentation 2.
Discussing the two approaches above I came to the following conclusion. Utilizing the population above age 65, normalized by square mile provides
a more accurate picture of this population. A similar argument as to the question
8 can be made for this normalized view of the data. The Jenks Natural Breaks
method is still the most desirable due to how it captures the highest data
class. The information that is more useful here is that it can show you an
actual accounting of how many citizens could be reached per tract. In this
case, the central area has tracts that mostly contain 3500 – 7120 people. This
could then also be used to judge return on investment from discussions with
this demographic.
This method also eliminates some of the skewed nature of some of the
surrounding tracts. With the number of individuals being presented you can better
account for actual tract population for the desired demographic. With the
percentage method you could have a much smaller population in total for a
tract, but the senior citizen population be a larger percentage. For example, a
tract with 100 people in it, 60 of which are senior citizens, would show in the
highest percentage class for the percent method. However, a tract with 1000
people and only 300 seniors, would show a lower percentage total, but by the
number method would be more valuable to target.