INFOSYS 247 Fall 2000 Assignment 4
Monica Maria
Fernandes
1 |
Dataset |
Authorization: public domain
Reference: Berndt, ER. The
Practice of Econometrics. 1991. NY: Addison-Wesley.
The data set contains 534 observations on 11 variables sampled from the Current Population Survey of 1985. This data set presents the following variables (Dimensions):
1. Education:
Edu: number of years of education
2. South:
indicator variable for Southern Region (1=Person lives in South, 0=Person lives
in elsewhere
3. Sex:
indicator variable for sex (1=Female, 0=Male)
4. Experience:
Exp: number of years of work experience
5. Union:
Indicator variable for union (1=Union member, 0=Not union member)
6. Wage:
dollars per hour
7. Age:
years
8. Race:
(1=Other, 2=Hispanic, 3=White)
9. Occupation:
Occ (0=Other, 1=Management, 2=Sales, 3=Clerical, 4=Service,
5=Professional)
10. Sector:
(0=Other, 1=Manufacturing, 2=Construction)
11. Marr:
marital status (0=Unmarried, 1=Married)
Nominal data: South, Sex, Union, Race, Occupation, Sector, Marr
Ordinal data: Education,
Experience, Wage, Age
2 |
Kinds of
relationships and hypotheses |
A. Correlations
1. Age & [work]
Experience: the older, more years of work experience (obvious)
2. Age & Wages: the older, tends
to have highest salaries, at the same Occupation?
3. Experience &
Wages: the more years of experience, the higher the Wage. But, it keeps
growing?
4. Wages: might vary
with Occupational status, Experience, Education, Sex, Region [of residence]
GENDER:
Are there any differences between Sex, regarding
Wages and Occupation?
1. Women get lower
Wages in management Occupation compared with men, if they have the same time of
Experience. [It means we would need to control this relation by a third
variable: experience].
2. Do women get the
same Wages in less paid Occupation than men?: If women, at clerical Occupation,
same Wages, compared with men?
3. Does Wages grow
with Age or tends to grow until a certain limit? If it does, are there any
differences between Sex?
MARITAL: Does Marriage make
difference, regarding Wages and Occupation?
1. Men get more
responsibilities with Marriage, and probably get better Wages then unmarried
men. [The tradition family work division]
2.
Women get more responsibilities with Marriage, and probably get lower
Wages compared with unmarried women. [The tradition
family work division]
3.
Unmarried women might get better paid Occupation, than married
women
All those Marital hypotheses
might not be explained by the variables presented in this case. Other
subjective variables such as motivation, responsibilities, cultural aspects,
family work division, pregnancy/children might control the results.
RACE: Are there any
differences between Race, regarding Wages and Occupation?
1. Whites get higher Wages than Hispanics and Others
3 |
Evidences to verify
or refute any of these hypotheses. |
Strong and Weak Correlations
The Figure 1 [Eureka tool] shows a high
correlation between Experience and Age, which is obvious, but no correlation
with Wages or other variables. Another view at the Xmdv tool presented at
Figure 2 shows a strong correlation between Age and Experience in the
scatterplot, and an interesting tendency to grow Wages until a certain point
and decrease after years of Experience and Age.
|
|
|
|
By focalizing the highest Wages [see Figure
3], we would find that the best salaries is from people
§
Who
have in median 17 years of Education, which means almost the highest level
[18].
§
Who
do not live [0] in the South [1],
§
Men
has the highest Wages,
§
The
median age is 35
§
Most
of them are white
§
The
highest Wages are from Professionals [5], followed by Management [1]
§
The
sector with highest wages is Other [0], with few on Manufacturing [1], and none
at Construction.
§
Most
of them are married [1]
|
|
|
|
|
Trying to understand more about the impact
of years of Education [dependent variable], we decided to focus in the
highest accumulated years 18 years , which presented some unexpected and
expected results [see Figure 4], such as:
§
Most of them do not leave [0] in the Southern Region [1].
§
Few women represented [the data is from 1985, and today this might
be quite different]
§
Education not necessarily means Experience, even considering
almost the same Occupation, which is predominantly of professionals [5]
§
Wages did not correlated to a relation between Education and
Experience. Using a filter, we found the average of Wages of the highest level
of Education [27 subjects] was 14, and the median 15, while the average and
median of highest Wages [27 subjects] in the sample was 22.
§
Most of them are White [only 1 Hispanic and 1 Others]
§
They are concentrated on other Sector, and only 1 in
Manufacturing.
§
Most them are married
§
Few are member of Union [0].
|
|
In the Figure 5, we used the tool
categorization to find out some patterns. Having categorized the wages in five
ranges, it was found different behaviors according to the kind of Occupation:
§
First, the highest Wages are concentrated on professional,
management and others. Clerical and sales do not have participation on the two
top ranges of salaries.
§
More years of Education are concentrated in professional and
management Occupations.
§
At management Occupation, Experience & Education have more
impact to achieve the highest Wages. But no so much impact at clerical.
§
In others Occupation, better Wages is achieved around 35 years
old, and decreases at the highest years of Experience. Probably because in this
case it correspond to the lowest levels of Education. But also the highest
level of Education did not result in highest Wages.
§
In professionals, years of Education more than Experience are
important to achieve the best Wages, since we found some concentration of high
Wages at low level of Experience.
4 |
Exploring the
dataset, and unexpected kinds of relations. |
GENDER: Are there any
differences between Sex, regarding Wages and Occupation?
As we present in Figure 6, in general
men have in average better Wages then women. If looking the details according
to Occupation, at management position, Men in average tend to have
higher salaries then women, although one outlier woman had the biggest salary
[she is unmarried, and works in the sector Other, and has 21 years old and one
year of experience, her Wage is almost twice the highest Wage, which is
unusual, unless she is the owner of some company
.].
As we will see in Figure 7, in sales
Occupation, the differences between men and women Wages are much more
accentuated. But at clerical Occupation where women are dominant, this tendency
in inverse: women in average have the highest salaries. In professional Occupation,
the difference between Men and Women decreases regarding Wages. At service
Occupation, we note that women in average have better Wages than men [taking
off one of the outliers], and low Wages in sales.
|
|
|
|
|
|
Also in the Figure 7, we can perceive
how women distributed across Occupation: women are majority at clerical, and
divide with men the sales and professional world, but is minority in other
Occupation.
Differences between Sexes can also be
perceived if controlled by Marital status, with outliers more dominant between
unmarried ones [see Figure 8].
MARITAL: Does Marriage
make difference, regarding Wages and Occupation?
At management Occupation, both married
women and men have highest Wages, as seen in Figure 9 and 10. In a more detailed
analysis at Figure 11, we figured out that in this sample there are more outliers
at unmarried status also that married women tend to have better salaries in
any Occupation, except in professional one, which in general refuted our hypothesis.
Although we might say that our sample has more married subjects then unmarried
ones. To explore the relation between Occupation and Sex, we used the tool
to focus the women in each Occupation, and localized the highest Wages. We
also did the same for men: the same
tendency was showed in men: married men got the highest Wages except in the
service Occupation.
|
|
|
|
|
|
|
|
RACE: Are there any differences
between Race, regarding Wages and Occupation?
In the Figure
4, we found that in the 27 highest Wages, only 2 of them would not be White.
But also we should consider the percentage of non-White in this population, for
example, 26 out of 534 are Hispanic.
Characteristics of Others:
§
Education: median 12, average 12.6 [the sample median is 12, the average is 13]
§
Wages:
median 7.5, average 8 [the median of the sample 7.7]
§
Occupation: most of them are concentrated on other, service and clerical.
Characteristics of Hispanic:
§
Education: median 12, average 11.4 [under the sample average]
§
Wages:
median 5.2, average 6.8 [under the sample median and average]
§
Occupation: most of them are concentrated in other, and only 4 in service
Characteristics of Whites
§
Education: median 12, average 12.64
§
Wages:
median 7.5, average 8.5
§
Occupation: distributed among Occupations
In the Figure 12, we see another representation
of the Races, which shows the white Race in average with better Wages, that
the outliers in Hispanic and Others are leveraging the averages of each category,
especially at professional Occupation. At management Occupation the difference
between white and the others is more significant. Also that in the less paid
Occupations, such as service and clerical the difference between other and
white is not so big as in relation with the Hispanics.
|
|
Other insights by
looking at Xmdv tool
By looking the parallel coordinates at
Figure 13, we can immediate perceive that women do not have the lowest level
of Education. Also comparing only Education and Wages, we can see very clear
that the years of Education not necessarily represents highest Wages, as shown
in Figure 14, and also that the lowest Wages correspond to several levels
of years of Education. Also it is interesting to observe the woman outlier
|
Figure
13: Relation between highest level of Education and Sex (Xmdl tool) |
|
|
It is very clear the relation between Wage and Occupation,
controlled by Education: the highest Wages are more correlated with Occupation
then with Education, although is relevant a certain level of years of Education
to succeed [see Figure 16]. The Figure 17 presents again the relevance
of management, professionals and other in the highest level of Wages. And
in Figure 15, we brushed the lowest levels of Wages to figure out how do they
distribute across Occupations and Education. The relation between Race and Wage shown in Figure
18 present a better visualization found before with Eureka. Looking at the
High levels of Wages we can figure out more clearly the differences in Wages
for the three Races, even seeing the proportion inside each race, special
attention to Hispanics, who were most outside the high level of Wages, except
by one outlier. 5 Most of this analyze was made using the Inxight Eureka tool, since the
Xmdv Tool wouldnt work well in some lab computer or would crash during the
process. Also the Inxight Eureka tool was perceived as more easy to use, and to
understand the graph concepts. Our overview of Xmdv is limited by the use,
since we did not explore as much as we would like. XmdvTool Release 4.2 beta [Matthew
Ward & Allen Martin] The Tool Purpose Visualization Tools:
XmdvTool Useful Features Appearance: colors can be customized easily
by choosing Preferences /Color Requestor. Scatterplots: very useful
to understand relations and focus some data to discover patterns. Intuitive to Use Switch between interactions modes: move from one to another is cleared perceived, although
the computer would crash most of the time. Hard to Understand Importing data: The process
of importing data is not so simple as the Eureka and Spotfire tools.
It was not clear what the role of cardinality, and which number would
be more appropriate to this data. The number of dimensions and records
should be included in the top of the data, besides the variable names
and minimal and maximum numbers for each variable. The file should be
saved in an .ock extension and not .txt as the other ones. Why by clicking in the header the software
would just disappear? Font Name and Size are confusing
since it does not apply to the presentation layout but to the software
interface, which does not make sense. The idea situations would be to
change the appearance of the presentation and not of the software. Why the minimal and maximal numbers
presented at the parallel coordinates did not
show as it was supposed to be? My fault or it is supposed to be like
that? Glyphs: problem
not with manipulation but to understand how useful this visual presentation
would be. Stack: at first
impression the concept is not clear. Missing Functionality Undo: I did
not find any function that allow me to go back the selection Save color preference: Every
time I would return to the software I would have to set up again. Save image: the popular
images tools did not recognize the format saved. Open file at the same location:
every time you open a file it requires to go though the tree directory,
which makes waste time. Navigation: the techniques
are display Open at the right folder: every
time we would open a folder the software would not show the last folder
opened. And considering the problem of crashing this would disturb anyone
Progressive refinement: how to
operate the data into categories might be missing or hard to understand.
Eureka seems to be much more intuitive than the other tools,
maybe because it uses some standardized Windows application procedures. Visualization Tools:
Eureka Useful Features Restore Row Order: Especially
good. Back and Forward button: helps
to make the experiment and data analysis. Hide columns: good
to compare more easily few variables. Moving columns: manipulating
a column from one place to another Focus: this ability
to focus the data allows us to analyze better the behavior. Categorization: once you
understand how to manipulate it can be provide a specific focus of analysis. Intuitive to Use Reversible interactions: back and
forward between view and specifications Sorting ascending: with a click
of the mouse, following the standard. Importing
data: The process is simple and only need to paste an excel file
into the Eureka screen. Hard to Understand The red line crossing
the icon, at first impression, seems that the button is not available. Categorization: it is
not clear the possibility of changing colors in the first step. And once you have categorized, the Categorize
option does not appear anymore. Instead
there is the option Order and Color. What is not clear also is why they display 1~ 2 and 2 ~3 when
you have only two categories. Filter: the idea
situation would be to highlight a specific area and at the same window
have the average, mean and medium of the corresponding data, instead
of moving to another window. Sometimes
the Filter tool did not worked so straight. Missing Functionality Highlights with lines and colors
[draw features]: as Word and Excel would have, it would facilitate
presentation Save as Image: for the
same presentation reasons. Save parameters: Cross intersection of variables: having
at the same column two variables to be analyzed Present at the same time Average,
Mean and Medium, instead of separate ones.
Tools Evaluation
Provide
analysis of multivariate data [multiple dimensions and parameters]. It presents
the data in 2-D screen and the following methods: scatterplots, glyphs,
parallel coordinates, and dimensional stacking.
Eureka Version 1.1 © Inxight Software
Also the software did not crash anytime, which made easier to work and explore
the tool and data.
Moving columns.