INFOSYS 247        Fall 2000               Assignment 4

Monica Maria Fernandes

 

 

1

Dataset

 

Authorization: public domain

Reference: Berndt, ER. The Practice of Econometrics. 1991. NY: Addison-Wesley.

 

The data set contains 534 observations on 11 variables sampled from the Current Population Survey of 1985. This data set presents the following variables (Dimensions):

 

1.       Education: Edu: number of years of education

2.       South: indicator variable for Southern Region (1=Person lives in South, 0=Person lives in elsewhere

3.       Sex: indicator variable for sex (1=Female, 0=Male)

4.       Experience: Exp: number of years of work experience

5.       Union: Indicator variable for union (1=Union member, 0=Not union member)

6.       Wage: dollars per hour

7.       Age: years

8.       Race: (1=Other, 2=Hispanic, 3=White)

9.       Occupation: Occ (0=Other, 1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional)

10.   Sector: (0=Other, 1=Manufacturing, 2=Construction)

11.   Marr: marital status (0=Unmarried, 1=Married)

 

     

Nominal data: South, Sex, Union, Race, Occupation, Sector, Marr

Ordinal data:  Education, Experience, Wage, Age

 

2

Kinds of relationships and hypotheses

 

A. Correlations

1.       Age & [work] Experience: the older, more years of work experience (obvious) 

2.       Age & Wages: the older, tends to have highest salaries, at the same Occupation? 

3.       Experience & Wages: the more years of experience, the higher the Wage. But, it keeps growing? 

4.       Wages: might vary with Occupational status, Experience, Education, Sex, Region [of residence]

 

B. Hypothesis

            GENDER: Are there any differences between Sex, regarding Wages and Occupation?

1.      Women get lower Wages in management Occupation compared with men, if they have the same time of Experience. [It means we would need to control this relation by a third variable: experience].

2.      Do women get the same Wages in less paid Occupation than men?: If women, at clerical Occupation, same Wages, compared with men?

3.      Does Wages grow with Age or tends to grow until a certain limit? If it does, are there any differences between Sex?

 

MARITAL: Does Marriage make difference, regarding Wages and Occupation?

1.      Men get more responsibilities with Marriage, and probably get better Wages then unmarried men. [The tradition family work division]

2.      Women get more responsibilities with Marriage, and probably get lower Wages compared with unmarried women. [The tradition family work division]

3.      Unmarried women might get better paid Occupation, than married women

 

All those Marital hypotheses might not be explained by the variables presented in this case. Other subjective variables such as motivation, responsibilities, cultural aspects, family work division, pregnancy/children might control the results.

 

RACE: Are there any differences between Race, regarding Wages and Occupation?

1.      Whites get higher Wages than Hispanics and Others

 

3

Evidences to verify or refute any of these hypotheses.


Strong and Weak Correlations

Age, Experience and Wages

The Figure 1 [Eureka tool] shows a high correlation between Experience and Age, which is obvious, but no correlation with Wages or other variables. Another view at the Xmdv tool presented at Figure 2 shows a strong correlation between Age and Experience in the scatterplot, and an interesting tendency to grow Wages until a certain point and decrease after years of Experience and Age.

 

 

Text Box: Figure 1:  Correlation between Age and Experience and Wages. (Eureka tool). Red highlight is from the user.

 


 
 

 

Text Box: Figure 2:  Correlation between Age & Experience, Experience & Wages and Age & Wages. (Eureka tool). Red highlight is from the user.


 
 
 
 

 

 

By focalizing the highest Wages [see Figure 3], we would find that the best salaries is from people

§         Who have in median 17 years of Education, which means almost the highest level [18].

§         Who do not live [0] in the South [1],

§         Men has the highest Wages,

§         The median age is 35

§         Most of them are white

§         The highest Wages are from Professionals [5], followed by Management [1]

§         The sector with highest wages is Other [0], with few on Manufacturing [1], and none at Construction.

§         Most of them are married [1]

  

 

Text Box: Figure 3:  Focusing the highest Wages, and analyzing across variables. 
 (Eureka tool).

 


 
 

 

Text Box: Figure 4:  Focusing the highest Education, and analyzing across variables.  (Eureka tool).

 

 

 

Trying to understand more about the impact of years of Education [dependent variable], we decided to focus in the highest accumulated years – 18 years –, which presented some unexpected and expected results [see Figure 4], such as:

 

§         Most of them do not leave [0] in the Southern Region [1].

§         Few women represented [the data is from 1985, and today this might be quite different]

§         Education not necessarily means Experience, even considering almost the same Occupation, which is predominantly of professionals [5]

§         Wages did not correlated to a relation between Education and Experience. Using a filter, we found the average of Wages of the highest level of Education [27 subjects] was 14, and the median 15, while the average and median of highest Wages [27 subjects] in the sample was 22.

§         Most of them are White [only 1 Hispanic and 1 Others]

§         They are concentrated on other Sector, and only 1 in Manufacturing.

§         Most them are married

§         Few are member of Union [0].


 

 

 

Text Box: Figure 5:  Age, Occupation, Wages, Experience and Education (Eureka tool) [Red lines and titles of occupation from the user]

…………………

Occupation:
others professional service 
clerical 
sales management

………………..

Wages
[categorized in 5 ranges]

………………..

Sex: 
women
men

 


 
 
 
 

 

 

In the Figure 5, we used the tool categorization to find out some patterns. Having categorized the wages in five ranges, it was found different behaviors according to the kind of Occupation:

§         First, the highest Wages are concentrated on professional, management and others. Clerical and sales do not have participation on the two top ranges of salaries.

§         More years of Education are concentrated in professional and management Occupations.

§         At management Occupation, Experience & Education have more impact to achieve the highest Wages. But no so much impact at clerical.

§         In others Occupation, better Wages is achieved around 35 years old, and decreases at the highest years of Experience. Probably because in this case it correspond to the lowest levels of Education. But also the highest level of Education did not result in highest Wages.

§         In professionals, years of Education more than Experience are important to achieve the best Wages, since we found some concentration of high Wages at low level of Experience.

 

4

Exploring the dataset, and unexpected kinds of relations.

 

 

GENDER: Are there any differences between Sex, regarding Wages and Occupation?

 

As we present in Figure 6, in general men have in average better Wages then women. If looking the details according to Occupation, at management position, Men in average tend to have higher salaries then women, although one outlier woman had the biggest salary [she is unmarried, and works in the sector Other, and has 21 years old and one year of experience, her Wage is almost twice the highest Wage, which is unusual, unless she is the owner of some company….].

As we will see in Figure 7, in sales Occupation, the differences between men and women Wages are much more accentuated. But at clerical Occupation where women are dominant, this tendency in inverse: women in average have the highest salaries. In professional Occupation, the difference between Men and Women decreases regarding Wages. At service Occupation, we note that women in average have better Wages than men [taking off one of the outliers], and low Wages in sales.

 

Text Box: Figure 6:  Relation between Sex, Wage (Eureka tool)

Sex: 
women
Men

 

 

Text Box: Figure 7:  Relation between Sex, Wage & Occupation (Eureka tool)

Sex: 
women
men


Occupation: from top to bottom:

others professional service 
clerical 
sales management,

 
 
 
   

Text Box: Figure 8:  Relation between Sex, Wage & Marital (Eureka tool)

Marital:
unmarried
married

 

 


 
 

Also in the Figure 7, we can perceive how women distributed across Occupation: women are majority at clerical, and divide with men the sales and professional world, but is minority in other Occupation.

Differences between Sexes can also be perceived if controlled by Marital status, with outliers more dominant between unmarried ones [see Figure 8].

 

MARITAL: Does Marriage make difference, regarding Wages and Occupation?

 

At management Occupation, both married women and men have highest Wages, as seen in Figure 9 and 10. In a more detailed analysis at Figure 11, we figured out that in this sample there are more outliers at unmarried status also that married women tend to have better salaries in any Occupation, except in professional one, which in general refuted our hypothesis. Although we might say that our sample has more married subjects then unmarried ones. To explore the relation between Occupation and Sex, we used the tool to focus the women in each Occupation, and localized the highest Wages. We also did the same for men:  the same tendency was showed in men: married men got the highest Wages except in the service Occupation.

 

Text Box: Figure 9:  Women at management Occupation:  Wage & Marital status (Eureka tool) Red highlights from the user

Marital:

unmarried
married

 

 

 

 

 

Text Box: Figure 10:  Men at management Occupation:  Wage & Marital status (Eureka tool) 

Marital:

unmarried
married

 

Text Box: Figure 11:  Marital status, Occupation, Wages and Sex (Eureka tool) 

Marital: 
married
Unmarried

…………………

Occupation:
others professional service 
clerical 
sales management

………………..

Wages

………………..

Sex: 
women
men
   
   

 

 

 

RACE: Are there any differences between Race, regarding Wages and Occupation?

In the Figure 4, we found that in the 27 highest Wages, only 2 of them would not be White. But also we should consider the percentage of non-White in this population, for example, 26 out of 534 are Hispanic.

 

Characteristics of Others:

§         Education: median 12, average 12.6 [the sample median is 12, the average is 13]

§         Wages: median 7.5, average 8 [the median of the sample 7.7]

§         Occupation: most of them are concentrated on other, service and clerical.

Characteristics of Hispanic:

§         Education: median 12, average 11.4 [under the sample average]

§         Wages: median 5.2, average 6.8 [under the sample median and average]

§         Occupation: most of them are concentrated in other, and only 4 in service

Characteristics of Whites

§         Education: median 12, average 12.64

§         Wages: median 7.5, average 8.5

§         Occupation: distributed among Occupations

 

In the Figure 12, we see another representation of the Races, which shows the white Race in average with better Wages, that the outliers in Hispanic and Others are leveraging the averages of each category, especially at professional Occupation. At management Occupation the difference between white and the others is more significant. Also that in the less paid Occupations, such as service and clerical the difference between other and white is not so big as in relation with the Hispanics.


 
 

 

Text Box: Figure 12:  Race and its relation with Wages and Occupation (Eureka tool) 

Occupation:
Others
Professional
service 
clerical 
sales
management

race: 
Other
Hispanic
White


…………………

………………..

Wages

………………..

Sex: 
women
men

 

 

 

 

Other insights by looking at Xmdv tool

 

By looking the parallel coordinates at Figure 13, we can immediate perceive that women do not have the lowest level of Education. Also comparing only Education and Wages, we can see very clear that the years of Education not necessarily represents highest Wages, as shown in Figure 14, and also that the lowest Wages correspond to several levels of years of Education. Also it is interesting to observe the woman outlier.

 

Figure 13: Relation between highest level of Education and Sex (Xmdl tool)


 
 
 
 

 

 

 

 


Figure 14: Relation between highest level of Education and Wage (Xmdl tool)


 
 
 
 
 

 

Figure 15: Relation beteween lowest Wages and levels of Education (Xmdv tool)

 

 

 

 

It is very clear the relation between Wage and Occupation, controlled by Education: the highest Wages are more correlated with Occupation then with Education, although is relevant a certain level of years of Education to succeed [see Figure 16]. The Figure 17 presents again the relevance of management, professionals and other in the highest level of Wages. And in Figure 15, we brushed the lowest levels of Wages to figure out how do they distribute across Occupations and Education.

 

 

Figure 16: Relation between highest Wages, Occupation and levels of Education (Xmdv tool)

 

 


 
   

 

Figure 17: Relation between lowest Wages, Occupation and levels of Education (Xmdv tool)


 
 
 
 

 

The relation between Race and Wage shown in Figure 18 present a better visualization found before with Eureka. Looking at the High levels of Wages we can figure out more clearly the differences in Wages for the three Races, even seeing the proportion inside each race, special attention to Hispanics, who were most outside the high level of Wages, except by one outlier.

 
 
 

 

Figure 18: Relation between high level of Wages and Race (Xmdv tool)

3 = White

2 = Hispanic

1 = Other

 

 

 

 

5

Tools Evaluation

 

Most of this analyze was made using the Inxight Eureka tool, since the Xmdv Tool wouldn’t work well in some lab computer or would crash during the process. Also the Inxight Eureka tool was perceived as more easy to use, and to understand the graph concepts. Our overview of Xmdv is limited by the use, since we did not explore as much as we would like.

 

XmdvTool Release 4.2 beta [Matthew Ward & Allen Martin]

The Tool Purpose


Provide analysis of multivariate data [multiple dimensions and parameters]. It presents the data in 2-D screen and the following methods: scatterplots, glyphs, parallel coordinates, and dimensional stacking.

 

Visualization Tools: XmdvTool

Useful Features

Appearance: colors can be customized easily by choosing Preferences /Color Requestor.

Scatterplots: very useful to understand relations and focus some data to discover patterns.

Intuitive to Use

Switch between interactions modes: move from one to another is cleared perceived, although the computer would crash most of the time.

Hard to Understand

Importing data: The process of importing data is not so simple as the Eureka and Spotfire tools. It was not clear what the role of cardinality, and which number would be more appropriate to this data. The number of dimensions and records should be included in the top of the data, besides the variable names and minimal and maximum numbers for each variable. The file should be saved in an .ock extension and not .txt as the other ones.

Why by clicking in the header the software would just disappear?

Font Name and Size are confusing since it does not apply to the presentation layout but to the software interface, which does not make sense. The idea situations would be to change the appearance of the presentation and not of the software.

Why the minimal and maximal numbers presented at the parallel coordinates did not show as it was supposed to be? My fault or it is supposed to be like that?

Glyphs: problem not with manipulation but to understand how useful this visual presentation would be.

Stack: at first impression the concept is not clear.

Missing Functionality

Undo: I did not find any function that allow me to go back the selection

Save color preference: Every time I would return to the software I would have to set up again.

Save image: the popular images tools did not recognize the format saved.

Open file at the same location: every time you open a file it requires to go though the tree directory, which makes waste time.

Navigation: the techniques are display

Open at the right folder: every time we would open a folder the software would not show the last folder opened. And considering the problem of crashing this would disturb anyone…

Progressive refinement: how to operate the data into categories might be missing or hard to understand.

 

Eureka Version 1.1 © Inxight Software

 

Eureka seems to be much more intuitive than the other tools, maybe because it uses some standardized Windows application procedures.
Also the software did not crash anytime, which made easier to work and explore the tool and data.

 

Visualization Tools: Eureka

Useful Features

Restore Row Order: Especially good.

Back and Forward button: helps to make the experiment and data analysis.

Hide columns: good to compare more easily few variables.

Moving columns: manipulating a column from one place to another

Focus: this ability to focus the data allows us to analyze better the behavior.

Categorization: once you understand how to manipulate it can be provide a specific focus of analysis.

 

Intuitive to Use

Reversible interactions: back and forward between view and specifications

Sorting ascending: with a click of the mouse, following the standard.
Moving columns.

Importing data: The process is simple and only need to paste an excel file into the Eureka screen.

 

Hard to Understand

The red line crossing the icon, at first impression, seems that the button is not available.

Categorization: it is not clear the possibility of changing colors in the first step.  And once you have categorized, the Categorize option does not appear anymore.  Instead there is the option Order and Color.  What is not clear also is why they display 1~ 2 and 2 ~3 when you have only two categories.

Filter: the idea situation would be to highlight a specific area and at the same window have the average, mean and medium of the corresponding data, instead of moving to another window.  Sometimes the Filter tool did not worked so straight.

Missing Functionality

Highlights with lines and colors [draw features]: as Word and Excel would have, it would facilitate presentation

Save as Image: for the same presentation reasons.

Save parameters:

Cross intersection of variables: having at the same column two variables to be analyzed

Present at the same time Average, Mean and Medium, instead of separate ones.