A. WHAT DOES RESEARCH MEAN
According to BRUCE W. TUCKMAN in his book
entitled “CONDUCTING EDUCATIONAL RESEARCH“, research is a systematic attempt to provide
answers to questions. Such answers may be abstract and general as is often the
case in basic research, or they may be highly concrete and specific as is often
the case in demonstration or applied research. In both kinds of research, the
investigator uncovers facts and then formulates a generalization based on the
interruption of those facts.
Basic research is concerned with the
relationship between two or more variables. It is carried out by identifying a
problem, examining selected relevant variables through a literature review,
constructing a hypothesis where possible, creating a research design to
investigate the problem, collecting and analyzing appropriate data, and then
drawing conclusions about the relationship of the variables. Basic research
does not often provide immediately usable information for altering the
environment. Its purpose, rather, is to develop a model, or theory, that
identifies all the relevant variables in a particular environment and
hypothesizes about their relationship. Then, using the findings of basic
research, it is possible to develop a product-product here being used to
include, for example, a given curriculum, a particular teacher-training
program, a textbook, or an audio-visual aid.
A further step is to test the product, the
province of applied research, often called demonstration. In effect, applied
research is a test or tryout that includes systematic evaluation.
B.
SCIENTIFIC THEORY VS COMMON SENSE
Science (from Latin scientia, meaning
"knowledge") is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about nature
and the universe.
science was a discovery that nature generally acts regularly enough to be
described by laws and even by mathematics; and required invention
to devise the techniques, abstractions, apparatus, and organization for
exhibiting the regularities and securing their law-like descriptions."—J.
L. Heilbron (Heilbron
& 2003 p.vii ).
A scientific theory is a well-substantiated explanation of some aspect of the
natural world that is acquired through the scientific method and repeatedly tested and
confirmed through observation and experimentation. A
scientific theory summarizes a hypothesis or group of hypotheses that have been
supported with repeated testing. A scientific theory may be rejected or modified if it does not
fit the new empirical findings- in such circumstances, a more accurate theory
is then desired. In certain cases, the less-accurate unmodified scientific
theory can still be treated as a theory if it is useful (due to its sheer
simplicity) as an approximation under specific conditions. A theory in this context is a well-substantiated explanation for a series of facts and observations that is testable and can be used to predict future observations. When used in non-scientific context, the word “theory”
implies that something is unproven or speculative which is more appropriate to be mentioned as common sense.
Scientific theories construct the enterprise of science. A theory is an
abstraction that applies to variety of circumstances, explaining relationships
and phenomena, based on objective evidence. Science uses conceptual schemes and theoretical structures
built through internal consistency which are empirically tested. The distinction between this structure of thought and common
sense should be, well, common sense. Common sense has no structure to it, is
explicitly subjective, and is subject to all manner of cognitive biases. There
is no need for testing, replication, or verification.
C. CONCEPTS,
CONSTRUCTS AND VARIABLES
1.
CONCEPTS
Concepts
are abstractions from particulars. Concepts have constitutive definitions, so
concepts are rich in meaning but cannot be measured directly. Many things we
want to study in behavioral research are concept, for instance quality,
satisfaction, attitude, leadership, etc. in research methodology, concept are
used in the step of problem and hypothesis formulation.
For example : Scientific : weight, mass, energy, force, etc.
Emotions : happiness, sadness, fear,
anxiety, etc.
2.
CONSTRUCTS
Construct
are concept that are measurable. Constructs are measurable because they have
additional definitions, operational definitions. Operationalization of concepts
into constructs concern with the concept of validity and reliability. After
operationalization, each concept becomes a construct. In the measurement
instrument for instance, questionnaire, each construct becomes a measureable
scale. A measureable scale can be a single-item or multiple-item scale. In
research methodology, constructs are used in the step of designing the
measurement instrument (operationalization of concepts).
For
example : Intelligence
(Concept
is theoretically whereas Construct defines and specifies so that it can be
measured and observed” (Kerlinger)
3.
VARIABLES
A
variable is a symbol to which numerals or values are assigned. A term often
requires an operational definition. The uses of variable can be differ into Independent
and Dependent variable. Meanwhile, types of variable are consist of attribute,
active, intervening, continuous and categorical.
For
example : Weight, energy,
intelligence, driver reaction time, stopping distance, age range, etc.
D.
CATEGORY OF VARIABLE ,
AND SCALE OF VARIABLE
1.
CATEGORY OF VARIABLES
a.
The Independent
Variable
The
Independent variable, which is stimulus variable or input operates either
within a person or within his or her environment to affect behaviour. It is
that factor which is measured, manipulated, or selected the experimenter or
determine it’s relationship to an observed phenomenon. If an experimenter
studying the relationship between two variables, X and Y, asks himself “ What will happen to Y if I
make X greater or smaller?” He is thinking X as independent variable. It is a
variable that will manipulate or change to cause a chance in some other
variable, how it affects another variable not in what affects it.
b.
The Dependent Variable
The dependent variable
is a respone variable or output. It is an observed aspect of the behavior of an
organism that has been stimulated. The dependent variable is that factor which
is observed and measured to dettermine the effect of the independent variable,
that is, that factor that appears, dissapears, or varies, as the experimenter
introduces, removes, or varies independent variable. In the study of
relationship between the two variables X and Y when the experimenter asks “
What will happen to Y if I make X greater or smaller?” he is thinking Y as
dependent variable. It is considered dependent because its value depends upon
the value of dependent variable.
Some Example of
Independent and Dependent Variable
The independent and dependent terms are used when researchers are trying to determine if there is a probable causal
relationship between variables. These terms
distinguish the variable that the researcher expects to have influence and the variable that he or she expects to be
influenced. The independent variable is the variable expected to change or
influence the dependent variable.
Thus, the dependent variable is expected to change or be influenced by the variation in the independent
variable.
A number of hypotheses
drawn from studies undertaken in a research methods course are listed bellow:
·
Hypothesis 1. Under intangible reinforcement condition,
middle-class children will learn significantly better than lower-class
children.
Independent Variable: middle-class versus lower class
Dependent Variable : ease or
speed learning
·
Hypothesis 2. Girls who plan to pursue careers in science
are more aggresive, less conforming, more independent, and have a greater need
for achievment than girls who do not plan such careers.
Independent Variable: Girls who plan to pursue careers
in science and girls who do no
Dependent Variable : Agroresiveness,confomity, independence,
need for achievment
c.
The Moderator Variable
The term moderator variable describes a special type of independent
variable, a secondary independent variable selected for study determine if it
affects the relationship between the primary independent variable and dependent
variables. The moderator variable is define as that factor which is measured,
manipulated, or selected by the experimenter to discover whether it modifies
the relationship of the dependent variable to an observed phenomenon. The
word moderator simply acknowledges the
reason that this secondary independent variable has been singled out for study.
If the experimenter is interested in studying the effect of independent
variable X on dependent Y but suspects that the nature of the relationship
between X and Y is altered by level of a third factor Z, then Z can be in the
analysis as a moderator variable.
Listed below are a
number of hypotheses drawn from various sources including students’ research
reports: the moderator variable ( along the independent and dependent variable)
has been identified for each one.
·
Hypothesis 1. Male experimenters get more effective
performances from both male and female subjects than do female experimenters,
but they are singularly most effective with male subjects.
Independent variable : the sex of experimenter
Moderator variable : the sex of the subject
Dependent variable :
effectiveness of performance of subject
·
Hypothesis 2. Situational pressures of morality cause
nondogmatic school superitendents to innovate while situational pressures of
expendiency cause dogmatic school superintendents to innovate.
Independent variable : type of situational pressure –
morality versus expendiency
Moderator variable : level of
dogmantism
Dependent variable : degree
to which superintendent innovates
d.
Control Variables
All the variables in a situation ( situational variables)
or in a person ( disposional variables) cannot be studied at the same time;
some must be neutralized to guarantee that they will not have a differential or
moderating effect on the relationship between the independent variable and the
dependent variable. These variables whose effects musb be neutralized or
controlled are called control variables. They are defined those factors which
are controlled by experimenter to cancel out or neutralize any effect they
otherwise have on the observed phenomenon. While the effects of control
variables are neutralized, the effects of moderator variables are studied.
Control
variables are not neccesarily specified in the hypothesis. It is often
neccesary to read the method section of a study to discover which variables
have been treated as control variables. The example below however, specifically
list at least one control variable in the hypothesis.
·
Hypothesis 1. Among boys there is correlation between
physical size and social maturity, but for girls in the same age group there is
no correlation between these two variables.
Control variable: age
·
Hypothesis 2. Task performance by high need achievers
will exceed that of low need achievers in tasks with a 50 percent subjective
probability of success.
Control variable :
subjective probability of task success
e.
Intervening Variables
All of the variables
described thus far-Independent, Dependent, Moderator, and Control – are
concrete. Each independent, dependent, moderator, and control variable can be
manipulated by the experimenter, and each variation can be observed as it
affects the dependent variable. What the experinter is trying to find out by
manipulating theese concrete variables is often not concrete, however, but
hypothetical : the relationship between a hypothetical underlying or
intervening variable and a dependent variable. An intervening variable is that
factor which theoretically affects the observed phenomenon but cannot be seen ,
measured, or manipulated; its effect must be
inferred from the effects of the independent and moderator variables on
the observed phenomenon.
In writing about their
experiments, researcher do not always identify their intervening variables and
are even less likely to label them as such.
Consider the role of the
intervening variable in the following hypotheses.
·
Hypothesis 1. Children who are blocked from reaching
their goals exhibit more aggresive acts than children not so blocked.
Independent variable : being or not being blocked from
goal
Intervening variable : frustation
Dependent variable : number
of aggresive acts
·
Hypothesis 2. Teacher given more positive feedback
experiences will have more positive attitudes toward children than teachers
given fewer positive feedback experiences.
Independent variable : number of positive feedback experiences
for teacher
Intervening variable : teacher’s self-esteem
Dependent variable :
positivness of teacher’s attitude towards students
2.
SCALE OF VARIABLE
a.
Nominal variables
Nominal variables are variables that have two
or more categories, but which do not have an intrinsic order. For example, a
real estate agent could classify their types of property into distinct
categories such as houses, condos, co-ops or bungalows. So "type of
property" is a nominal variable with 4 categories called houses, condos,
co-ops and bungalows. Of note, the different categories of a nominal variable
can also be referred to as groups or levels of the nominal variable. Another
example of a nominal variable would be classifying where people live in the USA
by state. In this case there will be many more levels of the nominal variable
(50 in fact).
b.
Ordinal variables
Ordinal variables are
variables that have two or more categories just like nominal variables only the
categories can also be ordered or ranked. So if you asked someone if they liked
the policies of the Democratic Party and they could answer either "Not
very much", "They are OK" or "Yes, a lot" then you
have an ordinal variable. Why? Because you have 3 categories, namely "Not
very much", "They are OK" and "Yes, a lot" and you can
rank them from the most positive (Yes, a lot), to the middle response (They are
OK), to the least positive (Not very much). However, whilst we can rank the
levels, we cannot place a "value" to them; we cannot say that
"They are OK" is twice as positive as "Not very much" for
example.
c.
Interval variables
Interval variables are
variables for which their central characteristic is that they can be measured
along a continuum and they have a numerical value (for example, temperature
measured in degrees Celsius or Fahrenheit). So the difference between 20C and
30C is the same as 30C to 40C. However, temperature measured in degrees Celsius
or Fahrenheit is NOT a ratio variable.
d. Ratio variables
Ratio variables are
interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature
measured in degrees Celsius or Fahrenheit is not a ratio variable because 0C
does not mean there is no temperature. However, temperature measured in Kelvin
is a ratio variable as 0 Kelvin (often called absolute zero) indicates that
there is no temperature whatsoever. Other examples of ratio variables include
height, mass, distance and many more. The name "ratio" reflects the
fact that you can use the ratio of measurements. So, for example, a distance of
ten metres is twice the distance of 5 metres.
D. TYPES OF RESEARCH
1.
Descriptive Research
a.
Definition
Descriptive research is the most widely-used research design as
indicated by the theses, dissertations and research reports of institutions.
Its common means of obtaining information include the use of the questionnaire,
personal interviews with the aid of study guide or interview schedule, and
observation, either participatory or not.
Descriptive research includes studies that purport to present facts concerning the nature and
status of anything. This means that descriptive research gives meaning to
the quality and standing of facts that are going on. For instance, the
information about a group of person, a number of objects, a set of conditions,
a class of events, a system of thoughts or any other kind of phenomenon or
experience which one may wish to study.
The fact-finding of descriptive reasearch is adequate
interpretation. The descriptive method is something more and beyond just
data-gathering; latter is neither reflective thinking nor research. The true meaning of data collected
should be reported from the point of view of the objectives and the basic
assumption of the project under way. Facts obtained may be accurate expressions
of central tendency, or deviation, or correlation; but the report is not
research unless discussion of those data is not carried up to the level of
adequate interpretation. Data must be subjected to the thinking process in
terms of ordered reasoning.
ð
The Nature of
Descriptive Research
v
Descriptive research
is designed for the investigator to gather information about present existing
conditions.
v
Descriptive research
involves collection of data in order to test the hypothesis or to answer
questions concerning the current status of the subject of the study.
v
Descriptive study
determines and reports the way things are. It has no control over what is,
and it can only measure what already exist.
v
Descriptive research
has been criticized for its inability to control variables, for being a
post-hoc study and for more frequently yielding only descriptive rather than
predictive, findings.
ð
The Aim of Descriptive
Research
v
The principal aims in
employing descriptive research are to describe the nature of a situation as it
exists at the time of the study and to explore the causes of particular
phenomena. (Travers, 1978)
v
Descriptive Research
seeks to tell “what exists” or “what is” about a certain educational phenomenon. Accurate observations and assessments arise
from data that ascertain the nature and incidence of prevailing conditions,
practices or description of object, process, and person who are all objects of
the study.
v
contribute in the
formation of principles and generalization in behavioural sciences
v
contribute in the
establishment of standard norms of conduct, behaviour, or performance.
v
reveal problems or
abnormal conditions;
v
make possible
prediction of future on the basis of findings on prevailing conditions,
corrections, and on the basis of reactions of people toward certain issues;
v
give better and deeper
understanding of phenomenon on the basis of an in-depth study of the
phenomenon.
v
provide basis for
decision-making.
Bickman
and Rog (1998) suggest that descriptive studies can answer questions such as
“what is” or “what was.” Experiments can typically answer “why” or “how.”
2.
Correlational Research
a.
Definition
Correlational research refers to the systematic investigation or statistical study of relationships
among two or more variables, without necessarily determining cause and effect.
ð
The Nature of
Correlational Research
v
Correlational research is also known as
associational research.
v
Relationships among two or more variables are
studied without any attempt to influence them.
v
Investigates the possibility of relationships
between two variables.
v
There is no manipulation of variables in
correlational research
v Correlational
studies describe the variable relationship via a correlation coefficient.
ð
The Aim of Correlational
Research
v
Correlational studies are carried out to
explain important human behavior or to predict likely outcomes (identify
relationships among variables).
v
If a relationship of sufficient magnitude
exists between two variables, it becomes possible to predict a score on either
variable if a score on the other variable is known (Prediction Studies).
v The
variable that is used to make the prediction is called the predictor
variable.
v
The variable about which the prediction is
made is called the criterion variable.
v
Both scatterplots and regression
lines are used in correlational studies to predict a score on a
criterion variable
v A
predicted score is never exact. Through a prediction equation, researchers use
a predicted score and an index of prediction error (standard error of estimate)
to conclude if the score is likely to be incorrect.
3.
Experimental Research
a.
Definition
Experimental research is defined as “observation under controlled
conditions”. Experimental research design is concerned with examination of the
effect of independent variable on dependent variable, where the independent
variable is manipulated through treatment or intervention(s), and the effect of
those interventions is observed on the dependent variable.
b.
Experimental Designs:
v
Pre and Post Test Only
Design
In this design, subjects are
randomly assigned to either the experimental or control group. The effect of the dependent variable on both the
groups is seen before the treatment (pre test). Following this the treatment is
carried out on experimental group only. After treatment observation of dependent variable is
made on both the groups to examine the effect of the manipulation of
independent variable on dependent variable.
v.
Solomon Four Group
Design
There are two experimental and
two control group (control group - I
& II) (Exp group- I & II). Initially the researcher randomly assigns subjects
to the four groups. Out of four groups,
only exp group I & control group I receives the pre test followed by the treatment
to the experimental group I & II. Finally all the four groups receive post test, where
the effects of the dependent variables of the study are observed and comparison
is made of the four groups to assess the effect of independent variable
(experimental variable) on the dependent variable.
The experimental group II is
observed at one occasion. To estimate the
amount of change in experimental & control group II the average test scores
of experimental & control groups I are used as baseline. The Solomon four group design is considered to be most
prestigious experimental research design, because it minimizes the threat to
internal and external validity. The test
effectively presents the reactive effects of the pre test. Any difference
between the experimental and control group can be more confidently attributed
to the experimental treatment.
E. DESCRIPTIVE STATISTICS
Descriptive statistics is the discipline of quantitatively describing the
main features of a collection of information, or the quantitative description itself. Descriptive
statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a sample,
rather than use the data to learn about the population that the sample of data is thought to represent. This
generally means that descriptive statistics, unlike inferential statistics, are
not developed on the basis of probability theory. Even when a data analysis draws its main conclusions
using inferential statistics, descriptive statistics are generally also
presented. For example in a paper reporting on a study involving human
subjects, there typically appears a table giving the overall sample size,
sample sizes in important subgroups (e.g., for each treatment or exposure
group), and demographic or clinical characteristics such as the average age,
the proportion of subjects of each sex, and the proportion of subjects with
related comorbidities.
Descriptive
statistics are very important because if we simply presented our raw data it
would be hard to visulize what the data was showing, especially if there was a
lot of it. Descriptive statistics therefore enables us to present the data in a
more meaningful way, which allows simpler interpretation of the data. For
example, if we had the results of 100 pieces of students' coursework, we may be
interested in the overall performance of those students. We would also be
interested in the distribution or spread of the marks. Descriptive statistics
allow us to do this. How to properly describe data through statistics and
graphs is an important topic and discussed in other Laerd Statistics guides.
Typically, there are two general types of statistic that are used to describe
data:
- Measures of central tendency: these
are ways of describing the central position of a frequency distribution
for a group of data. In this case, the frequency distribution is simply
the distribution and pattern of marks scored by the 100 students from the
lowest to the highest. We can describe this central position using a
number of statistics, including the mode, median, and mean. You can read
about measures of central tendency here.
- Measures of spread: these
are ways of summarizing a group of data by describing how spread out the
scores are. For example, the mean score of our 100 students may be 65 out
of 100. However, not all students will have scored 65 marks. Rather, their
scores will be spread out. Some will be lower and others higher. Measures
of spread help us to summarize how spread out these scores are. To
describe this spread, a number of statistics are available to us,
including the range, quartiles, absolute deviation, variance and standard
deviation.
When
we use descriptive statistics it is useful to summarize our group of data using
a combination of tabulated description (i.e., tables), graphical description
(i.e., graphs and charts) and statistical commentary (i.e., a discussion of the
results).
Descriptive
statistics provides simple summaries about the sample and about the
observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. These
summaries may either form the basis of the initial description of the data as
part of a more extensive statistical analysis, or they may be sufficient in and
of themselves for a particular investigation.
For example, the shooting
percentage in
basketball is
a descriptive statistic that summarizes the performance of a player or a team. This
number is the number of shots made divided by the number of shots taken. For
example, a player who shoots 33% is making approximately one shot in every
three. The percentage summarizes or describes multiple discrete events.
Consider also the grade point average. This single number describes the general performance
of a student across the range of their course experiences. The use of
descriptive and summary statistics has an extensive history and, indeed, the
simple tabulation of populations and of economic data was the first way the
topic of statistics appeared. More recently, a collection of summarisation
techniques has been formulated under the heading of exploratory data analysis: an example of such a technique is the box plot.
1. Univariate analysis
2. Bivariate analysis
When
a sample consists of more than one variable, descriptive statistics may be used
to describe the relationship between pairs of variables. In this case,
descriptive statistics include:
The
main reason for differentiating univariate and bivariate analysis is that
bivariate analysis is not only simple descriptive analysis, but also it
describes the relationship between two different variables.
F. READING
STATISTIC
1.
Measures of Central Tendency
To help readers get a
feel for the data that have been collected, researchers almost always say
something about the typical or representative score in the group. They do this
by computing and reporting one or more measures of central tendency. There are three
such measures that are frequently seen in the published literature, each of
which provides a numerical index of the average score in the
distribution.
a.
The Mode, Median, and Mean
The mode is simply the most
frequently occurring score. For example, given the nine scores 6, 2, 5, 1, 2,
9, 3, 6, and 2, the mode is equal to 2. The median is the number that lies at the midpoint of the
distribution of earned scores; it divides the distribution into two equally
large parts. For the set of nine scores just presented, the median is equal to
3. Four of the nine scores are smaller than 3; four are larger.5 The mean is the point that minimizes the collective distances of
scores from that point. It is found by dividing the sum of the scores by the
number of scores in the data set. Thus, for the group of nine scores presented
here, the mean is equal to 4.
In journal articles, authors sometimes use abbreviations
or symbols when referring to their measure(s) of central tendency. The
abbreviations Mo and Mdn, of course, correspond to the mode and median,
respectively. The letter M always stands for the mean, even though all three
measures of central tendency begin with this letter.
The Mean
Example:
Four tests results: 15, 18, 22, 20
The sum is: 75
Divide 75 by 4: 18.75
The 'Mean' (Average) is 18.75
(Often rounded to 19)
The Median
The
Median is the 'middle value' in your list. When the totals of the list are odd,
the median is the middle entry in the list after sorting the list into
increasing order. When the totals of the list are even, the median is equal to
the sum of the two middle (after sorting the list into increasing order)
numbers divided by two. Thus, remember to line up your values, the middle
number is the median! Be sure to remember the odd and even rule.
Examples:
Find the Median of: 9, 3, 44, 17, 15
(Odd amount of numbers)
Line up your numbers: 3, 9, 15, 17, 44 (smallest to largest)
The Median is: 15 (The number in the middle)
Find the Median of: 8, 3, 44, 17, 12, 6 (Even amount of numbers)
Line up your numbers: 3, 6, 8, 12, 17, 44
Add the 2 middles numbers and divide by 2: 8 12 = 20 ÷ 2 = 10
The Median is 10.
The Mode
The
mode in a list of numbers refers to the list of numbers that occur most
frequently. A trick to remember this one is to remember that mode starts with
the same first two letters that most does. Most frequently - Mode. You'll never
forget that one!
Examples:
Find the mode of:
9, 3, 3, 44, 17 , 17, 44, 15, 15, 15, 27, 40, 8,
Put the numbers is order for ease:
3, 3, 8, 9, 15, 15, 15, 17, 17, 27, 40, 44, 44,
The Mode is 15 (15 occurs the most at 3 times)
*It is important to note that there
can be more than one mode and if no number occurs more than once in the set,
then there is no mode for that set of numbers.
Occassionally, in Statistics you'll be asked for the 'range'
in a set of numbers. The range is simply the the smallest number subtracted
from the largest number in your set. Thus, if your set is 9, 3, 44, 15, 6 - The
range would be 44-3=41. Your range is 41. Further explanation can be read in the following
topics below.
2.
Standard Deviation
a.
Standard Deviation and Variance
Deviation just means how far from the normal. The Standard Deviation is a
measure of how spread out numbers are. Its symbol is σ (the greek letter
sigma) The formula is easy: it is the square root of the Variance. So
now you ask, "What is the Variance?"
Variance
The Variance is defined as: The
average of the squared differences from the Mean.
To calculate the variance follow
these steps:
- Work out the Mean (the simple average of the numbers)
- Then for each number:
subtract the Mean and square the result (the squared difference).
- Then work out the average of
those squared differences.
Example
You and your friends have just
measured the heights of your dogs (in millimeters):
The heights (at the shoulders)
are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation!
Your first step is to find the
Mean:
Answer:
|
Mean =
|
600 + 470 + 170 + 430 + 300
|
=
|
1970
|
= 394
|
|
5
|
5
|
so the mean (average) height is 394 mm. Let's plot this on the chart:
Now we
calculate each dog's difference from the Mean:
To calculate
the Variance, take each difference, square it, and then average the result:
So, the Variance is 21.704.
And the Standard Deviation is
just the square root of Variance, so:
Standard Deviation: σ =
√21.704 = 147,32... = 147 (to the nearest mm)
And the good thing about the Standard
Deviation is that it is useful. Now we can show which heights are within one
Standard Deviation (147mm) of the Mean:
So, using the Standard Deviation
we have a "standard" way of knowing what is normal, and what is extra
large or extra small.
Rottweilers are tall dogs.
And Dach shunds are a bit short ... but don't tell them!
But ... there is a small change
with Sample Data!
Our example was for a Population (the 5 dogs were the only
dogs we were interested in).
But if the data is a Sample (a selection taken from a
bigger Population), then the calculation changes!
When you have "N" data
values that are:
- The Population: divide by N when calculating Variance (like we did)
- A Sample: divide by N-1 when calculating Variance
All other calculations stay the same, including how we calculated the mean.
Example: if our 5 dogs we re just a sample of a bigger population of
dogs, we would divide by 4 instead of 5 like this:
Sample Variance = 108.520 / 4
= 27.130
Sample Standard Deviation =
√27.134 = 164 (to the nearest mm). Think of it as a
"correction" when your data is only a sample.
FORMULA
|
The "Population
Standard Deviation":
|
|
|
|
The "Sample
Standard Deviation":
|
|
|
Looks
complicated, but the important change is to
divide by N-1 (instead of N) when calculating a Sample Variance.
*Footnote:
Why square the differences?
If we just
added up the differences from the mean ... the negatives would cancel the
positives:
|
|
|
|
|4| + |4| + |-4| + |-4|
|
=
|
4 + 4 + 4 + 4
|
= 4
|
|
|
|
4
|
4
|
|
That looks
good (and is the Mean Deviation), but what about this case:
|
|
|
|
|7| + |1| + |-6| + |-2|
|
=
|
7 + 1 + 6 + 2
|
= 4
|
|
|
|
4
|
4
|
|
It also gives a value of 4, Even
though the differences are more spread out!
So let us try squaring each
difference (and taking the square root at the end):
|
|
|
|
√
|
42 + 42 + 42 + 42
|
= √
|
64
|
= 4
|
|
|
|
4
|
4
|
|
|
|
|
|
√
|
72 + 12 + 62 + 22
|
= √
|
90
|
= 4,74...
|
|
|
|
4
|
4
|
|
The Standard Deviation is bigger when the differences are more spread out
... just what we want! In fact this method is a similar idea to distance between points, just applied in a different
way. And it is easier to use algebra on
squares and square roots than absolute values, which makes the standard
deviation easy to use in other areas of mathematics.
Here are further explanations
about Standard Deviation!
Here we explain the formulas.
The symbol for Standard Deviation
is σ (the Greek letter sigma).
This is the formula for Standard
Deviation:
Let us explain it step by step.
Say we have a bunch of numbers
like 9, 2, 5, 4, 12, 7, 8, 11.
To calculate the standard
deviation of those numbers:
- 1. Work out the Mean (the
simple average of the numbers)
- 2. Then for each number:
subtract the Mean and square the result
- 3. Then work out the mean of
those squared differences.
- 4. Take the square root of
that and we are done!
First, let
us have some example values to work on:
Example: Sam has 20 Rose Bushes.
The number of flowers on each
bush is. Work out the Standard Deviation!
9, 2, 5, 4, 12, 7, 8, 11, 9, 3,
7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Step 1. Work out the mean
In the formula above μ (the greek letter "mu") is the mean of all our values ...
Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
The mean is:
9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+4 / 20
= 140 / 20 = 7
So: μ = 7
Step 2. Then for each
number: subtract the Mean and square the result
This is the part of the formula
that says:
So what is xi ?
They are the individual x values 9, 2, 5, 4, 12, 7, etc...
In other words x1
= 9, x2 = 2, x3 = 5, etc.
So it says "for each value,
subtract the mean and square the result", like this
Example (continued):
(9 - 7)2 = (2)2
= 4
(2 - 7)2 = (-5)2
= 25
(5 - 7)2 = (-2)2
= 4
(4 - 7)2 = (-3)2
= 9
(12 - 7)2 = (5)2
= 25
(7 - 7)2 = (0)2
= 0
(8 - 7)2 = (1)2
= 1
Step 3. Then work out the mean of those squared differences.
To work out the mean, add up
all the values then divide by how many.
First add up all the values from
the previous step.
But how do we say "add them
all up" in mathematics? We use "Sigma": Σ
Sigma Notation
We want to add up all the values
from 1 to N, where N=20 in our case because there are 20 values:
Example (continued):
Which means:
Sum all values from (x1-7)2 to (xN-7)2
We
already calculated (x1-7)2=4 etc. in the previous step,
so just sum them up:
=
4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178
But that
isn't the mean yet, we need to divide by how many, which is simply done
by multiplying by "1/N":
Example (continued):
Mean of
squared differences = (1/20) × 178 = 8,9
(Note: this
value is called the "Variance")
Step 4. Take the
square root of that:
Example (concluded):
σ = √(8,9) =
2,983...
Sample Standard Deviation
But wait, there is more
...sometimes our data is only a sample of the whole population.
Example: Sam has 20 rose bushes, but what if Sam only counted the flowers
on 6 of them?
The "population" is all
20 rose bushes,
and the "sample" is the
6 he counted. Let us say they are:
9, 2, 5, 4, 12, 7
We can still estimate the
Standard Deviation.
But when we use the sample as an estimate
of the whole population, the Standard Deviation formula changes to this:
The formula for Sample
Standard Deviation:
The important change is "N-1"
instead of "N" (which is called "Bessel's correction").
The symbols also change to
reflect that we are working on a sample instead of the whole population:
- The mean is now x (for
sample mean) instead of μ (the population mean),
- And the answer is s
(for Sample Standard Deviation) instead of σ.
But that does not affect the
calculations. Only N-1 instead of N changes the calculations.
OK, let us now calculate the Sample
Standard Deviation:
Step 1. Work out the mean
Example 2: Using sampled values 9, 2, 5, 4, 12, 7
The mean is (9+2+5+4+12+7) / 6 = 39/6 = 6,5
So: x = 6,5
Step 2. Then for each
number: subtract the Mean and square the result
Example 2 (continued):
(9 - 6,5)2 = (2,5)2
= 6,25
(2 - 6,5)2
= (-4,5)2 = 20,25
(5 - 6,5)2
= (-1,5)2 = 2,25
(4 - 6,5)2
= (-2,5)2 = 6,25
(12 - 6,5)2
= (5,5)2 = 30,25
(7 - 6,5)2
= (0,5)2 = 0,25
Step 3. Then work out
the mean of those squared differences.
To work out the mean, add up
all the values then divide by how many. But hang on ... we are calculating the Sample
Standard Deviation, so instead of dividing by how many (N), we will divide by N-1
Example 2 (continued):
Sum = 6,25 +
20,25 + 2,25 + 6,25 + 30,25 + 0,25 = 65,5
Divide by N-1:
(1/5) × 65,5 = 13,1
(This value
is called the "Sample Variance")
Step
4. Take the square root of that:
Example 2 (concluded):
s = √(13,1)
= 3,619...
Comparing
When we used the whole population
we got: Mean = 7, Standard Deviation = 2,983...
When we used the sample we
got: Sample Mean = 6,5, Sample Standard Deviation = 3,619...
Our Sample Mean was wrong by 7%,
and our Sample Standard Deviation was wrong by 21%.
Why Would We Take a Sample?
Mostly because it is easier and
cheaper.
Imagine you want to know what the
whole country thinks ... you can't ask millions of people, so instead you ask
maybe 1.000 people.
There is a nice quote (supposed
to be by Samuel Johnson):
"You don't have to eat the whole ox to know that the meat is
tough."
This is the essential idea of sampling. To find out
information about the population (such as mean and standard deviation), we do
not need to look at all members of the population; we only need a
sample. But when we take a sample, we lose some accuracy.
Summary
|
The Population Standard Deviation:
|
|
|
|
The Sample Standard
Deviation:
|
|
|
Many experiments require
measurement of uncertainty. Standard deviation is the best way to accomplish
this. Standard deviation tells us about how the data is distributed about the
mean value.
For example, the data points 50,
51, 52, 55, 56, 57, 59 and 60 have a mean at 55 (Blue).
Another data set of 12, 32, 43,
48, 64, 71, 83 and 87. This set too has a mean of 55 (Pink).
However, it can clearly be seen that the properties of
these two sets are different. The first set is much more closely packed than
the second one. Through standard deviation, we can measure this distribution of
data about the mean.
The above example should make it clear that if the
data points are values of the same parameter in various experiments, then the
first data set is a good fit, but the second one is too uncertain. Therefore in measurement of uncertainty, standard deviation is important - the lesser the standard
deviation, the lesser this uncertainty and thus more the confidence in the experiment, and thus higher the reliability of the experiment.
Usage
The measurement of uncertainty through standard
deviation is used in many experiments of social sciences and finances. For
example, the more risky and volatile ventures have a higher standard deviation.
Also, a very high standard deviation of the results for the same survey, for example, should make one rethink about the sample size and the survey
as a whole.
In physical experiments, it is important to have a measurement of
uncertainty. Standard deviation provides a way to check the results. Very large
values of standard deviation can mean the experiment is faulty - either there
is too much noise from outside or there could be a fault in the measuring
instrument.
3. Range
In statistics, range is defined simply as the difference between the
maximum and minimum observations. It is intuitively obvious why we define range
in statistics this way - range should suggest how diversely spread out the
values are, and by computing the difference between the maximum and minimum
values, we can get an estimate of the spread of the data.
For example, suppose an experiment involves finding out the weight of lab
rats and the values in grams are 320, 367, 423, 471 and 480. In this case, the
range is simply computed as 480-320 = 160 grams.
The Range is the difference
between the lowest and highest values.
Example: In {4, 6, 9, 3, 7}
the lowest value is 3, and the highest is 9.
So the range is 9-3 = 6.
It is that simple! But perhaps too simple ...
The Range Can Be Misleading
The range can sometimes be misleading when there are extremely high or low
values.
Example: In {8, 11, 5, 9, 7, 6, 3616}:
- the lowest value is 5,
- and the highest is 3616,
So the range
is 3616-5 = 3611.
Some Limitations of Range
Range is quite a useful indication of how spread out the data is, but it has
some serious limitations. This is because sometimes data can have outliers that are widely off the other data points. In these cases, the range might
not give a true indication of the spread of data.
For example, in our previous case, consider a small
baby rat added to the data set that weighs only 50 grams. Now the range is
computed as 480-50 = 430 grams, which looks like a false indication of the
dispersion of data.
This limitation of range is to be expected primarily
because range is computed taking only two data points into consideration. Thus
it cannot give a very good estimate of how the overall data behaves.
Practical Utility of Range
In a lot of cases, however, data is closely clustered
and if the number of observations is very large, then it can give a good sense
of data distribution. For example, consider a huge survey of the IQ levels of
university students consisting of 10,000 students from different backgrounds.
In this case, the range can be a useful tool to measure the dispersion of IQ
values among university students.
Sometimes, we define range in such a way so as to
eliminate the outliers and extreme points in the data set. For example, the
inter-quartile range in statistics is defined as the difference between the
third and first quartiles. You can immediately see how this new definition of
range is more robust than the previous one. Here the outliers will not matter
and this definition takes the whole distribution of data into consideration and
not just the maximum and minimum values.
4.
Probability
In general probability is the extent to which something is probable;
the likelihood of something happening or being the case. But in research this term has a
slightly different meaning called probability level or P-values then in the end
we can conclude whether it significant or not regarding to the hypothesis in
research. Here are further explanations about it.
- P
Values
The P value or
calculated probability is the estimated probability of rejecting the null
hypothesis (H0) of a study question when that hypothesis is
true. The null hypothesis is usually an hypothesis of "no difference"
e.g. no difference between blood pressures in group A and group B. Define a
null hypothesis for each study question clearly before the start of your study.
The only situation in which you should use a one
sided P value is when a large change in an unexpected direction would have
absolutely no relevance to your study. This situation is unusual; if you are in
any doubt then use a two sided P value.
The term significance level (alpha) is used to
refer to a pre-chosen probability and the term "P value" is used to
indicate a probability that you calculate after a given study.
The alternative hypothesis (H1) is
the opposite of the null hypothesis; in plain language terms this is usually
the hypothesis you set out to investigate. For example, question is "is
there a significant (not due to chance) difference in blood pressures between
groups A and B if we give group A the test drug and group B a sugar pill?"
and alternative hypothesis is " there is a difference in blood pressures
between groups A and B if we give group A the test drug and group B a sugar
pill".
If your P value is less than the chosen significance
level then you reject the null hypothesis i.e. accept that your sample gives
reasonable evidence to support the alternative hypothesis. It does NOT imply a
"meaningful" or "important" difference; that is for you to
decide when considering the real-world relevance of your result.
The choice of significance level at which you reject H0
is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong),
1% and 0.1% (P < 0.05, 0.01 and 0.001) levels have been used. These numbers
can give a false sense of security.
In the ideal world, we would be able to define a
"perfectly" random sample, the most appropriate test and one
definitive conclusion. We simply cannot. What we can do is try to optimise all
stages of our research to minimise sources of uncertainty.
When presenting P values some groups find it helpful
to use the asterisk rating system as well as quoting the P value:
P < 0.05 *
P < 0.01 **
P < 0.001
Most authors refer to statistically significant as
P < 0.05 and statistically highly significant as P < 0.001 (less
than one in a thousand chance of being wrong).
The asterisk system avoids the woolly term "significant". Please
note, however, that many statisticians do not like the asterisk rating system
when it is used without showing P values. As a rule of thumb, if you can quote
an exact P value then do. You might also want to refer to a quoted exact P
value as an asterisk in text narrative or tables of contrasts elsewhere in a
report.
At this point, a word about error. Type I error is the false
rejection of the null hypothesis and type II error is the false
acceptance of the null hypothesis. As an aid memoir: think that our cynical
society rejects before it accepts.
The significance level (alpha) is the probability of type I error. The
power of a test is one minus the probability of type II error (beta). Power
should be maximised when selecting statistical methods. If you want to estimate
sample sizes then you must understand all of the terms mentioned here.
The following table shows the relationship between power and error in
hypothesis testing:
|
|
DECISION
|
|
TRUTH
|
Accept
H0:
|
Reject
H0:
|
|
H0 is true:
|
correct
decision P
|
type
I error P
|
|
|
1-alpha
|
alpha (significance)
|
|
|
|
|
|
H0 is false:
|
type II error P
|
correct decision P
|
|
|
beta
|
1-beta
(power)
|
|
|
|
|
|
H0 = null hypothesis
|
|
|
|
P = probability
|
|
|
You must understand confidence intervals if you
intend to quote P values in reports and papers. Statistical referees of
scientific journals expect authors to quote confidence intervals with greater prominence than P
values.
Notes about Type I error:
- is the incorrect rejection
of the null hypothesis
- maximum probability is set
in advance as alpha
- is not affected by sample
size as it is set in advance
- increases with the number of
tests or end points (i.e. do 20 rejections of H0 and 1 is
likely to be wrongly significant for alpha = 0.05)
Notes about Type II
error:
- is the incorrect acceptance
of the null hypothesis
- beta depends upon sample
size and alpha
- can't be estimated except as
a function of the true population effect
- beta gets smaller as the
sample size gets larger
- beta gets smaller as the
number of tests or end points increases
b.
Significance Level
In hypothesis testing, the significance level is the criterion used for
rejecting the null hypothesis. The significance level is used in hypothesis testing as follows:
First, the difference between the results of the experiment and the null
hypothesis is determined. Then, assuming the null hypothesis is true, the
probability of a difference that large or larger is computed . Finally, this
probability is compared to the significance level. If the probability is less
than or equal to the significance level, then the null hypothesis is rejected
and the outcome is said to be statistically significant. Traditionally, experimenters have used
either the 0.05 level (sometimes called the 5% level) or the 0.01 level (1%
level), although the choice of levels is largely subjective. The lower the
significance level, the more the data must diverge from the null hypothesis to
be significant. Therefore, the 0.01 level is more conservative than the 0.05
level. The Greek letter alpha (α) is sometimes used to indicate the
significance level. See also: Type I error and significance test
5.
Correlation Coeficient
a.
Definition
Also called coefficient of correlation is a measure of the interdependence
of two random variables
that ranges in value
from -1 to +1, indicating
perfect negative correlation
at -1, absence of correlation at zero, and
perfect positive correlation
at +1.
|
Correlation Coefficient, r :
|
·
The quantity r, called the linear correlation coefficient,
measures the strength and the direction of a linear relationship between two
variables. The linear correlation coefficient is sometimes referred to as
the Pearson product moment correlation coefficient in honor of its
developer Karl Pearson.
·
The value of r is such that
-1 < r < +1. The + and – signs are used for
positive
linear correlations and negative linear
correlations, respectively.
·
Positive correlation: If x and y
have a strong positive linear correlation, r is close
to +1. An r value of exactly
+1 indicates a perfect positive fit. Positive values
indicate a relationship between x and y
variables such that as values for x increases,
values for y also increase.
·
Negative correlation: If x and y
have a strong negative linear correlation, r is close
to -1. An r value of exactly -1
indicates a perfect negative fit. Negative values
indicate a relationship between x and y such
that as values for x increase, values
for y decrease.
·
No correlation: If there is no linear correlation or a weak
linear correlation, r is
close to 0. A value near zero means that there
is a random, nonlinear relationship
between the two variables.
·
Note that r is a dimensionless quantity; that is, it does not
depend on the units
employed.
·
A perfect correlation of ± 1 occurs only when the data
points all lie exactly on a
straight line. If r = +1, the slope of
this line is positive. If r = -1, the slope of this
line is negative.
·
A correlation greater than 0.8 is generally described as strong,
whereas a correlation less than 0.5 is generally described as weak. These values can vary
based upon the "type" of data being examined. A study utilizing
scientific data may require a stronger correlation than a study using social
science data.
(Statistics) statistics a statistic measuring the degree of
correlation between two variables as by dividing their covariance by the square root of
the product of their variances. The closer the correlation coefficient is to 1 or -1 the greater the correlation; if
it is random, the coefficient is zero.
G. CROSS TABULATION
Cross tabulation is statistical
technique that establishes an interdependent relationship between two tables of
values but does not identify a causal relationship between the values; also
called two-way tabulation. For
example, a cross tabulation
might show that cars built on Monday have more service problems than cars built
on Wednesday. Cross tabulation can be used to analyze the results of a consumer
survey that, for example, indicates a preference for certain advertisements
based on which part of the country the consumer resides in. Cross tabulation
is a statistical tool that is used to analyze categorical data. Cross-tabulation is about taking two variables
and tabulating the results of one variable against the other variable. An
example would be the cross-tabluation of course performance against mode of
study:
|
HD
|
D
|
C
|
P
|
NN
|
|
FT - Internal
|
10
|
15
|
18
|
33
|
8
|
|
PT Internal
|
3
|
4
|
8
|
15
|
10
|
|
External
|
4
|
3
|
12
|
15
|
6
|
Each individual would
have had a recorded mode of study (the rows of the table) and performance on
the course (the columns of the table). For each indivdual, those pairs of
values have been entered into the appropriate cell of the table.
A cross-tabulation
gives you a basic picture of how two variables inter-relate. It helps you
search for patterns of interaction. Obviously, if certain cells contain
disproportionately large (or small) numbers of cases, then this suggests that
there might be a pattern of interaction.
In the table above,
the basic pattern is what you would expect as a teacher but, at a general
level, it says that the bulk of students get a P rating independant of mode of
study. What we normally do is to calculate the Chi-square statistic to see if this pattern has any
substantial relevance.
In statistics, a contingency table (also referred to as cross
tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency
distribution of the
variables. They are heavily used in survey research, business intelligence,
engineering and scientific research. They provide a basic picture of the
interrelation between two variables and can help find interactions between
them. The term contingency table
was first used by Karl Pearson in "On the Theory of Contingency and
Its Relation to Association and Normal Correlation", part of the Drapers' Company Research Memoirs
Biometric Series I
published in 1904.
A crucial problem of multivariate
statistics is finding
(direct-)dependence structure underlying the variables contained in
high-dimensional contingency tables. If some of the conditional
independences are
revealed, then even the storage of the data can be done in a smarter way (see
Lauritzen (2002)). In order to do this one can use information
theory concepts, which
gain the information only from the distribution of probability, which can be
expressed easily from the contingency table by the relative frequencies.
Example
Suppose that we have two variables, sex (male
or female) and handedness (right- or left-handed). Further suppose
that 100 individuals are randomly sampled from a very large population as part
of a study of sex differences in handedness. A contingency table can be created
to display the numbers of individuals who are male and right-handed, male and
left-handed, female and right-handed, and female and left-handed. Such a
contingency table is shown below.
|
Right-handed
|
Left-handed
|
Total
|
|
Males
|
43
|
9
|
52
|
|
Females
|
44
|
4
|
48
|
|
Totals
|
87
|
13
|
100
|
The numbers of the males, females, and right-
and left-handed individuals are called marginal totals. The grand total, i.e., the total number of
individuals represented in the contingency table, is the number in the bottom
right corner.
The table allows us to see at a glance that
the proportion of men who are right-handed is about the same as the proportion
of women who are right-handed although the proportions are not identical. The significance of the difference between the two
proportions can be assessed with a variety of statistical tests including Pearson's
chi-squared test, the G-test, Fisher's
exact test, and Barnard's test, provided the entries in the table represent
individuals randomly sampled from the population about which we want to draw a
conclusion. If the proportions of individuals in the different columns vary
significantly between rows (or vice versa), we say that there is a contingency between the two
variables. In other words, the two variables are not independent. If there is no contingency, we say that the two
variables are independent.
The example above is the simplest kind of
contingency table, a table in which each variable has only two levels; this is
called a 2 × 2 contingency table. In principle, any number of rows
and columns may be used. There may also be more than two variables, but higher
order contingency tables are difficult to represent on paper. The relation
between ordinal
variables, or between
ordinal and categorical variables, may also be represented in contingency
tables, although such a practice is rare.
3. INSTRUMENT
Instrument is the generic term that researchers use for a
measurement device (survey, test, questionnaire, etc). To help distinguish
between instrument and instrumentation, consider that the instrument is the device and instrumentation is the course of
action (the process of developing, testing, and using the device).
Instruments fall into two broad categories,
researcher-completed and subject-completed, distinguished by those instruments
that researchers administer versus those that are completed by participants.
Researchers chose which type of instrument, or instruments, to use based on the
research question. Examples are listed below:
|
Researcher-completed
Instruments
|
Subject-completed Instruments
|
|
Rating scales
|
Questionnaires
|
|
Interview schedules/guides
|
Self-checklists
|
|
Tally sheets
|
Attitude scales
|
|
Flowcharts
|
Personality inventories
|
|
Performance checklists
|
Achievement/aptitude tests
|
|
Time-and-motion logs
|
Projective devices
|
|
Observation forms
|
Sociometric devices
|
4. RELIABILITY
Reliability is the
degree to which an assessment tool produces stable and consistent results.
Types of Reliability
:
·
Test-retest
reliability is a measure of reliability obtained by administering
the same test twice over a period of time to a group of individuals. The
scores from Time 1 and Time 2 can then be correlated in order to evaluate the
test for stability over time.
Example: A test designed to
assess student learning in psychology could be given to a group of students
twice, with the second administration perhaps coming a week after the
first. The obtained correlation coefficient would indicate the stability
of the scores.
·
Parallel forms
reliability is a measure of reliability obtained by administering
different versions of an assessment tool (both versions must contain items that
probe the same construct, skill, knowledge base, etc.) to the same group of
individuals. The scores from the two versions can then be correlated in
order to evaluate the consistency of results across alternate versions.
Example: If you wanted to
evaluate the reliability of a critical thinking assessment, you might create a
large set of items that all pertain to critical thinking and then randomly
split the questions up into two sets, which would represent the parallel forms.
·
Inter-rater
reliability is a measure of reliability used to assess the degree
to which different judges or raters agree in their assessment decisions.
Inter-rater reliability is useful because human observers will not necessarily
interpret answers the same way; raters may disagree as to how well certain
responses or material demonstrate knowledge of the construct or skill being assessed.
Example: Inter-rater
reliability might be employed when different judges are evaluating the degree
to which art portfolios meet certain standards. Inter-rater reliability
is especially useful when judgments can be considered relatively subjective.
Thus, the use of this type of reliability would probably be more likely when
evaluating artwork as opposed to math problems.
·
Internal consistency
reliability is a measure of reliability used to evaluate the
degree to which different test items that probe the same construct produce
similar results.
Average inter-item
correlation is a subtype of internal consistency reliability. It is
obtained by taking all of the items on a test that probe the same construct
(e.g., reading comprehension), determining the correlation coefficient for each
pair of items, and finally taking the average of all of these
correlation coefficients. This final step yields the average inter-item
correlation.
·
Split-half
reliability is another subtype of internal consistency
reliability.The process of obtaining split-half reliability is begun by
“splitting in half” all items of a test that are intended to probe the same
area of knowledge (e.g., World War II) in order to form two “sets” of
items. The entire test is administered to a group of individuals,
the total score for each “set” is computed, and finally the split-half
reliability is obtained by determining the correlation between the two total
“set” scores.
3.
VALIDITY
Validity refers to how well a test
measures what it is purported to measure.
Types of Validity :
·
Face Validity ascertains that the measure
appears to be assessing the intended construct under study. The stakeholders
can easily assess face validity. Although this is not a very “scientific” type
of validity, it may be an essential component in enlisting motivation of
stakeholders. If the stakeholders do not believe the measure is an accurate
assessment of the ability, they may become disengaged with the task.
Example: If a measure of art
appreciation is created all of the items should be related to the different
components and types of art. If the questions are regarding historical
time periods, with no reference to any artistic movement, stakeholders may not
be motivated to give their best effort or invest in this measure because they
do not believe it is a true assessment of art appreciation.
·
Construct Validity is used to ensure
that the measure is actually measure what it is intended to measure (i.e. the
construct), and not other variables. Using a panel of “experts” familiar with
the construct is a way in which this type of validity can be assessed. The
experts can examine the items and decide what that specific item is intended to
measure. Students can be involved in this process to obtain their
feedback.
Example: A women’s studies
program may design a cumulative assessment of learning throughout the
major. The questions are written with complicated wording and
phrasing. This can cause the test inadvertently becoming a test of
reading comprehension, rather than a test of women’s studies. It is
important that the measure is actually assessing the intended construct, rather
than an extraneous factor.
·
Criterion-Related
Validity is used to predict future or current performance - it correlates test
results with another criterion of interest.
Example: If a physics program
designed a measure to assess cumulative student learning throughout the
major. The new measure could be correlated with a standardized measure of
ability in this discipline, such as an ETS field test or the GRE subject test.
The higher the correlation between the established measure and new measure, the
more faith stakeholders can have in the new assessment tool.
·
Formative Validity when applied to outcomes
assessment it is used to assess how well a measure is able to provide
information to help improve the program under study.
Example: When
designing a rubric for history one could assess student’s knowledge across the
discipline. If the measure can provide information that students are
lacking knowledge in a certain area, for instance the Civil Rights Movement,
then that assessment tool is providing meaningful information that can be used
to improve the course or program requirements.
·
Sampling Validity (similar to content
validity) ensures that the measure covers the broad range of areas within the
concept under study. Not everything can be covered, so items need to be
sampled from all of the domains. This may need to be completed using a
panel of “experts” to ensure that the content area is adequately sampled.
Additionally, a panel can help limit “expert” bias (i.e. a test reflecting what
an individual personally feels are the most important or relevant areas).
Example: When designing an
assessment of learning in the theatre department, it would not be sufficient to
only cover issues related to acting. Other areas of theatre such as
lighting, sound, functions of stage managers should all be included. The
assessment should reflect the content area in its entirety.
I. RESEARCH
QUESTION AND HYPOTHESIS
1. RESEARCH QUESTION
In a research
proposal, the function of the research questions is to explain specifically
what your study will attempt to learn or understand. In a research design , the
research questions serve two other vital functions: to help researcher to focus
the study and to give guidance for how to conduct it. Research questions are
fundamental to starting a thesis writing.
The research question serves two
purposes:
· it determines where and what kind of
research the writer will be looking for and
· it identifies the specific objectives the
study or paper will address.
Therefore, the writer must first identify
the type of study (qualitative, quantitative, or mixed) before the research
question is developed.
A qualitative study seeks to learn why or
how, so the writer’s research must be directed at determining the what, why and
how of the research topic. Therefore, when crafting a research question for a
qualitative study, the writer will need to ask a why or how question about the
topic. For example: How did the company successfully market its new product?
The sources needed for qualitative research typically include print and
internet texts (written words), audio and visual media.
A
quantitative study seeks to learn where, or when, so the writer’s research must
be directed at determining the where, or when of the research topic. Therefore,
when crafting a research question for a quantitative study, the writer will
need to ask a where, or when question about the topic. For example: Where
should the company market its new product? Unlike a qualitative study, a
quantitative study is mathematical analysis of the research topic, so the
writer’s research will consist of numbers and statistics.
Quantitative studies also fall into two
categories:
· Correlational studies: A correlational
study is non-experimental, requiring the writer to research relationships
without manipulating or randomly selecting the subjects of the research. The
research question for a correlational study may look like this: What is the
relationship between long distance commuters and eating disorders?
· Experimental studies: An experimental
study is experimental in that it requires the writer to manipulate and randomly
select the subjects of the research. The research question for an experimental
study may look like this: Does the consumption of fast food lead to eating
disorders?
A mixed
study integrates both qualitative and quantitative studies, so the writer’s
research must be directed at determining the why or how and the what, where, or
when of the research topic. Therefore, the writer will need to craft a research
question for each study required for the assignment. Note: A typical study may be expected to have between 1 to 6
research questions.
Once the writer has determined the type of study to be used and the
specific objectives the paper will address, the writer must also consider
whether the research question passes the ‘so what’ test. The ‘so what’ test
means that the writer must construct evidence to convince the audience why the
research is expected to add new or useful knowledge to the literature.
Once you have
decided on your research topic, the group needs to agree a specific research
question. You could repeat the activity described in the previous section
to help them to generate ideas and then agree the final research question.
One note of caution:
it is common to want to choose a very broad research question. Help your team
to resist this temptation by refining any broad question into a series of
smaller, manageable ones. You may find it helpful to discuss these questions
with the young researchers:
·
What is the key thing you want to find out?
·
Can you answer the question within the time and resources available?
·
Will you be able to collect the data needed to answer the question? Can
you access the people you need to collect data from? Will people be willing to
talk to you about your chosen research topic (for example, if it is
controversial or sensitive)?
·
Has the question already been answered by other researchers? Reading
around the literature will help you to find this out.
·
Will the answer to the question be genuinely useful? Does it have the
potential to have an impact and effect change?
It is also worth
thinking about what the answers to the question might be – will they be useful
and have an impact or could there be negative consequences to investigating a
particular issue?
It is best to define
any key terms in your research project or question upfront, so that everyone
has a shared understanding. You will be able to find ideas for definitions by
reading around the topic. You can find helpful literature on almost any subject
imaginable by consulting a library catalogue or internet searching.
2.
HYPOTHESES
A
research hypothesis is the statement created by researchers when they speculate
upon the outcome of a research or experiment. Hypothesis
is a formal statement that presents the expected relationship between an
independent and dependent variable. A Hypothesis, a suggested answer to the
problem, has the following characteristics:
·
It should conjecture upon a relationship
between two or more variables
·
It should be stated clearly and
unambiguously in the form of a declarative sentence
·
It should be testable, that is it should
be possible to restate it in an operational form that can then be evaluated on
data.
TYPES OF HYPOTHESES
a. Null
Hypotheses
A null
hypothesis is a hypothesis that proposes no relationship or difference
between two variables. "In the standard hypothesis-testing approach to
science one attempts to demonstrate the falsity of the null hypothesis, leaving
one with the implication that the alternative, mutually exclusive, hypothesis
is the acceptable one." (Reber, 1985, p. 337). A null hypothesis is
"the hypothesis that there is no relationship between two or more
variables, symbolized as H0" (Rosenthal & Rosnow, 1991, p.
624).
For example, we may want to
investigate the claim that despite what convention has told us, the mean adult
body temperature is not the accepted value of 98.6 degrees Fahrenheit. The null
hypothesis for an experiment to investigate this is “The mean adult body
temperature is 98.6 degrees Fahrenheit.” If we fail to reject the null
hypothesis, then our working hypothesis remains that the average adult has
temperature of 98.6 degrees.
If we are studying a new
treatment, the null hypothesis is that our treatment will not change our
subjects in any meaningful way.
b. Alternative
Hypotheses
The alternative or experimental
hypothesis reflects that there will be an observed effect for our experiment.
In a mathematical formulation of the alternative hypothesis there will
typically be an inequality, or not equal to symbol. This hypothesis is denoted
by either Ha or by H1.
The alternative hypothesis is what we are attempting to demonstrate in
an indirect way by the use of our hypothesis test. If the null hypothesis is
rejected, then we accept the alternative hypothesis. If the null hypothesis is
not rejected, then we do not accept the alternative hypothesis. Going back to
the above example of mean human body temperature, the alternative hypothesis is
“The average adult human body temperature is not 98.6 degrees Fahrenheit.”
If we are studying a new treatment, then the alternative hypothesis is
that our treatment does in fact change our subjects in a meaningful and
measureable way.
J.
SAMPLING
A population is any
entire collection of people, animals, plants or things from which we may
collect data. It is the entire group we are interested in, which we wish to
describe or draw conclusions about.
In order to make any
generalizations about a population, a sample, that is meant to be
representative of the population, is often studied. For each population there
are many possible samples. A sample statistic gives information about a
corresponding population parameter. For example, the sample mean for a set of
data would give information about the overall population mean.
It is important that
the investigator carefully and completely defines the population before
collecting the sample, including a description of the members to be included.
Example
The population for a study of infant health might be all children born in the
UK in the 1980's. The sample might be all babies born on 7th May in any of the
years.
A sample is a group of
units selected from a larger group (the population). By studying the sample it
is hoped to draw valid conclusions about the larger group.
A sample is generally
selected for study because the population is too large to study in its
entirety. The sample should be representative of the general population. This
is often best achieved by random sampling. Also, before collecting the sample,
it is important that the researcher carefully and completely defines the
population, including a description of the members to be included.
Example
The population for a study of infant health might be all children born in the
UK in the 1980's. The sample might be all babies born on 7th May in any of the
years.
K.
SAMPLING PROCEDURE
Sampling is a procedure by which we can infer the
characteristic of a large body of people (called a population) by talking with
only a few (a sample).
NON-PROBABILISTIC SAMPLING PROCEDURE
|
PROBABILISTIC SAMPLING PROCEDURE
|
There
are several different sampling techniques / procedures available like shown
above.
1. Simple Random Sampling
In
this case each individual is chosen entirely by chance and each member of the
population has an equal chance, or probability, of being selected. One way of
obtaining a random sample is to give each individual in a population a number,
and then use a table of random numbers to decide which individuals to include.1
2. Systematic Sampling
Individuals
are selected at regular intervals from a list of the whole population. The
intervals are chosen to ensure an adequate sample size. For example, every 10th
member of the population is included. This is often convenient and easy to use,
although it may also lead to bias for reasons outlined below.
3. Stratified Sampling
In
this method, the population is first divided into sub-groups (or strata) who
all share a similar characteristic. It is used when we might reasonably expect
the measurement of interest to vary between the different sub-groups. Gender or
smoking habits would be examples of strata. The study sample is then obtained
by taking samples from each stratum.
In
a stratified sample, the probability of an individual being included varies
according to known characteristics, such as gender, and the aim is to ensure
that all sub-groups of the population that might be of relevance to the study
are adequately represented.1
The
fact that the sample was stratified should be taken into account at the
analysis stage.
4. Clustered Sampling
In
a clustered sample, sub-groups of the population are used as the sampling unit,
rather than individuals. The population is divided into sub-groups, known as
clusters, and a selection of these are randomly selected to be included in the
study. All members of the cluster are then included in the study. Clustering
should be taken into account in the analysis.
The
General Household survey, which is undertaken annually in England, is a good
example of a cluster sample. All members of the selected households/ clusters
are included in the survey.
5. Quota or Proportional Sampling
This
method of sampling is often used by market researchers. Interviewers are given
a quota of subjects of a specified type to attempt to recruit. For example, an
interviewer might be told to go out and select 20 adult men and 20 adult women,
10 teenage girls and 10 teenage boys so that they could interview them about their
television viewing. There are several flaws with this method, but most
importantly it is not truly random.
6. Convenience Sampling
Convenience
sampling is perhaps the easiest method of sampling, because participants are
selected in the most convenient way, and are often allowed to choose or
volunteer to take part. Good results can be obtained, but the data set may be
seriously biased, because those who volunteer to take part may be different
from those who choose not to.
7. Snowball Sampling
This
method is commonly used in social sciences when investigating hard to reach
groups. Existing subjects are asked to nominate further subjects known to them,
so the sample increases in size like a rolling snowball. For example, when
carrying out a survey of risk behaviors amongst intravenous drug users,
participants may be asked to nominate other users to be interviewed.
8.
Judgment or Purposive Sampling
Purposive sampling represents a group of different non-probability
sampling techniques. Also
known as judgmental, selective or subjective sampling, purposive sampling relies on the judgment of the researcher when it comes to selecting the units (e.g., people, cases/organizations, events, pieces of data)
that are to be studied. Usually, the sample being investigated is quite small,
especially when compared with probability
sampling techniques.
The main goal of purposive sampling is to focus on particular
characteristics of a population that are of interest, which will best enable
you to answer your research questions. The sample being studied is not
representative of the population, but for researchers pursuing qualitative or mixed
methods research designs, this
is not considered to be a weakness. Rather, it is a choice, the purpose of
which varies depending on the type of purposing sampling technique that is used.
L. HOMOGENITY VARIANCE
Homogeneity
variance is a major assumption underlying the validity of many parametric
tests. More importantly, it serves as the null hypothesis in substantive
studies that focus on cross or within-group dispersion. The statistical
validity of many commonly used tests such as the t-test and ANOVA depends on
the extent to which the data conform to the assumption of homogeneity variance
(HOV). Condition in which all the variables in a sequence have the same
finite, or limited, variance. When homogeneity of
variance is determined to hold true for a statistical model, a simpler statistical or
computational approach to analyzing the data may be used due to a low level of uncertainty in the data. When a research design involves groups that
have very different variances, the p value accompanying the test statistics,
such as t and F, may be too lenient or too harsh. Furthermore, substantive
research often requires investigation of cross –or within-group fluctuation in
dispersion. For example, in quality control research, HOV test are often “a
useful endpoint in an analysis” (Conover, Johnson & Johnson, 1981, p.351).
M.
CONSTRUCTING QUESTIONNARE
Questionnaire is a
survey instrument containing the questions in a self administered survey. Survey
questions are answered as part of a questionnaire (or interview schedule, as it
is sometimes called in interview-based studies). The context created by the
questionnaire has a major impact on how individual questions are interpreted
and answered. As a result, survey researchers must carefully design the
questionnaire as well as individual questions. There is no precise formula for
a well-designed questionnaire. Nonetheless, some key principles should guide
the design of any questionnaire, and some systematic procedures should be
considered for refining it.
1. Maintain
Consistent Focus
A
survey should be guided by a clear conception of the research problem under
investigation and the population to be sampled. Throughout the process of
questionnaire design, the research objective should be the primary basis for
making decisions about what to include and exclude and what to emphasize or
treat in a cursory fashion. The questionnaire should be viewed as an integrated
whole, in which each section and every question serve a clear purpose related
to the study’s objective and each section complements other sections
2. Build
on Existing Instruments
Surveys
often include irrelevant questions and fail to include questions that, the
researchers realize later, are crucial. One way to ensure that possibly relevant
questions are asked is to use questions suggested by prior research, theory,
experience, or experts (including participants) who are knowledgeable about the
setting under investigation. If another researcher already has designed a set
of questions to measure a key concept, and evidence from previous surveys
indicates that this measure is reliable and valid, then, by all means, use that
instrument. Resources such as the Handbook of ResearchDesign and Social
Measurement (Miller & Salkind, 2002) can give you many ideas about existing
instruments; your literature review at the start of a research project should
be an even better source. But there is a trade-off here. Questions used
previously may not concern quite the right concept or may not be appropriate in
some ways to your population. So even though using a previously designed and
wellregarded instrument may reassure other researchers, it may not really be
appropriate for your own specific survey. A good rule of thumb is to use a
previously designed instrument if it measures the concept of concern to you and
if you have no clear reason for thinking it is inappropriate with your survey
population.
3. Refine
and Test Questions The only good question is a pretested question. Before you
rely on a question in your research, you need evidence that your respondents
will understand what it means. So try it out on a few people. One important
form of pretesting is discussing the questionnaire with colleagues. You can
also review prior research in which your key questions have been used. Forming
a panel of experts to review the questions can also help. For a student
research project, “experts” might include a practitioner who works in a setting
like the one to be surveyed, a methodologist, and a person experienced in questionnaire
design. Another increasingly popular form of pretesting comes from guided
discussions among potential respondents. Such “focus groups” let you check for
consistent understanding of terms and to identify the range of events or
experiences about which people will be asked to report. By listening to and
observing the focus group discussions, researchers can validate their
assumptions about what level of vocabulary is appropriate and what people are
going to be reporting (Nassar-McMillan & Borders, 2002). Professional
survey researchers also use a technique for improving questions called the
cognitive interview (Dillman, 2007). Although the specifics vary, the basic
approach is to ask people, ideally individuals who reflect the proposed survey
population, to “think aloud” as they answer questions. The researcher asks a
test question, then probes with follow-up questions about how the respondent
understood the question, how confusing it was, and so forth. This method can
identify many problems with proposed questions. Conducting a pilot study is the
final stage of questionnaire preparation. Complete the questionnaire yourself
and then revise it. Next, try it out on some colleagues or other friends, and
revise it again. For the actual pretest, draw a small sample of individuals
from the population you are studying, or one very similar to it, and try out
the survey procedures with them, including mailings if you plan to mail your
questionnaire and actual interviews if you plan to conduct in-person interviews.
Which pretesting method is best? Each has unique advantages and disadvantages.
Simple pretesting is the least reliable but may be the easiest to undertake.
Focus groups or cognitive interviews are better for understanding the bases of
problems with particular questions. Review of questions by an expert panel
identifies the greatest number of problems with questions (Presser & Blair,
1994).
4. Order
the Questions The sequence of questions on a survey matters. As a first step,
the individual questions should be sorted into broad thematic categories, which
then become separate sections in the questionnaire. For example, the 2000.
5. National
Survey Mathematics Questionnaire contained five sections: Teacher Opinions,
Teacher Background, Your Mathematics Teaching in a Particular Class, Your Most
Recent Mathematics Lesson in This Class, and Demographic Information. Both the
sections and the questions within the sections must be organized in a logical
order that would make sense in a conversation. The first question deserves
special attention, particularly if the questionnaire is to be
self-administered. This question signals to the respondent what the survey is
about, whether it will be interesting, and how easy it will be to complete
(“Overall, would you say that your current teaching situation is excellent,
good, fair, or poor?”). The first question should be connected to the primary
purpose of the survey; it should be interesting, it should be easy, and it
should apply to everyone in the sample (Dillman, 2007). Don’t try to jump right
into sensitive issues (“In general, what level of discipline problems do you
have in your classes?”); respondents have to “warm up” before they will be
ready for such questions. Question order can lead to context effects when one
or more questions inf luence how subsequent questions are interpreted (Schober,
1999). Prior questions can influence how questions are comprehended, what
beliefs shape responses, and whether comparative judgments are made
(Tourangeau, 1999). The potential for context effects is greatest when two or
more questions concern the same issue or closely related issues. Often,
respondents will try to be consistent with their responses, even if they really
do not mean the response. Whichever type of information a question is designed
to obtain, be sure it is asked of only the respondents who may have that
information. If you include a question about job satisfaction in a survey of
the general population, first ask respondents whether they have a job. These
filter questions create skip patterns. For example, respondents who answer no
to one question are directed to skip ahead to another question, but respondents
who answer yes go on to the contingent question. Skip patterns should be
indicated clearly with arrows or other marks in the questionnaire, as
demonstrated in Exhibit 8.2. Some questions may be presented in a “matrix”
format. Matrix questions are a series of questions that concern a common theme
and that have the same response choices. The questions are written so that a
common initial phrase applies to each one (see Exhibit 8.4). This format
shortens the questionnaire by reducing the number of words that must be used
for each question. It also emphasizes the common theme among the questions and
so invites answering each question in relation to other questions in the
matrix. It is very important to provide an explicit instruction to “Check one
response on each line” in a matrix question because some respondents will think
that they have completed the entire matrix after they have responded to just a
few of the specific questions.
6. Make
the Questionnaire Attractive
An
attractive questionnaire—neat, clear, clean, and spacious—is more likely to be
completed and less likely to confuse either the respondent or, in an interview,
the interviewer. An attractive questionnaire does not look cramped; plenty of
“white space”—more between questions than within question components—makes the
questionnaire appear easy to complete. Response choices are listed vertically
and are distinguished clearly and consistently, perhaps by formatting them in
all capital letters and keeping them in the middle of the page. Skip patterns
are indicated with arrows or other graphics. Some distinctive type of
formatting should also be used to identify instructions. Printing a multipage
questionnaire in booklet form usually results in the most attractive and
simple-to-use questionnaire (Dillman, 2000, pp. 80–86).
7. Write
Clear Questions
All
hope for achieving measurement validity is lost unless the questions in a
survey are clear and convey the intended meaning to respondents. You may be
thinking that you ask people questions all the time and have no trouble
understanding the answers you receive, but you may also remember
misunderstanding or being confused by some questions. Consider just a few of
the differences between everyday conversations and standardized surveys: •
Survey questions must be asked of many people, not just one person. • The same
survey question must be used with each person, not tailored to the specifics of
a given conversation. • Survey questions must be understood in the same way by
people who differ in many ways. • You will not be able to rephrase a survey
question if someone doesn’t understand it because that would result in a different
question for that person. • Survey respondents don’t know you and so can’t be
expected to share the nuances of expression that help you and your friends and
family to communicate. Question writing for a particular survey might begin
with a brainstorming session or a review of previous surveys. Then, whatever
questions are being considered must be systematically evaluated and refined.
Every question that is considered for inclusion must be reviewed carefully for
its clarity and ability to convey the intended meaning. Questions that were
clear and meaningful to one population may not be so to another. Nor can you
simply assume that a question used in a previously published study was
carefully evaluated. Adherence to a few basic principles will go a long way
toward developing clear and meaningful questions.
8. Avoid
Confusing Phrasing
In
most cases, a simple direct approach to asking a question minimizes confusion.
Use shorter rather than longer words: brave rather than courageous; job
concerns rather than work-related employment issues (Dillman, 2000). Use
shorter sentences when you can. A lengthy question often forces respondents to
“work hard,” that is, to have to read and reread the entire question. Lengthy
questions can go unanswered or can be given only a cursory reading without much
thought.
9. Avoid
Vagueness
Questions
should not be abbreviated in a way that results in confusion. The simple
statement
Residential
location _____________________
does
not provide sufficient focus; rather, it is a general question when a specific
kind of answer is desired. There are many reasonable answers to this question,
such as Silver Lake (a neighborhood), Los Angeles (a city), or Forbes Avenue (a
street). Asking, “In what neighborhood of Los Angeles do you live?” provides
specificity so that respondents understand that the intent of the question is
about their neighborhood. It is particularly important to avoid vague language;
there are words whose meaning may differ from respondent to respondent. The
question Do you usually or occasionally attend our school’s monthly
professional development workshops?Chapter 8 Survey Research 167 will not
provide useful information, for the meaning of usually or occasionally can
differ for each respondent. A better alternative is to define the two terms
such as usually (6 to 12 times a year) and occasionally (2 to 5 times a year).
A second option is to ask respondents how many times they attended professional
development sessions in the past year; the researcher can then classify the responses
into categories.
10. Provide
a Frame of Reference Questions often require a frame of reference that provides
specificity about how respondents should answer the question. The question
Overall, the performance of this principal is
____ Excellent
____ Good
____ Average
____ Poor
lacks
a frame of reference. In this case, the researcher does not know the basis of
comparison the respondent is using. Some respondents may compare the principal
to other principals, whereas some respondents may use a personal “absolute
scale” about a principal’s performance. To avoid this kind of confusion, the
basis of comparison should be specifically stated in the question: “Compared
with other principals you are familiar with, the performance of this principal
is. . . .”
11. Avoid
Negative Words and Double Negatives
Try
answering, “Do you disagree that mathematics teachers should not be required to
be observed by their supervisor if they have a master’s degree?” Respondents
have a hard time figuring out which response matches their sentiments because
the statement is written as a double negative. Such errors can easily be
avoided with minor wording changes: “Should mathematics teachers with a
master’s degree still be observed by their supervisor?” To be safe, it’s best
just to avoid using negative words such as don’t and not in questions.
12. Avoid
Double-Barreled Questions Double-barreled questions produce uninterpretable
results because they actually ask two questions but allow only one answer. For
example, the question “Do you support increased spending on schools and social
services?” is really asking two questions—one about support for schools and one
about support for social services. It is perfectly reasonable for someone to
support increased spending on schools but not on social services. A similar
problem can also show up in response categories.
13. Minimize
the Risk of Bias Specific words in survey questions should not trigger biases,
unless that is the researcher’s conscious intent. Such questions are referred
to as leading questions because they lead the respondent to a particular
answer. Biased or loaded words and phrases tend to produce misleading answers.
Some polls ask obviously loaded questions, such as “Isn’t it time for Americans
to stand up for morality and stop the shameless degradation of the airwaves?”
Especially when describing abstract ideas (e.g., “freedom” “justice,”
“fairness”), your choice of words dramatically affect how respondents answer.
Take the difference between “welfare” and “assistance for the poor.” On
average, surveys have found that public support for “more assistance for the poor” is about 39
points higher than for “welfare” (Smith, 1987). Most people favor helping the
poor; most people oppose welfare. So the terminology a survey uses to describe
public assistance can bias survey results quite heavily. Responses can also be
biased when response alternatives do not reflect the full range of possible
sentiment on an issue. When people pick a response choice, they seem to be
influenced by where they are placing themselves relative to the other response
choices. A similar bias occurs when some but not all possible responses are
included in the question. “What do you like about your community, such as the
parks and the schools?” focuses respondents on those categories, and other
answers may be ignored. It is best left to the respondent to answer the
question without such response cues.
14. Social
Desirability
Social
desirability is the tendency for individuals to respond in ways that make them
appear in the best light to the interviewer. When an illegal or socially
disapproved behavior or attitude is the focus, we have to be concerned that
some respondents will be reluctant to agree that they have ever done or thought
such a thing. In this situation, the goal is to write a question and response
choices that make agreement seem more acceptable. For example, it would
probably be better to ask, “Have you ever been suspended for a violation of
school rules?” rather than “Have you ever been identified as a troublemaker by
your principal?” Asking about a variety of behaviors or attitudes that range
from socially acceptable to socially unacceptable will also soften the impact
of agreeing with those that are socially unacceptable.
15. Use
Likert-Type Response
Categories
Likert-type responses generally ask respondents to indicate the extent to which
they agree or disagree with statements. The response categories list choices
for respondents to select their level of agreement with a statement from
strongly agree to strongly disagree. The questions in Exhibit 8.4 have
Likert-type response categories.