What is the most important validity for classroom teachers?
What can you NOT tell from correlation between two variables?
cause and effect(can only predict)
What are you looking for when interpreting a band chart?
seeing which bars overlap and which do not
What is test-retest reliability
2 scores from same person on same test at diff times-indicates stability
What are the 3 characteristics of a good obj?
specific, observable , measurable
What is standard deviation?
most accurate measure of variability-includes all scores in a distribution-estimate of variability
How many items per obj does crit. ref usually have?
3+ per obj-emphasis on figuring out where response is lacking-shows mastery
What is curvilinearity?
two variables move same at first, to a point, then show negative relationshipEx: drinking and feeling good, to a point, then you feel sickso (+)alchohol = (+)feelingsthen (+)alchohol = (-)feelings
What validity is it when you compare test items to objectives?
What are the 3 different ways to grade a performance assessment?
Holistic (don’t need to know)-Checklist-Rating scales
What is “obtained score”?
= true score (+) and (-) error scoredifference is error
The less variability in a test, the lower the…
reliability-shorter tests have lower reliability -not enough items to measure more accurately
How do you get away from multiple choice tests?
What is criterion-related validity?
how well tests scores correlate to other tests of the same thing-requires correlation coefficient to be computed
What validity is it when SAT scores are correlated with GPA?
What is negative correlation?
2 variable move in opposite directions-as A increases B decreases or vice versa-think of negative as BAD relationship leading to a breakup
What can you tell if you calculate less error on one test than another?
less error = more accurate measure
Is there a “one size fits all” test in regards to validity and reliability?
No-validity and reliability may only be appropriate for specific population, when administered by competent user
What is positive correlation?
both variables involved move in same direction-as A increases, B increasesthink of positive as a good relationship since they move together
In band interpretation, how do you find if a difference in scores is related to chance?
when there is overlap
If my watch is on 2 oclock but is broken, then the watch is..
Reliable but not valid(always know it will be on 2oclock, but it is only true 2 times each day)
Jared has obtained score of 86, we can be 99% sure his true score lies between ___ and ___
77 and 953SD x 3 = 986 + 9 = 9586 – 9 = 77
Which is narrower? Norm or crit referencing
Criterion refcovers few objects because you want to know how well they mastered each one
If you give a condition in an objective, where else must you give that condition?
in the test-same condition, same materials, same accuracy
What are some disadvantages of performance assessments?
-unReliabile-subjective due to outside influences-time consuming
What is the rationale behind performance assessments?
It provides a direct measure of abilities.
Is “with 100% accuracy” a condition?
No, it does not provide a context like “with a map” or something.
What type of referencing is most appropriate for broad objectives?
If student has raw score of 80, and SEM is 3, and want to be 99% confident of Trues score range, what would that range be?
89-71 (3 SD’s, times 3, then plus and minus)
When is it okay to put opinion in a True/False question?
If you attribute it to a source.”according to so and so…
What should you avoid when writing a T/F question?
What is the best way to test organizational thinking skills?
What kind of test item is best for assessing high level thinking skills?
What kind of test item are students most likely to guess on?
True/False and Multiple choice
Which test item is the easiest to score?
What level is appropriate for restricted essay items?
Anything lower than application-because they do not have free range on the topic, they are restricted
How can you more objectively grade essays?
Grade one criteria at a time, make the essays anonymous
What is the mean of 81, 83, 82?
What is the median of 10, 12, 8, 9, 7?
What are the measures of central tendency?
Mean, Median, Mode
Which measure of central tendancy is repeated the most?
Which measure of CT divides a distribution in half?
Which measure of CT is the average?
Which measure of CT is the 50th percentile?
Which measure of Central Tendency is the most stable?
Mean, because it takes every score into account
Student scores 49 on vocab test, and the mean for the class is 40, with an SD is 3. What is the z-score and what percentage of the class scored higher?
z-score = 3less than 1% scored higher;z=x-m;;; SD;z=49-40 ;; 3;z=3 then look at bell curve
What is variability?
How spread out the scores are in a distribution.
Which measure of a variable is the most dependable?
Standard Deviation-Takes all into consideration
If a set of scores has a variance of 0, what can you conclude?
Everyone in the distribution has the same score
What happens to SEM when you decrease SD?
When you decrease reliability, you have a more accurate test-SD is variability between scores
How can you change the level of a multiple choice item?
Change the distractors-make them more plausible-in stem, have them choose “best” answer
Should you grade one whole essay at a time (on the same test, by the same person)
No, grade by criteria for all essays
What does the “scoring criteria” for short answer and essay items tell you?
how many points it is worth, and what you will accept as correct
In norma dist, approx what percent of scores lies between T-scores of 40-80?
NCLB requires assessment at what grades?
What does NCLB require?
that students be tested annually
According to the text, what is the real argument against High Stakes Testing?
Using one score to make high stakes decisions-that score is only one snapshot in time-it also is biased, produces narrow scores, and causes teachers to “teach to the test”
What kind of discussions should teachers have with students about High Stakes Testing?
Simple and positive
What are the 12 conditions for a HST program according to the American Educational Research Association?
don’t use a single score for high stakes decisions2. everyone should have same resources and learning opportunities3. validation for each intended separate use (don’t use same score to tell graduation, promotion, financing, etc)4.
tell users the possible negative consequences of HST programs5. test and curriculum are aligned6. validity of passing scores and achievement levels (what the scores mean)7. remediation available to those who fail the HST8.
attention to language differences 9. attention to disabilities10. stick to rules about who will and wont take test (don’t tell low-performing students not to come to school that day)11. sufficient reliability researched for each intended use12. ongoing evaluation of intended and unintended effects of HST
What is Reconstitution?
Moving teachers around or not renewing contracts because of test results
What are some test-taking strategies you can teach students for HST?
-sleep, breakfast, study-follow directions carefully-read each item, passages, information carefully-manage test-taking time-easier items first-eliminate options before answering-check answers after completing test
What does a positively skewed distribution tell you about the scores?
majority of scores fall below the middle of the score distribution-there are many low scores, but few high scores
[image]positively or negatively skewed?
[image]positively skewed -many low scores, few high scores-most scores fall below the middle-tail is toward the positive end of curve
[image]positively or negatively skewed?
[image]negatively skewed -many high scores, few low scores-scores lump above middle-tail is toward neg.
end of curve
[image]| | | 1. 2. 3. what distribution, and label mean, median, mode
[image] | | | 1.
2. 3.positively skewed1.
mode 2. median 3. mean
[image] | | | 1. 2. 3. what distribution, and label mean, median, mode
[image] | | | 1.
2. 3.negatively skewed1.
mean 2. median 3. mode
Which measure of CT is most frequently used?
Which is not affected by extreme scores, the median or the mean?
median-represents the middle better when scores are skewed
What are 2 modes in a distribution called?
what are 3 or more modes in a distribution called?
If each score in a dist. occurs with equal frequency, what is the mode?
no mode (not 0)
What is the least stable measure of CT?
mode(a few scores can influence significantly)
In a normal distribution, which measure of CT has the most value?
none, they are all the same value (all reach the same highest points on the bell curve)
If the mean is 47, the median is 54, an mode is 59, what is the shape of the distribution?
negatively skewedgoes in same order they are listed
What does the semi-interquartile range do?
prevents extreme scores from influencing the sensitivity of a range (such as if everyone scores in the 40s but one person scored a 90)only the middle 50% is computed, top and bottom 25%s are left out
What does each quartiles in SIQR mean?
Q1 is the point below which 25% of scores lieQ2 is the median, or 50% of scoresQ3 is the top 25% percent of scores
What is the most commonly used estimate of variability?
What is the most accurate measure of variability?
Standard Deviation-includes all scores in a distribution
What does SD tell you?
how much a single estimated score actually describes all scores within the range
What do large and small score values represent in SD?
large SD means more variability, smaller SD means less
How can you tell the strength of a correlation?
how close the numbers are to -1.0 or +1.0
What does the sign (-/+) tell us about a correlation?
whether it is negative or positive
What does a correlation of .00 mean?
there is no correlation at all, high and lows in one are associated with highs and lows in another
How do you tell pos or neg correlation from scatterplots?
the direction of the slope-from left to right is negative (pointing down)-from right to left is positive (up)
“Does the test measure what it is supposed to test?” is asking about…
“Does the test yield the same or similar score scores consistently?” is asking about…
“Does the test score closely approximate an individuals true level of ability, skill or aptitude?” is asking about…
Why must a test have validity?
To show that it measures what it says it measures(such as a 3rd grade math test actually testing 5th grade math, or even a math test actually testing reading skills)
If someone inspects test questions to see if they correspond to what should be covered by the test, they are looking at what kind of validity?
What is the problem with content validity?
-hard to tell if construct tests are valid because they are abstract-only tells if the test LOOKS valid but it might be measuring something else (such as guessing ability,reading skills, etc)