Validity and Reliability

Content Validity includes gathering evidence to demonstrate that the assessment content fairly and adequately represents a defined domain of knowledge or performance. The purpose of this document is to provide guidance for collection of evidence to document adequate technical quality of rubrics that are being used to evaluate candidates in the Glenville State University Department of Teacher Education. Thanks to the Cato College of Education at UNC Charlotte for their assistance.

Validity Evidence Needed for Rubric Use and Interpretation

To establish content-validity for internally-developed assessments/rubrics, a panel of experts will be used. While there are some limitations of content validity studies using expert panels (e.g., bias), this approach is accepted by CAEP. As noted by Rubio, Berg-Weger, Tebb, Lee and Rauch (2003),

Using a panel of experts provides constructive feedback about the quality of the measure and objective criteria with which to evaluate each item …. A content validity study can provide information on the representativeness and clarity of each item and a preliminary analysis of factorial validity. In addition, the expert panel offers concrete suggestions for improving the measure. (p. 95).

Establishing Content Validity for Internally-Developed Assessments/Rubric

GSU is committed to ensure the Validity and Reliability of all EPP created assessments used within the Education program.

The Chart below identifies the EPP created assessments and the review cycle. The chart will be updated as assessments are revised or additional assessments are created.

EPP Created Performance Assessments Review for Reliability & Validity Timeline

Assessment	Fall 2020	Spring 2021	Fall 2021	Spring 2022	Fall 2022	Spring 2023	Fall 2023	Spring 2024	Fall 2024	Spring 2025
Partner Teacher Assessment	X					X
Dispositions	X					X
Professional Semester Evaluation		X					X
Early Education Special Subjects		X					X
Elementary Special Subjects			X					X
Music Special Subjects			X					X
Technology Standards & Performance Indicators Evaluation			X					X
English Special Subjects				X					X
Social Studies Special Subjects				X					X
Science Special Subjects					X					X
Math Special Subjects					X					X
PE/Health Special Subjects					X					X

Protocol

1. Complete the Initial Assessment/Rubric Review for each rubric or assessment used to officially evaluate candidate performance in the program. Make sure that the “overarching constructs” measured in the assessment are identified.

2. Identify a panel of experts and credentials for their selection. The review panel should include a mixture of GSU Faculty (i.e., content experts) and K12 school or community practitioners. Minimal credentials for each expert should be established by consensus from program faculty; credentials should bear up to reasonable external scrutiny (Davis, 1992).

The number of panel experts should include:

At least 3 content experts from the program/department in the Department of Education at GSU;
At least 1 external content expert from outside the program/department. This person could be from GSU or from another IHE, as long as the requisite content expertise is established; and
At least 3 practitioner experts from the field.

TOTAL NUMBER OF EXPERTS: At least seven (7)

3. Creating the response form. For each internally-development assessment/rubric, there should be an accompanying response form that panel members are asked to use to rate items that appear on the rubric. Program faculty should work collaboratively to develop the response form needed for each rubric used in the program to officially evaluate candidate performance.

For each item, the overarching construct that the item purports to measure should be identified and operationally defined.
The item should be written as it appears on the assessment.
Experts should rate the item’s level of representativeness in measuring the aligned overarching construct on a scale of 1-4, with 4 being the most representative. Space should be provided for experts to comment on the item or suggest revisions.
Experts should rate the importance of the item in measuring the aligned overarching construct, on a scale of 1-4, with 4 being the most essential. Space should be provided for experts to comment on the item or suggest revisions.
Experts should rate the item’s level of clarity on a scale of 1-4, with 4 being the most clear. Space should be provided for experts to comment on the item or suggest revisions.

4. Create an assessment packet for each member of the panel. The packet should include:

A letter explaining the purpose of the study, the reason the expert was selected, a description of the measure and its scoring, and an explanation of the response form.
A copy of the assessment instructions provided to candidates.
A copy of the rubric used to evaluate the assessment.
The response form aligned with the assessment/rubric for the panel member to rate each item.

5. Initiate the study. Set a deadline for the panel to return the response forms to you / complete the response form online.

6. Collecting the data. Once response data for each internally-developed rubric have been collected from the panel participants, that information should be submitted to the Education Department Office. Copies of all forms and/or an excel file of submitted scores (if collected electronically) should be submitted.

7. Once Content Validity Results have been submitted, the Education Department will generate a Content Validity Index (CVI). This index will be calculated based on recommendations by Rubio et. al. (2003), Davis (1992), and Lynn (1986):

The number of experts who rated the item as 3 or 4
The number of total experts

A CVI score of .78 or higher will be considered acceptable.

8. If the CVI score is lower than .78, faculty will meet to revise the assessment and revisions will be put through the validation protocol and the process repeated until the assessment meets the CVI of 0.78 or higher.

References:

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

Davis, L. (1992). Instrument review: Getting the most from your panel of experts. Applied Nursing Research, 5, 194-197.

Lawshe, C. H. (1975). A qualitative approach to content validity. Personnel Psychology, 28, 563-575.

Lynn, M. (1986). Determination and quantification of content validity. Nursing Research, 35, 382-385.

Rubio, D.M., Berg-Weger, M., Tebb, S. S., Lee, E. S., & Rauch, S. (2003). Objectifying content validity: Conducting a content validity study in social work research. Social Work Research, 27(2), 94-104.

Glenville State University Inter-Rater Reliability Protocol

What is Inter-rater Reliability?

Inter-rater reliability is the level of agreement between raters/judges. If everyone agrees, IRR is 1 (or 100%) and if everyone disagrees, IRR is 0 (0%). Glenville State University Teacher Education Department uses Cohen’s Kappa since our validity protocol requires seven raters to review each EPP created assessment.

What is Cohen’s Kappa?

Cohen’s Kappa measures inter-rater reliability. In this method, raters are chosen deliberately. It is generally thought to be a more robust measure than simple percent agreement calculation, since k takes into account the agreement occurring by chance.

The value of Kappa is defined as

Crane, W. & Brewer, M. (2009). Principles and Methods of Social Research. Psychology Press.

Men's Sports

Women's Sports

Athletics

Men's Sports

Women's Sports

Athletics

Validity and Reliability