Questionnaire Construction and Classroom Research

Writer(s): 
Dale T. Griffee, Seigakuin University

Many teachers are becoming interested in classroom research (Griffee & Nunan, 1997), and one popular way of doing research is to use data generated from questionnaires. There are many advantages to using questionnaires: (1) You can collect a large amount of data in a fairly short time (Brown, 1988, p. 3), (2) they are easier and less expensive than other forms of data collection (Seliger & Shohamy, 1989, p. 172), (3) questionnaires can be used to research almost any aspect of teaching or learning (Nunan, 1989, 62), and (4) they can be easily used in field settings such as classrooms (Nunan, 1992, p. 142).

Nunan (1992, p. 143) raised the issue that the creation of valid and reliable questionnaires is a specialized business. A teacher cannot simply make a questionnaire, administer it, and report the results. Before a questionnaire can be used for research purposes, it must be reported how the questionnaire was constructed, how it was piloted, what the results of the pilot were, and what, if any, revisions were made based on the pilot questionnaire results. The purpose of this article is to provide basic procedures for making a questionnaire instrument that has some claim to being valid and reliable.

Key Terms

Validity is usually taken to mean that the questionnaire is in fact measuring what it claims to measure (Brown, 1988, p. 101; 1996, p. 231). Reliability is information on whether the instrument is collecting data in a consistent and accurate way (Seliger & Shohamy, 1989, p. 185) and is usually reported as a coefficient from zero to one hundred. Of the various different types of reliability, I will deal with the type known as internal consistency, and I will discuss coefficient alpha (also known as Chronbach's alpha) because it has the advantage that scores can range (i.e., Likert scale) or be dichotomous (Pedhazur & Schmelkin, 1991, p. 97). Alpha reliability is the relationship between the number of items and the correlation between items. The instrument is the test or questionnaire and item refers to a question on an instrument (not all items, however, are questions). More specifically, an item is an examination of a mental attribute the answer to which is taken as a degree of performance in some psychological construct (Osterlind, 1990). Construct (following Pedhazur & Schmelkin, 1991, p. 52) refers to a theoretical abstraction that organizes and makes sense of the world. Constructs familiar to language teachers are proficiency, motivation, listening, confidence, and anxiety. A Likert scale is a way of marking a questionnaire by marking or circling one of a range of possible answers, such as strongly agree, agree, undecided, disagree, and strongly disagree.

Step One: Writing the Items

I will discuss four parts to step one: stating the construct, brainstorming items, asking a panel of experts to review the items, and asking students review the items. Stating the construct involves writing out what you plan to measure (i.e., the purpose of the questionnaire). This sounds deceptively simple, but is often hard to do. Take, for example, a questionnaire that I wanted to create to measure student confidence in speaking English as a foreign language (Griffee, 1997). Stating the construct does not mean writing a sentence that says the goal is to measure confidence in speaking English as a foreign language. Rather, it means, to state what you mean by the construct of confidence.

Many teachers use questionnaires to determine to what extent students approve of their course. In that case, the problem isn't so much defining the construct approval as it is in stating the goals and objectives of the course because it is from the course objectives that questionnaire items are constructed.

Brainstorming the items could involve writing the items by yourself, writing with the help of colleagues, or basing them on other questionnaires that measure the same construct.

The next stage is to ask several colleagues to look at your items to see if them make sense. An expert is a colleague who has enough training and experience to offer a reasonable opinion, but does not have to be a person who wrote their doctoral dissertation on your subject. Since I wanted to balance nationality and gender on my panel, I asked three male and three female English speaker teachers, and three male and three female Japanese speaker teachers. I gave each colleague the definitions of my construct followed by a list of the brainstormed items and asked them to rate how well each of my items was measuring the construct. Next, I talked to them to find out why they objected to certain items, and in some cases I was able to more clearly understand their objection and revise the item accordingly. This is an example of validation evidence, that my items were looked at by some colleagues and judged as adequately measuring the theoretical construct. It is however weak evidence because it gives us information about the instrument rather than on the data obtained from the instrument (Angoff, 1988, p. 27). I then asked some students of the type for whom the questionnaire was planned to look at each of the items and circle any word they did not understand. After making adjustments based on their feedback, I was ready to pilot my questionnaire.

Step Two: Piloting the Instrument

The underlying strategy of step two is to create more items than you will eventually use, and to pilot the questionnaire to determine which items to keep and which items to revise or eliminate. I will discuss two parts to this step: piloting the questionnaire, and analyzing the results. Keeping in mind that you cannot ask the same students to do both the pilot study and the main study, select a group of students for the pilot. Piloting is not an optional step. It is necessary to get results to analyze to help you decide which items to keep and which items to cut.

I will discuss four possible ways of analyzing the results. The first is simple correlation (see Reid, 1990, p. 325). If you wrote two items that you intended to measure your construct, and they had a high correlation, you could argue that the students understood the items in the same way. If they did not correlate, you would assume that at least one of the items was not understood by the students in the way you intended, and was therefore not measuring the construct. You might take these items back to your panel and ask them why they thought your students did not interpret them in the same way. Alternatively, you could pilot your questionnaire with Near Native Speakers (NNS) and students. Assuming that the NNS understand the items, items that do not correlate highly can be cut.

A second way to analyze the results of the pilot is to calculate the alpha reliability for the separate sections of the test. If one of the sections gets low reliability, you should take the items back to your panel for revision. A third way to analyze your results is to give the questionnaire to a student. Ask the student each question (or let them read it), and then ask them to tell you what they think the item means. Items which are not clear to the student or are understood in ways not intended are candidates to be cut. A fourth way to analyze the results of the pilot is Factor Analysis (FA), which is a form of multivariate correlation. FA is decidedly superior to simple correlation, but it requires advanced knowledge of statistics and a higher number of students in your pilot study. Some researchers (Boyal, Stankov, & Cattell, 1995) call for at least 10 participants per item. Revised items should be piloted again until you are satisfied that all items address the construct.

Step 3: Reporting The Final Validation Evidence

You are now ready to administer your instrument. The underlying strategy of step three is to find out if the revised items are functioning well. If your items are not functioning, you have to decide to use them as they are, or to revise them and pilot again. If you are satisfied, you can report the content results of your questionnaire. (For a discussion of adequate reliability see Griffee, 1996, p. 283, and also Pedhazur & Schmelkin, 1991, p. 109).

The above three steps should be reported in summary form in the materials section of your paper to show that your questionnaire is valid and to what extent. The mean and standard deviation for each item should be included. If you use a Likert scale, you may wish to report the responses as percentages.

This article assumes that you have access to a computer, and a statistical program or a spreadsheet program that includes descriptive statistics, correlation, factor analysis, and alpha reliability. As my statistical program does not include alpha reliability, I adopted a formula (Brown, 1996, p. 196) to a simple spreadsheet.

Conclusion

Validation is the process of item creation, piloting, and item testing to determine whether the items are measuring what you claim they are measuring. No single test or observation constitutes validation; rather it is a series of checks, each of which must be reported. Validation should be built into the foundation of the questionnaire, not added on as an afterthought. Validation is a never-ending process and one never finally validates a questionnaire. You can expect to spend months if not years validating your instrument before you administer it for research results. This is a sobering realization. This article can only suggest the time and steps necessary. As Nunan said, making a questionnaire is a specialized business and should not be undertaken lightly.

 

References

Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. Braum (Eds.), Test validity (pp. 19-32). Hillsdale, NJ: Lawrence Erlbaum.

Boyal, G. J., Stankov, L., & Cattell, R. B. (1995). Measurement and statistical models in the study of personality and intelligence. In D. H. Saklofske & M. Zeidner (Eds.), International handbook of personality and intelligence (pp. 417-446). New York: Plenum Press.

Brown, J. D. (1988). Understanding research in second language learning. Cambridge: Cambridge University Press.

Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents.

Griffee, D. T. (1996). Reliability and a learner style questionnaire. In G. van Troyer, S. Cornwell, & H. Morikawa (Eds.), Proceedings of the JALT 1995 international conference on language teaching and learning (pp. 283-292). Tokyo: The Japan Association for Language Teaching.

Griffee, D. T. (1997).Validating a questionnaire on confidence in speaking English as a foreign language. JALT Journal, 19(2), 177-197.

Griffee, D. T. & Nunan, D. (1997). Classroom teachers and classroom research. Tokyo: The Japan Association for Language Teaching.

Nunan, D. (1989). Understanding language classrooms. New York Prentice Hall.

Nunan, D. (1992). Research methods in language learning. Cambridge: Cambridge University Press.