Updated 4/14

(c) 2005. EDUCATION, 126(1), 26-36.

Language Course Taught with Online Supplement Material: Is It Effective?
Y. M. Shimazu
San Jose State University

The effectiveness of a 5-unit college-level Japanese language course taught with online supplemental material was examined against the same course taught without online material.   Participants were 86 students in 8 sections of a first-semester elementary Japanese language course during a 4-year period.  Complete data (quiz, midterm, and final exam scores) were available for 39 students in the comparison group and 47 students in the experimental group.   The instructor used a conventional textbook and a Kana workbook based on an eclectic approach.   The results showed no statistically significant differences on the students' test scores between the comparison group and the experimental group on 9 quizzes and final but showed statistically significant differences on one quiz and midterm (without-online students scored higher than with-online students).  Dropout rates were significantly lower in the with-online course.

Computer technology continues to develop.  As William Massy (1997) noted "the faculty role will change from being mainly a content expert, professor's job is to profess, to a combination of content expert, learning process design expert, and process implementation manager" (p. 31).   In recent years, the Internet has become the leading edge in delivering instruction at a distance by virtue of its ability to incorporate learning in innovative ways.   By using the Internet, students can maintain their control over the rate and timing of instruction and homework.   In 1986, Kulik and Kulik found that (a) computer-based instruction has a small but significant positive effect on achievement and (b) computer-based instruction substantially reduced instruction time to as little as a third of that required by traditional instruction.  Recent research by Aberson et al. (2000), comparing computer-based instruction to traditional instruction, found no statistically significant differences between the online tutorial and the lecture groups' test scores.  Their findings were, however, encouraging because the online instruction can be implemented as an effective supplement to traditional classroom.  The Pew Internet and American Life Project (2002) reported that 86% of the students attending universities in the United States have accessed the Internet, compared with 59% of the general population. Besides, 79% of college students stated that the use of the Internet has had a positive impact on their overall learning.  According to Ryan and associates (1999), "Higher education is moving with deliberate speed to an electronic classroom.  Much has been published on faculty experiences with WWW course delivery.  However little research exists on the evaluation of these methods" (p. 272).   A recent study by Al-Jarf (2004) indicated that use of Web-based instruction as a supplement to traditional inclass ESL writing instruction was significantly more effective than writing instruction depending on the textbook alone.
     The use of the Internet is expected in foreign-language classrooms.  The benefits of the Internet use in second-language learning is, however, often questioned.  Does the use of the Internet show positive effects on language learning?  It is not yet known whether the Internet use yields positive results.  This study analyzed the effects of the Internet use on second-language learners' test scores in an elementary Japanese class.  The results show pedagogical implications for the Internet application in second-language curricula.

Purpose of the Study
The present study investigated and evaluated the effectiveness of a language course supplemented with online material for college-level Japanese learners.  The study compared exam scores from students enrolled in traditional inclass instruction taught 5 consecutive days for 50 minutes "without online supplement" material with exam scores of students enrolled in traditional inclass instruction taught for 5 consecutive days for 50 minutes "with online supplement" material at home.

The sample consisted of full-time students from first-semester Japanese language classes during Fall 1997, Fall 1999, Fall 2000, Spring 2002, Fall 2002, Spring 2003, Fall 2003, and Spring 2004 semesters at San Jose State University.  The comparison group was subjected to traditional inclass instruction, and the experimental group was subjected to traditional inclass instruction plus online supplement material at home using their personal computers.
     Ten quizzes, one midterm exam, and one final exam were given.  The quizzes, the midterm exam, and the final exam were identical across semesters.  The data from the first 4 classes that were taught without online supplement material  (Fall 1997, Fall 1999, Fall 2001, and Spring 2002) were compared with the data from the remaining classes that were taught with online supplement material (Fall 2002, Spring 2003, Fall 2003, and Spring 2004).   A series of two-tailed independent t tests were used, and effect sizes are reported.
      Because greater replication studies provide additional support for believable generalizability (Robinson & Levin, 1997), an external replication was based on an independent group of students for 4 semesters.  Comparison-group data were obtained from students enrolled in the first 4 semesters.  Data were collected from regular classes, and students were not aware of the treatment they were receiving so that "novelty effect" was under control.  Because comparison-group scores were collected during the first 4 semesters, the "John Henry Effect" (refers to control groups or their teachers feeling threatened or challenged by being in competition with a new program or approach and, as a result, outdoing themselves and performing well beyond what normally would be expected) should have little or no impact on the test scores.  All students responded that they had access to the Internet and had knowledge of its use.

The individuals in this study were 86 students who had no prior knowledge of Japanese, that is, who never studied Japanese before.  Greater proportion of students in the experimental group were male than in the comparison group (see Table 1).  The average ages for the 2 groups are comparable (comparison group  M = 21.05 years old, SD = 2.62  vs. experimental group  M = 21.40 years old, SD = 2.35).  The average number of units each student carrying per semester was 14.26, ranged from 5 to 23, and the averages are comparable between the 2 groups.  The majority of students had an English language background.

All students received instruction 5 days per week, 50 minutes per day for one semester. Students in the comparison group and the experimental group used the same textbooks and were taught with the same methods by the same instructor, for the same length of time (15 weeks) during the semester. Both followed the same course outlines that assured uniformity in grammar topics treated by the same instructor.   Towards the end of each textbook lesson, as homework, both groups were assigned to do the Review Exercises at the end of each textbook lesson.
     The students in the experimental group engaged on their own time in various additional exercises on the Web.  The format of the online assignments consisted of a variety of exercises: (a) Number exercises, (b) Hiragana and Katakana character sound-letter association exercises, (c) vocabulary exercises, and (d) grammar review.  Some exercises required student active participation in the interactive language activities such as multiple-choice quizzes, repeating after the model, choosing the correct answer after listening to a short conversation.  Some exercises were a true-false type, locating grammar errors, particle exercises, answering simple questions, and translations.  Some exercises were a sentence completion and a filling-in-the-blank type that would help develop students' morphology and syntax skills.  Some exercises were reading comprehension and short writing compositions.  Each online exercise was related to material covered in the class.  The online assignments were given at an average rate of 3 exercises per week. The instructor searched Websites and chose the most relevant exercises that are closely related to the lesson and they were linked and uploaded onto the homework pages.  To avoid gibberish on the student's computer screen or the fear of the exercise not downloading (because of the nature of foreign fonts), when designing the homework pages, the instructor carefully chose the Web-based exercises that were downloadable with old browsers to meet the needs of students who are still using old versions of browsers and dial-up connections (see http://users.aol.com/ymshimazu1/04spj001a-homework.html).  Modifications and additions to the Website resulted in no downtime.   Each assignment required 2 to 3 days to complete and was 1 to 2 hours in length.  The exercises were available 24 hours a day and 7 days a week, and were available to anyone during the course.  Each online homework sheet required the student to indicate  (a) "how much the student liked the assignment" on a 7-point scale and  (b) "the time the student spent to do the assignment" on the bottom of the assignment page.  The instructor reminded and encouraged the students to stay engaged in the online assignments on a daily basis.  Upon completion of each textbook lesson, a quiz was given approximately once a week and its scores were used for data analyses.
      In the experimental group, students received prompt feedback from the instructor after they submitted the homework assignment.  If the assignment required just the interactive exercises with the computer, which used active-learning techniques, the instructor collected the homework sheet, which indicated the students' time spent (time sheet on task) and how much they liked the assignment.   Students also were encouraged to contact the instructor via email if they had any questions about the online task.  Almost no students, however, contacted the instructor via email to ask questions.  In the beginning of the 2001 spring semester, one student emailed me and asked how to write Japanese characters on his computer.
     The exact amount of hours each student studied at home was not controlled, but the amount of hours each student spent on the Internet were indicated on every online assignment sheet and reported.  The hours the students studied Japanese per week outside class were seen as comparable (comparison group M = 3.7 hours, SD = 1.2 vs. experimental group M = 3.5 hours, SD = 1.3).
     The instructor is a native of Japan, with 25 years of experience teaching Japanese in the United States.  The textbooks used were Learn Japanese Vol. 1 by University Hawaii Press and Handy Katakana Workbook by Pearson Custom.  Because the instructor and the researcher were the same, to avoid researcher expectancy, the problem of skewing the outcome, the instructor used techniques such as using student ID#s to identify examinees, true-false (T/F), multiple-choice quizzes, and other techniques to maximize test objectivity.  The course required some technology skills of the students, more than just Web browsing.  After having downloaded either RealPlayer, Media Player, or QuickTime on their computers, all students were able to listen to Japanese sentences using Streaming.

The quiz or the exam was given at the end of each textbook lesson.  The scores on the quizzes, the midterm exam and the final exam were collected and used for the data analyses.  The format of the quizzes and exams were the true-false (multiple-choice), dictation, matching, in the language domain areas of  (a) describing what one does or will do and  (b) asking for information, clarifying, and so on (i.e., language functions) in the 4 language-skill areas.   For quiz 2 (Q2), there were 47 T/F oral-production and listening-comprehension items, 2 dictation items (i.e., answering simple questions); for midterm, there were 30 T/F oral-production and listening-comprehension items, 10 matching items (translation between Japanese and English), a dictation of a short story, and 20 T/F listening- comprehension items.  The number of items and format of the rest of the quizzes and the final exam were similar to the midterm exam.  All quizzes and exams attempted to show inclusion of 4 language-skill areas to be tested: reading, writing, speaking, and listening skills.  One point was given per test item correct for the quizzes and the exams.

     Validity.  It is rare to find a teacher-made test in the research that is based on persuasive evidence of content or construct validity.  For this study, the entire information on construct validity and reliability of the quizzes, the midterm, and the final exams were not available.  The content validity of the midterm exam, however, was assessed by another language expert who has been a Japanese language instructor at a university in the United States for over 20 years.  The correlation coefficients between the oral production section of the midterm and an oral interview for 2 classes were .81 and .83; sample sizes ranged from 8 to 15.  So there is a strong relationship between the scores on the midterm results and performance on an oral interview adding to the validity evidence.

 In order to decide whether classes could be combined for the comparison group and for the experimental group, t tests were conducted between each of the 4 classes within the comparison sections and the experimental sections: for the comparison group, between Fall 1997 and Fall 1999, Fall 1997 and Fall 2000, Fall 1997 and Spring 2002, Fall 1999 and Fall 2000, Fall 1999 and Fall 2002, and Fall 2000 and Spring 2002, and for the experimental group, between Fall 2002 and Spring 2003, Fall 2002 and Fall 2003, Fall 2002 and Spring 2004, Spring 2003 and Fall 2003, Spring 2003 and Spring 2004, and Fall 2003 and Spring 2004.   No statistically significant differences were found when making the comparisons for the comparison group or when making the comparisons for the experimental group.  So the data were combined across the 4 sections for the comparison group and for the experimental group.
     The means, standard deviations, t-test results, and effect sizes (ES) for quizzes and examinations are provided for the comparison and experimental groups in Table 2.  Independent t tests were calculated to assess statistical differences between the comparison group and the experimental group for the 10 quizzes, one midterm exam, and one final exam.  Results showed no statistically significant differences between the comparison group and the experimental group on 9 quiz scores and the final exam scores but showed statistically significant differences on Q2 and the midterm exam.  The students who were instructed without online supplement scored higher, on the average, on Q2, and the midterm, as indicated in Table 2.
     Effect sizes (practical significance, that is, how much better the experimental group is compared with the comparison group) range from  - .92 to .04, (- .92 for midterm and - .73, for Q2) with one large effect size, 2 medium effect sizes, and 6 small effect sizes (see Table 2).  The statistically significant results were associated with the two largest effect sizes.  On the other 9 quizzes and the final, the data did not reveal any difference between the comparison and the experimental groups at the .01 level.  There were, however, differences between the without-online course and the with-online course when taking into account the significant differences in dropout rates for the 2 groups.  Dropout rates were much lower in the online supplement course, without online 31% vs. with online 12%.  In the comparison group, only 76 of 110 students completed the course, whereas in the experimental group 98 of 112 students completed the course.

The results suggest that language courses taught with online supplementary material make little difference in student test scores. This study suggests that students enrolled in the with-online course tend to stay in the course.  Because 19% more students dropped out of the class in the comparison group by the end of the semester, compared to the experimental group, many of the high achievers could have raised the mean scores in the comparison group.
     Student attitudes varied toward the online supplementary materials.  Some reported that they enjoyed the course and would recommend the course to other students.   Several students in the experimental group replied on the survey that the online assignments were very interesting and brought them satisfaction.  The online exercises were often lengthy and exhausting, yet some of the students viewed the online exercises as rewarding and kept practicing them.  Some students indicated on the test sheet that they liked the online exercises a lot and spent longer on them but often did not do well on the quizzes and exams, whereas several students who indicated that they did not like the online exercises very much and did not spend much time on online tasks did well on the quizzes and exams.  In this study some of the students who are in demanding academic programs may not have had as much time as other students to engage in extra online-language activities other than the classroom lessons.  They primarily focus and concentrate on their major course work, yet they still improve their language skills, as shown in the cases of some students in this study.  Many educators assume that lowering anxiety through online study at home can be a strong incentive and lead to higher achievement scores.  This was not the case in this study.
     An online bulletin board was made available for student use, but few students used it during the semester.   The reason for this is probably the class met every day, and there may have been no need for any students to receive any feedback from the instructor immediately through the Internet bulletin board.
     Many other qualitative variations also were observed.  The comparison-group students in Fall 1997 and Fall 1999 appeared to be alert and to learn faster than the experimental-group students in Fall 2003 and Spring 2004.  The comparison-group students in Fall1997 showed ability to focus more on each day's lesson than the experimental-group students in the Spring 2004.   Often the comparison-group students in the Fall 1997 exhibited qualities such as willingness to make mistakes, willingness to guess, giving opinions freely, and not being inhibited.

Limitations and Future Research
University scheduling of classes made random selection and a large sample size during one semester difficult for this type of study.  The study ran the risk of having a number of variables affecting the student scores due to the variances in the area of student background experience.  The author was cognizant of the limitations imposed on the generalizability of significant findings achieved in this study.  Nevertheless, because statistically significant data were obtained despite the nonrandomized sample, the suggestions made in this study are worthy of use in foreign-language course design when modified to meet the needs of different instructional settings.   For future studies, researchers should conduct several external replication studies based on new, independent participants in different foreign language areas.
     The amount of "outside class study time" of the student was not controlled for.  "Motivation" of each student toward language learning was not controlled for, either.  The instructor could not control the study rate of each student nor could he record exact contact hours of the student with the Internet assignment.  The author relied on students' self-reports time spent online.  Use of a set of course tools such as Blackboard¨ or WebCT¨ would allow the exact time that the student was logged on to the exercises.  Whether the student actually worked on the assignments or not, however, cannot be assessed only time logged on.   Regarding the use of online exercises, students must be familiar with the technological tools that were used in the online exercises and recognize their potentials and limitations to fully benefit from the online exercises.  Students also must know whether the computer systems they are using are adequate for online exercises.
     Extraneous variables were very difficult to control for.  The ability to control extraneous variables is essential if the results of the study are to be considered valid and generalizable. Considerations were given to ensure the least number of extraneous variables contaminating the experiment and these considerations were found to cast doubt on its validity.   The t test is robust with respect to the violation of the homogeneity of variance assumption (Glass, Peckham, & Sanders, 1992).  Even though this study covered a 4-year period, it did not allow use of randomly selected samples.   This research took into account differences among students: age, gender, hours they study outside class, and units carried during the semester.  The study, however, failed to tap motivation and the different learning styles of students that are related to the use of technologies.
     Well-designed and carefully controlled experiments will have sufficient statistical power to distinguish genuine effects even with a small size. Research can be done under conditions typical of actual classrooms with ordinary teachers and without access to financial resources or outside support (Hickey, Kindfield, Horwitz, & Christie, 2003).  Factors such as cognitive,  affective, and psychomotor skills might affect student test scores as a result of dealing with complexities of tasks in technology, the "synergistic effect" of certain technologies (National Education Association, 1999, p. 25).

The number of conclusions that reached no statistically significant differences on most quizzes and the final exam indicate that online learning may not be as prudent as many educators and administrators think it is.  These findings suggest that online learning may be nothing more than an academic exercise.  Technology can leverage faculty time, but it also can replace human contact.  Other considerations for the instructor might be the following:  How much time would it take to prepare and to maintain the online supplement material?  Would the rewards outweigh the cost in time and energy?  Other problems often lie in technological or physical structures of schools.  Unwired Internet connections to classrooms make it difficult to use technology.  Also, teachers' attitudes and fears toward technology often are key factors associated with their uses of technology (Becker, 2000). Unless a teacher has a positive attitude toward technology, it is unlikely that he or she will use it.  In addition, the constantly changing nature of technology makes it difficult for teachers to deal with new developments.  As a result, teachers may choose not to use it in their teaching unless there is a strong need or demand and reliable support for it.  Moreover, teachers who perceive pressure from colleagues are more likely to use computers for their own purposes, and teachers who receive help from colleagues are more likely to use computers with their students (Zhao, 2003).
     Online supplements appeared to decrease the dropout rate of students significantly.  What is unclear from the results of the present study is the exact cause of the higher achievement of some students.  It could be (a) high aptitude, (b) high motivation, or (c) opportunity outside of class.  What then accounts for the higher achievement?  The following questions were raised by this study:  Do students remain engaged in the learning process regardless of the delivery technique used, that is, when they study on their own at home on the Web?  Do online students have as many opportunities to interact and produce orally in the target language as they would in the classroom situation?
     The results of this study will provide teachers with some information in implementing more efficient course designs for language students in university environments.   Although this study was conducted specifically in a Japanese language class, the information obtained here may be applied to other foreign-language classes as well.  Language-learning variables such as intelligence, aptitude, attention, motivation, opportunity, learning style and strategy of the students should be taken into consideration for a future study.  Replicability studies will also ensure the validity of the results.  With sound research, we can assist not only our students but also ourselves as educators.

The author would like to thank Professor Patricia Busk, University of San Francisco, for her helpful comments and suggestions on this paper.  He would also like to thank several colleagues who read earlier versions of this paper and provided helpful comments and suggestions.


Aberson, C. L., Berger, D. E., Healy, M. R., Kyle, D. J., & Romero, V. L. (2000). Evaluation of an interactive tutorial for teaching the central limit  theorem. Teaching of Psychology, 27, 289-291.

Al-Jarf,  R. S.  (2004). The effect of Web-based learning on struggling EFL college writers.  Foreign Language Annals, 37, 49-57. Becker, H. J.  (2000).    Findings from the teaching, learning, and computing survey: Is Larry Cuban right?  Education Policy Analysis Archives,  8 (51). Retrieved July 1, 2004,   from http://epaa.asu.edu/epaa/v8n51/

Glass, G. V., Peckham, P. D., & Sanders, J. R. (1992).  Consequences of failure to meet assumptions underlying the fixed-effects analysis  of variance and covariance.  Review of Educational Research, 42, 237-288.

Hickey, D. T.,  Kindfield, A. H,  Horwitz, P,  & Christie, M. T.  (2003).  Integrating curriculum, instruction, assessment, and evaluation in a technology-supported genetics learning environment.  American Educational Research Journal, 40, 495-538.

Kulik, C. C., & Kulik, J. A. (1986).  Effectiveness of computer-based education in college.  AEDS Journal, 19,  81-108.
Massy, W. (1997).  Life on the wired campus: How information technology will shape institutional futures.  In D. G. Oblinger & S. C. Gold (Eds.),  The challenge of information technology in the academy (pp. 1-42). Boston: Anker.

National Education Association. (1999).  What's the difference?  A review of contemporary research on the effectiveness of distance learning in higher education. Washington DC: American Federation of Teachers. (NEA 1-42).

Robinson, D. H., & Levin, J.R. (1997). Reflection on statistical and substantive significance, with a slice of replication.  Educational Researcher, 26 (5), 21-27.
Ryan, M. , Carlton, K. H., & Ali, N. S. (1999).  Evaluation of traditional classroom teaching methods versus course delivery via the World Wide Web.  Journal of Nursing Education, 38, 272-277.

The Pew Internet and American Life Project (2002, September 15). College students say the Internet helps them.  Retrieved August 1, 2004, from http://www.pewinternet.org/PPP/r/50/press_release.asp

Zhao, Y. (2003).  Factors affecting technology uses in schools:  An ecological perspective.  American Educational Research Journal, 40, 807-840.

Table 1
Number of Examinees by Gender and Language Background

                                               Comparison Group                         Experimental Group
                                               ________________                            __________________
Characteristic                               f             %                                     f              %

     Male                                       23         58.97                                34         72.34
     Female                                    16         41.03                                13         27.66
     Total                                       39       100.00                                47       100.00

Language Background
     English                                    24         61.54                                25         53.19
     Chinese                                     8         20.51                                13         27.66
     Vietnamese                                2           5.13                                 4           8.51
      Korean                                    3           7.69                                 1           2.13
     Spanish                                    1           2.56                                 2           4.26
     Indonesian                                1           2.56                                 1           2.13
     Thai                                         0           0.00                                 1           2.13
     Total                                       39         99.99                                47        100.01


Table 2
Means, Standard Deviations, t-test Results, and Effect Sizes for Quiz and Examination Data Broken Down for Comparison and Experimental Groups

              Comparison Group                        Experimental Group
           (Fa97, Fa99, Fa01, Sp02)                 (Fa02, Sp03, Fa03, Sp04)
            ______________________                  _______________________
Test        n        M         SD                       n        M           SD                 t          df        ES a

Q1         39     95.33      6.24                      45     93.51       5.48                1.42       82      -0.29
Q2         37     84.57      8.75                      47     78.17       9.83                3.11**     82      -0.73
Q3         38     73.11     15.47                      47     64.40      16.49               2.49*      83      -0.56
Q4         36     82.58     13.99                      47     80.09      11.50               0.89       81      -0.18
Q5         36     79.25       9.12                      46     79.63       9.97              -0.18       80       0.04
Q6         38     78.50     11.59                      46     78.41       7.86                0.04       82      -0.01
Q7         38     79.32     10.13                      46     75.13       9.15                1.99       82      -0.41
MT         39     86.39       6.36                     47     80.51       7.82                3.77**     84      -0.92
Q8         38     80.31      10.77                     45     77.93     10.14                1.04        81      -0.22
Q9         37     82.24        9.52                     46     79.52      7.83                1.43        81      -0.29
Q10       36     68.44       13.86                     46     64.52     14.48               1.24        80      -0.28
FNL       39     81.46        7.45                      47     78.77      8.29               1.57        84      -0.36

*  Statistically significant at .05 level.
**  Statistically significant at .01 level, when overall error rate is controlled.
For Q9, F test of equal variances rejected at alpha of .05.
a  Effect sizes (ES) are based upon the difference between the means, divided by the comparison-group standard deviation (SDc).

=end of manuscript=

Editor: Mozilla 1.7.13