1. Introduction
In recent years, there has been an increase in research focused on the acquisition of spelling skills by second-language (L2) learners across different writing systems. However, overall, spelling ability has not received much attention in L2 research and teaching (Cook & Bassetti, 2005). This lack of attention to spelling was highlighted by Vivian Cook in the late 1990s, and he was among the first researchers to compare the spelling abilities of L2 English users with native speakers (Cook, 1997). Cook also emphasised the necessity of explicitly teaching spelling in L2 instruction (Cook, 1999).
Mastering spelling in alphabetic languages like German and English involves combining phonological, morphological, orthographic and semantic knowledge (Bahr et al., 2012), a complex process for L2 learners due to interactions between their first language (L1) and L2. This complexity is amplified by the linguistic and orthographic similarities or differences between the languages involved (Chung et al., 2019; Katz & Frost, 1992; Westergaard et al., 2017). Cross-linguistic influence1 can impact L2 spelling ability through language-general factors, which involve shared cognitive and linguistic processes, as well as through language-specific factors that derive from the unique properties of the individual writing systems involved. When examining the contrastive approach, L2 writers often exhibit signs of carrying over their L1 writing system as their literacy in their L1 has already influenced their spelling knowledge. Therefore, spelling error analyses can be seen as a window to provide not only insight into the separate linguistic contributions to word spelling, but also to investigate cross-linguistic influence. In the current study, we conducted a detailed examination of L2 English spelling errors that were further categorised based on the instructional methods used.
2. Background
2.1. Error analyses
During spelling acquisition, learners rely on different types of information, and various skills support the spelling process. To understand the spelling errors made by young English-language learners (ELLs), dual-route models provide a valuable theoretical framework. Dual-route models of spelling (e.g., Barry, 1994) suggest that two distinct processes are necessary for proficient spelling. Depending on their individual strategies, spellers may rely on sub-lexical routes based on sound-letter mappings or lexical routes (involving whole word recognition) (Niolaki et al., 2023). The sub-lexical route is engaged for spelling non-words, pseudowords and low-frequency words with regular spelling patterns, while the lexical route is utilised for both familiar regularly spelled words and irregular words. Thus, we expect beginner spellers to demonstrate greater reliance on sub-lexical rather than lexical processing as many words young ELLs initially encounter will not yet be familiar. The sub-lexical route, however, is not an efficient route and the application of phoneme-grapheme correspondences (PGC) demands time and cognitive resources (e.g., Moats, 2010). With increasing spelling training and print exposure, phonological and orthographic information is amalgamated to create permanent, word-specific memory representations (Orthographic Mapping Theory; Ehri 1995, 2005, 2014). Emergent spellers learn to bond the specific letters in written words (orthography) with their sounds (phonology) and meanings through this process. This bonding is crucial for the automatic and accurate recall of word spellings. The development of such stable representations, a process known as orthographic mapping, is facilitated by interacting with print and reflects a principle similar to the self-teaching mechanism (Shahar-Yames & Share, 2008; Share, 2004). This process is an integral component of learning to spell, as strong orthographic representations for words are essential to save cognitive resources. Direct access via the lexical route of spelling (i.e., processing familiar words that already have lexical representations) makes spelling more efficient (Bahr et al., 2012; Joshi et al., 2008). More advanced spellers make use of an extensive orthographic lexicon, rely less on phonics instruction and have developed knowledge of the high level of inconsistency in English orthography, which is crucial for effective word spelling (Niolaki et al., 2023).
The Triple Word Form Theory (e.g., Bahr et al., 2012; Berninger et al., 2010) provides a comprehensive framework for spelling error analysis by examining the interplay between phonology, orthography and morphology (the multifaceted dimensions of knowledge that are fundamental to the spelling process). Recent studies within the past years have focused on analysing the spelling errors of ELLs in immersive English-speaking environments in North America (e.g., Harrison, 2021; Martin et al., 2020; Zaretsky, 2020) and within non-majority English as a foreign-language (EFL) classrooms (e.g., Hevia-Tuero et al., 2023; Russak, 2022). However, to our knowledge, no error analysis study has yet been undertaken specifically with young German L1 learners learning English in a foreign-language setting, a gap that the present research aims to address.
2.2. German and English spelling
One of the key interests in exploring spelling errors is the L1 influence on the mechanisms of spelling in English as an L2. Certain language-specific (structural similarities and differences) and language-general (cognitive and linguistic) aspects may be transferred across languages (Cook, 1997, 2016; Cook & Bassetti, 2005). Some cognitive skills involved in spelling (such as phonological processing and RAN) are foundational across varied alphabetic languages, including German and English, evidencing a universal cognitive framework for literacy development (for a review of the cognitive skills involved in L1 and L2 spelling, see Mlakar, 2022). Yet, the degree to which these skills influence spelling outcomes differs significantly depending on orthographic consistency (e.g., Moll et al., 2014).
Cross-linguistic influence (both language-general and language-specific) is shaped by the linguistic proximity between languages (phonological and orthographic proximity hypothesis; e.g., Chung et al., 2019). English and German both use the Latin alphabet, though German includes additional letters such as ä, ö, ü and ß. Despite this similarity, the orthographies reveal a stark contrast in grapheme-phoneme correspondences (GPCs) consistency and syllable complexity, crucially affecting literacy acquisition processes (orthographic depth hypothesis; Katz & Frost, 1992; Schmalz et al., 2015; Seymour et al., 2003). English is characterised by a relatively opaque orthography, with a high number of irregularities in GPCs. In contrast, while German GPC rules are more regular than English and facilitate initial reading acquisition, there are some complexities in the system, particularly when mapping phonemes back to graphemes for spelling (Frith et al., 1998; Landerl, 2017). There are a number of specific phonological and orthographic differences that L1 German learners encounter when spelling in L2 English. Nevertheless, English is more opaque than German for both reading and spelling. This opacity is exemplified by the presence of certain phonemes in English that do not exist in German, such as the dental fricative sounds /θ/ and /ð/. German learners may struggle with these unfamiliar sounds, often substituting them with more familiar ones from their native phonetic inventory, like /f/ for /θ/ in the word ‘thin’ or /s/ for /ð/ in ‘with’.
2.3. Teaching interventions
For decades, there has been ongoing controversy on how to teach reading and spelling in both the L1 and L2. Especially since the 2000s, phonics has been advocated as the most effective way to teach young children to read and spell in Anglophone countries. In recent years, particularly in the United Kingdom, there have been suggestions that principles of teaching phonics could inform English as an Additional Language or EFL literacy instruction (e.g., Bauckham, 2016; Ofsted, 2021), although research evidence in this area remains limited. Findings from available studies suggest that key principles of phonics instruction, such as teaching GPCs, might be beneficial for L2 phonological decoding and/or vocabulary acquisition (August & Shanahan, 2006; Huo & Wang, 2017; Li & Woore, 2021; Murphy Odo, 2021; Woore, 2021), as learners often do not master L2 GPC in the absence of phonics instruction. However, none of these studies tested whether teaching L2 phonics instruction is effective in increasing L2 spelling accuracy.
In the context of our study with young ELLs who have already mastered the alphabetic principle in their L1 German, we define phonics-based instruction as an approach that demonstrates the contrasts and similarities in GPCs between German and English through explicit teaching. Existing research has primarily focused on explicit and implicit spelling instruction, with explicit and systematic instruction found to enhance spelling, reading and phonological awareness skills (for a meta-review, see Graham & Santangelo, 2014). Whereas phonics instruction focuses on the explicit teaching of GPCs, whole word instruction adopts a holistic approach that teaches reading and spelling visually at the word level. In this latter approach, children learn to say a word by recognising its written form, bypassing the decoding process (e.g., Blackwell & Laman, 2013). To our knowledge, there have been no intervention studies within L2 English contexts that utilise a combined approach of phonics and sight word teaching; however, such research has been conducted in L1 Anglophone settings (e.g., Shapiro & Solity, 2016).
3. The present study
Spelling is an under-researched component of foreign-language learning, with few studies on error analysis and the influence of a learner’s L1 on L2 spelling. We still lack a clear picture of the impact of specific approaches like phonics or whole word techniques on L2 English literacy attainment in instructed settings in Germany. To address this research gap, our study examined how phonics, whole word and combined literacy instruction over two years affect young ELLs’ reading, spelling and linguistic skills. By shedding more light on the phonological and orthographic knowledge beginner learners employ during the spelling process, we can intervene with tailored instruction. Our study aims to address the following research questions:
1a. How do spelling performance, reading skills and the prevalence of phonological versus orthographic error types differ among young German ELLs after two years of receiving phonics, whole word or combined literacy instruction?
Hypothesis:
We hypothesise that learners in the phonics instruction group will demonstrate greater accuracy in L2 real word and pseudoword spelling, attributed to the focused training in applying PGC (Murphy Odo, 2021). While reading comprehension skills are not expected to show significant differences among the instructional groups due to potential limitations in oral language proficiency (e.g., August & Shanahan, 2006), superior word decoding skills (both in rate and accuracy) are anticipated for the phonics group. In terms of error types, we posit that the phonics group will commit fewer phonological errors, as the instruction method emphasises sound-letter mappings.
1b. How do L2 English grammar and vocabulary knowledge develop among learners in each of the three instructional groups?
Hypothesis:
Although research is not extensive, evidence is emerging that phonics instruction in L2 may aid ELLs in expanding their vocabulary (e.g., Huo & Wang, 2017; Li & Woore, 2021; Woore, 2021). In line with Ehri’s Orthographic Mapping Theory (1995, 2005, 2014), we suggest that systematic training in PGC assists learners in the phonics group to develop better lexical representations. However, we do not expect significant group differences regarding the development of L2 grammatical knowledge.
2. To what extent do young German ELLs rely on their knowledge of phonology and/or orthography when spelling in English? Is this affected by instructional programmes (phonics, whole word, combined)?
Hypothesis:
We hypothesise, following the Triple Word Form Theory (e.g., Bahr et al., 2012; Berninger et al., 2010), that young German ELLs’ L2 English spelling errors will occur in both phonological and orthographic areas. Early spelling acquisition will likely be driven by sub-lexical/phonological knowledge across all instructional groups, with a greater dependence on sound-letter relationships. The phonics instruction group is expected to have fewer phonological errors due to enhanced phonological encoding.
3. Do phonemic and orthographic differences in L1 German affect L2 English spelling accuracy? Do accuracy levels differ between groups after having received a) phonics, b) whole word or c) combined instruction for two years?
Hypothesis:
We posit that L1 German distinct phonemic and orthographic characteristics could lead to specific spelling errors in L2 English, stemming from interference when dealing with non-corresponding English sounds like /æ/, /θ/, /ð/ and /ɒ/. Learners of L1 German may substitute these unfamiliar sounds with more familiar ones from their native phonological repertoire, such as <s>, <f> or <d> for the dental fricatives /θ/ (‘thin’) and /ð/ (‘this’). Additionally, we suggest that targeted phonics instruction can help address these difficulties by improving learners’ ability to recognise and spell these novel phonemes and their corresponding orthographic patterns.
3.1. Method
3.1.1. Participants, teachers and learning context
A total of 75 ELLs from two public primary schools in Lower Saxony, Germany, took part in this longitudinal, quasi-experimental study. In Germany, children typically begin their formal education with primary school, which starts at the age of six. The participants were divided into three classes, each assigned randomly to one of the three literacy intervention groups: Phonics, whole word and combined. They were followed from grade 3 to grade 4. Before the study began, all the children had learned fundamental literacy skills in German.
In the phonics group, there were 26 children (13 females, 13 males) with an average age at the end of the intervention (T2) of 10.5 years (SD = 0.76), ranging from 9.72 to 11.76 years. The whole word group consisted of 12 girls and 11 boys with a mean age of 10.57 years (SD = 0.73), spanning from 9.94 to 11.99 years. The combined group included 26 participants with an average age at T2 of 10.72 years (SD = 0.60), with ages ranging from 9.98 to 11.84 years. ELLs were predominantly native German speakers (21 in phonics, 19 in whole word and 21 in combined). Children within these groups also spoke languages including Arabic, Bulgarian, Kurdish, Polish, Russian, Hungarian and Turkish as their L1, with German as their L2. Questionnaire data indicated that all non-native German-speaking participants were either born in Germany or had relocated before the start of formal education, allowing for early familiarisation with the Latin alphabet. Additionally, those children with non-Latin script backgrounds reported they were orally fluent but not literate in their first languages. Among the participant groups, the phonics group contained a child with dyslexia, the whole word group had two children with distinct learning disabilities (general learning difficulties and dyscalculia) and the combined group had a child with social-emotional challenges. Our study encompasses a wide range of learners, thereby reflecting classroom diversity in Germany, which aligns with the country’s inclusive educational policies. In addition, we used parental educational level and occupation to measure family socioeconomic status (SES). Table 1 summarises participants’ demographic information.
Table 1
Participant characteristics.
DETAILS | PHONICS | WHOLE WORD | COMBINED |
---|---|---|---|
n | 26 | 23 | 26 |
Primary School | 1 | 1 | 2 |
Teacher | A | A | B |
Sex | 13f, 13m | 12f, 11m | 12f, 14m |
Mean Age (SD) Time 2 | 10.5 years (.76) | 10.57 years (.73) | 10.72 years (.60) |
L1 other than German | 5 | 4 | 5 |
3 Arabic, 1 Bulgarian, 1 Kurdish | 2 Polish, 1 Bulgarian, 1 Russian | 1 Arabic, 1 Bulgarian, 1 Hungarian, 1 Polish, 1 Turkish | |
Special educational needs | 1 | 2 | 1 |
Low SES | 4 | 3 | 3 |
Medium SES | 15 | 14 | 16 |
High SES | 7 | 6 | 7 |
The EFL instruction for the three classes was delivered by two experienced teachers, both holding university degrees in teaching English as a foreign language and each with substantial experience in the field (35 and 26 years of teaching experience, respectively). In our study, we established the comparability of both the instructional conditions and English language exposure across the three experimental groups. Each group received EFL instruction twice a week, with each lesson lasting 45 minutes. Over the course of the study, the total instruction time amounted to 76 weeks for the group receiving phonics-based instruction, 77 weeks for the group focused on the whole word approach and 75 weeks for the combined method group. Nonetheless, these differences were minimal and did not impact the overall comparability of the ‘dosage’ of English language input and literacy practice received by students in each group. All groups were provided with the same core instructional materials, which were thoughtfully adapted to ensure that each set of materials comprehensively supported the distinct teaching methods of the respective intervention groups. Parent questionnaires revealed that exposure to the English language had been very limited for all learners before entering grade 3. Contact time with English was mostly established through L2 teacher input, auditory learning materials, EFL textbooks and other written learning materials such as children’s books and worksheets.
Intervention logs provided a means to track implementation fidelity. This was controlled for by additionally using the Teacher Input Observation Scheme (Kersten et al., 2018) for 10 videotaped lesson recordings in each group to compare the number of cognitively stimulating activities, verbal input, non-verbal input and the support of learners’ output across groups. Three trained students, who had undergone extensive training on the strategies, curricula and observational protocols of the interventions, conducted fidelity checks. These assistants were not directly involved in the implementation of the intervention. Their examination focused on assessing adherence to the intervention, frequency and duration of exposure, quality of delivery, participant responsiveness and programme differentiation.
3.1.2. Literacy intervention programmes (phonics, whole word and combined)
The literacy component of the intervention programmes focused on either phonics, whole word or combined instruction. We included the whole word approach in this study as 74% of German primary teachers reported using this method in the primary EFL classroom (Treutlein et al., 2013) in keeping with market-leading coursebooks.
Literacy programme A (phonics instruction) focused on the explicit teaching of the systematic relationships between letters and sounds in English as an alphabetic writing system. The key concepts were increasing children’s knowledge of English GPC/PGC and pointing to the differences between the two orthographies (German and English). This should have helped the children to learn to spell familiar and unfamiliar words, without them encoding the whole written form using L1 spelling strategies or attempting to retrieve the word form from memory. Regular high-frequency words were fully sounded out phonetically. For exception words, often referred to as “tricky words”, the children learned to identify and decode/encode the parts of the word that followed phonetic rules and to sound out/spell as much of the word as they could. In the phonics programme, PGC that are the same in English and German, such as /t/, /k/ and /m/, were not explicitly taught. Special attention was given to PGC that differ between the two orthographies, for example /j/ as in ‘jump’, ‘jam’ and ‘enjoy’; and the /oo/ sound in ‘food’, ‘room’ and ‘too’.
Literacy programme B (whole word instruction) targeted the lexical reading and spelling route through sight word training (i.e., recognising whole words from orthographic memory). This instruction focused on holistic word recognition, in which lexical items were learned, stored and retrieved by memorising their shape. In this teaching method, often also called sight word or ‘look and say’ instruction, individual words were not broken down into phonemes or other sub-lexical parts but were used as whole units. The focus was on the word as the minimum unit of meaning. In the classroom, words were shown to students in various demonstrations and forms provided by the two teachers (e.g., in different sentences and pictures). Learners built a large sight word vocabulary, labelled things in the classroom, used picture dictionaries, sorted whole sentences to retell stories, used visualisation techniques and played memory games to boost their visual memory. Traditional drill and practice was used as a flashcard technique.
Programme C (combined instruction) differed from programme A (phonics) in two distinct areas: The quantity of letter-sound mappings and the methodology employed for instructing high-frequency words. In programme C, children were taught to use two strategies when reading and spelling. For reading, they could either recognise the word by sight or decode it phonetically. However, for spelling, children were taught to either recall the spelling of high-frequency words from memory or to apply phonetic segmentation and recombine the sounds to spell the word accurately. This combined method contrasts with programme A, where children were exclusively taught phonic decoding and encoding. Literacy programme C targeted teaching a combination of phonics skills together with whole word training. Children were provided with phonics instruction that emphasised the most common pronunciation of each grapheme. Subsequently, they were introduced to a selected list of 50 sight words, which comprised both regular words (e.g., ‘had’, ‘up’ or ‘went’) and exception words (e.g., ‘have’, ‘does’ or ‘one’). The aim of the instruction was to facilitate children in the recognition and retrieval of each of these words as whole entities, rather than through the decoding of individual components. In contrast, programme A employed an alternative approach. In this case, participants were given explicit instructions to fully sound out regular words, systematically implementing phonics principles. When confronted with exception words, the methodology involved recognising phonically decodable segments within the words and decoding as much of the words as the rules permitted.
3.1.3. Data collection and procedure
Data collection protocols for our study were designed to incorporate literacy, cognitive and linguistic measures. Cognitive measures were administered as pre-test control variables at the beginning of grade 3, while linguistic variables were assessed both at the start and then again at the end of grade 4 to evaluate progress following the interventions. Testing was conducted in different settings to accommodate the nature of each task. Participants were tested individually in a quiet room for tasks measuring L2 oral reading rate, L2 oral reading accuracy, phonological short-term memory, working memory and phonological awareness. Group assessments for L2 reading comprehension, L2 real word and pseudoword spelling, L2 grammar/vocabulary and non-verbal intelligence were carried out in the classroom setting. For all measures, instructions were consistently given in German to ensure comprehension across the participant pool. To maintain the standardisation of data-collection processes, the order of test administration was identical for all participants. Prior to conducting the tests, written parental consent was obtained in accordance with the research ethics guidelines set by the school authority. The data-collection team comprised researchers and graduate students who had received thorough formal training in the administration and scoring of the specific measures used. Oral reading rate and accuracy were evaluated by two English native speaker raters, yielding a Cohen’s Kappa of 0.83.
3.1.4 Assessment measures
To examine the lexical and sub-lexical spelling abilities of the participants, we administered both a real-word and a pseudoword spelling task. The real-word spelling test aimed to evaluate the children’s ability to retrieve word-specific spellings from memory, a skill which relies on the lexical route often used for familiar and irregularly spelled words. In contrast, the pseudoword spelling test was designed to assess participants’ proficiency in applying PGC rules in the absence of established lexical representations, a critical aspect of encoding unfamiliar words. By employing pseudowords that require spellers to rely on phonological processing, we aimed to shed light on how the sub-lexical route supports the development of orthographic skills when encountering novel orthographic forms.
In the L2 real-world spelling task, we measured German L1 speakers’ ability to handle both familiar (non-novel) and unfamiliar (novel) phonological and orthographic properties of English. The assessment tasks were divided into four categories. The first category targeted the spelling of eight words containing four phonemes that are novel for German L1 speakers (/æ/, /θ/, /ð/ and /ɒ/). The second category focused on evaluating the spelling of four words with orthographic features that are unfamiliar in German orthography, such as the silent <e> (e.g., ‘like’) and /ʃ/ spelled <sh>. The third category covered seven high-frequency words, including words with regular (e.g., ‘three’) and irregular PGC (e.g., ‘does’). The fourth category consisted of four words with two non-novel phonemes, /e/ and initial /h/. Participants were presented with 23 British English words, each articulated three times, twice in isolation and once within a sentence. Learners then transcribed these words onto paper.
For the L2 pseudoword spelling task, we used the same words from categories I, II and IV of the real word spelling tasks to create pseudowords (e.g., ‘shof’ for ‘shop’ or ‘thip’ for ‘this’). We carefully restricted the number of orthographically plausible responses to ensure validity of the findings. Participants listened to each of the 16 pseudowords, pronounced twice in British English, then wrote down what they heard.
Turning to reading measures, L2 reading comprehension was assessed using two different tests. Test A (Halatschev et al., 2018) is a standardised test specifically designed for 4th-grade ELLs in Germany. This test involved silently reading a passage and answering ten multiple-choice questions. Test B, based on Otto, the Little Spider (van Genechten, 2015), consisted of 19 questions of mixed format, including multiple-choice, true/false, picture sequencing and sentence completion. Both assessments were administered in a paper-and-pencil format.
L2 oral reading rate and accuracy were assessed by having participants read aloud from the children’s book Pirate Pat (Mackinnon et al., 2010). The text was presented in plain format, devoid of any visual cues from the original publication, to ensure that the assessment focused solely on reading ability. Each participant read the text individually, and the performance was video-recorded for later analysis. The read-aloud test had no time constraints, allowing participants to read at their natural pace.
There were four cognitive measures pertaining to working memory, phonological short-term memory, non-verbal intelligence and phonological awareness. Children’s working memory was evaluated with the Wechsler Intelligence Scale for Children–4th Edition’s Digit Span Backwards (DSB) and Letter-Number Sequencing (LNS) subtests, with DSB testing reversed digit recall and LNS testing letter and number arrangement and recall.
For phonological short-term memory, the measurement of verbal immediate memory was conducted using the Digit Span Forward (DSF) sub-test from the Wechsler Intelligence Scale for Children–4th Edition (Petermann & Petermann, 2011), where learners repeated a series of numbers in the same order as they were presented on a sound recording.
Participants’ non-verbal reasoning abilities were evaluated by using a sub-test from the German school readiness test (Esser et al., 2008), in which incomplete matrices had to be completed by selecting the correct element from a range of five to eight alternatives.
Finally, the Test of Basic Competencies for Reading and Spelling Skills (Stock et al., 2017) was employed to evaluate learners’ phonological awareness at a phoneme level. Three sub-tests were chosen: Non-word segmentation, vowel substitution and vowel length.
We also employed two L2 linguistic measures. Participants’ English receptive grammar was assessed through the Early Language and Intercultural Acquisition Studies Grammar Test II (Kersten et al., 2012), a comprehensive evaluation of 12 grammatical phenomena, such as possessive pronouns, negation or passive. A recent study by Koch et al. (2021) confirmed the test’s validity using a Rasch model and demonstrated a strong model fit in a bootstrap goodness-of-fit test. Second, L2 English receptive vocabulary was assessed through the norm-referenced British Picture Vocabulary Scale III (Dunn et al., 2009). The test was administered in a group setting using a recorded format. Participants each had their own paper-based test booklet, which presented four images per test item across sets A to J. Children were instructed to listen to each word played from the recording and then to mark the corresponding image in their booklet. In a study with 388 primary school children (Ponto, 2024), the internal consistency of the dataset for the British Picture Vocabulary Scale III was found to be high (α = 0.94).
3.1.5. Spelling error analysis
Spelling errors were categorised using a modified version of the Phonological, Orthographic and Morphological Assessment of Spelling (POMAS) framework (Bahr et al., 2012), which draws on the Triple Word Form Theory as a linguistically-informed, qualitative scoring system (e.g., Berninger et al., 2010). The POMAS coding scheme was adapted to focus exclusively on phonological and orthographic errors. Morphological errors were not included in the analysis as the words in the study sample did not involve the process of morphological word formation. As the learners in our study were emergent L2 spellers, we adapted the POMAS spelling error analysis to code the errors into clear categories (phonological or orthographic) depending on whether ELLs violated primarily phonological (affecting the phonological plausibility of a word) or orthographic (phonologically plausible but lack of orthographic conventions) rules and regulations. Within our sample, phonological errors were defined as those influencing the phonetic representation of words. Examples included substitutions of similar-sounding phonemes, such as ‘wis’ for ‘with’ (<s> for /ð/), ‘fin’ for ‘thin’ (<f> for /θ/), ‘sauf’ for ‘south’ (<f> for /θ/), ‘swie’ for ‘three’ (<s> and <w> for /θ/ and /r/), or ‘dis’ for ‘this’ (<d> for /ð/). Orthographic errors in our sample typically involved the incorrect application of spelling patterns to represent sounds. For instance, learners might spell ‘does’ as ‘das’, ‘shop’ as ‘schop’, ‘want’ as ‘onet’, ‘had’ as ‘hat’ or ‘around’ as ‘eround’. These mistakes indicate that the children may not have fully acquired the complex spelling patterns of English or understood when to apply them. Next, we classified the errors in the sample on a more fine-grained level, using a more detailed classification of errors according to specific linguistic features derived from general British English. Specifically, we were looking for interference from L1 German for novel phonological (/æ/, /θ/, /ð/, /ɒ/) and orthographic (silent <e> and /ʃ/ spelled as <sh>) patterns. These errors are representative of language-specific differences between German and English. The sample was scored by one of the authors and rescored by two trained research group students. The inter-rater reliability, as analysed by Cohen’s Kappa coefficient, was 0.86 (almost perfect agreement).
3.2. Results
To ensure that the participating groups were cognitively comparable at the start of grade 3 (henceforth, T1), we conducted one-way ANOVAs for phonological awareness, short-term memory, working memory and non-verbal reasoning. The analysis revealed no significant differences between the groups, indicating baseline comparability. Detailed results can be found in Table 2. In our study, we considered p-values less than .05 to indicate statistical significance.
Table 2
Descriptive statistics for all three groups (phonics v whole word v combined) and results for the ANOVAs.
VARIABLES | PHONICS | WHOLE WORD | COMBINED | F(2, 72) | p-VALUE | |||
---|---|---|---|---|---|---|---|---|
M | SD | M | SD | M | SD | |||
T1 PA | 21.35 | 4.16 | 19.13 | 3.55 | 20.08 | 3.64 | 2.10 | .129 |
T1 PSTM | 6.54 | 1.21 | 7.04 | 1.22 | 6.65 | 1.50 | 0.97 | .385 |
T1 WM | 6.19 | 1.41 | 6.22 | 1.28 | 5.96 | 1.00 | 0.33 | .722 |
T1 NVI | 27.31 | 5.47 | 26.48 | 4.10 | 28.04 | 3.30 | 0.77 | .467 |
Note. We report raw scores for all measures. T1 PA = T1 phonological awareness, T1 PSTM = T1 phonological short-term memory, T1 WM = T1 working memory, T1 NVI = T1 non-verbal intelligence.
3.2.1. Group differences
This study aimed to compare L2 spelling, reading and linguistic abilities in young L1 German ELLs following two years of phonics, whole word or combined instruction. We assessed differences in these abilities at T2 (end of grade 4) using a univariate analysis of variance. The results, including the total number of phonological and orthographic errors, are presented in Table 3. The instructional method was treated as a between-subjects factor. At T2, significant differences were observed in L2 reading rate (p = .048) and phonological errors in spelling (p = .043) across the groups. No other significant differences were noted. These findings indicate that combined instruction seemed to enhance children’s L2 reading rate and that children in the phonics group committed the fewest phonological errors in spelling.
Table 3
Descriptive statistics for all three groups (phonics v whole word v combined) and results for the ANOVA.
VARIABLES | PHONICS (n = 26) | WHOLE WORD (n = 23) | COMBINED (n = 26) | F(2, 72) | p-VALUE | |||
---|---|---|---|---|---|---|---|---|
M | SD | M | SD | M | SD | |||
L2 RW Spelling | 11.85 | 4.99 | 9.09 | 3.00 | 10.85 | 4.51 | 2.56 | 0.085 |
L2 PW Spelling | 5.08 | 1.94 | 4.83 | 1.59 | 4.19 | 2.15 | 1.46 | 0.239 |
L2 Orth. Errors | 2.69 | 1.41 | 2.83 | 1.19 | 2.77 | 1.86 | 0.05 | 0.953 |
L2 Phon. Errors | 7.88 | 4.57 | 11.00 | 3.07 | 9.35 | 4.75 | 3.29 | .043* |
L2 Reading C. | 19.19 | 5.55 | 17.78 | 4.47 | 17.54 | 5.52 | 0.75 | 0.477 |
L2 Reading R. | 71.31 | 18.73 | 68.09 | 10.74 | 79.77 | 19.3 | 3.17 | .048* |
L2 Reading Acc. | 59.69 | 22.69 | 56.22 | 12.57 | 58.00 | 20.38 | 0.20 | 0.82 |
Note. We report raw scores for all measures. L2 RW Spelling = L2 Real Word Spelling, L2 PW Spelling = L2 Pseudoword Spelling, L2 Orth. Errors = L2 Orthographic Errors, L2 Phon. Errors = L2 Phonological Errors, L2 Reading C. = L2 Reading Comprehension, L2 Reading R. = L2 Reading Rate, L2 Reading Acc. = L2 Reading Accuracy.
To determine which groups differed significantly for the variables with significant overall tests (i.e., L2 reading rate and L2 phonological errors), we conducted Bonferroni-corrected post-hoc tests. The p-values reported for these post-hoc tests were adjusted in accordance with the Bonferroni correction method as implemented in SPSS, where the observed empirical p-value is multiplied by the number of tests conducted. For L2 reading rate, the combined group (M = 79.77 words per minute) showed a higher mean reading rate compared to the whole word group (M = 68.09 words per minute), with this difference approaching significance (p = .055). Regarding L2 phonological errors, the whole word group (M = 11.00, SD = 3.08) had a significantly higher mean number of errors than the phonics group (M = 7.88, SD = 4.57), with a p-value of .037.
We further examined the development of children’s receptive L2 grammar and L2 vocabulary knowledge across the three instructional groups. For both L2 grammar and L2 vocabulary measures, which were available at the beginning of grade 3 (pre-intervention) and at the end of grade 4 (post-intervention), we conducted a two-factorial ANOVA with measurement time as a within-subject factor and instructional method (phonics, whole word and combined) as a between-subject factor. This analysis allowed us to assess the interaction effects between time and instructional method on the learners’ language development. The results are presented in Table 4.
Table 4
Group means and standard deviations for L2 grammar and vocabulary for both measurement points.
GROUPS | L2 GRAMMAR | L2 VOCABULARY | ||||||
---|---|---|---|---|---|---|---|---|
T1 | T2 | T1 | T2 | |||||
M | SD | M | SD | M | SD | M | SD | |
Phonics | 17.62 | 2.30 | 19.85 | 2.71 | 50.85 | 7.71 | 74.73 | 12.91 |
Whole word | 16.13 | 2.62 | 20.43 | 2.98 | 53.09 | 6.44 | 63.48 | 8.55 |
Combined | 18.23 | 4.53 | 21.15 | 4.49 | 55.46 | 7.09 | 63.04 | 8.22 |
For L2 grammar, significant improvement over time was observed (F(1, 72) = 67.95, p < .001), with no significant differences between the instructional conditions (F(2, 72) = 1.45, p = .242) or their interaction with time (F(2, 72) = 2.46, p = .093). These findings suggest that while all groups improved from the beginning to the end of the study, the rate of improvement was consistent across different instructional methods.
For L2 vocabulary, we found significant main effects for time (F(1, 72) = 190.78, p < .001) and a significant interaction (F(2, 72) = 25.65, p < .001) but no main effect for the condition (F(2, 72) = 2.43, p = .096). To specify the meaning of the significant interaction, we calculated conditional main effects. When comparing the average values between T1 and T2 for all three groups, we found significant conditional main effects for all three groups (all p < .001). For T1, we found no significant group differences for L2 vocabulary (F(2, 72) = 2.73, p = .072). For T2, however, we found a significant conditional main effect (F(2, 72) = 10.82, p < .001). Bonferroni-corrected post hoc tests indicated that the phonics group showed significantly higher values compared to both other groups (both p < .001) while the whole word and combined group did not differ at all (p > .05) regarding their L2 vocabulary. Whereas all three groups did not differ regarding their L2 vocabulary knowledge at the onset of the study, their receptive lexicon size increased with varying outcomes, resulting in a significantly higher average value in the phonics group compared to both other groups.
3.2.2. Error analysis: Phonological and orthographic errors
Further, we investigated the extent to which young German ELLs used phonological and orthographic knowledge when spelling in English and the impact of a two-year instructional programme on this process.
Out of the 75 children who spelled a total of 1725 real words, 911 errors were found, accounting for 52.81% of the words produced. Errors were found in both POMAS categories we used for error analysis, phonological and orthographic. Specifically, 701 errors were phonological in nature, representing 76.95% of all errors, while 210 errors were due to orthographic misspellings, making up 23.05% of the total. Analysing the distribution of errors across groups, it is evident that the prevalence of error types varied (see Figure 1). We found significant group differences between the phonics and whole word group, F(2, 72) = 3.29, p = .043. The fewest phonological errors occurred in the phonics group (M = 7.88, SD = 4.57), followed by the combined (M = 9.35, SD = 4.75) and whole word group (M = 11.00, SD = 3.08). With regard to the distribution of orthographic errors, there were no significant differences between groups. Again, fewest errors were made in the phonics group (M = 2.69, SD = 1.41) followed by the combined (M = 2.77, SD = 1.86) and whole word group (M = 2.83, SD = 1.19).
Number of phonological and orthographic errors across groups.
3.2.3. L1 Language-specific influence on L2 English
Our last research question aimed to investigate whether phonemic and orthographic differences in L1 German affect L2 English spelling accuracy and whether accuracy levels differ between groups after having received phonics, whole word or combined instruction for two years. Means, standard deviations and a one-way ANOVA with the target novel phoneme or orthographic pattern were calculated for two error categories across all three groups (Table 5). Error Category I consisted of four phonologically novel phonemes, /æ/, /θ/, /ð/ and /ɒ/, which do not exist in L1 German. Error Category II comprised words that are orthographically novel for L1 German speakers containing the silent <e> and /ʃ/ spelled as <sh>.
Table 5
Categorisation of error types, error rate, means, standard deviations and one-way ANOVA results across groups.
FOCUS | REAL WORDS | ERROR RATE % (N = 75) | PHONICS (n = 26) | WHOLE WORD (n = 23) | COMBINED (n = 26) | F(2,72) | p-VALUE | |||
---|---|---|---|---|---|---|---|---|---|---|
M | SD | M | SD | M | SD | |||||
I.1 /æ/ | hand had | 42.00% | 0.88 | 0.59 | 0.83 | 0.65 | 0.81 | 0.56 | 0.12 | .891 |
I.2 /θ/ | thin south | 92.67% | 1.77 | 0.43 | 1.96 | 0.21 | 1.85 | 0.37 | 1.73 | .185 |
I.3 /ð/ | this with | 67.33% | 1.00 | 0.85 | 1.70 | 0.56 | 1.38 | 0.8 | 5.23 | .008 |
I.4 /ɒ/ | not frog | 18.00% | 0.35 | 0.63 | 0.35 | 0.57 | 0.38 | 0.50 | 0.04 | .963 |
II.1 silent <e> | time like | 27.33% | 0.35 | 0.63 | 0.78 | 0.74 | 0.54 | 0.86 | 2.08 | .132 |
II.2 /ʃ/ spelled <sh> | shop shirt | 46.67% | 0.73 | 0.87 | 1.09 | 0.67 | 1.00 | 0.75 | 1.45 | .242 |
Errors were found in both categories stemming from phonological and orthographic differences between L1 German and L2 English, although the percentage of errors in each category differed. Analysing errors in Category I, learners had an average error percentage of 42% (M = 0.84, SD = 0.594) in sub-category I.1 (/æ/). Sub-category I.2 (/θ/) was where most errors were found across groups, with 92.67% of words spelled inaccurately (M = 1.85, SD = 0.356). Learners had an average error percentage of 67.33% (M = 1.35, SD = 0.797) in sub-category I.3 (/ð/), and with 18.00% (M = 0.36, SD = 0.561), fewest errors were made in sub-category I.4 (/ɒ/). Target words in Category II contained orthographically novel patterns, with sub-category II.2 (/ʃ/, 46.67% misspellings, M = 0.93, SD = 0.777) being more difficult to acquire for ELLs in comparison to sub-category II.1 (silent <e>, 27.33% misspellings, M = 0.55, SD = 0.759). Although learners in the phonics group outperformed their peers in the whole word and combined group on all six measures, only group differences for Category I.3 (/ð/) were statistically significant, F(2,72) = 5.23, p = .008. Learners in the phonics group spelled the words ‘this’ and ‘with’ more accurately compared to children in the two other groups, indicating less L1 influence from L1 German phonology. This suggests better mastery of L2 PGC.
4. Discussion
4.1. Instructional effects on literacy and linguistic attainment
Contrary to our hypothesis, no significant differences were found in learners’ spelling accuracy for real words and pseudowords; however, combined instruction led to an improved L2 reading rate. By integrating phonics with whole word training, ELLs may have developed two complementary approaches to word recognition (Dual Route Cascaded Model; Coltheart et al., 2001): The lexical route for direct word recognition and the sub-lexical route for phonological decoding. This finding could suggest that recognition of words by sight (whole word) is facilitated by knowing the details of PGC and letter patterns (phonics). This dual approach may lead to more efficient word recognition, allowing learners to flexibly choose the most effective strategy depending on the word context (Joshi et al., 2008). Furthermore, the combined instruction method could foster a more robust mental lexicon, enabling quicker retrieval of word meanings and pronunciations, and thereby accelerating reading rates.
Another noteworthy finding is that L2 phonics instruction may improve L2 receptive vocabulary acquisition, a tentative conclusion supported by a growing number of recent studies (e.g., Huo & Wang, 2017; Li & Woore, 2021). Phonics instruction emphasises mastering PGC, which appears to aid learners in developing robust mental word representations, tightly integrating written and spoken forms. Our findings resonate with Ehri’s Orthographic Mapping Theory (Ehri 1995, 2005, 2014), which provides a theoretical framework that explains how these integrations contribute to reading proficiency, word learning and spelling accuracy. According to Ehri’s theory, learners bond spellings to pronunciation and meaning through GPC, which leads to better vocabulary knowledge in memory. These stabilised connections (or “sight word” representations) enable fluent, automatic word recognition, allowing learners to channel cognitive resources into acquiring new vocabulary. Similarly, our findings parallel the tenets of Share’s Self-Teaching Hypothesis, in which phonological decoding acts as a self-teaching mechanism that enables the acquisition of new orthographic knowledge through repeated exposure to printed words (Share, 2004). The phonics group, due to their systematic and explicit phonics training, potentially developed more robust connections between orthography, phonology and semantics of words.
4.2. Spelling error types and L1 German influence
An interesting aspect of our study centred on the predominance of error types—phonological or orthographic—observed among the three instructional groups. Collectively, for all participants of our study, phonological errors comprised three-quarters of the total errors, while orthographic errors constituted one quarter. This finding indicates that even while learners at this developmental stage may not fully differentiate between the two spelling processes as outlined by the Dual Route Theory (e.g., Barry, 1994), it is possible to tailor instruction to strengthen the sub-lexical route and reduce dependence on less accurate phonological guesses. The predominance of phonological errors highlights a tendency to rely on L1 phonological knowledge during the early language development of L2 spellers before gradually incorporating more orthographic strategies with more language proficiency and print exposure (e.g., Harrison, 2021; Martin et al., 2020; Russak & Kahn-Horwitz, 2015; Zaretsky, 2020). Such an early focus on phonology typically gives way to a gradual increase in orthographic errors as part of the normal developmental spelling trajectory for L2 learners.
Another reason for the reliance on L1 phonological strategies among learners in our study may stem from their orthographic background. Research suggests that learners from shallow, alphabetic L1 backgrounds tend to apply phonological skills when spelling in L2 due to the transparent PGC in their L1 (Bhide, 2015). In contrast, learners from languages with deeper, opaque and non-alphabetic writing systems often favour visual and orthographic strategies to cope with the more complex mappings and memorisation that their native languages require (e.g., Ziegler & Goswami, 2005). Given that the German orthography is comparably shallow with predictable PGC (Landerl, 2017), the German-speaking learners in our study were presumably more likely to employ their L1 phonological knowledge during the spelling process.
Although phonological errors were the prevalent error type in all three groups, fewest phonological errors and the least L1 interference occurred in the phonics group. Our results resonate with the principles of Ehri’s Orthographic Mapping Theory (1995, 2005, 2014), which emphasises the role of phonological processes in forming connections between spoken and written language representations. The reduced frequency of phonological errors observed in the phonics group aligns with Ehri’s suggestion that systematic phonics instruction strengthens learners’ ability to form precise GPC, thereby facilitating more accurate orthographic mapping and spelling.
In all three groups, a large quantity of errors can be attributed to the learners’ difficulty perceiving novel phonological contrasts and interlingual influences based on typological differences between L1 German and L2 English. Our study revealed that children in the phonics group showed a general trend of better spelling proficiency for English-specific phonemes and orthographic patterns that are not represented in the German language. This was particularly observable in the better ability to spell the /ɒ/ in words like ‘frog’ and ‘not’, the voiceless dental fricative /θ/ in words like ‘thin’ and ‘south’, as well as complex orthographic patterns like the silent <e> construction in ‘time’ and ‘like’, and /ʃ/ spelled as <sh> in ‘shop’ and ‘shirt’. Comparative analysis of error rates across different phonemes revealed a notably high incidence of spelling difficulties with the /θ/ sound, as evidenced by a 92.67% error rate across groups. This contrasts with our findings on the silent <e>, with a markedly lower error rate of 27.33%. The low frequency of errors related to the silent <e> may indicate that the ELLs in our study are developing an understanding of this particular English orthographic pattern. However, the high error rate associated with the /θ/ sound demonstrates the difficulty learners continue to face with certain phonological elements of English. Both findings underscore the variability in ELLs’ acquisition of English spelling rules and highlight specific areas that may require targeted instructional support (Cook, 1999).
5. Limitations
Our study provides valuable insights into German speakers’ phonemic and orthographic challenges when spelling in English. Nonetheless, there are notable limitations. Due to the unforeseen interruptions caused by the Covid-19 pandemic, we were unable to conduct planned literacy assessments at the end of grade 3, limiting our analysis to a single post-intervention measure at the end of grade 4. This modification in our study design prevented the possibility of conducting a comprehensive longitudinal analysis of the development of spelling skills. We also lacked direct measures of L1 literacy, which could have informed our understanding of the relationship between L1 and L2 spelling abilities. Furthermore, our error analysis was primarily quantitative and we have limited insight into the learners’ cognitive strategies, their decision-making processes and the potential rules they may have applied when spelling in L2 English. An additional limitation concerns the assessment of L2 vocabulary growth. Although a two-factor ANOVA was conducted to ascertain initial vocabulary skill differences and their change over time, our methodology did not include a direct control for the potentially confounding influence of vocabulary improvement on children’s misspellings. This omission restricts our capacity to definitively attribute gains in spelling performance to the instructional interventions alone. Finally, the lack of delayed post-tests within our experimental framework limits the scope of our findings regarding the sustained impact of each instructional approach on learners’ spelling skills.
6. Conclusion and implications for the classroom
Comprehension of the linguistic underpinnings of particular spelling errors can inform models of spelling development so that effective spelling interventions can be designed for young ELLs. Our results suggest that the prevalent EFL literacy instruction in German primary schools, characterised by an emphasis on holistic sight word instruction, may be insufficient for young L1 German learners to acquire the necessary skills for spelling English words. This aligns with findings from L1 spelling research, which emphasise the linguistically guided nature of learning to spell, particularly the central role of phonology during the initial stages of spelling development (e.g., Martin et al., 2020). Spelling acquisition extends beyond mere visual memorisation to include linguistic patterns and PGC. Consequently, whole word approaches may be less effective due to their reliance on memorisation without phonological foundation and the constraints of visual memory capacity (Joshi et al., 2008), limiting learners’ ability to spell unfamiliar words independently (Share, 2004).
We suggest that the teaching of spelling should be multi-dimensional and target the strategies with which L2 learners struggle the most, such as knowledge of English phonology and key PGC, as most errors were made in this area. Given the reciprocal nature of spelling and reading, with robust reading skills serving as a predictor for spelling proficiency (e.g., Mlakar et al., in press), a comprehensive, explicit teaching approach that encompasses both seems to be beneficial for promoting L2 literacy skills. Some PGC may require less emphasis due to the overlap between English and German, but others, such as novel phonemes or orthographic patterns and spelling rules which do not exist in German, need explicit teaching to prevent the automatic triggering of incorrect or invented L1-based spellings. Further, the frequent phonological errors in our study might point to a need for more explicit pronunciation practice or phonics instruction. Our research supports Vivian Cook’s call for more explicit and systematic instruction in L2 spelling (Cook, 1997, 1999). Effective instruction must incorporate critical elements of word-specific knowledge, such as phonological and orthographic information. This careful integration is essential to foster the development of high-quality lexical representations, which are the foundation of proficient spelling.