FAQs: Benchmark Assessment System (BAS)

What is a benchmark assessment?

A benchmark assessment system is a series of texts that can be used to identify a student's current reading level and progress along a gradient of text levels over time. The word "benchmark" means a standard against which to measure something.

Why is benchmark assessment a valuable use of time?

You can:

  • Determine your students' independent and instructional reading levels.
  • Determine reading placement levels and group students for reading instruction.
  • Select texts that will be productive for student's instruction.
  • Assess the outcomes of teaching.
  • Assess a new student's reading level for independent reading and instruction.
  • Identify students who need intervention.
  • Document student progress across a school year and across grade levels.
  • Inform parent conferences.

What happens if a student scores well on the Where-to-Start Word Test but is unable to read the corresponding leveled Benchmark book?

It may be that your students have had an excellent word study program and are very good at word reading. This is a real strength but does not necessarily mean that they can perform equally well when reading continuous texts. There may be other instructional factors that make it difficult for them to process words "on the run" while reading. If you think this is the case, then adjust the chart on page 214 (Benchmark 1) or page 220 (Benchmark 2) and start two levels lower than indicated by the word test. If the students find the reading very easy, you can skip a level. This will save time in administering the assessment. Remember also, that if a student is obviously finding a level too difficult and you can see that he is making many errors at a rate that will not meet the criterion, you can stop the reading and move quickly to an easier level.

Why is there a higher criterion for accuracy at levels L-Z?

Click here for the authors' response (Word document).
Click here for more FAQs from the Fountas & Pinnell Benchmark Assessment Guide.

What is the source of the words in the Benchmark word lists?

The Benchmark word lists were compiled to include the words that appear most frequently (in our survey of leveled texts) in the books that children read from earlier levels to about end of grade four. In addition all word lists were checked with several different lists, including both Spache and Dolch. They are not identical to either list but there is a great deal of consistency across them. Finally, the lists were checked with teachers.

The words that appear on the word lists are "tier 1 words," meaning that they are frequently used in oral language and in general literature. The lists do not include "tier 2" and "tier 3" words—words not in common use or technical words related to content areas. These words are those that appear most frequently.

How does the Fountas & Pinnell Benchmark Assessment System address RTI compliance?

With the Fountas and Pinnell Benchmark Assessment System, you can monitor reading level three times each year. This assessment will yield level (with equivalent grade levels), accuracy, fluency, and detailed information and scores on comprehension. This system has been extensively field tested. You can have students complete a writing prompt to further assess comprehension. You can use optional assessments to monitor progress in phonemic awareness, phonics, letter learning, and high frequency word knowledge. You can establish expectations in each of these areas based on your own district's requirements. A grid is currently in development to establish criteria for each grade level, beginning, middle, and end.

What is the significance of the self-correction ratio? Why do we switch from reporting ratio to simply reporting the number of self-corrections when assessing reading at levels L through Z.

Self-correction fuels the development of the early reading process. Each time a child makes a substitution it represents a decision made by using information. Stopping, checking the word, and either making another attempt or arriving at the correct word—all are decision points that involve hypotheses based on the reader’s attention to meaning, language structure, and visual patterns. We can not know exactly what is going on in the reader’s head, but errors and subsequent self-corrections give us an idea. For example, if a child reads, "I am going to be at home on Sunday," substituting the word here for home, we would infer that he has noticed the first letter. A self-correction could mean further searching the visual features of the word or noticing that the word here does not sound right in the sentence and isn’t meaningful. A good hypothesis would be that the child has now used meaning and language structure to self-correct.

As a reader moves through the text, what has come before provides a background of information that influences monitoring and self-correction. Hypotheses are easier to make because meaning and text structure support the thinking. That is why readers often gain momentum and pick up fluency towards the end of the text. A wide range of word solving strategies—including the flexible use of meaning and language structure—allow the decision making process to take place more rapidly.

A beginning reader’s self-corrections are overt; that’s why we can tell so much from looking at reading records. But over the next year and a half, the process will change. The self-correction begins to take place before the reader says the word aloud. Or, the reader may note the self-correction in passing but not bother to correct out loud. Too much self-correction is inefficient. Proficient readers generally self-correct only when necessary to read meaningfully.

This covert, self-correction adds to the reader’s ability to produce the language in phrases. In the highly proficient reader around the middle of grade two, we would not expect to hear a great deal of overt self-correction if the reading is taking place with ease.

There is not a linear relationship between self-correction ratio and progress in reading. As children progress, observable self-correction decreases and may become nonexistent. We would not desire a 1:1 or 1:2 SC ratio in highly accurate reading. Just listen to a reader who is making quite a few errors and self-correcting almost every one of them; the reading will not sound good even though the accuracy would be almost 100%. Instead, we want high accuracy and only necessary self-correction.

We want the kind of reading that ignores small errors and/or mentally corrects responses before saying them aloud. We assume that proficient readers are self-regulating both their oral and silent reading; but we will not be able to observe it. That is why we switch from reporting ratio to simply reporting the number of self-corrections when assessing reading at levels L through Z. If we find very high accuracy and also many self-corrections, we would notice it and work with the reader to get smoother processing. To read more about the important role of self-correction see Change Over Time in Literacy Learning by Marie Clay.

Why are the little books for Benchmark 2 shorter than the ones for Benchmark 1?

The books for Levels L-N in Benchmark 1 while longer (16 pp), contain illustrations that give young readers picture support. The books from L-Z in Benchmark 2 are shorter (4 pp), and contain almost no illustrations with the exception of nonfiction text features like diagrams and maps to support the older reader. Length is only one factor in text difficulty and it is not a significant one unless you are talking about a large difference (50 to 100) in number of pages (which would inevitably place a greater burden on memory). A short text can be very hard, with difficult vocabulary, complex sentences, and complex ideas. A long text can be easy, with familiar concepts and vocabulary and simple sentences.

Another consideration was the amount of time required to administer the assessment. The length of selections in the the Benchmark System 2, provides an adequate sample for assessing an older child’s oral and silent reading, vocabulary, capacity to solve multisyllable words, and ability to interpret more sophisticated content.

How does the Fountas & Pinnell Benchmark Assessment System compare to DRA or Rigby PM Ultra?

How do Lexile Levels correlate to the F&P Text Level Gradient™?

There may be a statistical correlation between Lexile levels and F & P levels. For example, if you run measures on thousands of books and over many levels, there would be a correlation. We have not performed these analyses ourselves. The lower F & P levels, in general, would have lower Lexile scores. The higher F & P levels generally would have higher scores. But this kind of correlation is not the same as a precise matching of levels, for example, a Lexile range of numbers corresponds to a specific A to Z level in a reliable way. The two systems are based on some of the same text factors but not all. Metametrics uses a mathematical formula, which they can explain. The F & P levels are based on the ten text factors named in several of our books. A group of raters reach reliability after independent analysis. We can not say with high prediction that a given book with a certain Lexile score will fall into a category on the F & P gradient. Every time we have looked at Lexile levels for texts that seem highly reliable on our scale, we have found a number of "outliers."

Will you be producing more books at each level so that teaches have greater choice?

We would not want to use more than two books (one fiction and one nonfiction) at each level of the gradient of text. Greater variety would greatly lower the reliability of the assessment. The fiction and nonfiction texts that now exist have been field tested to show their equivalence. Two at each level is enough. More discussion follows.

If a child reads a text and meets the criteria, you would go up to the next level. If the reader does not meet the criteria, you would go down. We would not recommend having the child read both books unless you have a special reason. By the next testing period, you should be able to start assessment at a higher level. If not, the alternative text is available.

Once in a while, you may feel that the child is reading well at the level because he has extensive background knowledge of the topic. Background knowledge is an important factor in reading level and comprehension. But it does not guarantee effective processing.

All nonfiction texts assume that the reader has some background knowledge of the topic. If there is none, the reader can not process the text effectively. So, inevitably, background knowledge is a part of reading assessment, and it becomes more important as you move up the gradient. It can not be separated out. One of the reasons for close observation and for the comprehension conversation is to gain insights into processing. We are all familiar with tests of "reading comprehension" in which almost all of the multiple choice questions can be answered accurately by the reader who has extensive background knowledge. The comprehension conversation allows you to assess whether the reader grasped the nuances of the text as well as what he thinks about the topic.

It is true that there may be topics that for some individual readers are harder or easier than they would ordinarily be. This can happen because of varied background experiences. For example, you may think that the reader is processing a nonfiction text well because of background knowledge and that the assessment does not accurately reflect reading ability. In that case, you can try the fiction text. If that reader processes the fiction text well, then move up the gradient. If it is too difficult, move down. You have assessed the reader thoroughly at the level. If the next level is quite difficult, then, your instructional level would be the highest level on which the reader meets the criteria. In the Benchmark Guide, you will find a number of scenarios, including what to do when you get no instructional level (goes from too easy to too hard). Then, you have to look beyond the numbers.

In DRA2, at levels 24 and above, student's comprehension is assessed on a written response. Would a student's reading levels be comparable if assessed using the Fountas & Pinnell Benchmark Assessment Comprehension Conversation? Is there any research on assessment of comprehension?

These are two very different types of assessments, so it will be difficult to have uniformity when the foundational premises are not the same.

The Fountas & Pinnell assessment is designed to gain evidence of student’s thinking through their talking about their understandings and through additional probes. The teacher is able to rephrase questions to be sure the reader understands what is being asked. One issue includes the effects of writing and the other is the use of retelling as a measure.

Fountas & Pinnell do not support using writing about reading as the single measure of comprehension so the Benchmark Assessment System has included writing about reading as an additional optional piece of evidence. Further, given the research on retellings, Fountas & Pinnell have chosen not to use that form of response. Some students do have understanding but may not be able to organize it and write it. There is a great deal of research documenting the questions related to using retellings as a comprehension measure.

The Benchmark Assessment measure of comprehension is a rigorous one as the teacher gathers evidence of student’s understanding within, beyond and about the text. One cannot assure that the act of writing about one’s thinking represents the student’s full understanding.

My school has just purchased the Fountas & Pinnell Benchmark Assessment System 2. I am the school’s literacy teacher for 4th and 5th grades. I am confused because I see various levels listed for 4th grade in the Assessment Guide so I’m not sure which levels to choose for beginning, middle, and end of year goals.

We provide a range of levels for the beginning, middle, and end of the year. Level P or Q is a good expectation for beginning of the year, Level R for middle and Level S or T for the end of the year. We encourage you to look at these level suggestions and define them precisely for your school. If you need to choose one instructional level for the beginning of the year we would suggest Level P and for the end of the year, Level R.

If a student left third grade reading at instructional level O, the fourth grade teacher will retest the student at Level O at the beginning of the year. Students may lose information throughout the summer. In some cases, students end up going down. Is this an appropriate starting point at the beginning of a grade. On the other hand, is it the case that once a student has "passed" a level, they should be tested on the next higher level?

It is reasonable to test the student at the end-of-year level to see if competency has been maintained. That is made possible in the Benchmark System because there are alternative texts at each level. So, the student would not be reading the same material. Alternatively, the next level up could be attempted first. The teacher will know if it is too difficult and can then move down. The reliability is established when the student reads the highest level possible with the accuracy and comprehension criteria. If the district wants to make starting on last year's ending level a standard procedure, then they would have a quick assessment of whether the majority of students (or which students) have declined over the summer. This could be the impetus for planning some summer reading/writing programs.

What is the correlation between reading of high-frequency word lists in isolation and the student’s score on the Benchmark Assessment System?

There is a high relationship between the ability to read high frequency words in isolation and the ability to read with accuracy when processing continuous text. At this time we do not have mathematical statistics on the correlation between the optional word tests and the Benchmark text level for the F & P System, but these two kinds of measures are always related. In interpreting the tests, however, you should keep in mind that the two tasks are actually different in quality. Reading isolated words requires attention to visual features alone and matching them up with words in oral vocabulary. We do want to know the quantity of words that readers recognize quickly and easily. Reading continuous print requires not only word recognition but the orchestration of many different kinds of information--language syntax, meaning, dialogue, graphics, etc. The best measure of reading is to observe indicators of proficiency and understanding using continuous print.

For a fourth grader who begins the year reading at level H, how much progress should be made in a year?

We have begun to work on intervention lessons for students above grade three, but we do not have a large data base for students who have made accelerated progress. In fact, we are not sure such a data base exists and it would cost millions of dollars. Most research indicates that students this far behind continue to lag, and there is great variability in the quality and kind of extra help students receive. Simply going to a remedial reading group will not make enough difference. But, our experience suggests that daily intensive teaching as an extra intervention can make a difference.

Progress always depends on the instruction a student receives. The fourth grader described here needs daily intervention of an intensive nature in addition to good classroom instruction. If individual tutoring is available, this student should receive it for as long as needed. If not, the student should participate in an intensive small group intervention lesson every day. We recommend a group size of about 3 or 4 to 1 teacher. The lessons should include daily reading of instructional level texts, writing about reading, and phonics or word work (attention to the structure of words). With this kind of regular intervention teaching, we should expect 2 years of growth within 1 year. That means that the goal for this student would be proficiently reading about level O or P. That still would not be on grade level, so the student would probably need intervention in grade 5 as well.

Remember that we are not talking about "pushing" the student to read texts that are too hard. This student needs strong teaching while engaging in effective reading behaviors on a text that is just right and offers new opportunities to learn. The intervention teacher must be a careful selector and sequencer of texts to build the student's reading abilities over time.

What is the best rate for reading?

Educators should be cautious in assessing a student's rate of reading. Words-per-minute is only one factor in fluency, and we believe that it is not even the most important factor. Proficient readers vary greatly in the speed of their reading. Rate depends on purpose for reading, content, literary quality, and genre. Excellent readers may have good reason to slow down and reflect on what they are reading. It is a natural part of reading to search back in the text to confirm memory or look for information. They pause to examine illustrations or graphics that provide more information. And, remember that it is also possible for students to read too fast. In other words, faster is not necessarily better.

We do not want readers to read in a slow, halting way; but, teachers who are concerned about students' fluency should attend not just to rate but to four other key factors:

  • Pausing—Readers reflect the meaning of the text by pausing appropriately. They are guided by the punctuation.
  • Phrasing—Readers read in meaningful phrase units to show the meaning of the text.
  • Stress—Readers emphasize some words more than others in a sentence to reflect the meaning.
  • Intonation—Readers' voices rise and fall to reflect punctuation and the meaning of the text.

To support fluency, teachers can place students in texts that they are capable of reading and then work consistently to support pausing, phrasing, stress, and intonation. Rate will ultimately be affected so that students read at a good pace—not too slow and not too fast.

Are the end of grade level benchmarks nationally normed?

The grade level benchmarks are not nationally normed. That would take a large random sample of students taken across the United States and Canada and a great deal of testing. It is just not appropriate for this kind of system.

The levels have, however, been tested in a large field study. The end-of-year expectations as defined in our system are consistent with recommended national standards from the National Center on Education and the Economy. Districts do have a choice in adjusting the expectations to meet their own standards. There are slight variations from place to place, but we have stated levels that indicate typical satisfactory progress.

In the Benchmark Assessment System, how did you determine how to provide a fiction and nonfiction text at the same reading level considering the differing comprehension demands between the two (particularly background knowledge)? If teachers record oral reading data from fiction and nonfiction reading at random, does this skew the results in terms of using Benchmark as a progress-monitoring tool?

We agree that there are very important differences between fiction and nonfiction. They require a different stance and different ways of understanding. Background knowledge is required for both, but content knowledge may figure more prominently in nonfiction and text knowledge may become more important in fiction. However, the reading process at every level must encompass all of these ways of comprehending texts. Reading fiction and nonfiction are not fundamentally different processes. Therefore, we have structured The Continuum of Literacy Learning to include characteristics of texts and behaviors and understandings to notice, teach, and support across fiction and nonfiction for every level A to Z. The understanding is that in our teaching, we need to attend to the full range of strategies needed to competently process both nonfiction and fiction (and different genres of fiction).

In every assessment measure, you will find variations in performance depending on contextual factors. In the F&P Benchmark Assessment System, we know that for some children there might be a variation in their reading of fiction and nonfiction—depending on readers' experiences and the teaching program. There are variations in individual readers when they are engaged in a complex process. We need to capture the complex process realizing that students might have a basic process that reaches to a certain level but may show some variation when the topic is more or less familiar. We have determined that the texts are equivalent, allowing for some differences in children.

We suggest alternating the genre to get a benchmark level. That means that with fiction or nonfiction the level will be right to support new learning. To administer both fiction and nonfiction texts at every level would take too much teacher time and if there is a slight discrepancy, it will not make a qualitative difference. If the student has less experience with nonfiction the student may need a bit more support at the level and vice versa. The level determined from the benchmark will be reliable to start the teaching.

We want to get away from the "tradition" that students read a higher level in fiction than nonfiction. If that is true in particular contexts, it is an artifact of teaching. A student should be able to demonstrate the wide range of strategic actions across both fiction and nonfiction at the level. If the student is doing well in fiction, for example, teachers would make good selections of nonfiction books at the level and teach hard for the kinds of understandings students need.