Discovery Tools for Science Education Content

 

 

A pioneering project to develop grade level stratification tools

for STEM education in grade levels K thru College

 

 

 

 

Interim Project Report

October 2008

 

 

 

 

 

 

Principal Investigator: David E. Wojick, Ph.D., PE

davidwojick@craigellachie.us

391 Flickertail Lane

Star Tannery, Virginia 22654

 

Co-investigators

Bernadette Monahan, MA

Diane W. Adams, MA

 

The STEM Education Center

http://www.stemed.info

 

 

 

Funded in part by

Office of Scientific and Technical Information

U.S. Department of Energy

http://www.osti.gov

 

 

 

 

Project: Discovery tools for science education content.

 

Table of Contents

 

A. Introduction: Significance of the Grade Level Stratification Problem & Opportunity

 

B. Technical Approach

 

C. Anticipated Public Benefits

 

D. Project Narrative: Results to Date and Demonstration of Technical Feasibility

 

1. Introduction: The STEM GLS challenge

2. Assessing the primitive state of grade level stratification today

3. Selected topic case study: magnetism and electricity

4. Focus on computer based GLS

5. Focus on state standards of learning (SOL): they are the basis for teaching and testing, hence the greatest user need

6. First SOL case: Virginia K-12 science

7. Step one: the initial grade level stratification -- 4th grade electricity concepts and clusters using the ST, VT, GT, DT and IT method.

8. Significant features of the 4th grade electricity stratification list

9. Step two: electricity stratification for all K-12 grades

10. First and second GLS trials: the Newton test bed

11. Second SOL case: California electricity and magnetism

12. We discover dramatic grade level differences: VA (4, 6, MS & HS) versus CA (4 & HS only)

13. First math GLS -- VA probability theory

14. College level GLS for magnetism and electricity -- a new method

 

E. Putting it all together: a 3-D spiral model of K-12 STEM education

 

F. From grade levels to learning levels

 

G. Conclusion: Our project demonstrates the feasibility of grade level stratification of STEM educational content

 

Appendix I: Basic topics for a complete K-16 science education GLS system

Appendix II: Principal Investigator and other Key Personnel

 


 

A. Introduction

Significance of the Grade Level Stratification Problem and Opportunity

 

Web-based educational content is being developed in vast numbers throughout both the scientific community and the educational world. Most of this material is being developed on an ad hoc basis and there is presently no easy or systematic way for teachers, parents or students to find it. Specifically, there is no search engine that finds educational content by grade level or degree of difficulty. This is the problem we are solving. We refer to this problem as that of "grade level stratification" or GLS.

 

This is a global problem of huge proportion. Simply put there is no reasonable way for teachers, parents or students to search on the Web for content that is suited for a specific grade level or degree of difficulty. There are thousands of small collections scattered through the Web. The only way to search them is one by one, which is prohibitively difficult. It is a cottage industry with 100,000 cottages.

 

The Web only works where search works. At the present time Web searching does not work for science education. Our effort is designed to develop the tools needed to solve this problem.

 

The Federal role in science education is complex. The Federal government spends about one billion dollars per year directly on science education, but very little directly on content development. Most of this billion dollar budget is spent by the National Science Foundation, but there is also some by the Department of Education. But most of this funding goes to supporting schools and students and for other purposes than developing STEM (science, technology, engineering and mathematics) education content per se.

 

Formal development of STEM education content is generally left to the textbook publishers. However, since the advent of the World Wide Web an enormous amount of content has been developed by individuals, especially by scientists and science programs, together with science teachers and students. The Federal government spends tens of billions of dollars on scientific research and development each year and a significant amount of this funding is going into ad hoc educational content development. This is certainly true for DOE, which spends almost $10 billion per year on R&D. Every program and facility we have examined has some educational content, so do many individual projects. But because of a lack of formal programs for content development this wealth of ad hoc content lies hidden.

 

We estimate that the Web may contain a million or more web pages and documents that are suitable for K-16 STEM education purposes. A significant proportion of these are funded by Federal R&D agencies, either directly or indirectly. The technical problem is how to find this content in a workable manner?

 

We note in passing that college level STEM materials are often needed by researchers as well as by students and teachers. Researchers often explore outside of their specialty in the course of their research. When they do so they often must begin with an educational phase. Thus while this project narrative is mostly concerned with the application of grade level stratification to education per se, much of what we are doing will have extensive application to the world of scientific and technological research and development. Likewise, while we talk about GLS as a search problem is an authoring problem as well. Our tools can be used to author content by degree of difficulty.

 

It is important to note that conventional search engines like Google do not distinguish educational content from advanced scientific content using the same search terms. Exploring the potential for computer based search methods that do make this distinction has been a central focus of our project. We have conclusively demonstrated the feasibility of solving this problem using systematic grade level stratification of search terms.

 

 

B. Technical Approach

 

This project seeks to develop a new grade level stratification (GLS) system for finding and collecting STEM educational content on the Web, especially ad hoc content funded by Federal R&D agencies. It is believed that there are hundreds of thousands of such items on the Web.

 

The GLS method we developed in Stage I is simple, yet powerful. We started with the content requirements for teaching one topic (electricity and magnetism) in one state (Virginia), in grade levels K-12. Electricity and magnetism represents about 3% of the K-12 science curriculum, but it is taught in almost every grade, which made it a good pilot topic. From this content we were able to devise search term lists for the concepts taught in each grade. Using these lists we are able to estimate the grade level of any given electricity content in our benchmark database.

 

The overall objective is to produce a working grade level stratification system for K-12 & college science, for all science topics and ultimately for all STEM topics.

 

What remains to be done are three things:

 

Technical objective 1: First and foremost, replicate the GLS method and extend the electricity prototype to include the whole of K-16 science. When this is done we will have a complete GLS for K-16 science. All K-16 science content will then be potentially searchable and findable.

 

Technical objective 2: Solve the problem that different states teach the same content in different grades. Our working hypothesis is that there are intrinsic sequences in K-16 content, which define degrees of difficulty. Simply put, science education builds knowledge in a specific, systematic way. This means that appropriate content can be found without regard for the numerical grade in which it is taught in any given state. This is true for electricity and magnetism. We will determine the extent to which it is true for the rest of K-16 science.

 

Technical objective 3: refine and improve the relatively simple search algorithms developed so far, to improve ranking of search results.

 

 

C. Anticipated Public Benefits

 

If we are successful the public benefits begin with greatly increasing the use of STEM education content found on the Web. This should improve STEM education, as well as providing a greater flow of knowledge from the scientific community into education. It also increases the return on Federal investment in science. In addition, we have determined that scientists themselves make heavy use of undergraduate level educational material when they are moving into new fields. Therefore, improving access to educational content has the potential to speed up science itself.

 

At the present time there are many thousands of small collections of educational content on the web, containing hundreds of thousands of documents and other materials. Much of this content is developed by scientists, much by teachers, some even by students. At the present time the only way to find this material is to search collection by collection, because search engines do not sort their hits by grade level. Even at the collection level there is almost no stratification by individual grades. Given that the Web only works when search works, the Web does not presently work for science education. The content is there but it cannot be found.

 

 


D. Project Narrative

 

Results to Date and Demonstration of Technical Feasibility

 

Our project to date has clearly demonstrated the feasibility of our approach to creating an efficient grade level search mechanism. In fact we have pioneered a new method of grade level stratification (GLS) of content for science, technology, engineering and mathematics (STEM) education. Our method will be useful not only for creating new search tools, but for content development as well. This is explained below.

 

1. Introduction: The STEM GLS challenge

 

Our original project proposal stated our objective as follows:

 

"Grade level search."

 

"One of our ultimate objectives is to explore the feasibility of sorting content by grade level or at least by ranges of grade level."

 

"Core concept search."

 

"At a minimum we hope to be able to distinguish educational content from advanced scientific content. We expect to be able to identify a set of core concepts that are typically always taught at different grade ranges, however broad. The presence of these concepts and the absence of advanced concepts should be diagnostic of educational content. This hypothesis will be tested."

(end of proposal quotation)

 

What we have found is a way to specify clusters of core concepts and words that precisely identify the grade level of the content, for a given state. This is far more than we expected, far more than the minimum we hoped for. In fact it sets the stage for a new technology of grade level specific content. What we did and what we found is described below.

 

2. Assessing the primitive state of grade level stratification today

 

We began by surveying the extensive education websites at several science institutions. We looked to see if grade level stratification (GLS) was being effectively utilized to establish the grade level readability of science education documents. In researching these l sites it became apparent that many of the documents and even the sites themselves were not meeting their target group's needs, especially the elementary student.

 

In many cases the grade ranges that were used were too wide to be useful. In some cases Web resources were simply grouped as being "for Kids," with no grade level stratification at all. In most other cases the groupings were just "elementary, middle school, and high school." This is too broad to be useful, especially in the elementary grades. Also, much of the content was found to have a grade level that was higher than the category it was put under.

 

We also looked at a number of Web-based collections of K-12 science education resources. Here too we found that many resources were not correctly identified by grade level, or were categorized too broadly to be useful.

 

We also found that a lot of material mixes content that is suitable for one grade with content that is suitable for another. For example, an article with mostly 4th grade content will contain a high school level paragraph or two. Such material is not suitable for either grade.

 

3. Selected topic case study: magnetism and electricity

 

The team selected energy, with a concentration on magnetism and electricity, as the topic case study. There were several reasons of this choice. First, the team members are knowledgeable with the content. One member is an expert in the field of electric power and another teaches the subject as part of the curriculum in her grade level.

 

Second, magnetism and electricity is taught in many grades, from Kindergarten through advanced college. This makes it a good candidate for many levels of stratification. Third, the Department of Energy has a deep interest in magnetism and electricity. It does a lot of research in these areas of science and technology.

 

4. Focus on computer based GLS

 

We decided to focus our efforts on computer based grade level stratification. It was felt that the greatest need and opportunity for results was with this challenge. While there are powerful Web search engines, there is as yet no workable search engine that finds STEM documents by grade level. Nor is there a writing tool that will determine the grade level of a document, or coach a writer who is trying to achieve a certain grade level, as a spell checker does. That is the challenge we undertook.

 

One technical challenge that has emerged is that simple stemming of terms, which is often used in computer based search, does not work. This is because simple variants of technical terms may be taught at very different grade levels. For example, "electron" and "electronic." This difficulty has meant that in many cases variant search terms have to be hand crafted.

 

5. Focus on state standards of learning (SOL): they are the basis for teaching and testing, hence the greatest user need

 

In the United States, education is primarily a state and local responsibility. Each state is responsible for creating content standards in every subject. Science and math standards are required by federal law. Standards identify the academic content for essential components of the curriculum at different grade levels. These standards are used in every public school classroom by teachers to plan, prepare, teach, and assess students in every subject. In turn, these standards become the benchmark by which students are evaluated and assessed through the federal No Child Left Behind Act of 2001(NCLB). In the state of Virginia these standards are called the Virginia Standards of Learning or SOLs. In this report we use the term "SOL" to refer to all state standards of learning.

 

Because SOLs are so important we selected them as the basis for our GLS work. Other candidates, such as textbooks, Web-based teaching resources, etc., were considered too difficult to work with and too variable. There exist a number of candidates for federal standards, such as the AAAS Benchmarks. However, these are divided into groups of several grades each, so are not precise enough for our needs. Moreover, their grade levels for various topics often do not correspond to the grade levels used in many states. In fact, as we discuss below, grade levels vary significantly from state to state for many concepts. This is a major finding of our project.

 

6. First SOL case: Virginia K-12 science

 

We chose to use the Virginia K-12 science SOLs as our baseline standard for two reasons. First, one member of the team has expertise is in teaching the Virginia SOLs. Second, the Virginia science SOLs have very detailed content, compared to some other states. They also have even more detailed supporting guidance documentation. We therefore focused on the Virginia K-12 science SOLs for magnetism and electricity.

 

In Virginia there are grade specific SOLs for each grade from Kindergarten through 6th grade. Then there are topic specific SOLS for middle school and high school, not for specific grades. This is because different students take their science topics in different grades in middle and high school.

 

7. Step one: the initial grade level stratification -- 4th grade electricity concepts and clusters using the ST, VT, GT, DT and IT method.

 

Beginning with the fourth grade science SOLs in magnetism and electricity we isolated core concept terms that were specific to that standard. These are called simply SOL terms. We then developed a list of variations and otherwise related terms suitable for computer search, as explained below. (See listing)

 

By using the SOL term we first determined many of the grammatical variances of the key term that could be used in the same grade. This allows us to search on the SOL term (denoted ST in the listing) as well as possible grammatical variations of the term (denoted VT). This process is analogous to stemming but is more controlled because some terms with identical stems can have very different grade level. For example, "magnet" is a kindergarten term while "magneto" is a high school term.

 

We also identified some of those terms found in a dictionary (denoted DT) that relate to the SOL term and are likely at the same grade level. Implied terms (denoted IT) were included when the team determined it was probably needed to teach a SOL term. Guide terms (denoted GT) were added on occasion to help again clarify the ST. Guide terms were pulled from the state guideline documents to help teachers understand any standard that was not specific. For example, in the case of electricity the 4th grade SOL simple states that important historic figures will be taught. The guidelines identify Franklin, Faraday and Edison, so these were added as GTs. Teaching about Franklin implies teaching about lightning so this was added as an IT, and so on. The point is that the process of term selection is systematic, however, it also requires judgement.

 

In this way we developed a method for systematically extracting and building up a comprehensive listing of ST, VT, GT, DT and IT clusters for a given grade. This is the first step toward grade level stratification.

 

Here is the complete fourth grade listing:

 

ST electricity

VT electric

VT electrical

VT electrically

 

ST conductors

VT conductor

VT conducting

VT conduct

VT conducts

 

ST insulators

VT insulates

VT insulate

VT insulated

VT insulating

VT insulation

IT nonconductor

 

ST circuits

VT circuit

DT circuit breaker

IT closed circuit

IT open circuit

IT parallel

IT series

 

ST static

VT statically

DT static charge

DT statically charged

DT static electricity

DT electric charge

DT electrically charged

GT battery

IT batteries

IT electric cell

IT dry cell

 

ST electrical energy

DT electrification

DT electrify

DT electrified

 

ST electromagnets

VT electromagnet

VT electromagnetism

VT electromagnetic

VT electromagnetically

DT magnetization

DT magnetizer

 

SOL calls for "historic figures"

ST (none)

GT Benjamin Franklin

VT Franklin

IT lightning

 

GT Michael Faraday

VT Faraday

 

GT Thomas Edison

VT Edison

IT light bulb

 

8. Significant features of the 4th grade electricity stratification list

 

This list has several significant features. First and foremost, there are enough terms to support computer based search, analysis and identification of documents. Subsequent tests, described below, have borne this out. Thus we demonstrated the feasibility of grade level stratification, the goal of our initial effort.

 

Indeed, we note that a significant number of technical concepts are being taught. One of the most important ramifications of our method is that we can actually identify the number of major concepts that are being taught, for any topic in any grade. This has large implications for the utility of our technology. For example, we estimate that overall about 2000 distinct major concepts are taught in K-12 science. Given the number of hours typically devoted to science, this amounts to roughly one major concept for every half hour of instruction. These large numbers, combined with the multiplicity of topics to be mastered, help to explain why science is a hard subject, which many students find difficult to keep up with. Learning science is a marathon of sprints.

 

The clustering of terms around very different major concept terms in the list is also very important. For example, circuits are a different topic than static electricity, or Benjamin Franklin. Looking ahead, we believe this clustering feature will enable us to develop a search algorithm that is independent of what combination of topics happens to be taught in a specific school in a given grade. Ideally a user will specify a specific concept cluster to be found, not a grade level. Exploring this challenge is a major technical objective of our project.

 

We also note that learning some concepts is dependent upon having already learned others. For example, electromagnets depend on circuits, circuits depend on conduction, and conduction depends on electricity. These concepts must be learned in sequence. This sequential structure has important implications for the use of our method. In particular, Project 21 of the American Association for the Advancement of Science (AAAS) has demonstrated and mapped this kind of sequential dependency (at a coarser scale than grade level), as a tool for promoting science literacy. NSDL uses these AAAS maps as search aides. We hope to extend this work to a much finer, concept-by-concept set of learning dependencies. Exploring this challenge is another major technical objective of our project.

 

9. Step two: electricity stratification for all K-12 grades

 

Next we extended the same method of finding and clustering each ST, VT, GT, DT and IT for electricity and magnetism to all grade levels from kindergarten to high school. We created a master list for each grade level that teaches magnetism and electricity. Virginia has individual SOL for the elementary grades of K through 6. It has a single grade level for middle school (grades 7-8) and for high school (9-12), because the same science can be taught to different students in different grades. Thus there are nine SOL grade levels. Magnetism and electricity are taught in all but first and fifth grade, so there are 7 grade level listings for that topic.

 

We noted that some Virginia SOL terms appeared in more than one grade level listing. For our final list we determined that a term should only appear once in the grade level clusters, that being when it is originally taught. By doing this it would allow the term to appear at its earliest, or first, occurrence, therefore establishing it as core concept term in that grade level. In other words, even though a term might appear in the SOL at a later grade, it has already been taught in an earlier grade. So it is not being taught in the later grade, merely being used. Our goal is to identify the grade level at which each concept is being taught. We therefore located and listed only the first occurrence of each SOL term and its cluster of associated terms.

 

At this point we had a completed K-12 grade level stratification (GLS) for electricity and magnetism. This GLS will allow a computer to search a document and determine if it has a grade specific vocabulary. It will also help an author design a document for a given grade level.

 

Grade level defined: At its simplest, the highest grade level that has terms in the document is the grade level of the document. For example, suppose an article on electricity uses terms that are taught in grades K, 2, 3 and 4, but no higher grades. This article can only be read by someone in grade 4 or higher, because it uses 4th grade terms. But in that group (4th grade or higher) it is only using concepts that are taught in 4th grade, so it is not suitable for teaching higher grades. The article is only suitable for teaching 4th grade, so that is its grade level.

 

10. First and second GLS trials: the Newton test bed

 

We next conducted our first trial of computer based search using our electricity GLS, in collaboration with OSTI and their search specialist, Deep Web Technologies. It was crudely successful, demonstrating the feasibility of GLS based computer search but in need of refinement.

 

We chose as our test bed the Newton collection at Argonne National Laboratory. We chose Newton for several reasons. First, it is a DOE laboratory product that OSTI plans to include in its federation of DOE science education collections. Newton is a collection of about 20,000 science questions and answers, developed over many years via what is called the DOE "Ask a scientist" program. Each Q&A is a separate document in the Newton database. This simplifies the search problem by eliminating well known problems due to searching different kinds of documents. Exploring this latter challenge is a major project technical objective.

 

For our first test we merely used single word search, without Boolean combinations. We first identified all the documents in Newton that contained at least one word found in our GLS list. We then sorted these documents into grade levels based on the highest grade level word found. The stratification of Newton documents was basically successful and we demonstrated it at DOE's Office of Scientific and Technical Information in Oak Ridge, Tennessee on Jan 3, 2008.

 

However, one problem appeared that we needed to address. We call it the "outlier problem." This problem occurs when a document is not about electricity but contains a single word that appears on our electricity GLS. Such a document may contain advanced terms in its own topic, in conjunction with a relatively elementary electricity term. It will be incorrectly assigned the more elementary grade, according to its electricity term.

 

There are several possible solutions to this outlier problem. For one, it should not appear when a GLS for all of science is used. In that case, the more advanced term from another topic should be covered by the GLS for that topic. However, it occurred to us that the problem might also be minimized simply by searching for documents that contain more than one term from the electricity GLS. That is, try to find only those documents that are actually about electricity, the true hits.

 

We and Deep Web Technologies therefore conducted a second GLS trial, requiring that documents be selected only if they contained at least two of the electricity GLS terms. This simple change greatly decreased the incidence of false hits. This means that GLS search is roughly feasible without covering the whole of science. We expect to further explore this challenge as a technical objective. For example, searches requiring 3, 4 or 5 topic terms may be sufficiently precise to rule out most false hits, without ruling out too many true hits.

 

11. Second SOL case: California electricity and magnetism

 

We also decided to begin to apply the GLS method to other state standards, to assess its general applicability. We also wanted to compare other states to Virginia, to see how much variation there might be. The state we chose was California, for several reasons. First, California is considered one of the five top education states in the country, which influences not only trends in education but the STEM content marketplace. (The other states are Michigan, Texas, Florida, and New York.)

 

Secondly one of the team members taught science in California for five years. She was there when the first standardized test, called the Standardized Testing and Reporting (STAR) program, was implemented. Like Virginia, the California SOL are very detailed, which facilitates GLS. Some state SOL, New York for instance, are relatively vague in comparison. GLS in these states will probably require extensive use of SOL guidance documents and textbooks. This is a technical objective question that we may address in future.

 

We used the same step-by-step procedures in isolating the key terms for magnetism and electricity as we previously did for Virginia. Using the California standards the team compiled a list which produced the key standard terms. We followed the same procedures as we did in Virginia and used the same method for key terms related to those found in the standards, identifying variants, dictionary related terms, implied terms and guidance terms.

 

12. We discover dramatic grade level differences: VA (4, 6, MS & HS) versus CA (4 & HS only)

 

To our surprise, the list that was generated for California was quite different from the Virginia list. Not in content, that was almost the same, but in when the concepts are taught. The results showed that there can be a vast difference between when a concept is taught in one state than in another. For example, electricity is taught mainly in fourth grade and high school in California whereas in Virginia it is taught in a more sequential order from kindergarten through high school with an emphasis in fourth grade, sixth grade, middle school, and high school.

 

Another difference between the two states was when a fundamental term was being introduced for the first time. For example, a fundamental term like magnetism is introduced in kindergarten in Virginia, but not until fourth grade in California.

 

The team even performed a pilot analysis that determined which terms had the greatest distance between grade levels. These great differences in when a given concept is taught pose a potentially large problem for students who move from one state to another. Some concepts will be taught twice while others will not be taught at all. Not learning a major concept can be a serious problem, because later concepts build on earlier ones. Textbooks must face a similar problem. Our GLS product might be used to alert parents and schools to this problem. Moreover, using Web based materials to catch up is an obvious solution.

 

However, this significant finding poses a problem if GLS is going to look for terms by grade level. Strictly speaking, a specific concept grade level probably does not exist for many STEM concepts in the United States, rather grade level is a state by state matter. In some cases it may even be school specific within a state. The team believes further investigation is needed to determine if this difference extends to other states, and how many concepts are affected. Here again our method should be of great value. This may be a serious national problem, a form of incoherence.

 

Short of doing GLS for every state, we plan to try to gear our search algorithm to concept clusters, not to grades per se. The concept clusters do not appear to change much from state to state. The basics of electricity and magnetism are not state specific, only the grades in which different concepts are taught.

 

13. First math GLS -- VA probability theory

 

We also applied our GLS method to a mathematical topic, to test its generality. Some science concepts presuppose certain math concepts, so math grade level may help determine science grade level. We chose probability theory in the Virginia SOL. Like electricity, probability theory is taught from kindergarten through high school.

 

The method worked well. However, we noted one important difference in the outcome, which probably reflects the difference in abstraction between math and science. In the electricity science GLS most of the concepts are concrete technical terms, like magnet, static charge, Ohm's law, etc. In probability many of the terms are non-technical, such as event, likely and outcome. This difference might require a difference in search algorithms, requiring the presence of key technical terms for example. Of course there are some technical terms in K-12 probability theory, like probability and normal distribution, but the proportion seems relatively small compared to science. This technical objective question may be explored further in future.

 

14. College level GLS for magnetism and electricity -- a new method

 

Our next step was a big one, extending the K-12 GLS to the college level. There were several compelling reasons to do this. First, using the K-12 SOL based term lists we had, we could not distinguish high school level content from more advanced content, so our K-12 GLS was still incomplete. In order to identify high school level content we need to separate it from content that is more advanced than high school. This requires using terms that are more advanced than high school and we had none in the K-12 set. We needed college level terms to identify high school level content.

 

Also, we were aware from our earlier research that researchers themselves often need college level STEM content, especially when they explore new topics as part of their research. An expert in one STEM field may be a beginner in most other fields. And of course there are the many undergraduate level students and teachers who need content at the basic college level. We are interested in serving all these college level users, as well as K-12 users.

 

We therefore determined to try to extend the electricity GLS to include not merely a college level, but two undergraduate college levels, basic and advanced. This would enable it to identify basic college level content, as well as high school level content. The latter will serve the high school student and teacher, while the former will serve the researcher, as well as the basic level college student and teacher.

 

After considerable trial and error we settled on using the indexes of several popular textbooks. We used a basic electricity textbook for the basic college level terms. Several advanced level undergraduate textbooks were used for the advanced term set. These included textbooks for electrical engineering, power systems, electromechanics and electromagnetics. In addition to the index terms we added simple grammatical variants.

 

A selection issue arose at that point, for there were several thousand terms in all of the advanced indexes taken together. We felt this was probably more than necessary to distinguish advanced level content from basic level, so we just used the 500 or so advanced terms that seemed most common. Whether a larger number of advanced terms is useful will be a technical objective question. We also pared down the basic electricity index to about 100 terms. This was done by choosing terms that also appeared in one or more advanced textbooks.

 

It is an interesting research issue to determine whether this GLS approach could be used to distinguish advanced undergraduate college level content from the most advanced content, that used in postgraduate studies and actual research results. Journal literature for example. It is not at all clear that this professional level research activity uses terms that are all that distinct from the advanced undergraduate terms, but it may.

 

 

E. Putting it all together: a 3-D spiral model of K-12 STEM education

 

In K-12 STEM education it is common to talk about "spiraling" in the context of multiple topics, each being taught progressively over a series of grades. Our GLS development work supports a new, precision approach to the search, modeling, analysis and visualization of this important concept. How STEM concepts are clustered by grade, which topics are taught when, and how this differs from state to state, is all part of spiraling. This is explained below, in terms of a simple 3D spiral model.

 

First divide the science to be taught into, say, 30 topics, each of which is taught progressively in several grades over the K-12 period. Electricity is one topic and a complete topic listing is given later in this report. Let a vertical pole 6' high represent each topic. Place the poles in parallel, standing on the floor, making a cluster of 30 vertical poles. Next divide each pole into vertical segments, say 20 per pole, each of which will represent a group of concepts that are normally taught together. In electricity, one group might be conduction, another Ohm's law.

 

Label the segments in sequence so that the concepts taught earliest for each topic are closest to the floor, and progress upward to the last concepts at the top end of each pole. The sequence of segments represents the fact that many concepts have to be taught sequentially. Assume for now that there is only one such sequence for each topic. We now have 30 poles with 20 segments each, or 600 segments in all. This is everything that will be taught in K-12 science. Given that only one thing can be taught at a time, the question is how to work through all the topics and all the segments, step by step?

 

Now let a string represent the actual sequence of teaching of each segment and topic. In effect the string represents the student's learning experience, studying one segment after another and moving from topic to topic. Attach the string to each of the 600 segments in the order in which these are taught. The basic model is now complete. Spiraling refers to the fact that the string will leave a given pole for a period of time then return, then it leaves again, returns again, and so on.

 

The amount of spiraling can vary enormously, depending on the overall sequence of teaching. At one extreme, suppose each pole is completely taught before another is begun. In this case the string would only jump 29 times, from the top of each pole (except the last) to the bottom of another. At the other extreme the string would jump to another pole after every segment, or 599 times.

 

In any given curriculum the amount of jumping is normally somewhere in between these two extremes, perhaps several hundred jumps. All of this jumping probably contributes significantly to the difficult of learning. But if we minimize jumping by teaching more segments on each pole at once, before we jump to another pole, then we will maximize the time before we return to a pole once we leave it. This too will contribute to the difficulty of learning. Either way there is a potential problem. This is the dilemma of spiraling and it is fundamental to STEM education.

 

Moreover, the number of possible paths for the string, or sequences of concepts taught, is enormous. This means that two different curricula can have very different string sequences, even though they cover exactly the same material. Students moving from one to another will be taught some concepts twice and others not at all. Missing a major concept can be very troubling if later concepts depend upon it, as often happens. Students moving from state to state, or from one school system to another, may face this problem. So do textbooks, which cannot fit all the different spiraling patterns required by different states. This is probably a national problem of some significance, which may contribute significantly to the challenge of SETM education.

 

This simple spiral model shows just how complex technical education is. Note too that this model applies to math as well as science. Combining the two gives perhaps a 60 pole model. It may also apply to the other parts of the curriculum, such as reading, history, etc. The overall model is very complex, but that is just how the reality is. Teachers, students and content developers need to understand this.

 

 

 

 

F. From grade levels to learning levels

 

To solve the spiraling problem we have switched from grade levels to what we call "learning levels," to rank educational content from elementary to advanced. There are 10 levels, with level 1 being the most elementary and level 10 the most advanced level.

 

These 10 learning levels are presently based on the American grade level system. This ranges from kindergarten through the 12 primary and secondary grades, to the 4 undergraduate college "grades." This K-16 grade system thus has 17 grade levels in all. Our 10 levels span these 17 grades.

 

Ideally we would rank educational content by the grade at which it is taught. But the same content is taught in different grades in different schools, and even to different students in the same school, so this is not possible. There is no unique grade-to-content connection.

 

This is why we have created the 10 learning levels. Roughly speaking, each learning level corresponds to the average grade level range at which the content is taught in the USA. However, learning levels are averages, not actual levels, as explained below.

 

Learning Levels 1-10 are based on the K-16 grade ranges, as follows:

Learning Level 1 = grades K&1

Levels 2 through 6 = grades 2 through 6

Level 7 = middle school or junior high school

Level 8 = high school

Level 9 = basic undergraduate college

Learning Level 10 = advanced (BS degree) undergraduate college

 

In many cases we have used grade ranges, like middle school and basic college, rather than exact grades. This is because our data consists of state standards of learning for grades K-12 and college textbooks for the college grades. These sources only determine which content is used in the indicated ranges, not by exact grade.

 

Note too that the same content may be taught in different grades and grade ranges in different states. Thus the learning level is an average of various grades and grade ranges, across different states. This is somewhat confusing because there are two ranges in question. Middle school and high school are two grade ranges. But a given concept may be taught in both so the range of grades for that word may span both grade ranges.

 

But the learning level is not a normal average, where the average value is also the most common. This means that assigning a concept to a given learning level does not mean it is usually taught in the corresponding grade or grade range, although it may be. For example, if a concept is normally taught in either 4th or 6th grade it will be assigned to level 5, even though it ranges over grades 4 to 6, and is seldom taught in grade 5. Our model is not as simple as it may look.

 

The learning level assigned to educational content is just an average over many different schools and states. A user who is seeking content for use in particular school and grade may have to look at a higher or lower learning level in order to find it in our system.

 

 

G. Conclusion: Our project demonstrates high potential and the feasibility of grade level stratification of STEM educational content

 

To return to the topic of Web based search, the complexity of the spiraling model explains why Web based search for specific science content is so important. Teachers, parents and students need very specific content at every step of the way. But today's search engines do not do this job. This is a global problem of huge proportion. Simply put there is no reasonable way for teachers, parents or students to search on the Web for content that is suited for a specific grade level or degree of difficulty. There are thousands of small collections scattered through the Web, including a host of Federally funded content. The only way to search them is one by one, which is prohibitively difficult.

 

The Web only works where search works. At the present time Web search does not work for science education. Our effort is designed to develop the tools needed to solve this problem.

 

We believe that our project to date clearly demonstrates the feasibility of developing GLS products and services with high potential for use. In fact our results provide a significant new understanding of the structure of STEM education, including new ways to measure and improve that structure. The proposed project to develop a full scale GLS for STEM education is described below.

 

 

Appendix I

 

Basic topics for a complete K-16 science education GLS system

 

A. Physical Science

Topic:

 

1. Basic principles of electricity and magnetism (done).

2. The basic nature of matter.

3. Models of atomic structure.

4. Chemical properties and use of the periodic table of elements.

5. Changes in matter and the Law of Conservation of Matter and Energy.

6. States and forms of energy and how energy is transferred and transformed.

7. Temperature scales, heat, and heat transfer.

8. Principles and technological applications of work, force, and motion.

9. Characteristics of sound and technological applications of sound waves.

10. The nature and technological applications of light.

 

 

B. Earth science

 

Topic:

 

11. Characteristics of the Earth and the solar system.

12. Renewable and nonrenewable resources.

13. Composition and dynamics of the atmosphere.

14. How energy transfer from Sun to Earth drives weather and climate.

15. Freshwater resources and the water cycle.

16. Oceans as complex, interactive physical, chemical, and biological systems.

17. Rock-forming and ore minerals, and the rock cycle.

18. Geologic processes including plate tectonics.

19. History and evolution of the Earth and life, based on rocks and fossils.

20. Origin and evolution of the universe.

 

 

Life Science

 

Topic:

 

21. Cell theory.

22. Patterns and structures of cellular organization in organisms.

23. How organisms differ and can be classified.

24. Basic needs of organisms.

25. Photosynthesis and its importance to plant and animal life; the carbon cycle.

26. Interactions among members of a population.

27. Interactions among populations in a biological community; food chains and webs.

28. Adaptation to biotic and abiotic factors in an ecosystem.

29. Dynamics of ecosystems, communities, populations, and organisms.

30. Ecosystem dynamics and human activity.

31. How organisms reproduce and transmit genetic information to new generations.

32. Evolution and how organisms change over time.

 

 

 

 

 

Appendix II

 

Principal Investigator and other Key Personnel

 

Dr. David Wojick, the Principal Investigator, is an expert on the Web-based diffusion of scientific knowledge and the concept structure of science and technology. Diane Adams is a Web designer and Web research expert. Bernadette Monahan is a science teacher who specializes in educational technology and collecting science education Web content.

 

Wojick and Adams have been a team since 1976. During that period they have conducted or participated in a number of large scale projects that required developing new methods of search and analysis, in the context of real world applications. Much of this work has been done for the Federal government. Examples include the following:

 

a. Use of word search to identify hidden clusters and paths in Naval Research (Office of Naval Technology).

 

b. Use of word search to identify basic regulatory mechanisms in federal regulations (Office of Information and Regulatory Affairs, OMB).

 

c Coherence analysis diagnostic system of 126 kinds of confusion in technical texts (Department of Commerce).

 

d. A method to measure allocation of content to scientific topics in Web pages (Office of Scientific and Technological Information, DOE).

 

e. Population modeling of the diffusion of scientific knowledge (Office of Scientific and Technological Information, DOE).