Social semiotics terms the immediate environment in which a text functions the ‘context of situation’ – an instance of the context of culture. The context of situation is defined by three parameters, FIELD, TENOR and MODE, which can be operationalized by the WHAT, the WHO and the HOW of a text functioning in a science classroom (Knain, 2015). Text and context mutually enable and constrain each other in acts of meaning. For something to be a text, it must both hang together internally and cohere externally in terms of the three contextual parameters (Halliday & Hasan, 2013). In this paper, we argue that although group work in science classes can be seen as joint text development, what is actually developed is often not a text, but a trajectory of different multimodal texts, each with its own text-context relationship. This is because the students sometimes jump between different topics, which point to different values of the context-parameters. We present an analysis of video recorded student group work where the students produce a trajectory of multimodal texts and move between different contexts of situation – as judged by the values of the contextual parameters. But there is one main thread that they continuously return to. This thread is both internally cohesive and coherent with a (developing) context of situation, and thus constitutes a text. Our analyses suggest that a factor that helps in enabling the students to return to this main thread is a drawing that they produce. A number of aspects of visual grammar are used as indications of the continuous transformation of both the text and its context of situation, including framing, foregrounding and backgrounding. We suggest that this process of multimodal text development is likely to be characteristic for learning trajectories