Community-Building and Infrastructure Design for Data-Intensive Research in Computer Science Education

From PAWS Lab
Jump to: navigation, search


Online educational systems, and the large-scale data streams that they generate, have the potential to transform education as well as our scientific understanding of learning. Computer Science Education (CSE) researchers are increasingly making use of large collections of data generated by the click streams coming from eTextbooks, interactive programming environments, and other smart content. However, CSE research faces barriers that slow progress:

  • Collection of computer science learning process and outcome data generated by one system is not compatible with that from other systems.
  • Computer science problem solving and learning (e.g., open-ended coding solutions to complex problems) is quite different from the type of data (e.g., discrete answers to questions or verbal responses) that current educational data mining focuses on.

The project goal is to build community and capacity among CSE researchers, data scientists, and learning scientists toward reducing these barriers and facilitating the full potential of data-intensive research on learning and improving computer science education. We are bringing together CSE tool build communities with learning science and technology researchers toward a software infrastructure that supports scaled and sustainable data-intensive research in CSE that contributes to basic science of human learning of complex problem solving. This goal is being achieved through a set of community-building and infrastructure capacity-building activities whose ultimate goal is to develop and disseminate infrastructure that facilitates three aspects of CSE research:

  • development and broader re-use of innovative learning content that is instrumented for rich data collection,
  • formats and tools for analysis of learner data,
  • best practices to make large collections of learner data and associated analytics available to researchers in CSE, data science, or learning science.

We hope to engage a large community of researchers to define, develop, and use critical elements of this infrastructure toward address specific data-intensive research questions. We are hosting workshops, meetings, and online forums leveraging existing communities and building new capacities toward significant research outcomes and lasting infrastructure support. This is a collaborative project with Carnegie Mellon University and Virginia Tech teams. More information about the project as well as project materials could be found on the project home page.

Call for Collaborators: What we can offer

Reusable Smart Learning Content

We have many types of “smart learning content” (various interactive problems and examples) for Java, Python, SQL. Content could be re-used in other classes individually, or as a package through personalized Mastery Grids interface.

Adaptive practice system

We have an infrastructure that allows building flexible practice systems (which students could use on their own) for Java, Python, SQL. The system allows to create support for any course in the field - you define a sequence of topics, assign content to topics, and receive a fully-ready system with an adaptive interface. The system supports our own smart content and several types of interoperable content created by our collaborators. In total, there are 7 types of content for Python, 6 types for Java and 6 types for SQL. We can help you to create a custom support exactly for your course. The system could be used to run extensive classroom studies and collect data. Read more about it here Also, watch (not the most recent) demo at

Call for Collaborators: What we are looking for

More Smart Learning Content!

We are interested to collaborate with other developers of smart learning content to make it more interoperable and to feature in our personalized systems for Python, Java, and SQL.

More Studies with Data Collection

Our ultimate goal is to learn more about how people learn programming and better supporting them with adaptive educational systems. To get there we need to run more studies of different type of smart learning content and different personalization approaches. We want to partner with instructors in Java, Python and Database classes who are interested to engage their student in practicing computer science concepts with different types of smart learning content. See what we have in “When we can offer” section above and get in touch! It is a great chance to make interesting discoveries and publish good papers. In particular, we just developed PCLab system with several new types of learning content to support learning program construction skills - for which we have the least data. We are especially interested to run studies of this new content. See PCLab demos for Python ( and Java (