Open Corpus Personalized Learning
Goal: This project challenges the assumption that adaptive hypermedia systems require expensive knowledge engineering for domain and content modeling. It replaces carefully-crafted domain model with automatically-created domain models, lowering the cost of developing adaptive educational hypermedia software while also providing a wider range of instructional paths through the content. Adaptive educational hypermedia is known for its ability to improve learning outcomes and engagement maximizing educational opportunity for learners with different levels of knowledge. The development of this more automatic, open-corpus approach to adaptive educational hypermedia will increase the volume and the variety of resources available for meaningful online learning, especially for individuals learning on their own. Automatic knowledge indexing of educational content makes the system easy to maintain and update over time. These new open corpus user modeling techniques automatically adapt user models and personalized guidance to new materials as they are acquired. The ability to automatically organize, index, and adaptively recommend distributed educational content without the need of manual processing by system developers, enables new material to be integrated dynamically and with minimal effort in response to student needs.
This project merges research on text analysis, human learning, and personalization to enable open corpus personalized learning. It develops its models of the domain and human learning from an initial set of well-organized, manually selected materials. Automatic text analysis creates an ensemble of domain models with different characteristics. Each individual model may be flawed or incomplete, however collectively they provide comprehensive coverage of the topic from several perspectives, thus reducing the manual effort required to create adaptive educational hypermedia. Multiple perspectives also give the system more flexibility in how to guide each student. These domain models are used as a foundation for building and maintaining dynamic models of user knowledge. The ensemble of domain and user models is used to deliver reactive and proactive adaptive guidance in an open corpus context. The growth of a person's knowledge is inferred by observing learner behavior and obtaining occasional feedback. This exploratory research opens the way to open corpus personalized learning. The domain modeling, user modeling, and personalization techniques developed in this research will be evaluated using a multi-layer framework that includes assessment by subject experts, performance prediction, cross-validation, and user studies.
 The Project Team
 Knowledge Extraction
The Internet has dramatically increased both the volume and variety of online educational resources, such as online textbooks, online courses, and tutorials. The development of modern search techniques has further promoted the quick access of these resources. However, most of these educational resources are not well-structured, which imposed an important challenge -- readers without sufficient background knowledge may be difficult to understand its content. To achieve the goal of recommending the right content that matches individuals' knowledge levels, the first critical step is to provide a better organization for educational resources. The project visions two important components when organizing educational resources: (1) knowledge concept extraction; and (2) concept hierarchy extraction. Traditional solutions for these two problems heavily rely on experts' manual efforts which are time-consuming and unscalable.
Our goal for knowledge extraction is to provide a scalable solution for the above two problems. We pilot our study with extracting knowledge structures from textbooks since they provide a comprehensive list of concepts and are often used as major educational resources in schools, colleges and universities. In addition, textbooks are also equipped with structural information such as table of contents and glossaries, which are very helpful in identifying concepts and their relationships.
 Learner Modeling
| We have recently proposed a data-driven framework for dynamic knowledge modeling in textbook-based learning (UMAP 2016). We formulated the problem of modeling learning from reading as a reading-time prediction problem, reconstructed existing popular student models (such as Knowledge Tracing) and explored two automatic text analysis approaches (bag-of-words-based and latent semantic-based) to build the KC model. This framework can be applied to a broader context of open-corpus personalized learning, empowering learners with the ability to access the right reading content at the right moment, despite the huge volume of online educational content. We are also working on applying Feature-Aware Student knowledge Tracing (FAST) which is our new learner model proposed in 2014 with state-of-the-art predictive performance into textbook-based learning environment.
Over past years, our lab has developed CUMULATE, a centralized user modeling server built for the ADAPT2 architecture, to provide user modeling support for adaptive educational hypermedia (AEH) systems. We have proposed and implemented different learner models , including asymptotic assessment of user knowledge. We have explored different aspects to improve learner modeling, including reducing the content model, better evaluation for practitioners and applying network (graph) analysis.
 The Experimental Platform
In order to support students' learning in the classroom environment, we have implemented a web platform for students to access class materials including textbooks, research publications, web tutorials, etc. More importantly, the system automatically records users' reading behaviors in order to be able to build their student models based on this data. By now, the reading system is able to render material in two formats:
- HTML-based (not tested yet in classroom studies)
We plan to support pdf document rendering soon.
The reading system is basically formed by 2 main parts:
- The reader itself (see right side of the figure)
- The student reading data section (see left side of the figure)
In the student reading data section, the users can have access to two information sources. The first one is a sunburst hierarchical visualization tool (see upper section) that allows them to know their progress in the reading of the contents that are associated with the course using a color scale encoding from red (non-read) to green (totally read). The former version of this visualization tool is called ReadingCircle. The second one (see lower section) is the hierarchical index of the group, where each section The system is created for including learning material following a hierarchical structure in a similar way as books are structured (chapter, subchapter, section, etc.). In addition to this, the system allows the inclusion of multiple choice questions at the end of each section with the aim of test the acquired knowledge of the students.
- Huang, Yun and Yudelson, Michael and Han, Shuguang and He, Daqing and Brusilovsky, Peter. "A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning." In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pp. 141-150. ACM, 2016 (paper).
- Meng, Rui and Han, Shuguang and Huang, Yun and He, Daqing and Brusilovsky, Peter. "Knowledge-based Content Linking for Online Textbooks." In Proceeding of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 13-16. IEEE Computer Society, 2016. (paper).