Difference between revisions of "CUMULATE user and domain adaptive user modeling"

From PAWS Lab
Jump to: navigation, search
(Publication)
m (Procedures)
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{ambox| text='''This page is under construction. More content will be added soon'''|type=exclamation}}
 
 
 
This stream of work is aimed at improving [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] ''one-fits-all'' algorithm for  modeling user's problem-solving activity and creating a context-sensitive user modeling algorithm adaptable/adaptive to individual users' cognitive abilities as well as to individual problem complexities.
 
This stream of work is aimed at improving [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] ''one-fits-all'' algorithm for  modeling user's problem-solving activity and creating a context-sensitive user modeling algorithm adaptable/adaptive to individual users' cognitive abilities as well as to individual problem complexities.
  
Line 6: Line 4:
  
 
==Study 1==
 
==Study 1==
This study involves retrospective comparative evaluation of the [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] user modeling algorithms. The evaluation is done using [[CUMULATE Usage Logs|usage logs]] collected from 6 Database Management courses offered during Fall 2007 and Spring 2008 semesters at the [http://www.pitt.edu University of Pittsburgh], [http://www.ncirl.ie National College of Ireland], and [http://www.dcu.ie Dublin City University]. Each course had roughly the same structure and an identical set of problems served by [[SQLKnoT]] system.
+
This study involves retrospective comparative evaluation of the [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] user modeling algorithms. The evaluation is done using [[CUMULATE_Usage_Logs#Database_Management_6_Course_Dataset|usage logs]] collected from 6 Database Management courses offered during Fall 2007 and Spring 2008 semesters at the [http://www.pitt.edu University of Pittsburgh], [http://www.ncirl.ie National College of Ireland], and [http://www.dcu.ie Dublin City University]. Each course had roughly the same structure and an identical set of problems served by [[SQLKnoT]] system.
  
 
=== Scenario ===
 
=== Scenario ===
 
We were comparing [[CUMULATE asymptotic knowledge assessment|legacy]] CUMULATE algorithm and 3 versions of [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm. The versions differed in the parameters used for user modeling.
 
We were comparing [[CUMULATE asymptotic knowledge assessment|legacy]] CUMULATE algorithm and 3 versions of [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm. The versions differed in the parameters used for user modeling.
 
* First, was an attempt to shadow the [[CUMULATE asymptotic knowledge assessment|legacy]] algorithm by ''guessing'' the best parameters for modeling, without discriminating individual user and problem differences.
 
* First, was an attempt to shadow the [[CUMULATE asymptotic knowledge assessment|legacy]] algorithm by ''guessing'' the best parameters for modeling, without discriminating individual user and problem differences.
* The second version, did not discriminate users/problems as well. However, the parameters were obtained by fitting the global user parameter and a global problem parameter signature and then using them in model.
+
* The second version, did not discriminate users/problems as well. However, the parameters were obtained by fitting one global user parameter and one global problem parameter signature and then using them for the modeling.
 
* The third version of the [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm worked with a set of user specific parameters and problem specific parameter signatures for the modeling.
 
* The third version of the [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm worked with a set of user specific parameters and problem specific parameter signatures for the modeling.
  
Line 17: Line 15:
 
[http://en.wikipedia.org/wiki/Accuracy Accuracy] and [http://en.wikipedia.org/wiki/Sum_of_squared_error SSE] (sum of squared error) were used as the metrics of comparison. They were computed overall for each of the 6 semester logs and 4 versions of algorithms.
 
[http://en.wikipedia.org/wiki/Accuracy Accuracy] and [http://en.wikipedia.org/wiki/Sum_of_squared_error SSE] (sum of squared error) were used as the metrics of comparison. They were computed overall for each of the 6 semester logs and 4 versions of algorithms.
  
In the case of [[CUMULATE asymptotic knowledge assessment|legacy CUULATE]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]-best-guess algorithms, the data was taken as it. The globally parametrized, and individually paramatrized CUMULATE algorithms were supplied with the pre-fit global/individual user/problem-specific parameters. The data of only one of the early courses was used to obtain the parameters. Data of all 6 was used to compute parametrized models. Refer to the table below for details and basic log statistics statistics.
+
In the case of [[CUMULATE asymptotic knowledge assessment|legacy CUMULATE]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]-best-guess algorithms, the data was taken as it. The globally parametrized, and individually paramatrized CUMULATE algorithms were supplied with the pre-fit global/individual user/problem-specific parameters. The data of only one of the early courses was used to obtain the parameters. Data of all 6 was used to compute parametrized models. Refer to the table below for details and basic log statistics statistics.
  
 
{| border="1"
 
{| border="1"
Line 83: Line 81:
 
|22.82
 
|22.82
 
|}
 
|}
C<sub>L</sub> - [[CUMULATE asymptotic knowledge assessment|legacy CUULATE]]'s algorithm, C<sub>p:bg</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with guessed global parameters, C<sub>p:glob</sub>- [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm fit global user/problem parameters, C<sub>p:ind</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with individual user/problem parameters, C<sub>p:ind/glob</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with individual problem parameters, and global user parameter
+
C<sub>L</sub> - [[CUMULATE asymptotic knowledge assessment|legacy CUMULATE]]'s algorithm, C<sub>p:bg</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with guessed global parameters, C<sub>p:glob</sub>- [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm fit global user/problem parameters, C<sub>p:ind</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with individual user/problem parameters, C<sub>p:ind/glob</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with individual problem parameters, and global user parameter
  
 
=== Results ===
 
=== Results ===
 +
As we can see from the figures below, neither guessed not globally-fit parameters in a new [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm give it an edge. Individual problem-specific parameters, however, do give it an advantage. In majority of the cases this advantage is significant (with exception to the only graduate level course offered at [http://www.pitt.edu Pitt] in Fall 2007 semester)
 +
{|
 +
|
 +
[[Image:CUMULATE legacy vs parameterized - sql 6 semester dataset - accuracy.png]]
 +
<br/>([http://chart.apis.google.com/chart?cht=lxy&chs=300x240&chd=t:-1|61,32,63,65|-1|39,39,44,46|-1|40,40,45,54|-1|50,50,50,65|-1|66,66,66,75|-1|47,42,55,57&chco=24588E,4C9B46,F3A030,CF1E2B,78387B,7C807F&chxt=x,y&chxl=1:|.3|.4|.5|.6|.7|.8|0:|Cl|Cp:bg|Cp:al|Cp:ea/al&chxp=1,30,40,50,60,70,80|&chds=30,80&chxr=1,30,80,10&chm=o,24588E,0,-1,10|o,4C9B46,1,-1,10|o,F3A030,2,-1,10|o,CF1E2B,3,-1,10|o,78387B,4,-1,10|o,7C807F,5,-1,10&chg=-1,10&chdl=Pitt+08-1+U|Pitt+08-1+G|Pitt+08-2+U|NCI+08-2+U|NCI+08-2+U|DCU+08-2+U&chdlp=r&chtt=Algorithm+prediction+accuracy|for+different+semesters alternatively] via [http://code.google.com/apis/chart/ Google Chart API])
 +
|
 +
[[Image:CUMULATE legacy vs parameterized - sql 6 semester dataset - SSE.png]]
 +
<br/>([http://chart.apis.google.com/chart?cht=lxy&chs=300x240&chd=t:-1|33,67,28,28|-1|57,61,46,46|-1|59,60,45,42|-1|50,50,36,30|-1|34,34,24,21|-1|46,58,33,35&chco=24588E,4C9B46,F3A030,CF1E2B,78387B,7C807F&chxt=x,y&chxl=1:|.2|.3|.4|.5|.6|.7|0:|Cl|Cp:bg|Cp:al|Cp:ea/al&chxp=1,20,30,40,50,60,70|&chds=20,70&chxr=1,20,70,10&chm=o,24588E,0,-1,10|o,4C9B46,1,-1,10|o,F3A030,2,-1,10|o,CF1E2B,3,-1,10|o,78387B,4,-1,10|o,7C807F,5,-1,10&chg=-1,10&chdl=Pitt+08-1+U|Pitt+08-1+G|Pitt+08-2+U|NCI+08-2+U|NCI+08-2+U|DCU+08-2+U&chdlp=r&chtt=Algorithm+SSE|for+different+semesters alternatively] via [http://code.google.com/apis/chart/ Google Chart API])
 +
 +
|}
  
 
=== Publication ===
 
=== Publication ===
 
The results were presented during the [http://www.i-fest.pitt.edu/ i-fest 2009] poster competition at the [http://www.ischool.pitt.edu/ School of Information Sciences], [http://www.pitt.edu/ University of Pittsburgh] and got the second place in the PhD track. Poster can be accessed [http://www.pitt.edu/~mvy3/assets/i-fest%202009%20yudelson.pdf here].
 
The results were presented during the [http://www.i-fest.pitt.edu/ i-fest 2009] poster competition at the [http://www.ischool.pitt.edu/ School of Information Sciences], [http://www.pitt.edu/ University of Pittsburgh] and got the second place in the PhD track. Poster can be accessed [http://www.pitt.edu/~mvy3/assets/i-fest%202009%20yudelson.pdf here].
  
=== References ===
+
=== Addendum ===
 +
 
 +
MATLAB code used for computation can be downloaded [[Media:Cumulate legacy vs user n domain adaptive study1 matlab.src.zip|here]].
 +
 
 +
[[CUMULATE]] log data for the 6 Database Management Courses can be accessed [[CUMULATE_Usage_Logs#Database_Management_6_Course_Dataset|here]]
  
 
= Contacts =
 
= Contacts =
 
[[User:Myudelson|Michael V. Yudelson]]
 
[[User:Myudelson|Michael V. Yudelson]]

Latest revision as of 19:26, 6 May 2010

This stream of work is aimed at improving CUMULATE's legacy one-fits-all algorithm for modeling user's problem-solving activity and creating a context-sensitive user modeling algorithm adaptable/adaptive to individual users' cognitive abilities as well as to individual problem complexities.

A new parametrized user modeling algorithm has been devised. A set of studies is set up to evaluate the new algorithm as well as its adaptability/adaptivity.

Study 1

This study involves retrospective comparative evaluation of the CUMULATE's legacy and parametrized user modeling algorithms. The evaluation is done using usage logs collected from 6 Database Management courses offered during Fall 2007 and Spring 2008 semesters at the University of Pittsburgh, National College of Ireland, and Dublin City University. Each course had roughly the same structure and an identical set of problems served by SQLKnoT system.

Scenario

We were comparing legacy CUMULATE algorithm and 3 versions of parametrized algorithm. The versions differed in the parameters used for user modeling.

  • First, was an attempt to shadow the legacy algorithm by guessing the best parameters for modeling, without discriminating individual user and problem differences.
  • The second version, did not discriminate users/problems as well. However, the parameters were obtained by fitting one global user parameter and one global problem parameter signature and then using them for the modeling.
  • The third version of the parametrized algorithm worked with a set of user specific parameters and problem specific parameter signatures for the modeling.

Procedures

Accuracy and SSE (sum of squared error) were used as the metrics of comparison. They were computed overall for each of the 6 semester logs and 4 versions of algorithms.

In the case of legacy CUMULATE and parametrized CUMULATE-best-guess algorithms, the data was taken as it. The globally parametrized, and individually paramatrized CUMULATE algorithms were supplied with the pre-fit global/individual user/problem-specific parameters. The data of only one of the early courses was used to obtain the parameters. Data of all 6 was used to compute parametrized models. Refer to the table below for details and basic log statistics statistics.

Semester School Level Procedures Users Datapoints Attempts per user Problems per user
Fall'07 Pitt Und. Obtain global/individual user/problem-specific parameters
Compute CL, Cp:bg, Cp:glob, Cp:ind
27 4224 156.44 29.96
Fall'07 Pitt Grad. Compute CL, Cp:bg, Cp:glob, Cp:ind/glob 20 1233 61.65 29.95
Spring'08 Pitt Und. Compute CL, Cp:bg, Cp:glob, Cp:ind/glob 15 458 26.94 16.35
Spring'08 NCI Und. Compute CL, Cp:bg, Cp:glob, Cp:ind/glob 17 216 12.71 6.59
Spring'08 NCI Und. Compute CL, Cp:bg, Cp:glob, Cp:ind/glob 18 142 7.89 4.00
Spring'08 DCU Und. Compute CL, Cp:bg, Cp:glob, Cp:ind/glob 52 4574 81.68 22.82

CL - legacy CUMULATE's algorithm, Cp:bg - parametrized CUMULATE's algorithm with guessed global parameters, Cp:glob- parametrized CUMULATE's algorithm fit global user/problem parameters, Cp:ind - parametrized CUMULATE's algorithm with individual user/problem parameters, Cp:ind/glob - parametrized CUMULATE's algorithm with individual problem parameters, and global user parameter

Results

As we can see from the figures below, neither guessed not globally-fit parameters in a new parametrized algorithm give it an edge. Individual problem-specific parameters, however, do give it an advantage. In majority of the cases this advantage is significant (with exception to the only graduate level course offered at Pitt in Fall 2007 semester)

CUMULATE legacy vs parameterized - sql 6 semester dataset - accuracy.png
(alternatively via Google Chart API)

CUMULATE legacy vs parameterized - sql 6 semester dataset - SSE.png
(alternatively via Google Chart API)

Publication

The results were presented during the i-fest 2009 poster competition at the School of Information Sciences, University of Pittsburgh and got the second place in the PhD track. Poster can be accessed here.

Addendum

MATLAB code used for computation can be downloaded here.

CUMULATE log data for the 6 Database Management Courses can be accessed here

Contacts

Michael V. Yudelson