Difference between revisions of "CUMULATE user and domain adaptive user modeling"
(→Procedures) |
m (→Procedures) |
||
(39 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
This stream of work is aimed at improving [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] ''one-fits-all'' algorithm for modeling user's problem-solving activity and creating a context-sensitive user modeling algorithm adaptable/adaptive to individual users' cognitive abilities as well as to individual problem complexities. | This stream of work is aimed at improving [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] ''one-fits-all'' algorithm for modeling user's problem-solving activity and creating a context-sensitive user modeling algorithm adaptable/adaptive to individual users' cognitive abilities as well as to individual problem complexities. | ||
Line 6: | Line 4: | ||
==Study 1== | ==Study 1== | ||
− | This study involves retrospective comparative evaluation of the [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] user modeling algorithms. The evaluation is done using [[ | + | This study involves retrospective comparative evaluation of the [[CUMULATE]]'s [[CUMULATE asymptotic knowledge assessment|legacy]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] user modeling algorithms. The evaluation is done using [[CUMULATE_Usage_Logs#Database_Management_6_Course_Dataset|usage logs]] collected from 6 Database Management courses offered during Fall 2007 and Spring 2008 semesters at the [http://www.pitt.edu University of Pittsburgh], [http://www.ncirl.ie National College of Ireland], and [http://www.dcu.ie Dublin City University]. Each course had roughly the same structure and an identical set of problems served by [[SQLKnoT]] system. |
=== Scenario === | === Scenario === | ||
We were comparing [[CUMULATE asymptotic knowledge assessment|legacy]] CUMULATE algorithm and 3 versions of [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm. The versions differed in the parameters used for user modeling. | We were comparing [[CUMULATE asymptotic knowledge assessment|legacy]] CUMULATE algorithm and 3 versions of [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm. The versions differed in the parameters used for user modeling. | ||
* First, was an attempt to shadow the [[CUMULATE asymptotic knowledge assessment|legacy]] algorithm by ''guessing'' the best parameters for modeling, without discriminating individual user and problem differences. | * First, was an attempt to shadow the [[CUMULATE asymptotic knowledge assessment|legacy]] algorithm by ''guessing'' the best parameters for modeling, without discriminating individual user and problem differences. | ||
− | * The second version, did not discriminate users/problems as well. However, the parameters were obtained by fitting | + | * The second version, did not discriminate users/problems as well. However, the parameters were obtained by fitting one global user parameter and one global problem parameter signature and then using them for the modeling. |
* The third version of the [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm worked with a set of user specific parameters and problem specific parameter signatures for the modeling. | * The third version of the [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm worked with a set of user specific parameters and problem specific parameter signatures for the modeling. | ||
Line 17: | Line 15: | ||
[http://en.wikipedia.org/wiki/Accuracy Accuracy] and [http://en.wikipedia.org/wiki/Sum_of_squared_error SSE] (sum of squared error) were used as the metrics of comparison. They were computed overall for each of the 6 semester logs and 4 versions of algorithms. | [http://en.wikipedia.org/wiki/Accuracy Accuracy] and [http://en.wikipedia.org/wiki/Sum_of_squared_error SSE] (sum of squared error) were used as the metrics of comparison. They were computed overall for each of the 6 semester logs and 4 versions of algorithms. | ||
− | In the case of [[CUMULATE asymptotic knowledge assessment|legacy | + | In the case of [[CUMULATE asymptotic knowledge assessment|legacy CUMULATE]] and [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]-best-guess algorithms, the data was taken as it. The globally parametrized, and individually paramatrized CUMULATE algorithms were supplied with the pre-fit global/individual user/problem-specific parameters. The data of only one of the early courses was used to obtain the parameters. Data of all 6 was used to compute parametrized models. Refer to the table below for details and basic log statistics statistics. |
{| border="1" | {| border="1" | ||
Line 32: | Line 30: | ||
|[http://www.pitt.edu Pitt] | |[http://www.pitt.edu Pitt] | ||
|Und. | |Und. | ||
− | |Obtain global/individual user/problem-specific parameters<br/>Compute C<sub> | + | |Obtain global/individual user/problem-specific parameters<br/>Compute C<sub>L</sub>, C<sub>p:bg</sub>, C<sub>p:glob</sub>, C<sub>p:ind</sub> |
|27 | |27 | ||
|4224 | |4224 | ||
Line 41: | Line 39: | ||
|[http://www.pitt.edu Pitt] | |[http://www.pitt.edu Pitt] | ||
|Grad. | |Grad. | ||
− | |Compute C<sub> | + | |Compute C<sub>L</sub>, C<sub>p:bg</sub>, C<sub>p:glob</sub>, C<sub>p:ind/glob</sub> |
|20 | |20 | ||
|1233 | |1233 | ||
Line 50: | Line 48: | ||
|[http://www.pitt.edu Pitt] | |[http://www.pitt.edu Pitt] | ||
|Und. | |Und. | ||
− | |Compute C<sub> | + | |Compute C<sub>L</sub>, C<sub>p:bg</sub>, C<sub>p:glob</sub>, C<sub>p:ind/glob</sub> |
|15 | |15 | ||
|458 | |458 | ||
Line 59: | Line 57: | ||
|[http://www.ncirl.ie NCI] | |[http://www.ncirl.ie NCI] | ||
|Und. | |Und. | ||
− | |Compute C<sub> | + | |Compute C<sub>L</sub>, C<sub>p:bg</sub>, C<sub>p:glob</sub>, C<sub>p:ind/glob</sub> |
|17 | |17 | ||
|216 | |216 | ||
Line 68: | Line 66: | ||
|[http://www.ncirl.ie NCI] | |[http://www.ncirl.ie NCI] | ||
|Und. | |Und. | ||
− | |Compute C<sub> | + | |Compute C<sub>L</sub>, C<sub>p:bg</sub>, C<sub>p:glob</sub>, C<sub>p:ind/glob</sub> |
|18 | |18 | ||
|142 | |142 | ||
Line 77: | Line 75: | ||
|[http://www.dcu.ie DCU] | |[http://www.dcu.ie DCU] | ||
|Und. | |Und. | ||
− | |Compute C<sub> | + | |Compute C<sub>L</sub>, C<sub>p:bg</sub>, C<sub>p:glob</sub>, C<sub>p:ind/glob</sub> |
|52 | |52 | ||
|4574 | |4574 | ||
Line 83: | Line 81: | ||
|22.82 | |22.82 | ||
|} | |} | ||
+ | C<sub>L</sub> - [[CUMULATE asymptotic knowledge assessment|legacy CUMULATE]]'s algorithm, C<sub>p:bg</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with guessed global parameters, C<sub>p:glob</sub>- [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm fit global user/problem parameters, C<sub>p:ind</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with individual user/problem parameters, C<sub>p:ind/glob</sub> - [[CUMULATE parametrized asymptotic knowledge assessment|parametrized CUMULATE]]'s algorithm with individual problem parameters, and global user parameter | ||
=== Results === | === Results === | ||
+ | As we can see from the figures below, neither guessed not globally-fit parameters in a new [[CUMULATE parametrized asymptotic knowledge assessment|parametrized]] algorithm give it an edge. Individual problem-specific parameters, however, do give it an advantage. In majority of the cases this advantage is significant (with exception to the only graduate level course offered at [http://www.pitt.edu Pitt] in Fall 2007 semester) | ||
+ | {| | ||
+ | | | ||
+ | [[Image:CUMULATE legacy vs parameterized - sql 6 semester dataset - accuracy.png]] | ||
+ | <br/>([http://chart.apis.google.com/chart?cht=lxy&chs=300x240&chd=t:-1|61,32,63,65|-1|39,39,44,46|-1|40,40,45,54|-1|50,50,50,65|-1|66,66,66,75|-1|47,42,55,57&chco=24588E,4C9B46,F3A030,CF1E2B,78387B,7C807F&chxt=x,y&chxl=1:|.3|.4|.5|.6|.7|.8|0:|Cl|Cp:bg|Cp:al|Cp:ea/al&chxp=1,30,40,50,60,70,80|&chds=30,80&chxr=1,30,80,10&chm=o,24588E,0,-1,10|o,4C9B46,1,-1,10|o,F3A030,2,-1,10|o,CF1E2B,3,-1,10|o,78387B,4,-1,10|o,7C807F,5,-1,10&chg=-1,10&chdl=Pitt+08-1+U|Pitt+08-1+G|Pitt+08-2+U|NCI+08-2+U|NCI+08-2+U|DCU+08-2+U&chdlp=r&chtt=Algorithm+prediction+accuracy|for+different+semesters alternatively] via [http://code.google.com/apis/chart/ Google Chart API]) | ||
+ | | | ||
+ | [[Image:CUMULATE legacy vs parameterized - sql 6 semester dataset - SSE.png]] | ||
+ | <br/>([http://chart.apis.google.com/chart?cht=lxy&chs=300x240&chd=t:-1|33,67,28,28|-1|57,61,46,46|-1|59,60,45,42|-1|50,50,36,30|-1|34,34,24,21|-1|46,58,33,35&chco=24588E,4C9B46,F3A030,CF1E2B,78387B,7C807F&chxt=x,y&chxl=1:|.2|.3|.4|.5|.6|.7|0:|Cl|Cp:bg|Cp:al|Cp:ea/al&chxp=1,20,30,40,50,60,70|&chds=20,70&chxr=1,20,70,10&chm=o,24588E,0,-1,10|o,4C9B46,1,-1,10|o,F3A030,2,-1,10|o,CF1E2B,3,-1,10|o,78387B,4,-1,10|o,7C807F,5,-1,10&chg=-1,10&chdl=Pitt+08-1+U|Pitt+08-1+G|Pitt+08-2+U|NCI+08-2+U|NCI+08-2+U|DCU+08-2+U&chdlp=r&chtt=Algorithm+SSE|for+different+semesters alternatively] via [http://code.google.com/apis/chart/ Google Chart API]) | ||
+ | |||
+ | |} | ||
=== Publication === | === Publication === | ||
+ | The results were presented during the [http://www.i-fest.pitt.edu/ i-fest 2009] poster competition at the [http://www.ischool.pitt.edu/ School of Information Sciences], [http://www.pitt.edu/ University of Pittsburgh] and got the second place in the PhD track. Poster can be accessed [http://www.pitt.edu/~mvy3/assets/i-fest%202009%20yudelson.pdf here]. | ||
+ | |||
+ | === Addendum === | ||
+ | |||
+ | MATLAB code used for computation can be downloaded [[Media:Cumulate legacy vs user n domain adaptive study1 matlab.src.zip|here]]. | ||
− | + | [[CUMULATE]] log data for the 6 Database Management Courses can be accessed [[CUMULATE_Usage_Logs#Database_Management_6_Course_Dataset|here]] | |
= Contacts = | = Contacts = | ||
[[User:Myudelson|Michael V. Yudelson]] | [[User:Myudelson|Michael V. Yudelson]] |
Latest revision as of 19:26, 6 May 2010
This stream of work is aimed at improving CUMULATE's legacy one-fits-all algorithm for modeling user's problem-solving activity and creating a context-sensitive user modeling algorithm adaptable/adaptive to individual users' cognitive abilities as well as to individual problem complexities.
A new parametrized user modeling algorithm has been devised. A set of studies is set up to evaluate the new algorithm as well as its adaptability/adaptivity.
Study 1
This study involves retrospective comparative evaluation of the CUMULATE's legacy and parametrized user modeling algorithms. The evaluation is done using usage logs collected from 6 Database Management courses offered during Fall 2007 and Spring 2008 semesters at the University of Pittsburgh, National College of Ireland, and Dublin City University. Each course had roughly the same structure and an identical set of problems served by SQLKnoT system.
Scenario
We were comparing legacy CUMULATE algorithm and 3 versions of parametrized algorithm. The versions differed in the parameters used for user modeling.
- First, was an attempt to shadow the legacy algorithm by guessing the best parameters for modeling, without discriminating individual user and problem differences.
- The second version, did not discriminate users/problems as well. However, the parameters were obtained by fitting one global user parameter and one global problem parameter signature and then using them for the modeling.
- The third version of the parametrized algorithm worked with a set of user specific parameters and problem specific parameter signatures for the modeling.
Procedures
Accuracy and SSE (sum of squared error) were used as the metrics of comparison. They were computed overall for each of the 6 semester logs and 4 versions of algorithms.
In the case of legacy CUMULATE and parametrized CUMULATE-best-guess algorithms, the data was taken as it. The globally parametrized, and individually paramatrized CUMULATE algorithms were supplied with the pre-fit global/individual user/problem-specific parameters. The data of only one of the early courses was used to obtain the parameters. Data of all 6 was used to compute parametrized models. Refer to the table below for details and basic log statistics statistics.
Semester | School | Level | Procedures | Users | Datapoints | Attempts per user | Problems per user |
---|---|---|---|---|---|---|---|
Fall'07 | Pitt | Und. | Obtain global/individual user/problem-specific parameters Compute CL, Cp:bg, Cp:glob, Cp:ind |
27 | 4224 | 156.44 | 29.96 |
Fall'07 | Pitt | Grad. | Compute CL, Cp:bg, Cp:glob, Cp:ind/glob | 20 | 1233 | 61.65 | 29.95 |
Spring'08 | Pitt | Und. | Compute CL, Cp:bg, Cp:glob, Cp:ind/glob | 15 | 458 | 26.94 | 16.35 |
Spring'08 | NCI | Und. | Compute CL, Cp:bg, Cp:glob, Cp:ind/glob | 17 | 216 | 12.71 | 6.59 |
Spring'08 | NCI | Und. | Compute CL, Cp:bg, Cp:glob, Cp:ind/glob | 18 | 142 | 7.89 | 4.00 |
Spring'08 | DCU | Und. | Compute CL, Cp:bg, Cp:glob, Cp:ind/glob | 52 | 4574 | 81.68 | 22.82 |
CL - legacy CUMULATE's algorithm, Cp:bg - parametrized CUMULATE's algorithm with guessed global parameters, Cp:glob- parametrized CUMULATE's algorithm fit global user/problem parameters, Cp:ind - parametrized CUMULATE's algorithm with individual user/problem parameters, Cp:ind/glob - parametrized CUMULATE's algorithm with individual problem parameters, and global user parameter
Results
As we can see from the figures below, neither guessed not globally-fit parameters in a new parametrized algorithm give it an edge. Individual problem-specific parameters, however, do give it an advantage. In majority of the cases this advantage is significant (with exception to the only graduate level course offered at Pitt in Fall 2007 semester)
Publication
The results were presented during the i-fest 2009 poster competition at the School of Information Sciences, University of Pittsburgh and got the second place in the PhD track. Poster can be accessed here.
Addendum
MATLAB code used for computation can be downloaded here.
CUMULATE log data for the 6 Database Management Courses can be accessed here