You are viewing an old revision of this post, from March 11, 2009 @ 19:19:02. See below for differences between this version and the current revision.
It seems that there is a large opportunity to apply modern technologies to academic articles. Take a data sources such as arXiv.org, and apply text analytics and quantitative algorithms to relate and rank articles. In academia, word choice tends to be more precise than ordinary speech, so entity extraction should have a higher success rate. Furthermore, format is mandated, which both makes entity extraction easier and allows interesting statistics to emerge. For instance, in-text citations allows one to find out how many times an article is referenced (superior to simple the bibliography), what text the article is cited to support, and so on. One could use that to create a document map of the article; who is cited where? Given the demarcation into sections in journal articles, simply knowing who’s most often cited in introductions for a particular field is useful. Furthermore, the time-based nature of academic progress provides a number of interesting analyses for research. In short, the avenues for interesting research are almost endless – you have vast quantities of data, in a more than unstructured format, with the potential upside of both making research dramatically easier and potentially building a machine knowledgebase. Pity there’s not enough money in it.
Post Revisions:
- March 11, 2009 @ 19:19:26 [Current Revision] by Michael Griffiths
- March 11, 2009 @ 19:19:02 by Michael Griffiths
Changes:
There are no differences between the March 11, 2009 @ 19:19:02 revision and the current revision. (Maybe only post meta information was changed.)