Text Analysis and Text Mining Using R
I would cover the broad set of tools for text analysis and natural language processing in R, with an emphasis on my R package quanteda but also covering other major tools in the R ecosystem for text analysis (e.g. stringi).
The talk would is tutorial covers how to perform common text analysis and natural language processing tasks using R.
Specifically, I will demonstrate how to format and input source texts, how to structure their metadata, and how to prepare them for analysis.
This includes common tasks such as tokenisation, including constructing ngrams and "skip-grams", removing stopwords, stemming words, and other forms of feature selection.
I will also show to how to tag parts of speech and parse structural dependencies in texts.
For statistical analysis, I will show how R can be used to get summary statistics from text, search for and analyse keywords and phrases, analyse text for lexical diversity and readability, detect collocations, apply dictionaries, and measure term and document associations using distance measures.
Our analysis covers basic text-related data processing in the R base language, but most relies on the quanteda package (https://github.com/kbenoit/quanteda) for the quantitative analysis of textual data.
We also cover how to pass the structured objects from quanteda into other text analytic packages for doing topic modelling, latent semantic analysis, regression models, and other forms of machine learning.
Source: useR 2017
The talk would is tutorial covers how to perform common text analysis and natural language processing tasks using R.
Specifically, I will demonstrate how to format and input source texts, how to structure their metadata, and how to prepare them for analysis.
This includes common tasks such as tokenisation, including constructing ngrams and "skip-grams", removing stopwords, stemming words, and other forms of feature selection.
I will also show to how to tag parts of speech and parse structural dependencies in texts.
For statistical analysis, I will show how R can be used to get summary statistics from text, search for and analyse keywords and phrases, analyse text for lexical diversity and readability, detect collocations, apply dictionaries, and measure term and document associations using distance measures.
Our analysis covers basic text-related data processing in the R base language, but most relies on the quanteda package (https://github.com/kbenoit/quanteda) for the quantitative analysis of textual data.
We also cover how to pass the structured objects from quanteda into other text analytic packages for doing topic modelling, latent semantic analysis, regression models, and other forms of machine learning.
Source: useR 2017