Text and Data Mining
Find text and data sets and explore related tools, methods, and analysis through our workshop, class instruction, and consultation offerings.
What do you want to do?
Whether a computational process is needed for text and data mining — as opposed to analysis by hand — depends upon the level of complexity.
Review these questions as you consider what you want to find out:
- Do you need to analyze an entire corpus (body of work), or just selected items from it?
- Do you need the content to be easily read by humans, or only by machines?
- Do you need to download the entire contents? Or can your analysis be conducted on the platform where the content already resides?
- What kind of analysis do you want to do?
Find resources for mining
The following research guides contain text and data in broad categories or genres — such as content from the last year of Twitter, Congressional hearings, or particular newspapers. Most are licensed for use only by U-M researchers.
- English Language and Literature
- News Content
- Social Media Content
- Government-produced data
- Linguistics/Text Corpora
- Bibliometrics and Citation Analysis
You aren’t limited to these resources. We can help get the data you need by locating additional text or data sources, advising on whether licensed content is available to mine, and, in some cases, negotiating access to datasets and text collections for U-M researchers.
How we can help
We can help you find and use text and data sets for mining and offer introductory consultations, along with workshops and in-class instruction.
Request a consultation
We offer introductory trainings on using specific tools for data processing and text analysis, as well as make referrals when appropriate. We also provide consultations for data collections that the U-M Library owns or licenses.
Request a workshop or class instruction
Contact us to request a workshop or in-class instruction focused on text analysis. We offer overviews and introductions to text and data mining technologies, methodologies, and tools.