Finding patterns in text (literature, websites, ect.) #5

iamciera · 2014-10-10T18:05:50Z

I already started this a bit, but I have a project that incorporates a lot of what I want to learn: web scraping, python data handling, and visualization. The end goal is to find patterns in my favorite book, Infinite Jest, but I found that the tools I was building/using could be used for any text. Another reason I want to use this book is that David Foster Wallace has an extensive math background and has eluded to a "fractal" structuring of the plot. He is meticulous and incredibly calculated, I think there could be some interesting visualization in this book. I am not alone, this book also has a history of maniac fans that try to do data analysis and visualization BY HAND! I want to learn web scraping to take all their hard work and suck it into my dataset.

Anyway, someone could help build these tools for their favorite book along side of me. We wouldn't be carving a brand new path either, this is an entire field with a lot of people we can stand on the shoulders of.

Python Functions
Here is a list of functions that I want to build or have built:

Split Book by

Words ✓
Sentences
chapters ✓
Paragraphs
Count occurrences of words (length = one word) ✓
Track position of words ✓
Count occurrences of phrases (length > one word) ✓
Count occurrences of phrases
Attach chronology information to chapters and position of occurrence

Web Scraping
So I want to work on writing and understanding web scraping to take all their hard work and incorporate it into my dataset, for instance scraping the entire list of characters and places from this site

Visualization
The last step would be visualization. I did a simple visualization of a small subset of my favorite characters in the book in ggplot, but would like to map co-occurrence and things like that using D3. I need the dataset first though. Elgh.

danfulop · 2014-10-10T18:25:31Z

Mind blown!! :-) ...you're freak! ...in a good way ;-) It would be super cool if you revealed a fractal pattern in this novel through scraping and data analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding patterns in text (literature, websites, ect.) #5

Finding patterns in text (literature, websites, ect.) #5

iamciera commented Oct 10, 2014

danfulop commented Oct 10, 2014

Finding patterns in text (literature, websites, ect.) #5

Finding patterns in text (literature, websites, ect.) #5

Comments

iamciera commented Oct 10, 2014

danfulop commented Oct 10, 2014