Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding patterns in text (literature, websites, ect.) #5

Open
iamciera opened this issue Oct 10, 2014 · 1 comment
Open

Finding patterns in text (literature, websites, ect.) #5

iamciera opened this issue Oct 10, 2014 · 1 comment

Comments

@iamciera
Copy link
Member

I already started this a bit, but I have a project that incorporates a lot of what I want to learn: web scraping, python data handling, and visualization. The end goal is to find patterns in my favorite book, Infinite Jest, but I found that the tools I was building/using could be used for any text. Another reason I want to use this book is that David Foster Wallace has an extensive math background and has eluded to a "fractal" structuring of the plot. He is meticulous and incredibly calculated, I think there could be some interesting visualization in this book. I am not alone, this book also has a history of maniac fans that try to do data analysis and visualization BY HAND! I want to learn web scraping to take all their hard work and suck it into my dataset.

Anyway, someone could help build these tools for their favorite book along side of me. We wouldn't be carving a brand new path either, this is an entire field with a lot of people we can stand on the shoulders of.

Python Functions
Here is a list of functions that I want to build or have built:

Split Book by

  1. Words ✓
  2. Sentences
  3. chapters ✓
  4. Paragraphs
  5. Count occurrences of words (length = one word) ✓
  6. Track position of words ✓
  7. Count occurrences of phrases (length > one word) ✓
  8. Count occurrences of phrases
  9. Attach chronology information to chapters and position of occurrence

Web Scraping
So I want to work on writing and understanding web scraping to take all their hard work and incorporate it into my dataset, for instance scraping the entire list of characters and places from this site

Visualization
The last step would be visualization. I did a simple visualization of a small subset of my favorite characters in the book in ggplot, but would like to map co-occurrence and things like that using D3. I need the dataset first though. Elgh.

screen shot 2014-10-10 at 10 52 48 am

@danfulop
Copy link

Mind blown!! :-) ...you're freak! ...in a good way ;-) It would be super cool if you revealed a fractal pattern in this novel through scraping and data analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants