I’ll say it loud and proud. I’m excited to share this final project from my data visualization studio class about Anne Sexton’s poetry. I wasn’t sure I would be able to say that a few weeks ago but I have overcome numerous hurdles and have finally made it out of the Swamp of Despair. Read on about my journey and the process.
Why Anne Sexton?
A sociology professor introduced me to Transformations and Anne Sexton while an undergrad. I can still remember how these poems busted my Disney-fied interpretations of fairy tales. I remember being mortified and fascinated at the same time. I recall thinking the experience is a good because, at the time, I was ready to face and look at the harsher realities of life. I was already waking up; experiencing life in a city outside of a privileged suburban life. It was the perfect time for Anne Sexton to come into my life.
Fast forward decades later as a graduate student, I found myself proposing a data visualization project to get to know Anne Sexton and her poetry a bit more. Somehow the period of my life while an undergrad and my life currently as a graduate student feel similar. It’s hard to explain and one thing is clear, changes are taking place.
Python, bring it on.
Thinking back, I had no idea what I was taking on.
It was the beginning of the semester and I had made it a personal mission to learn and practice more coding. After some research on text analysis, I was certain Python was the right path. I read it can be applied in so many ways from visualization to physical computing to machine learning. If I’m going to learn, might as well go with a language that is versatile; more bang for your buck kind-of-thing. Right?
Well, I knew I would need some guidance so based on past experience, I decided to reach out to the data viz community for some leads on how to get started.
Data Viz Twitter
I’ve spent years with Photo Twitter and Design Twitter. Both still very cool communities I dip my toe into but Data Viz Twitter is cupcake for the foreseeable future. For example, when I tweeted a request for Python help, Mindy McAdams, a Digital journalism professor at the University of Florida was so kind as to tweet me one of her tutorials on setting up a Python environment.
Student tested! Take your time, and you can @ me. https://t.co/STXR5otmlL— Mindy McAdams (@macloo) August 29, 2019
Looking for something else, found this – https://t.co/QBfVg7KoKu – looks good for starting with NLP.— Mindy McAdams (@macloo) August 30, 2019
I struggled a bit at first as I was not familiar with command-line and the idea of typing into Terminal was intimidating but Mindy’s tutorial got me up and running with Jupyter Notebook and a fresh install of Python.
Miniconda over anaconda. Check. Jupyter Notebook as my IDE. Check.
Data Wrangling: Gathering Her Poems
Many of Anne Sexton’s poems are available online but not all and I definitely wanted to be efficient so I went about learning how to scrape the content. In the middle of reading how to do just that I discovered ParseHub. Easy to use but limited to the free version, I collected a sizeable number of poems. The downside was the duplication and I wasn’t an expert. So in order to make sure I had all poems I bought Anne Sexton: The Complete Poems to cross-check my collection. It was a great idea because some online sites only featured parts of her poems.
Anne Sexton: The Complete Poems became my Bible and it eventually helped to structure my project. Given that I was not going to re-publish any poem and the data would remain in my hands for a school project, I felt better when I needed to do a bit of PDF editing magic to extract her poems.
In total, I collected 308 poems from online sources and ebooks.
A note about hidden characters
There were a lot of these funky characters —Â,Áí, Á ,Ä¶ Äî‚Äô— that caused quite a few hiccups. I thought if I made sure the text was UTF-8, all was good, but no such luck. There were invisible characters in addition to those above. Find/Replace came in handy for most but didn’t catch all and I found myself having to manually clean them up.
Python? What about R?
I’m not exactly sure when I realized I needed to say goodbye to Python but I felt that too much time had passed. After several weeks I felt I had not made any progress. I was worried.
What if I’m not able to pull this off?
Did I just set myself up for major failure?
Is that so bad?
Uh yeah…sort of.
I had been moving through Codecademy’s python course but could not find much online that would help a beginner get started with the basics of text analysis using Python. I found many machine learning tutorials and many articles that provided instruction on how to use packages for text analysis but I found a giant gap in understanding why.
I shared these concerns with fellow designer and classmate, Alyssa Fowers, and she suggested I give R and this book a try.
But I was stubborn. I had set a goal and I wanted to follow through. It wasn’t for another few weeks that I decide to let go of my grand ideas about Python…
R, I love you.
Yes, it happened that quickly. What can I say?
There’s no particular reason I didn’t start with R from the beginning. I just wanted to learn Python. But in October the university had a long weekend break. I had time to focus on one thing so I decided to give R a run. If I could make progress over the weekend, any progress, I’m game.
Using the online version of Text Mining with R and a Data Camp text analysis course and tutorial using the songs of Prince (Little Red Corvette anyone?) I taught myself enough R in one weekend to feel good and make progress on my project. I started to make my first charts in R dutifully following tutorials then I started to practice with Sexton’s poems. Of course, there were many errors, charts that made no sense, and plenty of experimentation. I made useless chord diagrams and too many word clouds. It was fun.
The beauty of Text Mining with R is that while learning R and I was also learning about text analysis.
Some of the questions I had about her poems:
- How many words exist in the collection of published works?
- What are the most frequent words throughout the collection? What are the most frequent per poem?
- What is the diversity of words used per poem?
- Are there frequent word relationships?
- What is the overall sentiment of her poems? Does it change over time?
- What is the most positive poem and the most negative in each collection?
- What is the most negative and positive collection?
Text analysis and sentiment analysis is quite the specialization and I hope to learn more since I’ll be working on a project next semester with text and images. It should not have surprised me but sentiment analysis is incredibly subjective so based on what I learned in Text Mining in R, I used the Bing lexicon for my project.
Turns out, this project and process dove-tailed nicely with an Artificial Intelligence class I took with Dr. Chuan. We had many conversations in class about ethics and AI. Each time I have to remind myself to remain calm when I learn what people are doing just because they can.
One book I need to purchase is Liu Bing’s book, Sentiment Analysis and Opinion Mining. I think this book will help me understand a lot more especially for future projects. Hmm, my capstone comes to mind.
I realized early on that I had to take notes; lots of them. At one point I had accidentally filtered out the word, “God” while also removing stop words. And there is probably some magical way of tracking variables and data frames but I got a bit carried away and started to lose track. I had to make myself a chart to understand and remember that this set fed into this other set and created yet another set. Highlighters became my BFF.
Now I recognize that my workflow may be incredibly flawed but honestly, that was the least of my concerns. I had an assignment and deadlines and I did what I could while making a wonderful mess.
The notes saved me quite a few times because I had to redo some of my charts and having these notes was sunshine. Thank you past self!
First Draft: Creativity, Move Aside
“It looks too much like R charts.”
That is the one line of feedback that burst my R bubble. But yes, I agreed, it did look too clinical. I started to wonder what I could do differently when a different thought or question came to mind: Why is it that creativity went out the door? I somehow couldn’t think beyond getting things to work in R.
It brought to mind a quote I read from Daniel Hooper, the founder of Principle, an app for animating user interfaces:
Even if you’re a person who knows both design and engineering, your brain operates very differently in each state. Putting people in the technical engineering mindset suppresses their innate visual creative skills.
I lost sight of how to present what I was discovering about her poems in a compelling and cohesive way. Seems obvious. Yet, I think many of us experience this phenomenon when learning anything for the first time and especially when it is technical. I have a love/hate relationship learning code (or anything new) because of the simple fact that it is unfamiliar and in that process you get stuck and stumble often. But once the light bulb goes off…BOOM.
I’m not sure when it happened but suddenly everything started to gel. I think at some point I finally had a clear direction. Coding the website was a big factor. I initially had plans to use scrollytelling but I soon realized I was trying to fit content into buckets or in this case, content into a popular storytelling format. A format that isn’t required nor is it always appropriate.
We had a brief discussion in class about interactivity and learning when it might be needed or not. More often than not, making a chart interactive wasn’t needed. I think this goes back to balancing design and engineering. Just because you can doesn’t mean you should or have to.
I started to play with parallax and split screens to see how charts, text, images would play out “on the page” so-to-speak. I also learned a bit of CSS animation to see if the word cloud I had in mind might work well animated. I also learned a bit about CSSgrid which blew my mind. No more floats? Cool.
Essentially I was sketching with code in a way that made sense to me. Definitely not pretty but that wasn’t the goal.
If there’s one big thing I learned through this process when it comes to coding it’s this: Start. Go ahead and break things. You won’t learn unless you start and stop worrying about breaking things. Change one thing. Test it. If it breaks you don’t lose track of what broke it. Firefox is also your best friend (ok, or Chrome).
Interlude: Getting to Know Anne
Reading her biography I came to understand Anne Sexton much more intimately. I also started reading Anne Sexton: A Self-Portrait in Letters which is giving me more insight and understanding. Some of what she did was downright shocking and self-indulgent but perhaps that was her way of coping. Unfortunately, the impact of some of her actions had on her family was heartbreaking.
Philip McGowan’s book is a more contemporary lens from which to understand Anne Sexton and her work. So many reviewers slammed her posthumously published works as ramblings. McGowan helped me remember the mind is complex and that sometimes text can be limiting. How does or how can poetry push the boundaries of words. There’s definitely more to think and learn about that.
These two books in addition to the wealth of information online bridged the events in her life with her poetry. Where I could, it was important to give greater context to the charts because they are negative.
The Big D3
So, one of my last charts needed to be interactive. This meant I needed to learn it toute suite. Previous attempts, however, were not positive so I needed a strategy (back up plan) because I was feeling good about how my project was shaping up.
I learned from my dad to always have a back up plan. The solution wasn’t ideal but I had to find an alternative to present the sentiment of each poem in each collection just in case I couldn’t learn enough D3 to pull it off. After trying DataWrapper, Flourish, Charticulator, and RawGraphs, I settled on Tableau. The only pet peeve (ok, a big one) was the branded tab bar that is carried with embedded Tableau charts. I tried so many different ways I knew to hide the bar but no luck. The code was buried deep and I just couldn’t figure out a way to hack it.
Nope, Must Be D3
It’s interesting what motivates me. I wouldn’t be all that disappointed if I had the Tableau chart in my final project but the lack of customization and that tab bar was enough to make me sit in Starbucks until midnight after sitting for several hours before to at least get a y-axis with negative values and an x-axis.
Plus, I just had to create a chart using D3. Python was one goal. D3 was the other in a long list of goals for the semester. Next semester I’ll be taking a course focused on D3 so I also wanted to get a head start because I want to focus on storytelling. The tool shouldn’t feel like one.
After much sputtering, the D3.js Graph Gallery along with 3 tips (below) from my good friend and classmate, Qinyu found me on the path toward success. Naturally, there were many other resources online such as this, this and this. All in all, too many to list and frankly too many to track but another goal for this semester was soon to be crossed off my list. Yes.
The Three D3 Tips = Priceless
Tip 1: Read in data this way
For some reason, this structure makes the most sense to me.
// data is now whole data set
// draw chart in here!
Tip 2: Group your data
Just like in R when I grouped the data by collection, I needed to group the data for D3 to process. I didn’t make that connection. In hindsight, not knowing this was the source of my frustration weeks ago. Of course, it makes a whole lotta sense now but when I started to learn how to build a scatterplot in D3 I was looking at tutorials that used only two dimensions and linear data rather than ordinal data. Lodash.js, according to Qinyu, would help me a lot. Oh boy, did it ever. (Turns out, in the process of learning how to use lodash with D3, I also learned that there is a nest function in D3. I’m assured by the professor for our D3 course next semester, we will cover this – yay. )
scaleBand rather than
scaleOrdinal or is it
So, I think one of the great aspects of learning these days is the on-demand results Google spits out. Like all searches, it depends on what and how you input keywords and phrases but also if people have taken the time to optimize their content and web pages. Stackoverflow has been a gold mine but it also full of dated crap. There were loads of dead ends which is fine but the process can really become demoralizing rather quickly. So, when Qinyu informed me that
scaleBand would be better than
scaleOrdinal, I went with it.
While I realize there are many different ways to write code for D3 charts, these three tips helped me down a more successful path. One other concept I’ve been thinking about is this: In D3, the structure is counterintuitive because you create these invisible placeholders before you draw shapes and attach your data.
R, more CSS3, CSSGrid, parallax, CSS animation, a taste of python, and D3 … wow. Sometimes a list is what you need to see what you’ve learned. Sentiment and text analysis is definitely something I want to practice more. I really had no idea what was involved in analyzing text. In fact, once I learn more I’d love to re-explore.
It dawned on me recently that I’m not even a year old learning about data visualization. In that case, I feel pretty damn good about my final project. This I believe is the beginning of what I hope to be the next phase of my life.