A Resource for Fellow Undergrads: Advice About Data in Professional Talks

You’ll be surprised how important this is. Remember how you learned to use Excel in junior high (or maybe earlier)? It was probably for a science project. Well, don’t worry, because as a researcher, you’ll never have to make another Excel graph in your life!

…because we use things like Origin and Igor instead.

…which are complex enough to have their own programming languages.


Why? Well, once you pull super-cool results from your research (which I’m starting to now, by the way, and it’s awesome), you’re presented with a problem: people don’t have much time, and you have a lot of information. Often tens of thousands of data points, from different tests and techniques over months of narrowly-focused work, with layers of data analysis in between, that you somehow have to compile in a way that these busy, tired people will understand.

And they don’t want to deal with lazily-made graphs.

If you think I’m making a mountain out of a mole hill, think again – it’s this kind of stuff that makes papers take years to publish. But that’s why it’s good to get a handle on it now, when you’re just starting out your career!


It’s easy to get mired in the details of each individual graphing software. You can probably use any, as long as it offers enough flexibility, so instead I’ll keep my advice general. The predominant choice here is Igor, which is the source of the plots in the “good” column. The “bad” is a combination of MATLAB and Excel.

This wisdom comes from the EUREKA program, and my research mentor.

Disclaimer: none of the “good” column graphs have titles. This is only because I like to interchange titles based on what I use the graphs for, without having to edit the graph image over and over. Always, always include a title.

Bad

M - Three times more filtering

You’ll probably see a lot of this. This is fresh out of MATLAB, and yes, something I actually sent my mentor at some point. It looks great at the time, since you’re buried in your process and don’t take a step back to look at it! But in reality,

  • The lines are too thin
  • The labels are tiny
  • Most of the labels are unnecessary, at least for a presentation
  • Two graphs are overplotted for no particular reason
  • It’s not cropped, and the axis range is poorly selected to include extraneous regions (including regions in which my filtering technique had some unprofessional hiccups!)
  • Image quality is low
  • Almost everything is whitespace or noise (extra “ink on the page” that doesn’t contribute to meaning) – which is a guaranteed way to make your graph confusing

But lastly, and most importantly, the purpose of the graph is not clear. Unless you need it to portray something, why would you show it?

Better (but not perfect)
MagneticTransition-2

This is the same analysis, but a version of it which could conceivably be used for professional work. It’s not quite what you would use for a paper (which is more formal), but it’s pretty decent for a presentation. And certainly it’s leagues ahead of the other one. The reasons why turn out to be good general tips:

  • Thicker lines
  • Bigger labels
  • Only the important and necessary things are labelled
  • Nicer colors* (this happens to coincide with color coding elsewhere in the presentation, which is another good strategy)
  • Overplotting can be fine, but if data sets cross each other, it can get confusing
  • The most important point on the graph is clearly identified
  • High image quality – never screenshot; any good graphics program will have ways to export high-resolution figures

*  Never use red and green to distinguish between different data sets on the same graph. Red-green colorblindness is surprisingly common!

** Only important in a presentation – complicated graphs are often the only way to go, in papers.

 Bad

BadCVsDopantFit

Ah, the other ancient enemy of clear presentation: the default graphs from Excel. This is something I had for personal use, to organize the outputs of my data analysis. Don’t assume that you can just put it in your slides like this!

  • Even if you’re going to explain what it means, there should minimally be some clues which are written
  • Both in the graph area and in the legend, there is duplicate data! That becomes noise to people who are trying to read your graph – or, in this case, it could make people think “Wow, that’s a really good fit!” when in fact the red points are what’s supposed to correspond to the trend line
  • Don’t use “e” or “E” for scientific notation. Write out *10^x, if you need to. Whatever takes up the fewest characters is usually the best choice, so in this case, it would be better to just use decimals!
  • There are some extra pieces which are legacies from analysis work, some points which are unverified, etc. – you will be expected to explain every single item on your graphs, and that can take time away from the good science!
  • More of the same stylistic issues as before
Better (but not perfect)
Graph0

So again, this is meant for a slide presentation. Readability and simplicity are key, as before. Here are some more things to point out:

  • Color choice is key. Notice that the red here is neither a primary color nor a default color – going slightly towards pastel from the brightest colors available to you often works well. (It comes across more clearly if you have a graph with many different colors, and makes you look like quite the professional.)
  • Credit is given to prior work – that fit line was obtained by a former group member, and previously published. If you don’t attribute work to those who did it, the assumption will be that you did – which can constitute plagiarism, with quite serious ramifications if it makes its way into a publication. Be overcautious!
  • Be creative with your axis labels. That doesn’t mean making them polka-dotted. That means choosing the format that works best for the graph you’re working with. You’ll notice that the x-axis variable is dimensionless. But I still had to convey what it meant! So rather than putting the units in parentheses, I included the chemical formula. I stopped short of explicitly pointing out the x in the formula because it’s expected that people will digest your graphs to some degree, and at some point more text just becomes noise.

I’ll leave you with a thinking point: How to deal with data that isn’t what you expected. Sometimes there are just deadlines, and you don’t have time to conduct the tests you would have to in order to correct your mistakes. What can you do? Be honest. Still choose the most effective way to present the data, even if it’s presenting your mistakes, or things you’re still seeking to understand. Most of your work is complicated. People won’t blame you for setbacks. But, as you’ll see time and time again, they’ll nail you for hiding them.

Yes, that data is going in the opposite direction of the sensibly-expected trend, and is horrendously scattered. Such is life!

Can you see the setback?

News Flash: Research is Fun!

Album cover art?

Album cover art?

Okay, of course it is, or else you wouldn’t hear people saying that all the time.

But skeptics, take it from me: research isn’t as drawn-out and repetitive as some people make it out to be.

I say this having worked in a materials science lab, both on a mechanical design and a strictly materials project, for just about half a year. In this field, it’s commonplace to have samples sitting in a furnace cooling down for over a week – measurements are complex and require many trials per growth – purity of your sample is left largely to chance – etc. Basically blows your chem lab class out of the water.

Preparation of a few samples for EDX (energy-dispersive x-ray spectroscopy), which helps determine overall sample composition. They're all about 1mm square.

Preparation of a few samples for EDX (energy-dispersive x-ray spectroscopy), which helps determine overall sample composition. They’re all about 1mm square.

It adds up, in weeks and months, to the slow completion of a project.

What may come as a surprise is this: that’s not at all a bad thing.

Quite the opposite. Maybe I just don’t get out enough during the school year (free time of any quantity becomes rare on a weekly basis), but this summer has been the time of my life. I sleep better. I socialize more. I’ve even started a fairly rigorous exercise routine (another thing that gets pushed to the wayside during the school year), and rock climb on the side, in addition to resuming the personal projects I find important.

Now, that’s the easy stuff to understand: that’s living; it’s objectively fun, most would agree. Perhaps it’s a surprise that research allows you that, or perhaps not.

Here’s the shocker (not to me): In this field, where many stress the doldrums and repetition, I find myself doing highly varied work on a day-to-day and even hourly basis, which keeps me both engaged and happy, and all the nuances of which I am far from mastering. Work in this lab has taken me everywhere from highly-automated x-ray diffraction measurements (beautiful machines, by the way) to ludicrously small-scale tweezer work (I’m talking 50-micron gold wire). Everything has its own challenge.

There are those 50-micron wires (made wider by a coating of the silver adhesive). The scale is in quarter-millimeters (4x zoom.)

There are those 50-micron wires (made wider by a coating of the silver adhesive). The scale is in quarter-millimeters (4x zoom.)

But there are great rewards. I won’t even go into the whole contributing-to-science aspect; on a purely personal level, there are accomplishments and highs that take you by surprise.

I’m a bit of a programming enthusiast. For years, it’s been a hobby; something I knew would eventually become useful, but hadn’t found an application for yet.

This summer has brought that to fruition: first in determining the temperature-dependence of a magnetic transition in the compound I’m studying, and secondly in distinguishing on a 0.02 Angstrom scale (0.1% of the measured quantity) between unit cell lengths of differently-doped versions of that crystal.

Hopefully that doesn’t sound like gibberish, because it’s really cool. In both cases, the data analysis was done via MATLAB scripts I wrote (from scratch.) Nothing too complicated, true (just some elementary signal analysis, filtering, and numerical methods), but nevertheless instrumental to my research – and I implemented it myself.

That’s the most invigorating part – when you realize your mentor has stepped back, and you have at least an elementary skill-set to start doing things yourself in a professional lab environment, making your own hours and prioritizing your own approaches. It’s a great feeling! – and you should try it.

IMG_20160623_144942488_HDR