Learning a New Language – Computer Code!

Hey everyone!  Luckily my research has finally progressed past the experimental design phase and w are now in the data analysis phase! (WoooHoooo!) However, this phase of research comes with many of its own challenges.  To begin with, depending on your project type, there may be alot of data to handle, and there may not be a program already in existence that can analyse your data in the appropriate ways.  This was the case with my project, with most experiments having about 3200 data points each!  This meant that I had to learn a new language in order to properly design and write a program that would handle this large amount of data.  Even though this was a daunting task, it was all possible through one of the greatest tools of this age: the internet!

So, how do you go about going from a total novice at computer code to writing your own program in as short a time as possible?

Well the first thing to do is to write out by hand exactly what you want your program to do.  For example, if your experiment results in a data file, the first thing you will want your program to do is pull out the relevant data to be analysed from the output file.  Now that you have your data pulled into the program or stored in a file your program can access, what do you do with it?  This portion of the programming is highly dependent on your specific requirements, but for the purpose of example lets just say that your data files contain several position measurements you want to create a histogram of how frequently  these position measurements fall into certain bounds.

The first step to doing this is to come up with a very small sample of your data and do the calculations by hand or with the help of a program such as excel.  For our histogram, we would want to count the number of cells with values in a certain range, tally the results, and then plot them on a bar graph.  This step is crucial to writing a program that both works, and works as your expect (a big distinction).

Now it’s almost time to start coding! But before you can do that you have to decide what type of program to use.  This would be a good time to talk to your research mentor, and to see what types of coding languages they know, and what software the lab has available to write your code with.  In my case, I chose to use Wolfram’s Mathematica to write my program.  This particular piece of software isn’t strictly a platform to write programs, but is more based in a data analysis platform that accepts commands as code, which makes it easy enough to use by versatile enough to perform the tasks I wanted.

So, you’ve got your data files, a written out example of what needs to be done to the data for proper analysis, and software in which to write your program.  Now what?  This is probably the most difficult step, as staring a blank page with the hopes of filling it with correct code is quite daunting.  But this is where you written out plan comes in!  Looking back at our example, the first thing we needed to do was import the data.  Assuming that no one we know already knows how to do this, a simple google search for “importing data <coding language or software here>” will most likely yield a result.  Once you figure out the correct function to import your data, it is important to run your program and make sure that nothing strange is happening to your data during the import process.  This somewhat tedious process of checking each step of your program as you write is may seem annoying, but it is much better than writing a 200 line program and finding an error back in like 5 that requires substantial reworking of all the subsequent steps.  Keeping this check as you go philosophy in mind, continue searching the internet and the software documentation for ways to make the program do what your plan requires, and with enough time and effort a custom made analysis program will come out of it!

Of course, we’re not done yet.  Now that you program works for one data file, it is time to make sure it works for all of them.  This is where designing your experiments beforehand so that the data files come out in the same format every time comes in handy.  If all your files are in the same format, you should be able to feed in each file and run the program.  If they are not all similar, you might either have to adjust your program so that it can handle these differences (which might be impossible depending on how different each file is) or to slightly tinker with your program each time you feed in a new data file.  A good way of doing the latter option is to place lines of code at the beginning of your program that define variables for the parts of your data files that are different.  For example, if your code has a line that says “while column number is <5, perform function x and ++1 to the column number” and you have data files with different numbers of columns, change the 5 in the line of code to a variable, which you can define in the first few lines of code and then edit later one without having to sift through your functions to find the correct line to change.

Hopefully this information will help someone who, like me, is trying to write a program without knowing any computer languages beforehand.

Best of luck, and happy debugging!