# Working with Data Files and Numpy Arrays¶

## Open files and load data¶

To load the file you just created (my_exp_1.data) you do the following:

```
In [2]: mf = B.get_file('my_exp_1.data') # B is the LT.box
```

`my_exp_1.data`

is the name of the data file that you just created
and that resides in your current working directory (if this did not work look at find my files).
The line of code:

```
B.get_file(’my_exp_1.data’)
```

creates a `dfile`

object, to which we assign the name `mf`

.
Now you can start to ‘play’ with it. As an example you can find the
column names of your data by doing the following:

```
In [3]: mf.show_keys()
```

The `()`

is important. To look at the values associated with each name
you can do:

```
In [4]: mf.show_data('time')
In [5]: mf.show_data('dist')
In [6]: mf.show_data('d_err')
```

Or you can display them together:

```
In [7]: mf.show_data('time:dist:d_err')
```

This may be useful for showing them but in order to work with the data you need to get them into ipython as arrays. This is achieved by doing:

```
In [8]: t = mf['time']
```

If you remember from eaerlier, this is similar to accessing an element of a dictionary, but in this case you get
a so-called `numpy.array()`

which I then stored in variable `t`

(for time).
Numpy arrays are
one of the most important objects we will be using and will be
discussed some more below.

You can again display the array by just typing its name and hit return

```
In [9]: t
```

You should see something like (the number in the brackets will be different):

```
Out[9]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
```

To access a single element of an array you enter:

```
In [10]: t[1]
```

Try out other values for the index!

The integer (whole number) in the bracket is the index and in this case runs from 0 to 10, since there are 11 elements in the array. You can find the length of an array by typing:

```
In [11]: len(t)
```

And you should get back 11.

## Numpy arrays¶

A numpy array is a one- or multi-dimensional array of most
frequently numerical or logical data (but other data can also be
stored). You can find a very nice introduction at
numpy for beginners .
Information on its dimensions and its size
is stored in the `shape`

part of the array. Try the following:

```
In [11]: t.shape
```

and you should get (11,).

You can now manipulate this array. For instance you can multiply it by a
number. In the example you will multiply all values by 1000. to convert
to milliseconds and assign the new result to `tms`

:

```
In [12]: tms = t*1000
In [13]: tms
```

The output now looks like this:

```
Out[13]: array([ 0., 1000., 2000., 3000., 4000., 5000., 6000.,
7000., 8000., 9000., 10000.])
```

Almost any mathematical operation is possible, check the documentation. Now get the the distance data and the corresponding errors by doing:

```
In [14]: dexp = mf['dist']
In [15]: derr = mf['d_err']
```

You can also make arrays that have the same size as your data but
contain only 0’s or 1’s. In the example below we make an array
exactly like `derr`

that contains only ones and another that
contains only zeros.

```
In [16]: err_one = np.ones_like(derr)
In [15]: err_0 = np.zeros_like(derr)
```

Look at `err_one`

and `err_0`

and verify that they contain only
1’s or 0’s and have exact the same number of elements as you array `derr`

.

You can convert a python list to a numpy array as follows:

```
In [12]: my_list = [1,2,3,4,5,6]
In [13]: my_array = np.array(my_list)
```

You can also convert a numpy array back to a list by doing:

```
In [12]: my_list = list(my_array)
```

Another very useful tool is ```
np.linspace(start,stop, num,
endpoint)
```

. This function returns an array of `num`

equally spaced
values between `start`

and `stop`

. By default `endpoint = True`

meaning that the stop value is included in the array. If you set
`endpoint = False`

the stop value is not included. Below are a few
examples.

```
In [21]: np.linspace(start = -2. ,stop = 5.,num = 10, endpoint = True) # Include the endpoint
Out[21]:
array([-2. , -1.22222222, -0.44444444, 0.33333333, 1.11111111,
1.88888889, 2.66666667, 3.44444444, 4.22222222, 5. ])
In [22]: np.linspace(-2., 5., 10) # short cut version
Out[22]:
array([-2. , -1.22222222, -0.44444444, 0.33333333, 1.11111111,
1.88888889, 2.66666667, 3.44444444, 4.22222222, 5. ])
In [23]: np.linspace(start = -2. ,stop = 5.,num = 10, endpoint = False) # without the end point
Out[23]: array([-2. , -1.3, -0.6, 0.1, 0.8, 1.5, 2.2, 2.9, 3.6, 4.3])
```

As an example try the following from your spyder console (assuming `pyplot`

as been preloaded):

```
In [24]: x = np.linspace(0., 2.*np.pi, 1000) # create an array with 1000 elements
In [25]: plot(sin(x), cos(x + np.pi/4.)) # create 2 Lissajou curves
In [26]: plot(sin(x), cos(5.*(x + np.pi/4.)))
```

Note that in the previous example the terms \(sin(x)\) and \(cos(x + \pi/4)\) are calculated for the entire array of 1000 values.

## 2-dimensional arrays¶

You can combine 1-dimensional arrays of the same length into a two dimensional array by:

```
In [16]: time_distance = np.array([t, dexp]')
In [17]: time_distance.shape
```

should now give you (2,11). You can access each element by their indices, try:

```
In [16]: time_distance[2,3]
```

## Selecting data from arrays / Logical operations on arrays¶

As in regular python lists numpy arrays support a wide variety of slicing operations to select a sub-set of data from an array:

```
In [16]: t_sub = tms[2:8]
In [170: t_sub
Out[16]: array([2000., 3000., 4000., 5000., 6000., 7000.])
```

Selects elements 2 through 7 of the array `tms`

and stores these in the array `t_sub`

. You
can also place the indices of the array elements that you would like to access in an array
as shown in the example below:

```
In [16]: i_s = np.array([2,3,5,7])
In [17]: t_s = tms[i_s]
In [18]: t_s
Out[16]: array([2000., 3000., 5000., 7000.])
```

Remember also:

```
In [16]: tms[0] # is the first element of tms or any array in general
Out[16]: 0.0
In [18]: tms[-1] # is the last element
Out[18]: 10000.0
In [19]: tms[-2] # is the 2nd to last element etc.
Out[16]: 9000.0
```

Numpy arrays can also be used in logical operations. This is especially useful when you would like to select a subset of the data for further operations. Try out the following

```
In [16]: big = tms > 4000
In [17]: small = tms < 7000.
In [17]: tms[big]
Out[17]: array([ 5000., 6000., 7000., 8000., 9000., 10000.])
In [18]: tms[small]
Out[18]: array([ 0., 1000., 2000., 3000., 4000., 5000., 6000.])
```

The arrays `big`

and `small`

are used to select elements from the
original array. The array `tms[big]`

only contains those values of `tms`

that are bigger than 4000 and `tms[small]`

only
contains the values that are smaller than 7000. The arrays `big`

and `small`

contain the logical result (`True`

or `False`

) of the logical expression for each array element.

```
In [19]: small
Out[19]:
array([ True, True, True, True, True, True, True, False, False,
False, False])
```

They can also be combined as

```
In [20]: both = big & small
In [21]: tms[both]
Out[21]: array([5000., 6000.])
```

Here `&`

means `and`

and `|`

mean `or`

. The array `tms[both]`

therefore contains
only those elements of `tms`

that are between 4000 and 7000 (excluding the limits)

This can also be written in one line as:

```
In [21]: tms[ (4000 < tms) & (tms < 7000)]
Out[21]: array([5000., 6000.])
```

## Parameters in data files¶

You can also access the parameters that you defined in your file. First you can look at all the parameters that you defined by doing:

```
In [16]: mf.par.show_all_data()
pressure 1.e5
temperature 80.
```

In this case you see the two parameters called pressure and temperature with a value of 1.e5 and 80, respectively. To get these values and store them in variables you would do:

```
In [17]: T = mf.par['temperature']
In [18]: P = mf.par['pressure']
```

If you get an error message saying e.g. mf.par does not exist you have an error in
your parameter definition in the data file.
For more detailed information look at the datafile documentation (`pdfile`

).

## Computations using arrays¶

Now all your data are in the form of variables and (numpy) arrays that can be used for computation. For instance you might want to know what percentage error each data point has. This can be done as follows:

```
In [16]: p_err = derr/dexp * 100.
In [17]: p_err
```

And the output should be:

```
Out[17]: array([ 31.42857143, 26.08695652, 12.5 , 13.15789474,
17.89473684, 8.57142857, 11.04294479, 7.60233918,
9.23076923, 5.91133005, 7.0754717 ])
```

To have a bit a nicer output you can use a `for`

loop. First some
information on loops. The simple `for`

loop works as follows

```
In [18]: for D in dexp:
....: _
```

The cursor will have moved to the right by about 4 spaces, the prompt has changed and the cursor is typically just below the D

now enter at the location of the `_`

:

```
print( 'distance = ', D )
```

the output should look like:

```
In [18]: for D in dexp:
....: print( 'distance = ', D )
....: _
```

The cursor is now just below the `p`

of `print`

. Now press `return`

**twice** and the loop starts to run. Your output will look like (with the
first part of the loop):

```
In [18]: for D in dexp:
....: print( "distance = ", D )
....:
....:
distance = 3.5
distance = 4.6
distance = 8.0
distance = 11.4
distance = 9.5
distance = 14.0
distance = 16.3
distance = 17.1
distance = 19.5
distance = 20.3
distance = 21.2
```

What happened here:

you created a

`for`

loop, where each element( one after another) of`dexp`

get assigned the name`D`

. In the loop body (what comes below the`for...`

statement and is indented) the current value`D`

is printed together with the string`’distance = ’`

.

**The loop ends where the indentation ends**

This is typical syntax in python and is used for all other program blocks. In the beginning it can be a bit irritating as you will encounter it (see indent_error for an example)

Interactively a block is closed with two returns.

In order to print all values of `t`

, `dexp`

and `derr`

in one for loop I
use `enumerate`

. First I check what `enumerate`

does

```
In [19]: for i,D in enumerate(dexp):
....: print( 'i = ', i, 'D = ', D )
```

and end the last line again with 2 returns. You should then see:

```
In [19]: for i,D in enumerate(dexp):
....: print( 'i = ', i, 'D = ', D )
....:
....:
i = 0 D = 3.5
i = 1 D = 4.6
i = 2 D = 8.0
i = 3 D = 11.4
i = 4 D = 9.5
i = 5 D = 14.0
i = 6 D = 16.3
i = 7 D = 17.1
i = 8 D = 19.5
i = 9 D = 20.3
i = 10 D = 21.2
```

In this variation the `i`

contains the index of `D`

in `dexp`

. Since the
corresponding values in `t`

, `dexp`

and `derr`

all have the same index, I can
print them all in one loop as follows:

```
In [20]: for i,D in enumerate(dexp):
....: print( 'time = ', t[i], 'dist = ', D, ' error = ', derr[i] )
```

Again I close the loop with 2 returns. An the output now is:

```
In [20]: for i,D in enumerate(dexp):
....: print( 'time = ', t[i], 'dist = ', D, ' error = ', derr[i] )
....:
....:
time = 0.0 dist = 3.5 error = 1.1
time = 1.0 dist = 4.6 error = 1.2
time = 2.0 dist = 8.0 error = 1.0
time = 3.0 dist = 11.4 error = 1.5
time = 4.0 dist = 9.5 error = 1.7
time = 5.0 dist = 14.0 error = 1.2
time = 6.0 dist = 16.3 error = 1.8
time = 7.0 dist = 17.1 error = 1.3
time = 8.0 dist = 19.5 error = 1.8
time = 9.0 dist = 20.3 error = 1.2
time = 10.0 dist = 21.2 error = 1.5
```

Now you have learned how to get the data and how to loop over data.
There are many more loop possibilities in Python that you can find in
the documentation. For your needs in modern lab the `for`

loop is
enough.

### Python cannot find my files !¶

This is a problem that many people encounter in the beginning. When you issue the command:

```
In [2]: mf = B.get_file('my_exp_1.data') # B is the LT.box
```

Python looks for the file in the `current working directory`

. Where
is this ? There are three commands that you can issue from within
ipython regarding the directory (or folder) that you are currently
working in:

```
In [1]: pwd # print working directory: displays where it is currently looking for files
Out[1]: '/Users/boeglinw'
In [2]: ls # list contents of the current directory
In [3]: cd Documents # change directory to the Documents which is part of boeglinw
In [4]: pwd
Out[4]: '/Users/boeglinw/Documents'
In [5]: cd .. # change directory back up to boeglinw
Out[5]: '/Users/boeglinw'
```

This works for all operating systems. Alternatively use the file tab in spyder to set your working directory or right-click on the tab in the editor window containing your file name and select ‘Set concole working directory’. If you need more help let me know.