Working with Data Files and Numpy Arrays ======================================== Open files and load data ++++++++++++++++++++++++ To load the file you just created (:ref:`my_exp_1.data `) you do the following: .. _load_data: .. sourcecode:: ipython In [2]: mf = B.get_file('my_exp_1.data') # B is the LT.box ``my_exp_1.data`` is the name of the data file that you just created and that resides in your current working directory (if this did not work look at :ref:`find my files `). The line of code:: B.get_file(’my_exp_1.data’) creates a :class:`~LT.datafile.dfile` object, to which we assign the name ``mf``. Now you can start to 'play' with it. As an example you can find the column names of your data by doing the following: .. sourcecode:: ipython In [3]: mf.show_keys() The ``()`` is important. To look at the values associated with each name you can do: .. sourcecode:: ipython In [4]: mf.show_data('time') In [5]: mf.show_data('dist') In [6]: mf.show_data('d_err') Or you can display them together: .. sourcecode:: ipython In [7]: mf.show_data('time:dist:d_err') .. _get_data: This may be useful for showing them but in order to work with the data you need to get them into `ipython`_ as arrays. This is achieved by doing: .. sourcecode:: ipython In [8]: t = mf['time'] If you remember from eaerlier, this is similar to accessing an element of a dictionary, but in this case you get a so-called :func:`numpy.array` which I then stored in variable ``t`` (for time). `Numpy arrays `_ are one of the most important objects we will be using and will be discussed some more below. You can again display the array by just typing its name and hit return .. sourcecode:: ipython In [9]: t You should see something like (the number in the brackets will be different): .. sourcecode:: ipython Out[9]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) To access a single element of an array you enter: .. sourcecode:: ipython In [10]: t[1] Try out other values for the index! The integer (whole number) in the bracket is the index and in this case runs from 0 to 10, since there are 11 elements in the array. You can find the length of an array by typing: .. sourcecode:: ipython In [11]: len(t) And you should get back 11. Numpy arrays +++++++++++++ A numpy array is a one- or multi-dimensional array of most frequently numerical or logical data (but other data can also be stored). You can find a very nice introduction at `numpy for beginners `_ . Information on its dimensions and its size is stored in the ``shape`` part of the array. Try the following: .. sourcecode:: ipython In [11]: t.shape and you should get (11,). You can now manipulate this array. For instance you can multiply it by a number. In the example you will multiply all values by 1000. to convert to milliseconds and assign the new result to ``tms``: .. sourcecode:: ipython In [12]: tms = t*1000 In [13]: tms The output now looks like this: .. sourcecode:: ipython Out[13]: array([ 0., 1000., 2000., 3000., 4000., 5000., 6000., 7000., 8000., 9000., 10000.]) Almost any mathematical operation is possible, check the documentation. Now get the the distance data and the corresponding errors by doing: .. sourcecode:: ipython In [14]: dexp = mf['dist'] In [15]: derr = mf['d_err'] You can also make arrays that have the same size as your data but contain only 0's or 1's. In the example below we make an array exactly like ``derr`` that contains only ones and another that contains only zeros. .. sourcecode:: ipython In [16]: err_one = np.ones_like(derr) In [15]: err_0 = np.zeros_like(derr) Look at ``err_one`` and ``err_0`` and verify that they contain only 1's or 0's and have exact the same number of elements as you array ``derr``. You can convert a python list to a numpy array as follows: .. sourcecode:: ipython In [12]: my_list = [1,2,3,4,5,6] In [13]: my_array = np.array(my_list) You can also convert a numpy array back to a list by doing: .. sourcecode:: ipython In [12]: my_list = list(my_array) Another very useful tool is ``np.linspace(start,stop, num, endpoint)``. This function returns an array of ``num`` equally spaced values between ``start`` and ``stop``. By default ``endpoint = True`` meaning that the stop value is included in the array. If you set ``endpoint = False`` the stop value is not included. Below are a few examples. .. sourcecode:: ipython In [21]: np.linspace(start = -2. ,stop = 5.,num = 10, endpoint = True) # Include the endpoint Out[21]: array([-2. , -1.22222222, -0.44444444, 0.33333333, 1.11111111, 1.88888889, 2.66666667, 3.44444444, 4.22222222, 5. ]) In [22]: np.linspace(-2., 5., 10) # short cut version Out[22]: array([-2. , -1.22222222, -0.44444444, 0.33333333, 1.11111111, 1.88888889, 2.66666667, 3.44444444, 4.22222222, 5. ]) In [23]: np.linspace(start = -2. ,stop = 5.,num = 10, endpoint = False) # without the end point Out[23]: array([-2. , -1.3, -0.6, 0.1, 0.8, 1.5, 2.2, 2.9, 3.6, 4.3]) As an example try the following from your spyder console (assuming ``pyplot`` as been preloaded): .. sourcecode:: ipython In [24]: x = np.linspace(0., 2.*np.pi, 1000) # create an array with 1000 elements In [25]: plot(sin(x), cos(x + np.pi/4.)) # create 2 Lissajou curves In [26]: plot(sin(x), cos(5.*(x + np.pi/4.))) Note that in the previous example the terms :math:`sin(x)` and :math:`cos(x + \pi/4)` are calculated for the entire array of 1000 values. 2-dimensional arrays ++++++++++++++++++++ You can combine 1-dimensional arrays of the same length into a two dimensional array by: .. sourcecode:: ipython In [16]: time_distance = np.array([t, dexp]') In [17]: time_distance.shape should now give you (2,11). You can access each element by their indices, try: .. sourcecode:: ipython In [16]: time_distance[2,3] Selecting data from arrays / Logical operations on arrays ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ As in regular python lists numpy arrays support a wide variety of slicing operations to select a sub-set of data from an array: .. sourcecode:: ipython In [16]: t_sub = tms[2:8] In [170: t_sub Out[16]: array([2000., 3000., 4000., 5000., 6000., 7000.]) Selects elements 2 through 7 of the array ``tms`` and stores these in the array ``t_sub``. You can also place the indices of the array elements that you would like to access in an array as shown in the example below: .. sourcecode:: ipython In [16]: i_s = np.array([2,3,5,7]) In [17]: t_s = tms[i_s] In [18]: t_s Out[16]: array([2000., 3000., 5000., 7000.]) Remember also: .. sourcecode:: ipython In [16]: tms[0] # is the first element of tms or any array in general Out[16]: 0.0 In [18]: tms[-1] # is the last element Out[18]: 10000.0 In [19]: tms[-2] # is the 2nd to last element etc. Out[16]: 9000.0 Numpy arrays can also be used in logical operations. This is especially useful when you would like to select a subset of the data for further operations. Try out the following .. sourcecode:: ipython In [16]: big = tms > 4000 In [17]: small = tms < 7000. In [17]: tms[big] Out[17]: array([ 5000., 6000., 7000., 8000., 9000., 10000.]) In [18]: tms[small] Out[18]: array([ 0., 1000., 2000., 3000., 4000., 5000., 6000.]) The arrays ``big`` and ``small`` are used to select elements from the original array. The array ``tms[big]`` only contains those values of ``tms`` that are bigger than 4000 and ``tms[small]`` only contains the values that are smaller than 7000. The arrays ``big`` and ``small`` contain the logical result (``True`` or ``False``) of the logical expression for each array element. .. sourcecode:: ipython In [19]: small Out[19]: array([ True, True, True, True, True, True, True, False, False, False, False]) They can also be combined as .. sourcecode:: ipython In [20]: both = big & small In [21]: tms[both] Out[21]: array([5000., 6000.]) Here ``&`` means ``and`` and ``|`` mean ``or``. The array ``tms[both]`` therefore contains only those elements of ``tms`` that are between 4000 and 7000 (excluding the limits) This can also be written in one line as: .. sourcecode:: ipython In [21]: tms[ (4000 < tms) & (tms < 7000)] Out[21]: array([5000., 6000.]) Parameters in data files ++++++++++++++++++++++++ You can also access the parameters that you defined in your file. First you can look at all the parameters that you defined by doing: .. sourcecode:: ipython In [16]: mf.par.show_all_data() pressure 1.e5 temperature 80. In this case you see the two parameters called pressure and temperature with a value of 1.e5 and 80, respectively. To get these values and store them in variables you would do: .. sourcecode:: ipython In [17]: T = mf.par['temperature'] In [18]: P = mf.par['pressure'] If you get an error message saying e.g. `mf.par` does not exist you have an error in your parameter definition in the data file. For more detailed information look at the datafile documentation (:class:`~LT.pdatafile.pdfile`). Computations using arrays ++++++++++++++++++++++++++ Now all your data are in the form of variables and (numpy) arrays that can be used for computation. For instance you might want to know what percentage error each data point has. This can be done as follows: .. sourcecode:: ipython In [16]: p_err = derr/dexp * 100. In [17]: p_err And the output should be: .. sourcecode:: ipython Out[17]: array([ 31.42857143, 26.08695652, 12.5 , 13.15789474, 17.89473684, 8.57142857, 11.04294479, 7.60233918, 9.23076923, 5.91133005, 7.0754717 ]) To have a bit a nicer output you can use a ``for`` loop. First some information on loops. The simple ``for`` loop works as follows .. sourcecode:: ipython In [18]: for D in dexp: ....: _ The cursor will have moved to the right by about 4 spaces, the prompt has changed and the cursor is typically just below the D now enter at the location of the ``_``: .. sourcecode:: ipython print( 'distance = ', D ) the output should look like: .. sourcecode:: ipython In [18]: for D in dexp: ....: print( 'distance = ', D ) ....: _ The cursor is now just below the ``p`` of ``print``. Now press ``return`` **twice** and the loop starts to run. Your output will look like (with the first part of the loop): .. sourcecode:: ipython In [18]: for D in dexp: ....: print( "distance = ", D ) ....: ....: distance = 3.5 distance = 4.6 distance = 8.0 distance = 11.4 distance = 9.5 distance = 14.0 distance = 16.3 distance = 17.1 distance = 19.5 distance = 20.3 distance = 21.2 What happened here: you created a ``for`` loop, where each element( one after another) of ``dexp`` get assigned the name ``D``. In the loop body (what comes below the ``for...`` statement and is indented) the current value ``D`` is printed together with the string ``’distance = ’``. **The loop ends where the indentation ends** This is typical syntax in python and is used for all other program blocks. In the beginning it can be a bit irritating as you will encounter it (see :ref:`indent_error ` for an example) Interactively a block is closed with two returns. In order to print all values of ``t``, ``dexp`` and ``derr`` in one for loop I use ``enumerate``. First I check what ``enumerate`` does .. sourcecode:: ipython In [19]: for i,D in enumerate(dexp): ....: print( 'i = ', i, 'D = ', D ) and end the last line again with 2 returns. You should then see: .. sourcecode:: ipython In [19]: for i,D in enumerate(dexp): ....: print( 'i = ', i, 'D = ', D ) ....: ....: i = 0 D = 3.5 i = 1 D = 4.6 i = 2 D = 8.0 i = 3 D = 11.4 i = 4 D = 9.5 i = 5 D = 14.0 i = 6 D = 16.3 i = 7 D = 17.1 i = 8 D = 19.5 i = 9 D = 20.3 i = 10 D = 21.2 In this variation the ``i`` contains the index of ``D`` in ``dexp``. Since the corresponding values in ``t``, ``dexp`` and ``derr`` all have the same index, I can print them all in one loop as follows: .. sourcecode:: ipython In [20]: for i,D in enumerate(dexp): ....: print( 'time = ', t[i], 'dist = ', D, ' error = ', derr[i] ) Again I close the loop with 2 returns. An the output now is: .. sourcecode:: ipython In [20]: for i,D in enumerate(dexp): ....: print( 'time = ', t[i], 'dist = ', D, ' error = ', derr[i] ) ....: ....: time = 0.0 dist = 3.5 error = 1.1 time = 1.0 dist = 4.6 error = 1.2 time = 2.0 dist = 8.0 error = 1.0 time = 3.0 dist = 11.4 error = 1.5 time = 4.0 dist = 9.5 error = 1.7 time = 5.0 dist = 14.0 error = 1.2 time = 6.0 dist = 16.3 error = 1.8 time = 7.0 dist = 17.1 error = 1.3 time = 8.0 dist = 19.5 error = 1.8 time = 9.0 dist = 20.3 error = 1.2 time = 10.0 dist = 21.2 error = 1.5 Now you have learned how to get the data and how to loop over data. There are many more loop possibilities in Python that you can find in the documentation. For your needs in modern lab the ``for`` loop is enough. .. _files: Python cannot find my files ! ----------------------------- This is a problem that many people encounter in the beginning. When you issue the command: .. sourcecode:: ipython In [2]: mf = B.get_file('my_exp_1.data') # B is the LT.box Python looks for the file in the ``current working directory``. Where is this ? There are three commands that you can issue from within `ipython`_ regarding the directory (or folder) that you are currently working in: .. sourcecode:: ipython In [1]: pwd # print working directory: displays where it is currently looking for files Out[1]: '/Users/boeglinw' In [2]: ls # list contents of the current directory In [3]: cd Documents # change directory to the Documents which is part of boeglinw In [4]: pwd Out[4]: '/Users/boeglinw/Documents' In [5]: cd .. # change directory back up to boeglinw Out[5]: '/Users/boeglinw' This works for all operating systems. Alternatively use the file tab in spyder to set your working directory or right-click on the tab in the editor window containing your file name and select 'Set concole working directory'. If you need more help let me know. .. include:: include/links.rst