datafile

Reads a file containing columns of data and creates a dictionary according to header information using the following format:

  1. comments start with # in the first column : # My Comment

  2. header information starts with #!, where # is also in the first column. It must precede the data.

  3. each columns is represented by name[dtype, col.nr.]/, dtype (optional) is the data type and col.nr (starting at 0) is the column number

  4. data are separated by white space NOT commas!

  5. data types:

    • s : string

    • f : float

    • i : integer

  6. blank lines are ignored

NOTE: if data formats are entered they should be specified for all columns

Example data:

#! p_miss[0]/ siglt[2]/ s01[3]/ alt[4]/
200. 1.35e-4 -1.e-3    0.1
220. 2.56e-4 -2.e-4    -0.1
230. 3.47e-6 -3.e-5    1.1

The header can also contain data type information:

#! p_miss[f,0]/ siglt[f,2]/ s01[f,3]/ alt[f,4]/

200. 1.35e-4 -1.e-3    0.1
220. 2.56e-4 -2.e-4    -0.1
230. 3.47e-6 -3.e-5    1.1

Example that opens a file and create a dfile object:

>>> f0 = dfile('sig_LT.dat')

Loop over content:

>>> for l in f0:
>>> ...pm = l['p_miss']
>>> ...sig_lt = l['siglt']*10000.
>>> ...print pm, sig_lt, l['alt']
>>> # end for l

Variables can also be accessed directly as numpy arrays (if installed):

>>> pm = f0['p_miss']
>>> sig_lt = f0['siglt']*10000.

They can also be converted to attributes:

>>> f0.make_attr()
>>> print f0.p_miss
>>> print f0.sig_lt

Variable names are also called keys.

Save the data file as a csv file, this is useful for exporting and formatting for documents (other than latex)

>>> f0.write_csv(filename)

class LT.datafile.dfile(filename, debug=False, new=False, skip=True, use_numpy=True, fast=False, adata=None)

Open a file, read and interpret the contents and return a dfile object:

>>> df = dfile('my_datafile')

keywords:

debug = False (default) : print additional iformation

fast = False (default)load data in a fast way (using np.loadtxt) this has

some limits for string entries. Best used for large purely numerical data.

use_numpy = True (if numpy is installed)set automatically but can be overridden

some attributes of datafile are not availables when numpy is not present

adata = my_lines (default = None)use the data provided in a list of strings (my_lines) as data file content.

The content of my_lines must follow the datafile syntax.

add_data(keys, line)

add data to the data file

d.add_data(‘x:y:z’,’xval yval zval’)

xval, yval, zval are the values as they would be entered in a data file line

both arguments are strings and describe a complete line i.e. values for all variables defined in the header line.

add_header_comment(text)

add a comment line to the header. The # at the beginning is automatically added !

  • for a normal comment start with a space

  • to add a parameter start the comment with a

add_key(key, format='f')

add a new key, this is useful if you want to add new data assign new values using a loop like:

>>> df.add_key('newkey',format='i')

The new data need to be stored as follows:

>>> for d in df.data:
>>>     d['newkey']= some_new_value

where df is the datafile and ‘newkey’ the new key. The new data set can be saved using fp = open('new_file','w') and df.write_all(fp)

add_parameter(name, value)

add a parameter to the comment section

check_data(func, data, key, *args)

function used to check if data fulfill a condition provided by the user. The function is assumed to return True of False.

delete_key(key)

remove a key and all values associated with it

eval_data(eval_str)

iterator over all data evaluating an expression contained in eval_str

get_all_data()

return a list of all data stored

get_data(key, sel_func=None, sel_args=None)

return all data for key subject to the results of a selector function.

one can define a selector function (sel_func) using the arguments stored in sel_args. This function is evaluated for each data record and only those data are returned for which the condition is fulfilled.

  1. sel_func: a user provided function returning True or False

  2. sel_args: a list of arguments used in the sel_func function

Example::

assume the data contain a variable (key) called ‘name’ you want to select only those data where name contains a certain substring ‘Jo’

Define this function:

>>> def myfind(data, key , what):
>>> ...where = data[key]
>>> ...return (str.find(where, what) >= 0)

now you can select the data using:

>>> df.get_data('name',myfind, ['name','Jo'] )

This should return a list of names containing the substring ‘Jo’

get_data_eval(key, eval_str)

return all data for the key under the condition that the expression in eval_str is True.

get_data_list(keylist, sel_func=None, sel_args=None)

return all the data corresponding to the key list as follows:

>>> a = df.get_data_list('key1:key2:key3')

>>> a = df.get_data_list('key1:key2:key3', myfind, ['name','J'])

return those data where the name values contain the character J

a contains the list of data

get_data_list_eval(keylist, eval_str)

similar function as select_data_eval but it only returns the values for the keys defined in keylist.

get_full_header()

return all line up to the header line

get_header()

return the header lines

get_keys()

return a list of keys

name()

print the filename associate with this instance

save(file=None)

Save the current datafile.

With the keyword: file = ‘new_file’

the datafile will be written into the new_file name

scale(key, factor)

multiply all values of key with a factor

select_data(sel_func=None, sel_args=None)

returns an iterator for the data. As in get_data() a selector function and its arguments can be supplied:

conditions can be applied:

  1. sel_func: a user provided function returning True or False

  2. sel_args: a list of arguments used in the sel_func function

The iterator returned can be used as follows:

>>> for d in df.select_data():
>>> ...print d

or the result can be converted to a list:

>>> list(df.select_data())

Using a selector function:

Example: assume the data contain a variable (key) called ‘name’

you want to select only those data where name contains a certain substring ‘sub’

Define this function:

>>> def myfind(data, key , what):
>>> ...where = data[key]
>>> ...return (str.find(where, what) >= 0)

now you can select the data using:

>>> list( df.select_data( myfind, 'name', 'sub') )
select_data_eval(eval_str)

returns an iterator for data selected with an eval expression stored in the string eval_str each dataset item is accessed using the name data:

>>> df.select_data_eval("data['x'] >= 0.")

returns only those data items where the value of the x-column in the file is larger than or equal to 0.

show_all_data()

print all data and keys stored

show_data(keylist)

print all the data corresponding to the key list:

>>> df.show_data('key1:key2:key3')
show_keys()

print a list of variable names in the dictionary

sort(key, **kwargs)

sort the data according to the values in key

update_header()

update header line of this data file including format. This makes sure that the header line is in sync with the dictionary keys

write_all(fp, complete_header=False)

write all data to a file associated to fp

if complete_header = True include the complete header including all comments

write_complete_header(fp)

write entire header including comments and internal parameters to file with handle fp. Example:

fp = open(‘mydile.data’,’w’)

datafile.write_comlete_header(fp)

write_csv(f)

save the current file as a csv file

f : file name to be used

write_header(fp)

write only header line of this data file into file fp, including format

write_line(fp, i)

write data line i into file fp

write_selected(fp, index_list, complete_header=False)

write a new datafile with an identical header but enter only those data with an index given in index_list. If the index does not exist print a message and skip it.

if complete_header = True include the complete header including all comments