13. Intention/Extending

13.1. Intention and Remarks

13.1.1. Genesis

This package was programmed because of my personal need to fit multiple datasets together which differ in attributes defined by the measurements. A very common thing that is not included in numpy/scipy or most other fit programs. What I wanted is a numpy ndarray with its matrix like functionality for evaluating my data, but including attributes related to the data e.g. from a measurement. For multiple measurements I need a list of these with variable length. ==> dataArray and dataList.

As the used models are repeatedly the same a module with physical models was growing. A lot of these models are used frequently in Small Angle Scattering programs like SASview or SASfit. For my purpose the dynamic models as diffusion, ZIMM, ROUSE and other things mainly for protein dynamics were missing.

Some programs (under open license) are difficult to extend as the models are hidden in classes, or the access/reusage includes a special designed interface to get parameters instead of simple function calls. Here Python functions are easier to use for the non-programmers as most PhD-students are. Models are just Python functions (or one line lambda functions) with the arguments accessed by their name (keyword arguments). Scripting in Python with numpy/scipy is easy to learn even without extended programming skills.

The main difficulty beside finding the right model for your problem is proper multidimensional fitting including errors. This is included in dataArray/dataList using scipy.optimize to allow fitting of the models in an simple and easy way. The user can concentrate on reading data / model fitting / presenting results.

13.1.2. Scripting over GUI

Documentation of the evaluation of scientific data is difficult in GUI based programs (sequence of clicking buttons ???). Script oriented evaluation (MATLAB, Python, Jupyter,….) allows easy repetition with stepwise improvement and at the same time document what was done.

Complex models have multiple contributions, background contribution, … which can easily be defined in a short script including a documentation. I cannot guess if the background in a measurement is const linear, parabolic or whatever and each choice is also a limitation. Therefore the intention is to supply not obvious and complex models (with a scientific reference) and allow the user to adopt them to their needs e.g. add background and amplitude or resolution convolution. Simple models are fast implemented in one line as lambda functions or more complex things in scripts. The mathematical basis as integration or linear algebra can be used from scipy/numpy.

13.1.3. Plotting

Matplotlib seems to be the standard for numpy/scipy users. You can use it if you want. If you try to plot fast and live (interactive) it is complicated and slow. 3D plotting has strong limitations as it is not really 3D rendering.

Frequently I run scripts that show results of different datasets and I want to keep these for comparison open and be able to modify the plot. Some of this is possible in matplotlib but not the default. As I want to think about physics and not plotting, I like more xmgrace, with a GUI interface after plotting. A simple one line command should result in a 90% finished plot, final 10% fine adjustment can be done in the GUI if needed or from additional commands. I adopted the original Graceplot module (python interface to XmGrace) to my needs and added dataArray functionality. For the errorPlot of a fit a simple matplotlib interface is included. Meanwhile, the module mpl is a rudimentary interface to matplotlib to make plotting easier for beginners.

The nice thing about Xmgrace is that it stores the plot as ASCII text instead of the JPG or PDF. So its easy to reopen the plot and change the plot later if your supervisor/boss/reviewer asks for log-log or other colors or whatever. For data inspection zoom, hide of data, simple fitting for trends and else are possible on WYSIWYG/GUI basis. If you want to retrieve the data (or forgot to save your results separately) they are accessible in the ASCII file. Export in scientific paper quality is possible. A simple interface for annotations, lines, …. is included. Unfortunately its only 2D but this is 99% of my work.

13.1.4. Speed/Libraries

The most common libraries for scientific computing in python are NumPy and SciPy and these are the main obligatory dependencies for Jscatter (later added matplotlib and Pillow for image reading). Python in combination with numpy can be quite fast if the ndarrays methods are used consequently instead of explicit for loops. E.g. the numpy.einsum function immediately uses compiled C to do the computation. (See this and look for “Why are NumPy arrays efficient”). SciPy offers all the math needed and optimized algorithms, also from blas/lapack. To speed up, if needed, on a multiprocessor machine the module Parallel execution offers an easy interface to the standard python module multiprocessing within a single command. If your model still needs long computing time and needs speed up the common methods as Cython, Numba or f2py (Fortran with OpenMp for parallel processing) should be used in your model. As these are more difficult the advanced user may use it in their models.

Nevertheless the critical point in these cases is the model and not the small overhead in dataArray/dataList or fitting.

As some models depend on f2py and Fortran code an example is provided how to use f2py and finally contribute a function in Jscatter. Extending/Contributing/Fortran

Some resources :

13.1.5. Development environment/Testing

The development platform is mainly current Linux (Manjaro/Rocky Linux) with testing on Gitlab (wheel creation and testing on Ubuntu, MacOS and Windows). I regularly use Jscatter on macOS and on our Linux cluster. The code is Python 3.x compatible. I rarely use Windows (only if a manufacturer of an instrument forces me…) Jscatter works now (2026) under Windows. As matplotlib is slow fits give no intermediate output.

13.2. Extending/Contributing/Fortran

There are different ways to extend Jscatter by new models.
  • Write a python function (see Beginners Guide).

  • Write a Fortran function (preferred >f90) with a wrapper that contains the documentation.

  • Use Cython or numba …. (for experts, not the scope here).

The simplest way is writing a python function as most functions are written like that. The function should contain the basic model without background or other trivial contributions needed for fitting of real data. These should be added in user defined functions as the user may be better know how he wants to treat the background.

Contribution of a single function or a complete module dealing with a new topic are welcome. As Users should know what it is about, the documentation is important and should contain a scientific reference about the models.

Contributors accept that all new modules or functions included will be covered by the GPLv3 or later.

If modules or code from other projects under OpenSource are used the respective License and Copyright should be mentioned and respected in the source code. Please give important contributions also in the documentation.

Using Fortran

Premature optimization is the root of all evil – DonaldKnuth

The speedup to use more sophisticated methods is not too big (not x100) if the numpy and scipy functions were used.

Here an example with a factor of about 6:

The main part of the cloudScattering function is evaluation of

\[F(q)= \sum_N b_i(q) e^{iqr}\]

For e.g. N=20000 . We calc \(S(Q)=F(q) \cdot F^*(q) = |F(q)|^2\) and pass \(F(q)\) for calculation of \(S(q) = < |F(q)|^2 >\) and \(beta =|< F(q) >|^2 / < |F(q)|^2 >\) for spherical averaging.

def _scattering(point,r,q,blength,formfactor):
     fa=blength*np.interp(q, formfactor[0, :], formfactor[1, :])
     qx=q*point
     iqr=np.einsum('i,ji',qx,r)*1j
     beiqrsum=np.einsum('i,i',fa,np.exp(iqr))
     Sq=beiqrsum*beiqrsum.conj()
     return q,Sq.real,beiqrsum.real

The main work is done in np.einsum and np.exp(iqr) which both are already compiled C functions (standard numpy function). We may loose time in coming back to python between these steps.

We can reach a speedup of a factor of 6 if we use a Fortran function and use this in _scattering. The speedup is higher if the computation was done in Python loops (very inefficient), but we use numpy not to do this in Python loops. The speedup depends strongly on the model and how it is implemented. Often one is happy if the Python meanders.

The Fortran function/subroutine should be encapsulated in a module as it looks cleaner and the module may contain several functions. Additional it works smoothest in combination with f2py to use a module of functions only (always give intent(in) properly for smooth working). To avoid conflicts and dependency problems it is convenient to copy all needed routines into this module. Please respect Copyrights

module cloud
    implicit none
    use typesandconstants
    use utils
contains

function ffq(point,r,q,blength,formfactor) result(ff)
    ! point point on unit sphere 3 x 1
    real(dp), intent(in) :: point(:)
    real(dp)             :: ff(3)
    .....see fortran/cloud.f95 for all declarations

    fa=blength*interpolatesorted(sizerms*q, formfactor(1,:), formfactor(2,:))
    qx=q*point
    iqr= j1 * matmul( r,qx)
    Fq= sum( fa* exp(iqr) )
    ff(1)=q
    ff(2)=real(Fq*conjg( Fq ) )
    ff(3)=real(Fq)

end function ffq
end module

A function interface and the compilation is done by f2py resulting in a shared object file here fsca.so:

f2py -c filename.f95 -m fsca

The module can be imported and used like any other function in a module:

import fsca
def scattering_f(point,r,q,blength,formfactor):
    """ A doc is needed if it should be used
    Parameters...
    Returns.....
    """
    ret=fsca.cloud.ffq(point,r,q,blength,formfactor)
    return ret

The wrapper is also needed if this function should be called by parallel (using multiprocessing). The reason is that the parallel processed function needs to be serialized and f2py functions cannot. The wrapper can.

If multiprocessing is desired and each process needs to get large arrays the python multiprocessing uses a lot of memory (without shared memory each process gets a pickled copy) In this case Fortran with OpenMP is easier for multiprocessing with shared memory.

Parallel execution with Fortran using OpenMP is used in cloudScattering. Different compiler options for OpenMP with other compilers may lead to failure. The speedup is not very big. In a similar example as above (with individual atomic formfactors) its about 1.5 speedup on 6 core Ryzen for a very complex function. In that specific case the reduced memory was worth it.

Including in Jscatter clone

For contributing or your own development you need to clone the Jscatter repository (See gitlab for documentation.)

git clone https://gitlab.com/biehl/jscatter.git

To use the Jscatter clone do in the main clone directory (dont forget the last point “.” )

pip install --user -e .

Meanwhile (2025) cmake is used to compile via f2py. To include a new Fortran module with functions place the source code in the source folder and add the file name to the fortran_src_files in CMakeLists.txt. The install procedure compiles and builds the wrapper using cmake, gfortran and f2py for all given files. A module function can be accessed as e.g. jscatter.libs.fscatter.utils.wofz for the Faddeeva function in utils.

The function is accessible in a Jscatter module after import of fscatter. E.g the import in formfactor/cloudscattering.py is

from ..libs import fscatter
# use it
data = fscatter.cloud.ffq(q,..... )

The module can be imported were needed and the python wrapper with documentation can be placed in the appropriate module where it is used.

If you are happy and want to contribute, sent it to the author or use a merge request on gitlab.