14.08.2019»»среда

Finite Element Fortran Program For Sale

14.08.2019
    79 - Comments
Finite Element Fortran Program For Sale 3,7/5 5174 reviews

The thermal analysis was performed using a finite difference program. The resulting temperatures were input to the finite element model through a special purpose FORTRAN program that was written for this project to interpolate the thermal results and calculate finite element nodal temperatures. First of all, lets start with the main question: what is a subroutine? It is a script that, when run in parallel with the Finite Element (FE) model, allows users to request features which are not defined by default in the commercial software Abaqus. FORTRAN SUBROUTINES FOR FINITE ELEMENT ANALYSIS by B. Merrifield SUMMARY Twelve subroutines, written in ICL 1900 Fortran are presented for matrix and other operations which are commonly encountered in the finite element analysis of structures. Although the subroutines have been developed.

  1. Finite Element Method Example
Home
Site Navigation
Overview
MALOC
PUNC
GAMer
SG
MC
MCLite
Download
Related Sites
MCP Group
BANG Cluster
CCoM
CSME
Related Software
ROCKS
FSF
GNU
RedHat
SuSE
Debian
Ubuntu
NetBSD
FreeBSD
OpenBSD

Site Contact

Michael Holst
Center for Computational
Mathematics (CCoM)
FETK Overview
The Finite Element ToolKit (FETK) is a collaborativelydeveloped, evolving collection of adaptive finite element method (AFEM) software libraries and tools for solving coupled systems ofnonlinear geometric partial differential equations (PDE).The FETK libraries and tools are written in an object-oriented form of ANSI-C and in C++, and include acommon portability layer (MALOC) for all of FETK, a collection of standard numerical libraries (PUNC),a stand-alone high-quality surface and volume simplex mesh generator (GAMer),a stand-alone networked polygon display tool (SG),a general nonlinear finite element modeling kernel (MC),and a MATLAB toolkit (MCLite) for protyping finite elementmethods and examining simplex meshes using MATLAB.The entire FETK Suite of tools is highly portable(from iPhone to Blue Gene/L), thanks to use of asmall abstraction layer (MALOC) andheavy use of the GNU Autoconf infrastructure.
The FETK libraries and tools are developed collaboratively by a number of people,and are released freely under open source licensesfor maximal benefit to the mathematics, science, andengineering communities.The FETK projectlead is Professor Michael Holstat the Center for Computational Mathematicsat UC San Diego.
In June 2010 the source code tree for the entire FETK Project was releasedunder the GNU LGPL (GNU Library General Public License), and can be downloaded as a single gzipped tar file from the FETK Download Page.Alternatively, you can follow the links below to informationabout the individual libraries, and download the individuallibraries.To build the FETK libraries (all at once or individually), you will need a standard but very minimal set of UNIX toolssuch as a C compiler, make, and the bourne shell for running configure scripts.Some of the libraries also require a C++ compiler, a FORTRAN compiler,and a Python interpreter.
The primary FETKANSI-C software libraries are:
  • MALOC is a Minimal Abstraction Layer for Object-oriented C/C++ programs.
  • PUNC is Portable Understructure for Numerical Computing (requires MALOC).
  • GAMer is a Geometry-preserving Adaptive MeshER (requires MALOC).
  • SG is a Socket Graphics tool for displaying polygons (requires MALOC).
  • MC is a 2D/3D AFEM code for nonlinear geometric PDE (requires MALOC; optionally uses PUNC+GAMER+SG).
Application-specific software designed for usewith the FETKsoftware libraries is:
  • GPDE is a Geometric Partial Differential Equation solver (requires MALOC+PUNC+MC; optionally uses GAMER+SG).
  • APBS is an Adaptive Poisson-Boltzmann Equation Solver (requires MALOC+PUNC+MC; optionally uses GAMER+SG).
  • SMOL is a Smoluchowki Equation Solver solver (requires MALOC+PUNC+MC; optionally uses GAMER+SG).
MATLAB toolkits designed for use with MC and SG or as standalone packages:
  • MCLite is a simple 2D MATLAB version of MC designed for teaching.
  • FETKLab is a sophisticated 2D MATLAB adaptive PDE solver built on top of MCLite.
Related packages developed and maintained byFETKdevelopers (included in PUNC above):
  • PMG is a Parallel Algebraic MultiGrid code for general semilinear elliptic equatons.
  • CgCode is a package of Conjugate gradient Codes for large sparse linear systems.
Follow the individual links above for more information abouta particular tool, including contact information for the primarydevelopers and maintainers of the particular tool.The development mailing list archives can be foundhere.
Fortran

In the last CUDA Fortran post we investigated how shared memory can be used to optimize a matrix transpose, achieving roughly an order of magnitude improvement in effective bandwidth by using shared memory to coalesce global memory access. The topic of today’s post is to show how to use shared memory to enhance data reuse in a finite difference code. In addition to shared memory, we will also discuss constant memory, which is a read-only memory that is cached on chip and is optimized for uniform access across threads in a block (or warp).

Our example uses a three-dimensional grid of size 643. For simplicity we assume periodic boundary conditions and only consider first-order derivatives, although extending the code to calculate higher-order derivatives with other types of boundary conditions is straightforward.

The finite difference method essentially uses a weighted summation of function values at neighboring points to approximate the derivative at a particular point. For a (2N+1)-point stencil with uniform spacing ∆x in the x-direction, the following equation gives a central finite difference scheme for the derivative in x. Equations for the other directions are similar.

The coefficients Ci are typically generated from Taylor series expansions and can be chosen to obtain a scheme with desired characteristics such as accuracy, and in the context of partial differential equations, dispersion and dissipation. For explicit finite difference schemes such as the type above, larger stencils typically have a higher order of accuracy. For this post we use a nine-point stencil which has eighth-order accuracy. We also choose a symmetric stencil, shown in the following equation.

Here we specify values of the function on the computational grid using the grid indices i, j, k rather than the physical coordinates x, y, z. Here the coefficients are ax = 4/(5 ∆x) , bx = −1/(5 ∆x) , cx = 4/(105 ∆x), and dx = − 1/(280 ∆x), which is a typical eighth-order scheme. For derivatives in the y– and z-directions, the index offsets in the above equation are simply applied to the j and k indices and the coefficients are the same except that ∆y and ∆z are used in place of ∆x.

Because we calculate an approximation to the derivative at each point on the 643 periodic grid, the value of f at each point is used eight times: once for each right-hand-side term in the above expression. In designing a derivative kernel, our goal is to exploit this data reuse by storing blocks of data in shared memory to ensure we fetch the values of f from global memory as few times as possible. Adobe cs3 download.

We employ a tiling approach in which each thread block loads a tile of data from the multidimensional grid into shared memory, so that each thread in the block can access all elements of the shared memory tile as needed. How do we choose the best tile shape and size? Some experimentation is required, but characteristics of the finite difference stencil and grid size provide direction.

When choosing a tile shape for stencil calculations, the tiles typically overlap by half of the stencil size, as depicted on the left in the figure below.

Finite Element Fortran Program For Sale

Finite Element Method Example

Here, in order to calculate the derivative in a 16 × 16 tile (in yellow), the values of f not only from this tile but also from two additional 4 × 16 sections (in orange) must be loaded by each thread block. Overall, the f values in the orange sections get loaded twice, once by the thread block that calculates the derivative at that location, and once by each neighboring thread block. As a result, 8 × 16 values out of 16 × 16, or half of the values, get loaded from global memory twice. In addition, coalescing on devices with a compute capability of 2.0 and higher will be suboptimal with 16 × 16 tiles because perfect coalescing on such devices requires access to data within 32 contiguous elements in global memory per load.

A better choice of tile (and thread block) size for our problem and data size is a 64 × 4 tile, as depicted on the right in the figure above. This tile avoids overlap altogether when calculating the x-derivative for our one-dimensional stencil on a grid of 643 since the tile contains all points in the direction of the derivative. A minimal tile would have just one pencil, or one-dimensional array of all points in a direction. This would correspond to thread blocks of only 64 threads, however, so from an occupancy standpoint it is beneficial to use multiple pencils per tile. In our finite difference code, available for download on Github, we parameterize the number of pencils to allow experimentation. In addition to loading each value of f only once, every warp of threads loads contiguous data from global memory using this tile and therefore achieves perfectly coalesced accesses to global memory.

The implementation for the x-derivative kernel is:

Here mx, my, and mz are the array dimensions which are set to 64, and sPencils is 4, which is the number of pencils used to make the shared memory tile. The indices i, j, and k correspond to the coordinates in the 643 mesh. The index i can also be used for the x-coordinate in the shared memory tile, while the index j_l is the local coordinate in the y-direction for the shared memory tile. This kernel is launched with a block of 64 × sPencils threads which calculated the derivatives on a x × y tile of 64 × sPencils, so each thread calculates the derivative at one point.

The shared memory tile is declared with a padding of 4 elements at each end of first index to accommodate the periodic images needed to calculate the derivative at the endpoints of the x-direction.

Data from global memory are read into the shared memory tile for f_s(1:mx,1:sPencils) by the line:

These reads from global memory are perfectly coalesced. Data are copied within shared memory to fill out the periodic images in the x-direction by the following code.

Finite Element Fortran Program For Sale

Note that in this operation we are reading from shared memory, not from global memory. Each element is read from global memory only once by the previous statement. Since a thread reads data from shared memory that another thread wrote, we need a syncthreads() call before the periodic images are written. Likewise, we need a syncthreads() call after the periodic images are written since they are accessed by different threads. After the second syncthreads() call, our finite difference approximation is calculated using the following code.

The coefficients ax_c, bx_c, cx_c, and dx_c are declared external to this kernel in constant memory.

Constant memory

Constant memory is a special-purpose memory space that is read-only from device code, but can be read and written by the host. Constant memory resides in device DRAM, and is cached on-chip. The constant memory cache has only one read port, but can broadcast data from this port across a warp. This means that constant memory access is effective when all threads in a warp read the same address, but when threads in a warp read different addresses the reads are serialized. Constant memory is perfect for coefficients and other data that are used uniformly across threads, as is the case with our coefficients ax_c, bx_c, etc.

In CUDA Fortran, constant data must be declared in the declaration section of a module, i.e. before the contains, and can be used in any code in the module or any host code that includes the module. In our finite difference code we have the following declaration which uses the constant variable attribute.

Constant data can be initialized by the host using an assignment statement, just as device memory is initialized. Constant memory is used in device code the same way global memory is used. In fact, the only difference between implementing constant and global memory is that constant memory must be declared at module scope, and is declared with the constant attribute rather than the device attribute. This makes experimenting with constant memory simple, just declare the variable in the module and change the variable attribute.

The x-derivative kernel on a Tesla C2050 achieves the following performance.

We will use this performance as a basis for comparison for derivatives in the other directions, which we will cover in the next CUDA Fortran post.

 fullpacunder © 2019