Microsoft PowerPoint - 3Dp-2.ppt [互換モード]

Size: px
Start display at page:

Download "Microsoft PowerPoint - 3Dp-2.ppt [互換モード]"

Transcription

1 3D Parallel FEM (II) Kengo Nakajima Technical & Scientific Computing I (48-7) Seminar on Computer Science I (48-4) Parallel FEM

2 Parallel FEM 3D- Target Application Elastic Material Z U Z = Young s Modulus E Poisson s Ratio Rectangular Prism xx cubes (hexahedra) NZ NX, NY, NZ cubes in each direction Boundary Conditions NX NY Y Symmetric B.C. U X =@X= U Y =@Y= X U Z =@Z= Dirichlet B.C. U Z =@Z=Z max

3 Parallel FEM 3D- 3 Finite-Element Procedures Governing Equations Galerkin Method: Weak Form Element-by-Element Integration Element Matrix Global Matrix Boundary Conditions Linear Solver

4 Parallel FEM 3D- 4 FEM Procedures: Program Initialiation Control Data Node, Connectivity of Elements (N: Node#, NE: Elem#) Initialiation of Arrays (Global/Element Matrices) Element-Global Matrix Mapping (Index, Item) Generation of Matrix Element-by-Element Operations (do icel=, NE) Element matrices Accumulation to global matrix Boundary Conditions Linear Solver Conjugate Gradient Method Calculation of Stress

5 Parallel FEM 3D- 5 Procedures for Parallel FEM <$O-para>/mesh/ mgcube <$O-para>/mesh/ partition.log <$O-para>/mesh/ cube. <$O-para>/mesh/ part <$O-para>/run/ test.inp ParaVIEW File (fixed file name) Initial Global Mesh File (fixed file name) <$O-para>/mesh/ part.inp ParaVIEW File (fixed file name) <$O-para>/mesh/ <HEADER>.* Distributed Local Mesh Files <$O-para>/run/ sol <$O-para>/run/ INPUT.DAT

6 Parallel FEM 3D- 6 Parallel FEM Procedures: Program Initialiation Control Data Node, Connectivity of Elements (N: Node#, NE: Elem#) Initialiation of Arrays (Global/Element Matrices) Element-Global Matrix Mapping (Index, Item) Generation of Matrix Element-by-Element Operations (do icel=, NE) Element matrices Accumulation to global matrix Boundary Conditions Linear Solver Conjugate Gradient Method Calculation of Stress

7 Parallel FEM 3D- test_p main input_cntl Input of control data input_grid Input of mesh info find_node searching nodes mat_con connectivity of matrix msort sorting mat_con connectivity of matrix mat_ass_main coefficient matrix jacobi Jacobian mat_ass_bc boundary conditions Structure of fem3d (parallel) solve33 control of linear solver recover_stress stress calculation output_ucd visualiation cg_3 CG solver jacobi Jacobian

8 Parallel FEM 3D- 8 Main Part #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" //#include "solver33.h" extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON(); extern void MAT_CON(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void OUTPUT_UCD(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; PFEM_INIT(argc,argv); INPUT_CNTL(); INPUT_GRID(); MAT_CON(); MAT_CON(); MAT_ASS_MAIN(); MAT_ASS_BC() ; SOLVE33(); RECOVER_STRESS(); OUTPUT_UCD() ; PFEM_FINALIZE() ;

9 Parallel FEM 3D- 9 Global Variables: pfem_util.h (/4) Name Type Sie I/O Definition fname C [8] I Name of mesh file N,NP I I # Node (N: Internal, NP: Internal + External) ICELTOT I I # Element NODGRPtot I I # Node Group XYZ R [NP][3] I Node Coordinates ICELNOD I [ICELTOT][8] I Element Connectivity NODGRP_INDEX I [NODGRPtot+] I # Node in each Node Group NODGRP_ITEM NODGRP_NAME I C8 [NODGRP_INDEX[N ODGRPTOT+]] [NODGRP_INDEX[N ODGRPTOT+]] NL, NU I O I I Node ID in each Node Group Name of NodeGroup # Upper/Lower Triangular Non-Zero Off-Diagonals at each node NPL, NPU I O # Upper/Lower Triangular Non-Zero Off-Diagonals D R [9*NP] O Diagonal Block of Global Matrix B,X R [3*NP] O RHS, Unknown Vector

10 Parallel FEM 3D- Global Variables: pfem_util.h (/4) Name Type Sie I/O Definition ALUG R [9*NP] O Full LU factoriation of Diagonal Blocks D AL,AU R [9*NPL],[9*NPU] O Upper/Lower Triangular Components of Global Matrix indexl, indexu I [NP+] O # Non-Zero Off-Diagonal Blocks iteml, itemu I [NPL], [NPU] O Column ID of Non-Zero Off-Diagonal Blocks INL, INU I [NP] O Number of Off-Diagonal Blocks at Each Node IAL,IAU I [NP][NL],[NP][NU] O Off-Diagonal Blocks at Each Node IWKX I [NP][] O Work Arrays METHOD I I Iterative Method (fixed as ) PRECOND I I Preconditioning Method (: SSOR, : Block Diagonal Scaling) ITER, ITERactual I I Number of CG Iterations (MAX, Actual) RESID R I Convergence Criteria (fixed as.e-8) SIGMA_DIAG R I Coefficient for LU Factoriation (fixed as.) pfemiarray I [] O Integer Parameter Array pfemrarray R [] O Real Parameter Array

11 Parallel FEM 3D- Global Variables: pfem_util.h (3/4) Name Typ e Sie I/O Definition O8th R I =.5 i i i PNQ, PNE, PNT R [][][8] O,, i ~ 8at each Gaussian Quad. Point POS, WEI R [] O NCOL, NCOL I [] O Work arrays for sorting Coordinates, Weighting Factor at each Gaussian Quad. Point SHAPE R [][][][8] O N i (i=~8) at each Gaussian Quad Point PNX, PNY, PNZ R [][][][8] O i i i,, i ~ 8 at each Gaussian Quad. Point DETJ R [][][] O ELAST, POISSON SIGMA_N, TAU_N Determinant of Jacobian Matrix at each Gaussian Quad. Point R I Young s Modulus, Poisson s Ratio R [N][3] O Normal/Shear Stress at each Node N N x N N y N N

12 Parallel FEM 3D- Global Variables: pfem_util.h (4/4) Name Type Sie I/O Definition PETOT I O Number of PE s my_rank I O Process ID of MPI errno I O Error Flag NEIBPETOT I I Number of Neighbors NEIBPE I [NEIBPETOT] I ID of Neighbor IMPORT_INDEX EXPORT_INEDX I [NEIBPETOT+] I Sie of Import/Export Arrays for Communication Table IMPORT_ITEM I [NPimport] I EXPORT_ITEM I [NPexport] I Receiving Table (External Points) NPimport=IMPORT_INDEX[NEIBPETOT]) Sending Table (Boundary Points) NPexport=EXPORT_INDEX[NEIBPETOT]) ICELTOT_INT I I Number of Local Elements intelem_list I [ICELTOT_INT] I List of Local Elements iterpremax I I Number of ASDD Iterations

13 Parallel FEM 3D- 3 Start/End: MPI_Init/Finalie #include "pfem_util.h" void PFEM_INIT(int argc, char* argv[]) { int i; MPI_Init(&argc,&argv); MPI_Comm_sie(MPI_COMM_WORLD,&PETOT); MPI_Comm_rank(MPI_COMM_WORLD,&my_rank); for(i=;i<;i++) pfemrarray[i]=.; for(i=;i<;i++) pfemiarray[i]=; #include <stdio.h> #include <stdlib.h> #include "pfem_util.h" void PFEM_FINALIZE() { MPI_Finalie (); if( my_rank == ){ fprintf(stdout,"* normal terminatio n"); exit();

14 Parallel FEM 3D- 4 Reading Control File: INPUT_CNTL #include <stdio.h> #include <stdlib.h> #include "pfem_util.h" /** **/ void INPUT_CNTL() { FILE *fp; if( my_rank == ){ if( (fp=fopen("input.dat","r")) == NULL){ fprintf(stdout,"input file cannot be opened! n"); exit(); fscanf(fp,"%s",header); fscanf(fp,"%d %d",&method,&precond); fscanf(fp,"%d",&iterpremax); fscanf(fp,"%d",&iter); fclose(fp); MPI_Bcast(HEADER,8,MPI_CHAR,,MPI_COMM_WORLD); MPI_Bcast(&METHOD,,MPI_INTEGER,,MPI_COMM_WORLD); MPI_Bcast(&PRECOND,,MPI_INTEGER,,MPI_COMM_WORLD); MPI_Bcast(&iterPREmax,,MPI_INTEGER,,MPI_COMM_WORLD); MPI_Bcast(&ITER,,MPI_INTEGER,,MPI_COMM_WORLD); SIGMA_DIAG=.; SIGMA =.; RESID =.e-8; NSET = ; pfemrarray[]= RESID; pfemrarray[]= SIGMA_DIAG; pfemrarray[]= SIGMA; pfemiarray[]= ITER; pfemiarray[]= METHOD; pfemiarray[]= PRECOND; pfemiarray[3]= NSET; pfemiarray[4]= iterpremax;

15 Parallel FEM 3D- 5 Reading Meshes: INPUT_GRID (/4) #include <stdio.h> #include <stdlib.h> #include "pfem_util.h" #include "allocate.h" /*** external functions **/ extern void ERROR_EXIT (int, int); extern void DEFINE_FILE_NAME(char*,char*,int); /** **/ void INPUT_GRID() { FILE *fp; int i,j,k,ii,kk,kkk,nn,icel,is,ie,ic; int NTYPE,IMAT; int idummy; /** **/ DEFINE_FILE_NAME(HEADER, fname, my_rank); if( (fp=fopen(fname,"r")) == NULL){ fprintf(stdout,"input file cannot be opened! n"); exit(); NEIB-PE fscanf(fp,"%d",&kkk); fscanf(fp,"%d",&neibpetot); NEIBPE=(int*)allocate_vector(sieof(int),NEIBPETOT); for(i=;i<neibpetot;i++) fscanf(fp,"%d",&neibpe[i]); for(i=;i<neibpetot;i++){ if( NEIBPE[i] > PETOT- ){ ERROR_EXIT (,my_rank);

16 Parallel FEM 3D- 6 Name of Distributed Local Mesh File: DEFINE_FILE_NAME HEADER + Rank ID #include <stdio.h> #include <string.h> void DEFINE_FILE_NAME (char HEADERo[], char filename[],int my_rank) { char string[8]; sprintf(string,".%-d",my_rank); strcpy(filename,headero); strcat(filename,string);

17 Parallel FEM 3D- 7 allocate, deallocate #include <stdio.h> #include <stdlib.h> void* allocate_vector(int sie,int m) { void *a; if ( ( a=(void * )malloc( m * sie ) ) == NULL ) { fprintf(stdout,"error:memory does not enough! in vector n"); exit(); return a; void deallocate_vector(void *a) { free( a ); void** allocate_matrix(int sie,int m,int n) { void **aa; int i; if ( ( aa=(void ** )malloc( m * sieof(void*) ) ) == NULL ) { fprintf(stdout,"error:memory does not enough! aa in matrix n"); exit(); if ( ( aa[]=(void * )malloc( m * n * sie ) ) == NULL ) { fprintf(stdout,"error:memory does not enough! in matrix n"); exit(); for(i=;i<m;i++) aa[i]=(char*)aa[i-]+sie*n; return aa; void deallocate_matrix(void **aa) { free( aa ); Same interface with FORTRAN

18 Parallel FEM 3D- 8 aaa. Local Numbering: Neighbors/Nodes 6 aaa ID of PE NEIBPETOT: # neighbors NEIBPE(neib): ID of neighbors

19 Parallel FEM 3D- 9 Reading Meshes: INPUT_GRID (/4) /** **/ NODE fscanf(fp,"%d %d",&np,&n); XYZ =(KREAL**)allocate_matrix(sieof(KREAL),NP,3); NODE_ID=(KINT **)allocate_matrix(sieof(kint ),NP,); for(i=;i<np;i++){ for(j=;j<3;j++){ XYZ[i][j]=.; for(i=;i<np;i++){ /** **/ fscanf(fp,"%d %d %lf %lf %lf",&node_id[i][],&node_id[i][],&xyz[i][],&xyz[i][],&xyz[i][]); ELEMENT fscanf(fp,"%d %d",&iceltot,&iceltot_int); ICELNOD=(KINT**)allocate_matrix(sieof(KINT),ICELTOT,8); intelem_list=(kint*)allocate_vector(sieof(kint),iceltot); ELEM_ID=(KINT**)allocate_matrix(sieof(KINT),ICELTOT,); for(i=;i<iceltot;i++) fscanf(fp,"%d",&ntype); for(icel=;icel<iceltot;icel++){ fscanf(fp,"%d %d %d %d %d %d %d %d %d %d %d", &ELEM_ID[icel][],&ELEM_ID[icel][], &IMAT, &ICELNOD[icel][],&ICELNOD[icel][],&ICELNOD[icel][],&ICELNOD[icel][3], &ICELNOD[icel][4],&ICELNOD[icel][5],&ICELNOD[icel][6],&ICELNOD[icel][7]); for(ic=;ic<iceltot_int;ic++) fscanf(fp,"%d",&intelem_list[ic]);

20 Parallel FEM 3D- Local Numbering: Nodes Local node ID starts from in each PE Same program for -CPU can be used: SPMD Local element ID also starts from Numbering: Internal -> External Points Double Numbering Local node ID at its home PE ID of home PE

21 Parallel FEM 3D- aaa. Local Numbering: Nodes 6 aaa (# Nodes, Internal Pts.)

22 Parallel FEM 3D- aaa. Local Numbering: Nodes 6 aaa Home PE, Local ID Coordinates Home PE, Local ID Coordinates

23 Parallel FEM 3D- 3 aaa. Local Numbering: Nodes 6 aaa Home PE, Local ID Coordinates Home PE, Local ID Coordinates

24 Parallel FEM 3D- 4 Reading Meshes: INPUT_GRID (/4) /** **/ NODE fscanf(fp,"%d %d",&np,&n); XYZ =(KREAL**)allocate_matrix(sieof(KREAL),NP,3); NODE_ID=(KINT **)allocate_matrix(sieof(kint ),NP,); for(i=;i<np;i++){ for(j=;j<3;j++){ XYZ[i][j]=.; for(i=;i<np;i++){ /** **/ fscanf(fp,"%d %d %lf %lf %lf",&node_id[i][],&node_id[i][],&xyz[i][],&xyz[i][],&xyz[i][]); ELEMENT fscanf(fp,"%d %d",&iceltot,&iceltot_int); ICELNOD=(KINT**)allocate_matrix(sieof(KINT),ICELTOT,8); intelem_list=(kint*)allocate_vector(sieof(kint),iceltot); ELEM_ID=(KINT**)allocate_matrix(sieof(KINT),ICELTOT,); for(i=;i<iceltot;i++) fscanf(fp,"%d",&ntype); for(icel=;icel<iceltot;icel++){ fscanf(fp,"%d %d %d %d %d %d %d %d %d %d %d", &ELEM_ID[icel][],&ELEM_ID[icel][], &IMAT, &ICELNOD[icel][],&ICELNOD[icel][],&ICELNOD[icel][],&ICELNOD[icel][3], &ICELNOD[icel][4],&ICELNOD[icel][5],&ICELNOD[icel][6],&ICELNOD[icel][7]); for(ic=;ic<iceltot_int;ic++) fscanf(fp,"%d",&intelem_list[ic]);

25 Parallel FEM 3D- 5 aaa. Local Numbering: Elements 6 aaa

26 Parallel FEM 3D- 6 aaa. Local Numbering: Elements 6 aaa ( 全要素, 領域所属要素 ) Home PE of Element Defined by home of 8 nodes If all of 8 nodes are internal pts., home of the element is that of 8 nodes. If external nodes are included, the smallest number of ID of home of the nodes is selected. In this case, home PE s of elements in overlapped region are all. used in OUTPUT_UCD

27 Parallel FEM 3D- 7 aaa. Local Numbering: Elements 6 aaa (Element type for all elements)

28 Parallel FEM 3D- 8 aaa. Local Numbering: Elements 6 aaa Double Numbering for Element Local ID at home PE ID of home PE Material ID 8 Nodes Underlined local ID is used in the program

29 Parallel FEM 3D- 9 aaa. Local Numbering: Elements 6 aaa aaa., are Local Elements ( Home Elements ) aaa.,, 3 are Local Elements

30 Parallel FEM 3D- 3 Reading Meshes: INPUT_GRID (3/4) /** **/ COMMUNICATION table IMPORT_INDEX=(int*)allocate_vector(sieof(int),NEIBPETOT+); EXPORT_INDEX=(int*)allocate_vector(sieof(int),NEIBPETOT+); for(i=;i<neibpetot+;++i) IMPORT_INDEX[i]=; for(i=;i<neibpetot+;++i) EXPORT_INDEX[i]=; if( PETOT!= ) { for(i=;i<=neibpetot;i++) fscanf(fp,"%d",&import_index[i]); nn=import_index[neibpetot]; if( nn > ) IMPORT_ITEM=(int*)allocate_vector(sieof(int),nn); for(i=;i<nn;i++) fscanf(fp,"%d %d",&import_item[i],&idummy); /** **/ NODE grp. info. for(i=;i<=neibpetot;i++) fscanf(fp,"%d",&export_index[i]); nn=export_index[neibpetot]; if( nn > ) EXPORT_ITEM=(int*)allocate_vector(sieof(int),nn); for(i=;i<nn;i++) fscanf(fp,"%d",&export_item[i]); fscanf(fp,"%d",&nodgrptot); NODGRP_INDEX=(KINT* )allocate_vector(sieof(kint),nodgrptot+); NODGRP_NAME =(CHAR8*)allocate_vector(sieof(CHAR8),NODGRPtot); for(i=;i<nodgrptot+;i++) NODGRP_INDEX[i]=; for(i=;i<nodgrptot;i++) fscanf(fp,"%d",&nodgrp_index[i+]); nn=nodgrp_index[nodgrptot]; NODGRP_ITEM=(KINT*)allocate_vector(sieof(KINT),nn); for(k=;k<nodgrptot;k++){ is= NODGRP_INDEX[k]; ie= NODGRP_INDEX[k+]; fscanf(fp,"%s",nodgrp_name[k].name); nn= ie - is; if( nn!= ){ for(kk=is;kk<ie;kk++) fscanf(fp,"%d",&nodgrp_item[kk]);

31 Parallel FEM 3D- 3 PE-to-PE Communication Generalied Communication Tables Communication in parallel FEM means obtaining information of external points from their home PE s Communication Tables describe relationship of external points among PE s Send/Export, Recv/Import Sending information of boundary points Receiving information of external points

32 Parallel FEM 3D- 3 Generalied Comm. Table: Send Neighbors NeibPETot,NeibPE[neib] Message sie for each neighbor export_index[neib], neib=, NeibPETot- ID of boundary points export_item[k], k=, export_index[neibpetot]- Messages to each neighbor SendBuf[k], k=, export_index[neibpetot]-

33 Parallel FEM 3D- 33 Communication Table (Send/Export) aaa. 6 aaa export_index(neib) export_item 4 7 export_index Sie of Messages sent to Each Neighbor # Neighbors= in this case export_item Local ID of boundary points

34 Parallel FEM 3D- 34 SEND: MPI_Isend/Irecv/Waitall SendBuf neib# neib# neib# neib#3 BUFlength_e BUFlength_e BUFlength_e BUFlength_e export_index[] export_index[] export_index[] export_index[3] export_index[4] export_item (export_index[neib]:export_index[neib+]-) are sent to neib-th neighbor for (neib=; neib<neibpetot;neib++){ for (k=export_index[neib];k<export_index[neib+];k++){ kk= export_item[k]; SendBuf[k]= VAL[kk]; for (neib=; neib<neibpetot; neib++){ tag= ; is_e= export_index[neib]; ie_e= export_index[neib+]; BUFlength_e= ie_e - is_e Copies to sending buffers ierr= MPI_Isend (&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],, MPI_COMM_WORLD, &ReqSend[neib]) MPI_Waitall(NeibPETot, ReqSend, StatSend);

35 Parallel FEM 3D- 35 Generalied Comm. Table: Receive Neighbors NeibPETot,NeibPE[neib] Message sie for each neighbor import_index[neib], neib=, NeibPETot- ID of external points import_item[k], k=, import_index[neibpetot]- Messages from each neighbor RecvBuf[k], k=, import_index[neibpetot]-

36 Parallel FEM 3D- 36 Communication Table (Recv/Import) aaa. 6 aaa import_index(neib) 3 import_item, home PE export_index(neib) export_item 4 7 import_index Sie of Messages recv. from Each Neighbor # Neighbors= in this case import_item Local ID of external points, ant their home

37 Parallel FEM 3D- 37 RECV: MPI_Isend/Irecv/Waitall for (neib=; neib<neibpetot; neib++){ tag= ; is_i= import_index[neib]; ie_i= import_index[neib+]; BUFlength_i= ie_i - is_i ierr= MPI_Irecv (&RecvBuf[iS_i], BUFlength_i, MPI_DOUBLE, NeibPE[neib],, MPI_COMM_WORLD, &ReqRecv[neib]) MPI_Waitall(NeibPETot, ReqRecv, StatRecv); for (neib=; neib<neibpetot;neib++){ for (k=import_index[neib];k<import_index[neib+];k++){ kk= import_item[k]; VAL[kk]= RecvBuf[k]; Copies from receiving buffers import_item (import_index[neib]:import_index[neib+]-) are received from neib-th neighbor RecvBuf neib# neib# neib# neib#3 BUFlength_i BUFlength_i BUFlength_i BUFlength_i import_index[] import_index[] import_index[] import_index[3] import_index[4]

38 Parallel FEM 3D- 38 Node-based Partitioning internal nodes - elements - external nodes PE# PE# PE# PE# PE#3 PE# PE# PE#

39 Parallel FEM 3D- 39 PE-to-PE comm. : Local Data PE# PE# PE# PE# (...) PE# PE# 7 7 3

40 Parallel FEM 3D- 4 PE-to-PE comm. : Local Data PE# PE# PE# PE# (...) ID of the PE # Neighbors 3 ID Neighbors NEIBPE= NEIBPE[]=3, NEIBPE[]= PE# PE# 7 7 3

41 Parallel FEM 3D- 4 PE-to-PE comm. : SEND PE# PE# PE# PE# (...) export_index PE# PE# export_index[]= export_index[]= export_index[]= +3 = 5 export_item[-4]=,4,4,5,6 4 th node is sent to two PE s

42 Parallel FEM 3D- 4 PE-to-PE comm. : RECV PE# PE# PE# PE# (...) import_index PE# PE# import_index[]= import_index[]= 3 import_index[]= 3+3 = 6 import_item[-5]=7,8,,9,, 7 7 3

43 Parallel FEM 3D- 43 Reading Meshes: INPUT_GRID (4/4) /** **/ ELEMENT grp. info. fscanf(fp,"%d",&elmgrptot); ELMGRP_INDEX=(KINT* )allocate_vector(sieof(int),elmgrptot+); ELMGRP_NAME =(CHAR8*)allocate_vector(sieof(CHAR8),ELMGRPtot); for(i=;i<elmgrptot+;i++) ELMGRP_INDEX[i]=; for(i=;i<elmgrptot;i++) fscanf(fp,"%d",&elmgrp_index[i+]); nn=elmgrp_index[elmgrptot]; ELMGRP_ITEM=(KINT *)allocate_vector(sieof(kint),nn); /** **/ for(k=;k<elmgrptot;k++){ is=elmgrp_index[k]; ie=elmgrp_index[k+]; fscanf(fp,"%s",elmgrp_name[k].name); nn= ie - is ; if( nn!= ){ for(kk=is;kk<ie;kk++) fscanf(fp,"%d",&elmgrp_item[kk]); SURFACE grp. info. fscanf(fp,"%d",&sufgrptot); SUFGRP_INDEX=(KINT* )allocate_vector(sieof(kint),sufgrptot+); SUFGRP_NAME =(CHAR8*)allocate_vector(sieof(CHAR8),SUFGRPtot); for(i=;i<sufgrptot+;i++) SUFGRP_INDEX[i]=; for(i=;i<sufgrptot;i++) fscanf(fp,"%d",&sufgrp_index[i+]); nn= SUFGRP_INDEX[SUFGRPtot]; SUFGRP_ITEM=(KINT*)allocate_vector(sieof(KINT),*nn); for(k=;k<sufgrptot;k++){ is=sufgrp_index[k]; ie=sufgrp_index[k+]; fscanf(fp,"%s",sufgrp_name[k].name); nn= ie - is; if( nn!= ){ for(kk=is;kk<ie;kk++) fscanf(fp,"%d",&sufgrp_item[*kk ]); for(kk=is;kk<ie;kk++) fscanf(fp,"%d",&sufgrp_item[*kk+]); fclose(fp);

44 Parallel FEM 3D- 44 Parallel FEM Procedures: Program Initialiation Control Data Node, Connectivity of Elements (N: Node#, NE: Elem#) Initialiation of Arrays (Global/Element Matrices) Element-Global Matrix Mapping (Index, Item) Generation of Matrix Element-by-Element Operations (do icel=, NE) Element matrices Accumulation to global matrix Boundary Conditions Linear Solver Conjugate Gradient Method Calculation of Stress

45 Parallel FEM 3D- test_p main input_cntl Input of control data input_grid Input of mesh info find_node searching nodes NOT so different from -CPU code in summer semester mat_con connectivity of matrix mat_con connectivity of matrix mat_ass_main coefficient matrix msort sorting jacobi Jacobian mat_ass_bc boundary conditions Structure of fem3d (parallel) solve33 control of linear solver recover_stress stress calculation output_ucd visualiation cg_3 CG solver jacobi Jacobian

46 Parallel FEM 3D- 46 Some Features of fem3d Non-Zero Off-Diagonals Upper/Lower triangular components are stored separately. U Stored as Block Vector: 3-components per node Matrix: 9-components per block Processed as block based on 3 variables on each node (not each component of matrix) L

47 Parallel FEM 3D- 47 Storing 3x3 Block (/3) Less memory requirement Index, Item i j i i j j i j

48 Parallel FEM 3D- 48 Storing 3x3 Block (/3) Computational Efficiency Ratio of (Computation/Indirect Memory Access) is larger >x speed-up both for vector/scalar processors Contiguous memory access, Cache Utiliation, Larger Flop/Byte do i=, 3*N Y(i)= D(i)*X(i) do k= index(i-)+, index(i) kk= item(k) Y(i)= Y(i) + AMAT(k)*X(k) enddo enddo do i=, N X= X(3*i-) X= X(3*i-) X3= X(3*i) Y(3*i-)= D(9*i-8)*X+D(9*i-7)*X+D(9*i-6)*X3 Y(3*i-)= D(9*i-5)*X+D(9*i-4)*X+D(9*i-3)*X3 Y(3*I )= D(9*i-)*X+D(9*i-)*X+D(9*I )*X3 do k= index(i-)+, index(i) kk= item(k) X= X(3*kk-) X= X(3*kk-) X3= X(3*kk) Y(3*i-)= Y(3*i-)+AMAT(9*k-8)*X+AMAT(9*k-7)*X & +AMAT(9*k-6)*X3 Y(3*i-)= Y(3*i-)+AMAT(9*k-5)*X+AMAT(9*k-4)*X & +AMAT(9*k-3)*X3 Y(3*I )= Y(3*I )+AMAT(9*k-)*X+AMAT(9*k-)*X & +AMAT(9*k )*X3 enddo enddo

49 Parallel FEM 3D- 49 Storing 3x3 Block (3/3) Stabiliation of Computation Instead of division by diagonal components, full LU factoriation of 3x3 Diagonal Block is applied. Effective for ill-conditioned problems i j i j i i j j

50 Parallel FEM 3D- 5 Definitions of Terms i j Block (Node): i i Component (DOF): j j i j

51 Parallel FEM 3D- 5 Towards Matrix Assembling In D, it was easy to obtain information related to index and item. non-ero off-diagonals for each node ID of non-ero off-diagonal : i+, i-, where i is node ID In 3D, situation is more complicated: Number of non-ero off-diagonal blocks is between 7 and 6 for the current target problem More complicated for real problems. Generally, there are no information related to number of non-ero off-diagonal blocks beforehand.

52 Parallel FEM 3D- 5 Towards Matrix Assembling In D, it was easy to obtain information related to index and item. non-ero off-diagonals for each node ID of non-ero off-diagonal : i+, i-, where i is node ID In 3D, situation is more complicated: Number of non-ero off-diagonal blocks is between 7 and 6 for the current target problem More complicated for real problems. Generally, there are no information related to number of non-ero off-diagonal blocks beforehand. Count number of non-ero off-diagonals using arrays: INL[N],INU[N],IAL[N][NL], IAU[N][NU]

53 Parallel FEM 3D- 53 Main Part #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" //#include "solver33.h" extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON(); extern void MAT_CON(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void OUTPUT_UCD(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; PFEM_INIT(argc,argv); INPUT_CNTL(); INPUT_GRID(); MAT_CON(); MAT_CON(); MAT_ASS_MAIN(); MAT_ASS_BC() ; SOLVE33(); RECOVER_STRESS(); OUTPUT_UCD() ; PFEM_FINALIZE() ; MAT_CON: generate INL,INU, IAL, IAU MAT_CON: generate index, item Storing node ID starting from (not )

54 Parallel FEM 3D- 54 MAT_CON: Overview do icel=, ICELTOT generate INL, INU, IAL, IAU according to 8 nodes of hexahedral element (FIND_NODE) enddo,, 8 7,, 5 6,,,, 4 3,,,, ,,,,,,

55 Parallel FEM 3D- 55 Generating Connectivity of Matrix MAT_CON (/4) #include <stdio.h> #include "pfem_util.h" #include "allocate.h" extern FILE *fp_log; extern void msort(int*, int*, int); static void FIND_TS_NODE (int,int); void MAT_CON() { int i,j,k,icel,in; int in,in,in3,in4,in5,in6,in7,in8; int NN; N= 56; not in use NU= 6; NL= 6; INL=(KINT* )allocate_vector(sieof(kint),np); IAL=(KINT**)allocate_matrix(sieof(KINT),NP,NL); INU=(KINT* )allocate_vector(sieof(kint),np); IAU=(KINT**)allocate_matrix(sieof(KINT),NP,NU); for(i=;i<np;i++) INL[i]=; for(i=;i<np;i++) for(j=;j<nl;j++) IAL[i][j]=; for(i=;i<np;i++) INU[i]=; for(i=;i<np;i++) for(j=;j<nu;j++) IAU[i][j]=; NU,NL: Number of maximum number of connected nodes to each node (number of upper/lower non-ero off-diagonal blocks) In the current problem, geometry is rather simple. Therefore we can specify NU and NL in this way. If it s not clear -> implement that in Report # of Summer Semester

56 Parallel FEM 3D- 56 Generating Connectivity of Matrix MAT_CON (/4) #include <stdio.h> #include "pfem_util.h" #include "allocate.h" extern FILE *fp_log; extern void msort(int*, int*, int); static void FIND_TS_NODE (int,int); void MAT_CON() { int i,j,k,icel,in; int in,in,in3,in4,in5,in6,in7,in8; int NN; N= 56; not in use NU= 6; NL= 6; Array Sie Description INL, INU IAL, IAU [NP] [NP][NL], [NP][NU] Number of connected nodes to each node (lower/upper) Corresponding connected node ID (column ID) INL=(KINT* )allocate_vector(sieof(kint),np); IAL=(KINT**)allocate_matrix(sieof(KINT),NP,NL); INU=(KINT* )allocate_vector(sieof(kint),np); IAU=(KINT**)allocate_matrix(sieof(KINT),NP,NU); for(i=;i<np;i++) INL[i]=; for(i=;i<np;i++) for(j=;j<nl;j++) IAL[i][j]=; for(i=;i<np;i++) INU[i]=; for(i=;i<np;i++) for(j=;j<nu;j++) IAU[i][j]=;

57 Parallel FEM 3D- 57 Generating Connectivity of Matrix MAT_CON (/4): Starting from for( icel=;icel< ICELTOT;icel++){ in=icelnod[icel][]; in=icelnod[icel][]; in3=icelnod[icel][]; in4=icelnod[icel][3]; in5=icelnod[icel][4]; in6=icelnod[icel][5]; in7=icelnod[icel][6]; in8=icelnod[icel][7]; FIND_TS_NODE (in,in); FIND_TS_NODE (in,in3); FIND_TS_NODE (in,in4); FIND_TS_NODE (in,in5); FIND_TS_NODE (in,in6); FIND_TS_NODE (in,in7); FIND_TS_NODE (in,in8);,,,,,, 8 7,, 5 6,,,, 4 3,,,,,, FIND_TS_NODE (in,in); FIND_TS_NODE (in,in3); FIND_TS_NODE (in,in4); FIND_TS_NODE (in,in5); FIND_TS_NODE (in,in6); FIND_TS_NODE (in,in7); FIND_TS_NODE (in,in8); FIND_TS_NODE (in3,in); FIND_TS_NODE (in3,in); FIND_TS_NODE (in3,in4); FIND_TS_NODE (in3,in5); FIND_TS_NODE (in3,in6); FIND_TS_NODE (in3,in7); FIND_TS_NODE (in3,in8);

58 Parallel FEM 3D- 58 FIND_TS_NODE: Search Connectivity INL,INU,IAL,IAU: Automatic Search static void FIND_TS_NODE (int ip,int ip) { int kk,icou; if( ip > ip ){ for(kk=;kk<inl[ip-];kk++){ if(ip == IAL[ip-][kk]) return; icou=inl[ip-]+; IAL[ip-][icou-]=ip; INL[ip-]=icou; return; if( ip > ip ) { for(kk=;kk<inu[ip-];kk++){ if(ip == IAU[ip-][kk]) return; icou=inu[ip-]+; IAU[ip-][icou-]=ip; INU[ip-]=icou; return; Array Sie Description INL, INU IAL, IAU [NP] [NP][NL], [NP][NU] Number of connected nodes to each node (lower/upper) Corresponding connected node ID (column ID)

59 Parallel FEM 3D- 59 FIND_TS_NODE: Search Connectivity INL,INU,IAL,IAU: Automatic Search static void FIND_TS_NODE (int ip,int ip) { int kk,icou; if( ip > ip ){ for(kk=;kk<inl[ip-];kk++){ if(ip == IAL[ip-][kk]) return; icou=inl[ip-]+; IAL[ip-][icou-]=ip; INL[ip-]=icou; return; If the target node is already included in IAL,IAU, proceed to next pair of nodes if( ip > ip ) { for(kk=;kk<inu[ip-];kk++){ if(ip == IAU[ip-][kk]) return; icou=inu[ip-]+; IAU[ip-][icou-]=ip; INU[ip-]=icou; return;

60 Parallel FEM 3D- 6 FIND_TS_NODE: Search Connectivity INL,INU,IAL,IAU: Automatic Search static void FIND_TS_NODE (int ip,int ip) { int kk,icou; if( ip > ip ){ for(kk=;kk<inl[ip-];kk++){ if(ip == IAL[ip-][kk]) return; icou=inl[ip-]+; IAL[ip-][icou-]=ip; INL[ip-]=icou; return; If the target node is NOT included in IAL,IAU, store the node in IAL/IAU, and add to INL/INU (Node ID in IAL and IAU starts from ). if( ip > ip ) { for(kk=;kk<inu[ip-];kk++){ if(ip == IAU[ip-][kk]) return; icou=inu[ip-]+; IAU[ip-][icou-]=ip; INU[ip-]=icou; return;

61 Parallel FEM 3D- 6 Generating Connectivity of Matrix MAT_CON (3/4) FIND_TS_NODE (in4,in); FIND_TS_NODE (in4,in); FIND_TS_NODE (in4,in3); FIND_TS_NODE (in4,in5); FIND_TS_NODE (in4,in6); FIND_TS_NODE (in4,in7); FIND_TS_NODE (in4,in8); FIND_TS_NODE (in5,in); FIND_TS_NODE (in5,in); FIND_TS_NODE (in5,in3); FIND_TS_NODE (in5,in4); FIND_TS_NODE (in5,in6); FIND_TS_NODE (in5,in7); FIND_TS_NODE (in5,in8); FIND_TS_NODE (in6,in); FIND_TS_NODE (in6,in); FIND_TS_NODE (in6,in3); FIND_TS_NODE (in6,in4); FIND_TS_NODE (in6,in5); FIND_TS_NODE (in6,in7); FIND_TS_NODE (in6,in8);,,,,,, 8 7,, 5 6,,,, 4 3,,,,,, FIND_TS_NODE (in7,in); FIND_TS_NODE (in7,in); FIND_TS_NODE (in7,in3); FIND_TS_NODE (in7,in4); FIND_TS_NODE (in7,in5); FIND_TS_NODE (in7,in6); FIND_TS_NODE (in7,in8);

62 Parallel FEM 3D- 6 Generating Connectivity of Matrix MAT_CON (4/4) FIND_TS_NODE (in8,in); FIND_TS_NODE (in8,in); FIND_TS_NODE (in8,in3); FIND_TS_NODE (in8,in4); FIND_TS_NODE (in8,in5); FIND_TS_NODE (in8,in6); FIND_TS_NODE (in8,in7); Sort IAL[i][k],IAU[i][k] in ascending order by bubble sorting for less than components. for(in=;in<n;in++){ NN=INL[in]; for (k=;k<nn;k++){ NCOL[k]=IAL[in][k]; msort(ncol,ncol,nn); for(k=nn;k>;k--){ IAL[in][NN-k]= NCOL[NCOL[k-]-]; NN=INU[in]; for (k=;k<nn;k++){ NCOL[k]=IAU[in][k]; msort(ncol,ncol,nn); for(k=nn;k>;k--){ IAU[in][NN-k]= NCOL[NCOL[k-]-];

63 Parallel FEM 3D- 63 MAT_CON: CRS format #include <stdio.h> #include "pfem_util.h" #include "allocate.h" extern FILE* fp_log; void MAT_CON() { int i,k,kk; indexl=(kint*)allocate_vector(sieof(kint),np+); indexu=(kint*)allocate_vector(sieof(kint),np+); for(i=;i<np+;i++) indexl[i]=; for(i=;i<np+;i++) indexu[i]=; for(i=;i<np;i++){ indexl[i+]=indexl[i]+inl[i]; indexu[i+]=indexu[i]+inu[i]; NPL=indexL[NP]; NPU=indexU[NP]; iteml=(kint*)allocate_vector(sieof(kint),npl); itemu=(kint*)allocate_vector(sieof(kint),npu); for(i=;i<np;i++){ for(k=;k<inl[i];k++){ kk=k+indexl[i]; iteml[kk]=ial[i][k]-; for(k=;k<inu[i];k++){ kk=k+indexu[i]; itemu[kk]=iau[i][k]-; deallocate_vector(inl); deallocate_vector(inu); deallocate_vector(ial); deallocate_vector(iau); C indexl[ i ] indexu[ i ] k k INL[ k] INU[ k] indexl[] indexu[] FORTRAN indexl[ i] indexu[ i] i k k indexl[] indexu[] i i i INL[ k] INU[ k]

64 Parallel FEM 3D- 64 MAT_CON: CRS format #include <stdio.h> #include "pfem_util.h" #include "allocate.h" extern FILE* fp_log; void MAT_CON() { int i,k,kk; indexl=(kint*)allocate_vector(sieof(kint),np+); indexu=(kint*)allocate_vector(sieof(kint),np+); for(i=;i<np+;i++) indexl[i]=; for(i=;i<np+;i++) indexu[i]=; for(i=;i<np;i++){ indexl[i+]=indexl[i]+inl[i]; indexu[i+]=indexu[i]+inu[i]; NPL=indexL[NP]; NPU=indexU[NP]; iteml=(kint*)allocate_vector(sieof(kint),npl); itemu=(kint*)allocate_vector(sieof(kint),npu); for(i=;i<np;i++){ for(k=;k<inl[i];k++){ kk=k+indexl[i]; iteml[kk]=ial[i][k]-; for(k=;k<inu[i];k++){ kk=k+indexu[i]; itemu[kk]=iau[i][k]-; deallocate_vector(inl); deallocate_vector(inu); deallocate_vector(ial); deallocate_vector(iau); NPL=indexL[N] Sie of array: iteml Total number of lower non-ero off-diagonal blocks NPU=indexU[N] Sie of array: itemu Total number of upper non-ero off-diagonal blocks

65 Parallel FEM 3D- 65 MAT_CON: CRS format #include <stdio.h> #include "pfem_util.h" #include "allocate.h" extern FILE* fp_log; void MAT_CON() { int i,k,kk; indexl=(kint*)allocate_vector(sieof(kint),np+); indexu=(kint*)allocate_vector(sieof(kint),np+); for(i=;i<np+;i++) indexl[i]=; for(i=;i<np+;i++) indexu[i]=; for(i=;i<np;i++){ indexl[i+]=indexl[i]+inl[i]; indexu[i+]=indexu[i]+inu[i]; NPL=indexL[NP]; NPU=indexU[NP]; iteml=(kint*)allocate_vector(sieof(kint),npl); itemu=(kint*)allocate_vector(sieof(kint),npu); for(i=;i<np;i++){ for(k=;k<inl[i];k++){ kk=k+indexl[i]; iteml[kk]=ial[i][k]-; for(k=;k<inu[i];k++){ kk=k+indexu[i]; itemu[kk]=iau[i][k]-; deallocate_vector(inl); deallocate_vector(inu); deallocate_vector(ial); deallocate_vector(iau); iteml,itemu store node ID starting from

66 Parallel FEM 3D- 66 Main Part #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" //#include "solver33.h" extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON(); extern void MAT_CON(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void OUTPUT_UCD(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; PFEM_INIT(argc,argv); INPUT_CNTL(); INPUT_GRID(); MAT_CON(); MAT_CON(); MAT_ASS_MAIN(); MAT_ASS_BC() ; SOLVE33(); RECOVER_STRESS(); OUTPUT_UCD() ; PFEM_FINALIZE() ;

67 Parallel FEM 3D- 67 MAT_ASS_MAIN: Overview do kpn=, Gaussian Quad. points in -direction do jpn=, Gaussian Quad. points in -direction do ipn=, Gaussian Quad. Pointe in -direction Define Shape Function at Gaussian Quad. Points (8-points) Its derivative on natural/local coordinate is also defined. enddo enddo enddo do icel=, ICELTOT Loop for Element Jacobian and derivative on global coordinate of shape functions at Gaussian Quad. Points are defined according to coordinates of 8 nodes.(jacobi) do ie=, 8 Local Node ID do je=, 8 Local Node ID Global Node ID: ip, jp Address of A ip,jp in iteml,itemu : kk do kpn=, Gaussian Quad. points in -direction do jpn=, Gaussian Quad. points in -direction do ipn=, Gaussian Quad. points in -direction integration on each element coefficients of element matrices accumulation to global matrix enddo enddo enddo enddo enddo enddo i e j e

68 Parallel FEM 3D- 68 Storing 3x3 Block i j i i j j i j

69 Parallel FEM 3D- 69 MAT_ASS_MAIN (/7) #include <stdio.h> #include <math.h> #include "pfem_util.h" #include "allocate.h" extern FILE *fp_log; extern void JACOBI(); void MAT_ASS_MAIN() { int i,k,kk; int ip,jp,kp; int ipn,jpn,kpn; int icel; int ie,je; int iis,iie; int in,in,in3,in4,in5,in6,in7,in8; double vala,valb,valx; double QP,QM,EP,EM,TP,TM; double X,X,X3,X4,X5,X6,X7,X8; double Y,Y,Y3,Y4,Y5,Y6,Y7,Y8; double Z,Z,Z3,Z4,Z5,Z6,Z7,Z8; double PNXi,PNYi,PNZi,PNXj,PNYj,PNZj; double VOL; double coef; double a,a,a3,a,a,a3,a3,a3,a33; KINT nodlocal[8]; AL=(KREAL*) allocate_vector(sieof(kreal),9*npl); AU=(KREAL*) allocate_vector(sieof(kreal),9*npu); B =(KREAL*) allocate_vector(sieof(kreal),3*np ); D =(KREAL*) allocate_vector(sieof(kreal),9*np); X =(KREAL*) allocate_vector(sieof(kreal),3*np); Lower Triangle Blocks Upper RHS Diagonal Blocks Unkonws for(i=;i<9*npl;i++) AL[i]=.; for(i=;i<9*npu;i++) AU[i]=.; for(i=;i<3*np;i++) B[i]=.; for(i=;i<9*np;i++) D[i]=.; for(i=;i<3*np;i++) X[i]=.;

70 Parallel FEM 3D- 7 /*** ***/ MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; POS: WEI: Quad. Point Weighting Factor INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP;

71 Parallel FEM 3D- 7 MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; /*** ***/ INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP;

72 Parallel FEM 3D- 7 Strain-Stress Relationship x y xy y x x y xy y x E D D

73 Parallel FEM 3D- 73 MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; /*** ***/ INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; valx vala valb SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP; E E E

74 Parallel FEM 3D- 74 MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; /*** ***/ INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; valx vala valb SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP; E E E E E

75 Parallel FEM 3D- 75 Strain-Stress Relationship x y xy y x x y xy y x valb valb valb valx vala vala vala valx vala vala vala valx D D

76 Parallel FEM 3D- 76 MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; /*** ***/ INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; QP i EP TP k, QMi i j, EM j j, TMk k i i k SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP;

77 Parallel FEM 3D- 77 MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; /*** ***/ INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP;,,,,,, 8,, 5 6,, 4,, 7 3,,,,,,

78 Parallel FEM 3D- 78 MAT_ASS_MAIN (/7) WEI[]=.e; WEI[]=.e; POS[]= e; POS[]= e; /*** INIT. PNQ - st-order derivative of shape function by QSI PNE - st-order derivative of shape function by ETA PNT - st-order derivative of shape function by ZET ***/ vala= POISSON / (.e-poisson); valb= (.e-.e*poisson)/(.e*(.e-poisson)); valx= ELAST*(.e-POISSON)/((.e+POISSON)*(.e-.e*POISSON)); vala= vala * valx; valb= valb * valx; for(ip=;ip<;ip++){ for(jp=;jp<;jp++){ for(kp=;kp<;kp++){ QP=.e + POS[ip]; QM=.e - POS[ip]; EP=.e + POS[jp]; EM=.e - POS[jp]; TP=.e + POS[kp]; TM=.e - POS[kp]; SHAPE[ip][jp][kp][]= O8th * QM * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EM * TM; SHAPE[ip][jp][kp][]= O8th * QP * EP * TM; SHAPE[ip][jp][kp][3]= O8th * QM * EP * TM; SHAPE[ip][jp][kp][4]= O8th * QM * EM * TP; SHAPE[ip][jp][kp][5]= O8th * QP * EM * TP; SHAPE[ip][jp][kp][6]= O8th * QP * EP * TP; SHAPE[ip][jp][kp][7]= O8th * QM * EP * TP; 8 ),, ( 8 ),, ( 8 ),, ( 8 ),, ( 4 3 N N N N 8 ),, ( 8 ),, ( 8 ),, ( 8 ),, ( N N N N

79 Parallel FEM 3D- 79 MAT_ASS_MAIN (3/7) PNQ[jp][kp][]= - O8th * EM * TM; PNQ[jp][kp][]= + O8th * EM * TM; PNQ[jp][kp][]= + O8th * EP * TM; PNQ[jp][kp][3]= - O8th * EP * TM; PNQ[jp][kp][4]= - O8th * EM * TP; PNQ[jp][kp][5]= + O8th * EM * TP; PNQ[jp][kp][6]= + O8th * EP * TP; PNQ[jp][kp][7]= - O8th * EP * TP; PNE[ip][kp][]= - O8th * QM * TM; PNE[ip][kp][]= - O8th * QP * TM; PNE[ip][kp][]= + O8th * QP * TM; PNE[ip][kp][3]= + O8th * QM * TM; PNE[ip][kp][4]= - O8th * QM * TP; PNE[ip][kp][5]= - O8th * QP * TP; PNE[ip][kp][6]= + O8th * QP * TP; PNE[ip][kp][7]= + O8th * QM * TP; PNT[ip][jp][]= - O8th * QM * EM; PNT[ip][jp][]= - O8th * QP * EM; PNT[ip][jp][]= - O8th * QP * EP; PNT[ip][jp][3]= - O8th * QM * EP; PNT[ip][jp][4]= + O8th * QM * EM; PNT[ip][jp][5]= + O8th * QP * EM; PNT[ip][jp][6]= + O8th * QP * EP; PNT[ip][jp][7]= + O8th * QM * EP; for( icel=;icel< ICELTOT;icel++){ in=icelnod[icel][]; in=icelnod[icel][]; in3=icelnod[icel][]; in4=icelnod[icel][3]; in5=icelnod[icel][4]; in6=icelnod[icel][5]; in7=icelnod[icel][6]; in8=icelnod[icel][7]; N PNQ( j, k) l ( i, j, k ) N PNE( i, k) l ( i, j, k ) N PNT ( i, j) l ( i, j, k ) N ( i, j, k ) N ( i, j, k ) N3 ( i, j, k ) N3 ( i, j, k ) ( i j k,, ) j j j First Order Derivative of Shape Functions at j k k k k

80 Parallel FEM 3D- 8 MAT_ASS_MAIN (3/7) PNQ[jp][kp][]= - O8th * EM * TM; PNQ[jp][kp][]= + O8th * EM * TM; PNQ[jp][kp][]= + O8th * EP * TM; PNQ[jp][kp][3]= - O8th * EP * TM; PNQ[jp][kp][4]= - O8th * EM * TP; PNQ[jp][kp][5]= + O8th * EM * TP; PNQ[jp][kp][6]= + O8th * EP * TP; PNQ[jp][kp][7]= - O8th * EP * TP; PNE[ip][kp][]= - O8th * QM * TM; PNE[ip][kp][]= - O8th * QP * TM; PNE[ip][kp][]= + O8th * QP * TM; PNE[ip][kp][3]= + O8th * QM * TM; PNE[ip][kp][4]= - O8th * QM * TP; PNE[ip][kp][5]= - O8th * QP * TP; PNE[ip][kp][6]= + O8th * QP * TP; PNE[ip][kp][7]= + O8th * QM * TP; PNT[ip][jp][]= - O8th * QM * EM; PNT[ip][jp][]= - O8th * QP * EM; PNT[ip][jp][]= - O8th * QP * EP; PNT[ip][jp][3]= - O8th * QM * EP; PNT[ip][jp][4]= + O8th * QM * EM; PNT[ip][jp][5]= + O8th * QP * EM; PNT[ip][jp][6]= + O8th * QP * EP; PNT[ip][jp][7]= + O8th * QM * EP; for( icel=;icel< ICELTOT;icel++){ in=icelnod[icel][]; in=icelnod[icel][]; in3=icelnod[icel][]; in4=icelnod[icel][3]; in5=icelnod[icel][4]; in6=icelnod[icel][5]; in7=icelnod[icel][6]; in8=icelnod[icel][7];,,,, 5 6,,,,,, 8 4,, 7 3,,,,,,

81 Parallel FEM 3D- 8 MAT_ASS_MAIN (4/7) /** ** JACOBIAN & INVERSE JACOBIAN **/ nodlocal[]= in; nodlocal[]= in; nodlocal[]= in3; nodlocal[3]= in4; nodlocal[4]= in5; nodlocal[5]= in6; nodlocal[6]= in7; nodlocal[7]= in8; X=XYZ[in-][]; X=XYZ[in-][]; X3=XYZ[in3-][]; X4=XYZ[in4-][]; X5=XYZ[in5-][]; X6=XYZ[in6-][]; X7=XYZ[in7-][]; X8=XYZ[in8-][]; Y=XYZ[in-][]; Y=XYZ[in-][]; Y3=XYZ[in3-][]; Y4=XYZ[in4-][]; Y5=XYZ[in5-][]; Y6=XYZ[in6-][]; Y7=XYZ[in7-][]; Y8=XYZ[in8-][]; Z=XYZ[in-][]; Z=XYZ[in-][]; Z3=XYZ[in3-][]; Z4=XYZ[in4-][]; Z5=XYZ[in5-][]; Z6=XYZ[in6-][]; Z7=XYZ[in7-][]; Z8=XYZ[in8-][]; Node ID (Global) X-Coordinates of 8 nodes Y-Coordinates of 8 nodes Z-Coordinates of 8 nodes,,,,,,,, 8,, 5 6,, 4 7 3,,,,,, JACOBI(DETJ, PNQ, PNE, PNT, PNX, PNY, PNZ, X, X, X3, X4, X5, X6, X7, X8, Y, Y, Y3, Y4, Y5, Y6, Y7, Y8, Z, Z, Z3, Z4, Z5, Z6, Z7, Z8);

82 Parallel FEM 3D- 8 MAT_ASS_MAIN (5/7) /** **/ CONSTRUCT the GLOBAL MATRIX for(ie=;ie<8;ie++){ ip=nodlocal[ie]; for(je=;je<8;je++){ jp=nodlocal[je]; kk=; if( jp > ip ){ iis=indexu[ip-]; iie=indexu[ip ]; for( k=iis;k<iie;k++){ if( itemu[k] == jp- ){ kk=k; break; if( jp < ip ){ iis=indexl[ip-]; iie=indexl[ip ]; for( k=iis;k<iie;k++){ if( iteml[k] == jp- ){ kk=k; break; PNXi=.e; PNYi=.e; PNZi=.e; PNXj=.e; PNYj=.e; PNZj=.e; VOL=.e; Non-Zero Off-Diagonal Block in Global Matix A ip, jp kk:address in iteml, itemu starting from k: starting from ip,jp: node ID starting from

83 Parallel FEM 3D- 83 Element Matrix: 4 4 (u,v,w) components on each node are physically strongly-coupled: these three components are treated in block-wise manner: 8 8 matrix i e j e,, 8 7,, 5 6,,,, 4 3,,,,,,,,,, a a a i e j ie je ie je e 3 a a a i i e j e ie je e j e 3 a a a i i i e e j j e ie je e i e, j e 8

84 Parallel FEM 3D- 84 Global Matrix i p j p i p i p j p j p i p j p

85 Parallel FEM 3D- 85 MAT_ASS_MAIN (5/7) /** **/ CONSTRUCT the GLOBAL MATRIX for(ie=;ie<8;ie++){ ip=nodlocal[ie]; for(je=;je<8;je++){ jp=nodlocal[je]; kk=; if( jp > ip ){ iis=indexu[ip-]; iie=indexu[ip ]; for( k=iis;k<iie;k++){ if( itemu[k] == jp- ){ kk=k; break; if( jp < ip ){ iis=indexl[ip-]; iie=indexl[ip ]; for( k=iis;k<iie;k++){ if( iteml[k] == jp- ){ kk=k; break; Element Matrix(i e ~j e ): Local ID Global Matrix (i p ~j p ): Global ID kk:address in iteml, itemu starting from k: starting from ip,jp: node ID starting from PNXi=.e; PNYi=.e; PNZi=.e; PNXj=.e; PNYj=.e; PNZj=.e; VOL=.e;

86 Parallel FEM 3D- 86 MAT_ASS_MAIN (6/7) for(kpn=;kpn<;kpn++){ for(jpn=;jpn<;jpn++){ for(ipn=;ipn<;ipn++){ coef= -fabs(detj[ipn][jpn][kpn])*wei[ipn]*wei[jpn]*wei[kpn]; VOL+=coef; PNXi= PNX[ipn][jpn][kpn][ie]; PNYi= PNY[ipn][jpn][kpn][ie]; PNZi= PNZ[ipn][jpn][kpn][ie]; PNXj= PNX[ipn][jpn][kpn][je]; PNYj= PNY[ipn][jpn][kpn][je]; PNZj= PNZ[ipn][jpn][kpn][je]; a= (valx*pnxi*pnxj+valb*(pnyi*pnyj+pnzi*pnzj))*coef; a= (valx*pnyi*pnyj+valb*(pnzi*pnzj+pnxi*pnxj))*coef; a33= (valx*pnzi*pnzj+valb*(pnxi*pnxj+pnyi*pnyj))*coef; a= (vala*pnxi*pnyj + valb*pnxj*pnyi)*coef; a3= (vala*pnxi*pnzj + valb*pnxj*pnzi)*coef; a= (vala*pnyi*pnxj + valb*pnyj*pnxi)*coef; a3= (vala*pnyi*pnzj + valb*pnyj*pnzi)*coef; a3= (vala*pnzi*pnxj + valb*pnzj*pnxi)*coef; a3= (vala*pnzi*pnyj + valb*pnzj*pnyi)*coef; if (jp > ip) { AU[9*kk ]+=a; AU[9*kk+]+=a; AU[9*kk+]+=a3; AU[9*kk+3]+=a; AU[9*kk+4]+=a; AU[9*kk+5]+=a3; AU[9*kk+6]+=a3; AU[9*kk+7]+=a3; AU[9*kk+8]+=a33;

87 Parallel FEM 3D- 87 MAT_ASS_MAIN (6/7) for(kpn=;kpn<;kpn++){ for(jpn=;jpn<;jpn++){ for(ipn=;ipn<;ipn++){ coef= -fabs(detj[ipn][jpn][kpn])*wei[ipn]*wei[jpn]*wei[kpn]; VOL+=coef; PNXi= PNX[ipn][jpn][kpn][ie]; PNYi= PNY[ipn][jpn][kpn][ie]; PNZi= PNZ[ipn][jpn][kpn][ie]; PNXj= PNX[ipn][jpn][kpn][je]; PNYj= PNY[ipn][jpn][kpn][je]; PNZj= PNZ[ipn][jpn][kpn][je]; a= (valx*pnxi*pnxj+valb*(pnyi*pnyj+pnzi*pnzj))*coef; a= (valx*pnyi*pnyj+valb*(pnzi*pnzj+pnxi*pnxj))*coef; a33= (valx*pnzi*pnzj+valb*(pnxi*pnxj+pnyi*pnyj))*coef; a= (vala*pnxi*pnyj + valb*pnxj*pnyi)*coef; a3= (vala*pnxi*pnzj + valb*pnxj*pnzi)*coef; a= (vala*pnyi*pnxj + valb*pnyj*pnxi)*coef; a3= (vala*pnyi*pnzj + valb*pnyj*pnzi)*coef; a3= (vala*pnzi*pnxj + valb*pnzj*pnxi)*coef; a3= (vala*pnzi*pnyj + valb*pnzj*pnyi)*coef; if (jp > ip) { AU[9*kk ]+=a; AU[9*kk+]+=a; AU[9*kk+]+=a3; AU[9*kk+3]+=a; AU[9*kk+4]+=a; AU[9*kk+5]+=a3; AU[9*kk+6]+=a3; AU[9*kk+7]+=a3; AU[9*kk+8]+=a33; N k k j i k j i M j L i f W W W d d d f I ),, ( ),, ( d d d J N N y N y N b x N x N D dxdyd N N y N y N b x N x N D dv N N y N y N b x N x N D j i j i j i j i j i j i V j i j i j i det k j i k j i J W W W,, coef det ),, ( f

88 Parallel FEM 3D- 88 MAT_ASS_MAIN (6/7) for(kpn=;kpn<;kpn++){ for(jpn=;jpn<;jpn++){ for(ipn=;ipn<;ipn++){ coef= -fabs(detj[ipn][jpn][kpn])*wei[ipn]*wei[jpn]*wei[kpn]; VOL+=coef; PNXi= PNX[ipn][jpn][kpn][ie]; PNYi= PNY[ipn][jpn][kpn][ie]; PNZi= PNZ[ipn][jpn][kpn][ie]; PNXj= PNX[ipn][jpn][kpn][je]; PNYj= PNY[ipn][jpn][kpn][je]; PNZj= PNZ[ipn][jpn][kpn][je]; a= (valx*pnxi*pnxj+valb*(pnyi*pnyj+pnzi*pnzj))*coef; a= (valx*pnyi*pnyj+valb*(pnzi*pnzj+pnxi*pnxj))*coef; a33= (valx*pnzi*pnzj+valb*(pnxi*pnxj+pnyi*pnyj))*coef; a= (vala*pnxi*pnyj + valb*pnxj*pnyi)*coef; a3= (vala*pnxi*pnzj + valb*pnxj*pnzi)*coef; a= (vala*pnyi*pnxj + valb*pnyj*pnxi)*coef; a3= (vala*pnyi*pnzj + valb*pnyj*pnzi)*coef; a3= (vala*pnzi*pnxj + valb*pnzj*pnxi)*coef; a3= (vala*pnzi*pnyj + valb*pnzj*pnyi)*coef; x valx y vala vala xy y x if (jp > ip) { valaau[9*kk-9]+=a; AU[9*kk-8]+=a; valxau[9*kk-7]+=a3; vala AU[9*kk-6]+=a; valau[9*kk-5]+=a; valx AU[9*kk-4]+=a3; AU[9*kk-3]+=a3; AU[9*kk-]+=a3; AU[9*kk-]+=a33; valb valb x y xy y valb x

89 Parallel FEM 3D- 89 Equilibrium Eqn s in X-dir. a a a ij ij ij 3 a a a ij ij ij3 a a a ij3 ij 3 ij33 i, j 8 8 7,, 5 6,,,, 4,,,,,, 3,,,,,, a a a ij ij ij3 V V valx N vala N N N valb vala N i, x N j, valb Ni, N j, x V i, x i, x j, x j, y N valb N i, y i, y N N j, y j, x dv dv N i, N j, dv

90 Parallel FEM 3D- 9 Equilibrium Eqn s in Y-dir. a a a ij ij ij 3 a a a ij ij ij3 a a a ij3 ij 3 ij33 i, j 8 8 7,, 5 6,,,, 4,,,,,, 3,,,,,, a a a ij ij ij 3 V V vala N valx N N N valb N valb vala N i, y N j, valb Ni, N j, y V i, y i, y j, x j, y i, x N i, N N j, y j, dv dv N i, x N j, x dv

91 Parallel FEM 3D- 9 Equilibrium Eqn s in Z-dir. a a a ij ij ij 3 a a a ij ij ij3 a a a ij3 ij 3 ij33 i, j 8 8 7,, 5 6,,,, 4,,,,,, 3,,,,,, a a a ij3 ij3 ij33 V V vala N vala N N N valb N valb N N N vald N i, N j, valb Ni, x N j, x Ni, y N j, y V i, i, j, x j, y i, x i, y j, j, dv dv dv

92 Parallel FEM 3D- 9 MAT_ASS_MAIN (6/7) for(kpn=;kpn<;kpn++){ for(jpn=;jpn<;jpn++){ for(ipn=;ipn<;ipn++){ coef= -fabs(detj[ipn][jpn][kpn])*wei[ipn]*wei[jpn]*wei[kpn]; VOL+=coef; PNXi= PNX[ipn][jpn][kpn][ie]; PNYi= PNY[ipn][jpn][kpn][ie]; PNZi= PNZ[ipn][jpn][kpn][ie]; PNXj= PNX[ipn][jpn][kpn][je]; PNYj= PNY[ipn][jpn][kpn][je]; PNZj= PNZ[ipn][jpn][kpn][je]; a= (valx*pnxi*pnxj+valb*(pnyi*pnyj+pnzi*pnzj))*coef; a= (valx*pnyi*pnyj+valb*(pnzi*pnzj+pnxi*pnxj))*coef; a33= (valx*pnzi*pnzj+valb*(pnxi*pnxj+pnyi*pnyj))*coef; a= (vala*pnxi*pnyj + valb*pnxj*pnyi)*coef; a3= (vala*pnxi*pnzj + valb*pnxj*pnzi)*coef; a= (vala*pnyi*pnxj + valb*pnyj*pnxi)*coef; a3= (vala*pnyi*pnzj + valb*pnyj*pnzi)*coef; a3= (vala*pnzi*pnxj + valb*pnzj*pnxi)*coef; a3= (vala*pnzi*pnyj + valb*pnzj*pnyi)*coef; if (jp > ip) { AU[9*kk ]+=a; AU[9*kk+]+=a; AU[9*kk+]+=a3; AU[9*kk+3]+=a; AU[9*kk+4]+=a; AU[9*kk+5]+=a3; AU[9*kk+6]+=a3; AU[9*kk+7]+=a3; AU[9*kk+8]+=a33; kk:address in iteml, itemu starting from

93 Parallel FEM 3D- 93 Element Matrix: 4 4 (u,v,w) components on each node are physically strongly-coupled: these three components are treated in block-wise manner: 8 8 matrix i e j e,, 8 7,, 5 6,,,, 4 3,,,,,,,,,, a a a i e j ie je ie je e 3 a a a i i e j e ie je e j e 3 a a a i i i e e j j e ie je e i e, j e 8

94 Parallel FEM 3D- 94 MAT_ASS_MAIN (7/7) if (jp < ip) { AL[9*kk ]+=a; AL[9*kk+]+=a; AL[9*kk+]+=a3; AL[9*kk+3]+=a; AL[9*kk+4]+=a; AL[9*kk+5]+=a3; AL[9*kk+6]+=a3; AL[9*kk+7]+=a3; AL[9*kk+8]+=a33; if (jp == ip) { D[9*ip ]+=a; D[9*ip+]+=a; D[9*ip+]+=a3; D[9*ip+3]+=a; D[9*ip+4]+=a; D[9*ip+5]+=a3; D[9*ip+6]+=a3; D[9*ip+7]+=a3; D[9*ip+8]+=a33;

95 Parallel FEM 3D- 95 MAT_ASS_BC: Overview do i=, N Loop for Nodes Mark nodes where Dirichlet B.C. are applied (IWKX) enddo do i=, N Loop for Nodes if (IWKX(i,).eq.) then if marked nodes corresponding components of RHS (B), Diagonal Block (D) are corrected (row, col.) do k= indexl(i-)+, indexl(i) Lower Triangular Block corresponding comp. of non-ero off-diagonal (lower-triangular) blocks (AL) are corrected enddo do k= indexu(i-)+, indexu(i) Upper Triangular Block corresponding comp. of non-ero off-diagonal (upper-triangular) blocks (AU) are corrected enddo endif enddo do i=, N 節点ループ do k= indexl(i-)+, indexl(i) Lower Triangular Block if (IWKX(itemL(k),).eq.) then if corresponding node (off-diagonal block) is marked corresponding components of RHS and AL are corrected (col.) endif enddo do k= indexu(i-)+, indexu(i) Upper Triangular Block if (IWKX(itemU(k),).eq.) then if corresponding node (off-diagonal block) is marked corresponding components of RHS and AU are corrected (col.) endif enddo enddo

96 Parallel FEM 3D- 96 #include <stdio.h> #include <stdlib.h> #include <string.h> #include "pfem_util.h" #include "allocate.h" extern FILE *fp_log; void MAT_ASS_BC() { int i,j,k,in,ib,ib,icel; int in,in,in3,in4,in5,in6,in7,in8; int iq,iq,iq3,iq4,iq5,iq6,iq7,iq8; int is,ie; double STRESS,VAL; MAT_ASS_BC (/9) IWKX=(KINT**) allocate_matrix(sieof(kint),np,); for(i=;i<n;i++) for(j=;j<;j++) IWKX[i][j]=;

97 Parallel FEM 3D- 97 MAT_ASS_BC (/9) /** **/ Z=Zmax for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmax") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ B[3*in ]= B[3*in ] - D[9*in+]*.e; B[3*in+]= B[3*in+] - D[9*in+5]*.e; D[9*in+]=.e; D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; X Z NX U Z = NZ Y NY

98 Parallel FEM 3D- 98 MAT_ASS_BC (/9) /** **/ Z=Zmax for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmax") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ B[3*in ]= B[3*in ] - D[9*in+]*.e; B[3*in+]= B[3*in+] - D[9*in+5]*.e; D[9*in+]=.e; D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; If the node in is included in the node group Zmax IWKX[in-][]= where in is global node ID starting from.

99 Parallel FEM 3D- 99 MAT_ASS_BC (/9) /** **/ Z=Zmax for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmax") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ B[3*in ]= B[3*in ] - D[9*in+]*.e; B[3*in+]= B[3*in+] - D[9*in+5]*.e; D[9*in+]=.e; D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; If the node in is included in Zmax, displacement in Z- direction is equal to. i.e. B[3*in+]=. (in= ~N-) In FORTRAN: B[3*in-]=. (in=, N)

100 Parallel FEM 3D- MAT_ASS_BC (3/9) for(in=;in<np;in++){ is= indexl[in]; ie= indexl[in+]; for(k=is;k<=ie;k++){ if (IWKX[itemL[k]][] == ) { B[3*in ]= B[3*in ] - AL[9*k+]*.e; B[3*in+]= B[3*in+] - AL[9*k+5]*.e; B[3*in+]= B[3*in+] - AL[9*k+8]*.e; AL[9*k+]=.e; AL[9*k+5]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<=ie;k++){ if (IWKX[itemU[k]][] == ) { B[3*in ]= B[3*in ] - AU[9*k+]*.e; B[3*in+]= B[3*in+] - AU[9*k+5]*.e; B[3*in+]= B[3*in+] - AU[9*k+8]*.e; AU[9*k+]=.e; AU[9*k+5]=.e; AU[9*k+8]=.e;

101 Parallel FEM 3D- Global Matrix i p j p i p i p j p j p i p j p

102 Parallel FEM 3D- Same Procedure as D case Corresponding Row: Diag. Component=, Other Comp.= i p j p i p i p j p j p i p j p

103 Parallel FEM 3D- 3 Same Procedure as D case Corresponding Column: Move to RHS, Off-Diag. Component= i p j p i p i p j p j p i p j p

104 Parallel FEM 3D- 4 MAT_ASS_BC (/9) /** **/ Z=Zmax for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmax") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ B[3*in ]= B[3*in ] - D[9*in+]*.e; B[3*in+]= B[3*in+] - D[9*in+5]*.e; D[9*in+]=.e; D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; If the node in is included in the node group Zmax IWKX[in-][]= where in is global node ID starting from.

105 Parallel FEM 3D- 5 MAT_ASS_BC (/9) /** **/ Z=Zmax for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmax") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ B[3*in ]= B[3*in ] - D[9*in+]*.e; B[3*in+]= B[3*in+] - D[9*in+5]*.e; D[9*in+]=.e; Modify: B[3*in], B[3*in+] Then, clear: D[9*in+6],D[9*in+7] D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; i p j p i p j p i p j p i p j p

106 Parallel FEM 3D- 6 MAT_ASS_BC (/9) /** **/ Z=Zmax for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmax") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; i p i p j p i p for(in=;in<np;in++){ if( IWKX[in][] == ){ B[3*in ]= B[3*in ] - D[9*in+]*.e; B[3*in+]= B[3*in+] - D[9*in+5]*.e; D[9*in+]=.e; D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; j p i p j p j p

107 Parallel FEM 3D- 7 MAT_ASS_BC (3/9) for(in=;in<np;in++){ is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ if (IWKX[itemL[k]][] == ) { B[3*in ]= B[3*in ] - AL[9*k+]*.e; B[3*in+]= B[3*in+] - AL[9*k+5]*.e; B[3*in+]= B[3*in+] - AL[9*k+8]*.e; AL[9*k+]=.e; AL[9*k+5]=.e; is= indexu[in]; ie= indexu[in+]; AL[9*k+8]=.e; Modify RHS, then clear AL and AU for(k=is;k<ie;k++){ if (IWKX[itemU[k]][] == ) { B[3*in ]= B[3*in ] - AU[9*k+]*.e; B[3*in+]= B[3*in+] - AU[9*k+5]*.e; B[3*in+]= B[3*in+] - AU[9*k+8]*.e; AU[9*k+]=.e; AU[9*k+5]=.e; AU[9*k+8]=.e; i p i p j p j p i p j p

108 Parallel FEM 3D- 8 MAT_ASS_BC (4/9) /** **/ Z=Zmin for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"zmin") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ D[9*in+]=.e; D[9*in+5]=.e; D[9*in+6]=.e; D[9*in+7]=.e; D[9*in+8]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+6]=.e; AL[9*k+7]=.e; AL[9*k+8]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+6]=.e; AU[9*k+7]=.e; AU[9*k+8]=.e; w=@zmin

109 Parallel FEM 3D- 9 MAT_ASS_BC (5/9) for(in=;in<np;in++){ is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ if (IWKX[itemL[k]][] == ) { AL[9*k+]=.e; is= indexu[in]; ie= indexu[in+]; AL[9*k+5]=.e; AL[9*k+8]=.e; w=@zmin for(k=is;k<ie;k++){ if (IWKX[itemU[k]][] == ) { AU[9*k+]=.e; AU[9*k+5]=.e; AU[9*k+8]=.e;

110 Parallel FEM 3D- MAT_ASS_BC (6/9) /** **/ X=Xmin for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"xmin") == ) break; for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ D[9*in ]=.e; D[9*in+]=.e; D[9*in+]=.e; D[9*in+3]=.e; D[9*in+6]=.e; B[3*in ]=.e; u=@xmin is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k ]=.e; AL[9*k+]=.e; AL[9*k+]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k ]=.e; AU[9*k+]=.e; AU[9*k+]=.e;

111 Parallel FEM 3D- MAT_ASS_BC (7/9) for(in=;in<np;in++){ is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ if (IWKX[itemL[k]][] == ) { AL[9*k ]=.e; AL[9*k+3]=.e; AL[9*k+6]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ if (IWKX[itemU[k]][] == ) { AU[9*k ]=.e; AU[9*k+3]=.e; AU[9*k+6]=.e; u=@xmin

112 Parallel FEM 3D- MAT_ASS_BC (8/9) /** **/ Y=Ymin for(in=;in<np;in++) IWKX[in][]=; ib=-; for( ib=;ib<nodgrptot;ib++){ if( strcmp(nodgrp_name[ib].name,"ymin") == ) break; v=@ymin for( ib=nodgrp_index[ib];ib<nodgrp_index[ib+];ib++){ in=nodgrp_item[ib]; IWKX[in-][]=; for(in=;in<np;in++){ if( IWKX[in][] == ){ D[9*in+]=.e; D[9*in+4]=.e; D[9*in+7]=.e; D[9*in+3]=.e; D[9*in+5]=.e; B[3*in+]=.e; is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ AL[9*k+3]=.e; AL[9*k+4]=.e; AL[9*k+5]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ AU[9*k+3]=.e; AU[9*k+4]=.e; AU[9*k+5]=.e;

113 Parallel FEM 3D- 3 MAT_ASS_BC (9/9) for(in=;in<np;in++){ is= indexl[in]; ie= indexl[in+]; for(k=is;k<ie;k++){ if (IWKX[itemL[k]][] == ) { AL[9*k+]=.e; AL[9*k+4]=.e; AL[9*k+7]=.e; is= indexu[in]; ie= indexu[in+]; for(k=is;k<ie;k++){ if (IWKX[itemU[k]][] == ) { AU[9*k+]=.e; AU[9*k+4]=.e; AU[9*k+7]=.e; v=@ymin

114 Parallel FEM 3D- 4 Parallel FEM Procedures: Program Initialiation Control Data Node, Connectivity of Elements (N: Node#, NE: Elem#) Initialiation of Arrays (Global/Element Matrices) Element-Global Matrix Mapping (Index, Item) Generation of Matrix Element-by-Element Operations (do icel=, NE) Element matrices Accumulation to global matrix Boundary Conditions Linear Solver Conjugate Gradient Method Calculation of Stress

115 Parallel FEM 3D- test_p main input_cntl Input of control data input_grid Input of mesh info find_node searching nodes mat_con connectivity of matrix msort sorting mat_con connectivity of matrix mat_ass_main coefficient matrix jacobi Jacobian mat_ass_bc boundary conditions Structure of fem3d (parallel) solve33 control of linear solver recover_stress stress calculation output_ucd visualiation cg_3 CG solver jacobi Jacobian

116 Parallel FEM 3D- 6 Main Part #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" //#include "solver33.h" extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON(); extern void MAT_CON(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void OUTPUT_UCD(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; PFEM_INIT(argc,argv); INPUT_CNTL(); INPUT_GRID(); MAT_CON(); MAT_CON(); MAT_ASS_MAIN(); MAT_ASS_BC() ; SOLVE33(); RECOVER_STRESS(); OUTPUT_UCD() ; PFEM_FINALIZE() ;

117 Parallel FEM 3D- 7 Some Features of fem3d Non-Zero Off-Diagonals Upper/Lower triangular components are stored separately. U Stored as Block Vector: 3-components per node Matrix: 9-components per block Processed as block based on 3 variables on each node (not each component of matrix) L

118 Parallel FEM 3D- 8 Storing 3x3 Block (/3) Less memory requirement Index, Item i j i i j j i j

119 Parallel FEM 3D- 9 Storing 3x3 Block (/3) Computational Efficiency Ratio of (Comptation/Indirect Memory Access) is larger >x speed-up both for vector/scalar processors Contiguous memory access, Cache Utiliation, Larger do i=, Flop/Byte 3*N Y(i)= D(i)*X(i) do k= index(i-)+, index(i) kk= item(k) Y(i)= Y(i) + AMAT(k)*X(k) enddo enddo do i=, N X= X(3*i-) X= X(3*i-) X3= X(3*i) Y(3*i-)= D(9*i-8)*X+D(9*i-7)*X+D(9*i-6)*X3 Y(3*i-)= D(9*i-5)*X+D(9*i-4)*X+D(9*i-3)*X3 Y(3*I )= D(9*i-)*X+D(9*i-)*X+D(9*I )*X3 do k= index(i-)+, index(i) kk= item(k) X= X(3*kk-) X= X(3*kk-) X3= X(3*kk) Y(3*i-)= Y(3*i-)+AMAT(9*k-8)*X+AMAT(9*k-7)*X & +AMAT(9*k-6)*X3 Y(3*i-)= Y(3*i-)+AMAT(9*k-5)*X+AMAT(9*k-4)*X & +AMAT(9*k-3)*X3 Y(3*I )= Y(3*I )+AMAT(9*k-)*X+AMAT(9*k-)*X & +AMAT(9*k )*X3 enddo enddo

120 Parallel FEM 3D- Storing 3x3 Block (3/3) Stabiliation of Computation Instead of division by diagonal components, full LU factoriation of 3x3 Diagonal Block is applied. Effective for ill-conditioned problems i j i j i i j j

121 Parallel FEM 3D- Definitions of Terms i j Block (Node): i i Component (DOF): j j i j

122 Parallel FEM 3D- Global Variables: pfem_util.h (/4) Name Type Sie I/O Definition fname C [8] I Name of mesh file N,NP I I # Node (N: Internal, NP: Internal + External) ICELTOT I I # Element NODGRPtot I I # Node Group XYZ R [NP][3] I Node Coordinates ICELNOD I [ICELTOT][8] I Element Connectivity NODGRP_INDEX I [NODGRPtot+] I # Node in each Node Group NODGRP_ITEM NODGRP_NAME I C8 [NODGRP_INDEX[N ODGRPTOT+]] [NODGRP_INDEX[N ODGRPTOT+]] NL, NU I O I I Node ID in each Node Group Name of NodeGroup # Upper/Lower Triangular Non-Zero Off-Diagonals at each node NPL, NPU I O # Upper/Lower Triangular Non-Zero Off-Diagonals D R [9*NP] O Diagonal Block of Global Matrix B,X R [3*NP] O RHS, Unknown Vector

123 Parallel FEM 3D- 3 Global Variables: pfem_util.h (/4) Name Type Sie I/O Definition ALUG R [9*NP] O Full LU factoriation of Diagonal Blocks D AL,AU R [9*NPL],[9*NPU] O Upper/Lower Triangular Components of Global Matrix indexl, indexu I [NP+] O # Non-Zero Off-Diagonal Blocks iteml, itemu I [NPL], [NPU] O Column ID of Non-Zero Off-Diagonal Blocks INL, INU I [NP] O Number of Off-Diagonal Blocks at Each Node IAL,IAU I [NP][NL],[NP][NU] O Off-Diagonal Blocks at Each Node IWKX I [NP][] O Work Arrays METHOD I I Iterative Method (fixed as ) PRECOND I I Preconditioning Method (: SSOR, : Block Diagonal Scaling) ITER, ITERactual I I Number of CG Iterations (MAX, Actual) RESID R I Convergence Criteria (fixed as.e-8) SIGMA_DIAG R I Coefficient for LU Factoriation (fixed as.) pfemiarray I [] O Integer Parameter Array pfemrarray R [] O Real Parameter Array

124 Parallel FEM 3D- 4 Global Variables: pfem_util.h (3/4) Name Typ e Sie I/O Definition O8th R I =.5 i i i PNQ, PNE, PNT R [][][8] O,, i ~ 8at each Gaussian Quad. Point POS, WEI R [] O NCOL, NCOL I [] O Work arrays for sorting Coordinates, Weighting Factor at each Gaussian Quad. Point SHAPE R [][][][8] O N i (i=~8) at each Gaussian Quad Point PNX, PNY, PNZ R [][][][8] O i i i,, i ~ 8 at each Gaussian Quad. Point DETJ R [][][] O ELAST, POISSON SIGMA_N, TAU_N Determinant of Jacobian Matrix at each Gaussian Quad. Point R I Young s Modulus, Poisson s Ratio R [N][3] O Normal/Shear Stress at each Node N N x N N y N N

125 Parallel FEM 3D- 5 Global Variables: pfem_util.h (4/4) Name Type Sie I/O Definition PETOT I O Number of PE s my_rank I O Process ID of MPI errno I O Error Flag NEIBPETOT I I Number of Neighbors NEIBPE I [NEIBPETOT] I ID of Neighbor IMPORT_INDEX EXPORT_INEDX I [NEIBPETOT+] I Sie of Import/Export Arrays for Communication Table IMPORT_ITEM I [NPimport] I EXPORT_ITEM I [NPexport] I Receiving Table (External Points) NPimport=IMPORT_INDEX[NEIBPETOT]) Sending Table (Boundary Points) NPexport=EXPORT_INDEX[NEIBPETOT]) ICELTOT_INT I I Number of Local Elements intelem_list I [ICELTOT_INT] I List of Local Elements iterpremax I I Number of ASDD Iterations

126 Preconditioned Iterative Solver e.g. CG method (Conjugate Gradient) Compute r () = b-[a]x () for i=,, solve [M] (i-) = r (i-) i- = r (i-) (i-) if i= p () = () else i- = i- / i- p (i) = (i-) + i- endif q (i) = [A]p (i) i = i- /p (i) q (i) x (i) = x (i-) + i p (i) r (i) = r (i-) - i q (i) check convergence r end p (i-) Dot products Matrix-vector multiplication Preconditioners DAXPY Parallel FEM 3D- 6

127 Localied SGS/SSOR Preconditioning SGS/SSOR: Global Operations (Forward/Backward Substitution) NOT suitable for parallel computing Ignoring effects of external points for preconditioning Block-Jacobi Localied Preconditioning WEAKER than original SGS/SSOR More PE s, more iterations L Parallel FEM 3D- 7 r U!C!C !C {= [Minv]{r!C !C=== do i=, N W(i,Z)= W(i,R) enddo!c=== do i=, N WVAL= W(i,Z) do k= indexl(i-)+, indexl(i) WVAL= WVAL - AL(k) * W(itemL(k),Z) enddo W(i,Z)= WVAL / D(i) enddo do i= N,, - SW =.d do k= indexu(i), indexu(i-)+, - SW= SW + AU(k) * W(itemU(k),Z) enddo W(i,Z)= W(i,Z) SW / D(i) enddo

128 Localied SGS/SSOR Preconditioning A PE # PE # PE #3 PE #4 Considered : 3 Ignored : Parallel FEM 3D- 8

129 Parallel FEM 3D- 9 9 Overlapped Additive Schwart Domain Decomposition Method Stabiliation of Localied Preconditioning: ASDD Global Operation Local Operation Global Nesting Correction: Repeating -> Stable M r, r M r M ) ( n n n n M M r M ) ( n n n n M M r M

130 Parallel FEM 3D- 3 3 Overlapped Additive Schwart Domain Decomposition Method Stabiliation of Localied Preconditioning: ASDD Global Nesting Correction: Repeating -> Stable ) ( n n n n M M r M ) ( n n n n M M r M ) ( n n n n n n M M r M r M n n M M r r n n where r M

131 Overlapped Additive Schwart Domain Decomposition Method Effect of additive Schwart domain decomposition for solid mechanics example example with 3x44 3 DOF on Hitachi SR, Number of ASDD cycle/iteration=, = -8 NO Additive Schwart WITH Additive Schwart PE # Iter. # Sec. Speed Up Iter.# Sec. Speed Up Parallel FEM 3D- 3

132 Parallel FEM 3D- 3 SOLVE33(/4): INPUT_CNTL #include <stdio.h> #include <string.h> #include <math.h> #include "pfem_util.h" #include "allocate.h" extern FILE *fp_log; extern void CG_3(); void SOLVE33() { int i,j,k,ii,l; KREAL ALU[3][3]; KREAL PW[3]; double AL; int ERROR, ICFLAG=; CHAR_LENGTH BUF; /** **/ PARAMETERs ITER = pfemiarray[]; METHOD = pfemiarray[]; PRECOND = pfemiarray[]; NSET = pfemiarray[3]; iterpremax= pfemiarray[4]; NREST = pfemiarray[5]; not used RESID = pfemrarray[]; SIGMA_DIAG= pfemrarray[];. if( iterpremax < ) iterpremax= ; if (iterpremax > 4 ) iterpremax= 4;

133 Parallel FEM 3D Overlapped Additive Schwart Domain Decomposition Method Stabiliation of Localied Preconditioning: ASDD Local Operation (Forward/Backward Substitution) Global Nesting Correction:Repeating -> Stable ) ( n n n n M M r M ) ( n n n n M M r M do iterpre=, iterpremax enddo ) (. n n M M r M calc ) (. n n M M r M calc

134 Parallel FEM 3D- 34 SOLVE33 (/4) /** **/ BLOCK LUs if( ICFLAG == ){ ALUG =(KREAL*)allocate_vector(sieof(KREAL),9*N); ICFLAG= ; strcpy(buf.name,"### LINEAR SOLVER: 3x3 Block" ); if (METHOD == ) strcat(buf.name,"sscg"); if (METHOD == ) strcat(buf.name,"ssbicgstab"); if (PRECOND == ) { strcat(buf.name,"bilu()-no ASDD"); if (PRECOND!= ) strcat(buf.name,"block Scaling"); fprintf(stdout,"%s n",buf.name); fprintf(fp_log,"%s n",buf.name);

135 Parallel FEM 3D- 35 SOLVE33 (3/4) if( NSET == ){ for(i=;i<9*n;i++) ALUG[i]=.; for( ii=;ii<n;ii++){ ALU[][]= D[9*ii ]*SIGMA_DIAG; ALU[][]= D[9*ii+]; ALU[][]= D[9*ii+]; ALU[][]= D[9*ii+3]; ALU[][]= D[9*ii+4]*SIGMA_DIAG; ALU[][]= D[9*ii+5]; ALU[][]= D[9*ii+6]; ALU[][]= D[9*ii+7]; ALU[][]= D[9*ii+8]*SIGMA_DIAG; ALUG: full LU factoriation of D SIGMA_DIAG=. (INPUT_CNTL) for(k=;k<3;k++){ L=k; AL=fabs(ALU[L][k]); for( i=k+;i<3;i++){ if( fabs(alu[i][k]) > AL ){ L=i; AL=fabs(ALU[L][k]); ALU[k][k]=.e/ALU[k][k]; for(i=k+;i<3;i++){ ALU[i][k]*=ALU[k][k]; for(j=k+;j<3;j++){ PW[j]=ALU[i][j] - ALU[i][k]*ALU[k][j]; for(j=k+;j<3;j++){ ALU[i][j]=PW[j]; ALUG[9*ii ]=ALU[][]; ALUG[9*ii+]=ALU[][]; ALUG[9*ii+]=ALU[][]; ALUG[9*ii+3]=ALU[][]; ALUG[9*ii+4]=ALU[][]; ALUG[9*ii+5]=ALU[][]; ALUG[9*ii+6]=ALU[][]; ALUG[9*ii+7]=ALU[][]; ALUG[9*ii+8]=ALU[][];

136 Parallel FEM 3D- 36 SOLVE33 (3/4) if( NSET == ){ for(i=;i<9*n;i++) ALUG[i]=.; for( ii=;ii<n;ii++){ ALU[][]= D[9*ii ]*SIGMA_DIAG; ALU[][]= D[9*ii+]; ALU[][]= D[9*ii+]; ALU[][]= D[9*ii+3]; ALU[][]= D[9*ii+4]*SIGMA_DIAG; ALU[][]= D[9*ii+5]; ALU[][]= D[9*ii+6]; ALU[][]= D[9*ii+7]; ALU[][]= D[9*ii+8]*SIGMA_DIAG; ALUG: full LU factoriation of D SIGMA_DIAG=. (INPUT_CNTL) for(k=;k<3;k++){ L=k; AL=fabs(ALU[L][k]); for( i=k+;i<3;i++){ if( fabs(alu[i][k]) > AL ){ L=i; AL=fabs(ALU[L][k]); ALU[k][k]=.e/ALU[k][k]; for(i=k+;i<3;i++){ ALU[i][k]*=ALU[k][k]; for(j=k+;j<3;j++){ PW[j]=ALU[i][j] - ALU[i][k]*ALU[k][j]; for(j=k+;j<3;j++){ ALU[i][j]=PW[j]; ALUG[9*ii ]=ALU[][]; ALUG[9*ii+]=ALU[][]; ALUG[9*ii+]=ALU[][]; ALUG[9*ii+3]=ALU[][]; ALUG[9*ii+4]=ALU[][]; ALUG[9*ii+5]=ALU[][]; ALUG[9*ii+6]=ALU[][]; ALUG[9*ii+7]=ALU[][]; ALUG[9*ii+8]=ALU[][];

137 Parallel FEM 3D- 37 SOLVE33 (3/4) if( NSET == ){ for(i=;i<9*n;i++) ALUG[i]=.; for( ii=;ii<n;ii++){ ALU[][]= D[9*ii ]*SIGMA_DIAG; ALU[][]= D[9*ii+]; ALU[][]= D[9*ii+]; ALU[][]= D[9*ii+3]; ALU[][]= D[9*ii+4]*SIGMA_DIAG; ALU[][]= D[9*ii+5]; ALU[][]= D[9*ii+6]; ALU[][]= D[9*ii+7]; ALU[][]= D[9*ii+8]*SIGMA_DIAG; for(k=;k<3;k++){ L=k; AL=fabs(ALU[L][k]); for( i=k+;i<3;i++){ if( fabs(alu[i][k]) > AL ){ L=i; AL=fabs(ALU[L][k]); ALU[k][k]=.e/ALU[k][k]; for(i=k+;i<3;i++){ ALU[i][k]*=ALU[k][k]; for(j=k+;j<3;j++){ PW[j]=ALU[i][j] - ALU[i][k]*ALU[k][j]; for(j=k+;j<3;j++){ ALU[i][j]=PW[j]; ALUG[9*ii ]=ALU[][]; ALUG[9*ii+]=ALU[][]; ALUG[9*ii+]=ALU[][]; ALUG[9*ii+3]=ALU[][]; ALUG[9*ii+4]=ALU[][]; ALUG[9*ii+5]=ALU[][]; ALUG[9*ii+6]=ALU[][]; ALUG[9*ii+7]=ALU[][]; ALUG[9*ii+8]=ALU[][]; LU factoriation with Full Pivoting Component with the largest absolute value becomes pivot ALUG:full LU fact. of D

138 Parallel FEM 3D- 38 SOLVE33 (4/4) /*** ***/ ITERATIVE solver if (METHOD == ) { CG_3( N, NP, NPL, NPU, D, AL, indexl, iteml, AU, indexu, itemu, B, X, ALUG, RESID, ITER, &ERROR, my_rank, NEIBPETOT,NEIBPE,IMPORT_INDEX,IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, PRECOND, iterpremax); ITERactual= ITER;

139 Parallel FEM 3D- 39 CG (Initialiation) (/3) #include <stdio.h> #include <math.h> #include "mpi.h" #include "precision.h" #include "allocate.h" extern FILE *fp_log; extern void SOLVER_SEND_RECV_3 (); void CG_3( { KINT N,KINT NP,KINT NPL, KINT NPU,KREAL D[], KREAL AL[],KINT INL[], KINT IAL[], KREAL AU[],KINT INU[], KINT IAU[], KREAL B[],KREAL X[],KREAL ALU[], KREAL RESID,KINT ITER, KINT *ERROR,int my_rank, int NEIBPETOT,int NEIBPE[], int IMPORT_INDEX[], int IMPORT_ITEM[], int EXPORT_INDEX[], int EXPORT_ITEM[], KINT PRECOND, KINT iterpremax) int i,j,k; int iel,isl,ieu,isu; double X,X,X3; double WVAL,WVAL,WVAL3; double SW,SW,SW3; WV,WV,WV3; double BNRM,BNRM,DNRM,DNRM; double S_TIME,E_TIME; double ALPHA,BETA; double C,C,RHO,RHO,RHO; int iterpre; int indexa,indexb; KREAL *WS,*WR; KREAL **WW; KINT R=,Z=,Q=,P=,ZP=3; KINT MAXIT; KREAL TOL,W,SS; double COMPtime,COMMtime,R; double START_TIME,END_TIME;

140 Parallel FEM 3D- 4 Variables/Arrays Global I/R Sie CG_3 N,NP,NPL,NPU I N,NP,NPL,NPU D,B,X R [3*NP] D,B,X AL,AU R [9*NPL],[9*NPU] AL, AU indexl, indexu I [NP+] INL, INU iteml, itemu I [NPL], [NPU] IAL, IAU ALUG R [9*NP] ALU RESID R RESID ITER I ITER PRECOND I PRECOND iterpremax I iterpremax R [4][3*NP] WW

141 Parallel FEM 3D- 4 SOLVER_SEND_RECV_3 (/) #include <stdio.h> #include <math.h> #include "mpi.h" #include "precision.h" #include "allocate.h" static MPI_Status *sta,*sta; static MPI_Request *req,*req; static KINT NFLAG=; extern FILE *fp_log; void SOLVER_SEND_RECV_3( int N, int NEIBPETOT, int NEIBPE[], int IMPORT_INDEX[], int IMPORT_ITEM[], int EXPORT_INDEX[], int EXPORT_ITEM[], KREAL WS[], KREAL WR[], KREAL X[], int my_rank) { int ii,k,neib,istart,inum; /*** INIT. ***/ if( NFLAG == ){ sta=(mpi_status*)allocate_vector(sieof(mpi_status),neibpetot); sta=(mpi_status*)allocate_vector(sieof(mpi_status),neibpetot); req=(mpi_request*)allocate_vector(sieof(mpi_request),neibpetot); req=(mpi_request*)allocate_vector(sieof(mpi_request),neibpetot); NFLAG=; /*** SEND ***/ for( neib=;neib<=neibpetot;neib++){ istart=export_index[neib-]; inum =EXPORT_INDEX[neib]-istart; for( k=istart+;k<=istart+inum;k++){ ii=3*export_item[k-]; WS[3*k-3]=X[ii-3]; WS[3*k-]=X[ii-]; WS[3*k-]=X[ii-]; MPI_Isend(&WS[3*istart],3*inum,MPI_DOUBLE, NEIBPE[neib-],,MPI_COMM_WORLD,&req[neib-]);

142 Parallel FEM 3D- 4 SOLVER_SEND_RECV_3 (/) /*** ***/ RECEIVE for( neib=;neib<=neibpetot;neib++){ istart=import_index[neib-]; inum =IMPORT_INDEX[neib]-istart; MPI_Irecv(&WR[3*istart],3*inum,MPI_DOUBLE, NEIBPE[neib-],,MPI_COMM_WORLD,&req[neib-]); MPI_Waitall (NEIBPETOT, req, sta); for( neib=;neib<=neibpetot;neib++){ istart=import_index[neib-]; inum =IMPORT_INDEX[neib]-istart; for( k=istart+;k<=istart+inum;k++){ ii = 3*IMPORT_ITEM[k-]; X[ii-3]=WR[3*k-3]; X[ii-]=WR[3*k-]; X[ii-]=WR[3*k-]; MPI_Waitall (NEIBPETOT, req, sta);

143 Parallel FEM 3D- 43 CG (Initialiation) (/3) /*** ***/ INIT ERROR= ; COMPtime=.; COMMtime=.; WW=(KREAL**) allocate_matrix(sieof(kreal),4,3*np); WS=(KREAL* ) allocate_vector(sieof(kreal),3*np); WR=(KREAL* ) allocate_vector(sieof(kreal),3*np); MAXIT = ITER; TOL = RESID; for(i=;i<np;i++){ X[i]=.; for(i=;i<4;i++) for(j=;j<3*np;j++) WW[i][j]=.; for(i=;i<3*np;i++) WS[i]=.; for(i=;i<3*np;i++) WR[i]=.;

144 Parallel FEM 3D- 44 CG (Initialiation) (3/3) /*** ***/ INIT ERROR= ; COMPtime=.; COMMtime=.; WW=(KREAL**) allocate_matrix(sieof(kreal),4,3*np); WS=(KREAL* ) allocate_vector(sieof(kreal),3*np); WR=(KREAL* ) allocate_vector(sieof(kreal),3*np); Sending Buffer Receiving Buffer MAXIT = ITER; TOL = RESID; /** for(i=;i<np;i++){ X[i]=.; for(i=;i<4;i++) for(j=;j<3*np;j++) WW[i][j]=.; for(i=;i<3*np;i++) WS[i]=.; for(i=;i<3*np;i++) WR[i]=.;

145 Parallel FEM 3D- 45 /** {= [Minv]{r **/ if( PRECOND == ){ SGS/SSOR (/5) /** Block SSOR **/ for( i=;i<n;i++){ WW[ZP][3*i ]=WW[R][3*i ]; WW[ZP][3*i+]=WW[R][3*i+]; WW[ZP][3*i+]=WW[R][3*i+]; for( i=;i<np;i++){ WW[Z][3*i ]=.e; WW[Z][3*i+]=.e; WW[Z][3*i+]=.e; for(iterpre=;iterpre<iterpremax;iterpre++){ for( i=n;i<np;i++){ WW[ZP][3*i ]=.; WW[ZP][3*i+]=.; WW[ZP][3*i+]=.; loops for ASDD

146 Parallel FEM 3D- 46 /** FORWARD **/ for( i=;i<n;i++){ SW= WW[ZP][3*i ]; SW= WW[ZP][3*i+]; SW3= WW[ZP][3*i+]; SGS/SSOR (/5) indexl INL iteml IAL isl=inl[i]; iel=inl[i+]; for(j=isl;j<iel;j++){ k=ial[j]; X= WW[ZP][3*k ]; X= WW[ZP][3*k+]; X3= WW[ZP][3*k+]; SW+= - AL[9*j ]*X - AL[9*j+]*X - AL[9*j+]*X3; SW+= - AL[9*j+3]*X - AL[9*j+4]*X - AL[9*j+5]*X3; SW3+= - AL[9*j+6]*X - AL[9*j+7]*X - AL[9*j+8]*X3; X= SW; X= SW; X3= SW3; X= X - ALU[9*i+3]*X; X3= X3 - ALU[9*i+6]*X - ALU[9*i+7]*X; X3= ALU[9*i+8]* X3; X= ALU[9*i+4]*( X - ALU[9*i+5]*X3 ); X= ALU[9*i ]*( X - ALU[9*i+]*X3 - ALU[9*i+]*X); L r WW[ZP][3*i ]= X; WW[ZP][3*i+]= X; WW[ZP][3*i+]= X3; calc. calc. M M ( r ( r M M n n M M ) n n )

147 Parallel FEM 3D- 47 /** FORWARD **/ SGS/SSOR (/5): Initial for( i=;i<n;i++){ SW= WW[ZP][3*i ]; SW= WW[ZP][3*i+]; SW3= WW[ZP][3*i+]; indexl INL iteml IAL isl=inl[i]; iel=inl[i+]; for(j=isl;j<iel;j++){ k=ial[j]; X= WW[ZP][3*k ]; X= WW[ZP][3*k+]; X3= WW[ZP][3*k+]; SW+= - AL[9*j ]*X - AL[9*j+]*X - AL[9*j+]*X3; SW+= - AL[9*j+3]*X - AL[9*j+4]*X - AL[9*j+5]*X3; SW3+= - AL[9*j+6]*X - AL[9*j+7]*X - AL[9*j+8]*X3; L r X= SW; X= SW; X3= SW3; X= X - ALU[9*i+3]*X; X3= X3 - ALU[9*i+6]*X - ALU[9*i+7]*X; X3= ALU[9*i+8]* X3; X= ALU[9*i+4]*( X - ALU[9*i+5]*X3 ); X= ALU[9*i ]*( X - ALU[9*i+]*X3 - ALU[9*i+]*X); WW[ZP][3*i ]= X; WW[ZP][3*i+]= X; WW[ZP][3*i+]= X3; calc calc. M r. M r

148 Parallel FEM 3D- 48 /** BACKWARD **/ for(i=n-;i>=;i--){ isu= INU[i]; ieu= INU[i+]; SW=.e; SW=.e; SW3=.e; for(j=isu;j<ieu;j++){ k=iau[j]; SGS/SSOR (3/5) indexu INU itemu IAU U X=WW[ZP][3*k ]; X=WW[ZP][3*k+]; X3=WW[ZP][3*k+]; SW+= + AU[9*j ]*X + AU[9*j+]*X + AU[9*j+]*X3; SW+= + AU[9*j+3]*X + AU[9*j+4]*X + AU[9*j+5]*X3; SW3+= + AU[9*j+6]*X + AU[9*j+7]*X + AU[9*j+8]*X3; X= SW; X= SW; X3= SW3; X= X - ALU[9*i+3]*X; X3= X3 - ALU[9*i+6]*X - ALU[9*i+7]*X; X3= ALU[9*i+8]* X3; X= ALU[9*i+4]*( X - ALU[9*i+5]*X3 ); X= ALU[9*i ]*( X - ALU[9*i+]*X3 - ALU[9*i+]*X); WW[ZP][3*i ]+= -X; WW[ZP][3*i+]+= -X; WW[ZP][3*i+]+= -X3; calc. calc. M M ( r ( r M M n n M M n n ) )

149 Parallel FEM 3D- 49 SGS/SSOR (4/5) /** additive Schwart **/ SOLVER_SEND_RECV_3 ( NP, NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX, EXPORT_ITEM, WS, WR, WW[ZP], my_rank); indexa= ZP; indexb= Z; for( i=;i<np;i++){ WW[indexB][3*i ]+=WW[indexA][3*i ]; WW[indexB][3*i+]+=WW[indexA][3*i+]; WW[indexB][3*i+]+=WW[indexA][3*i+]; if (iterpre == iterpremax ) goto LABEL75; ) ( n n n n M M r M ) ( n n n n M M r M

150 Parallel FEM 3D- 5 SGS/SSOR (5/5) for( j=;j<n;j++){ X= WW[indexB][3*j ]; X= WW[indexB][3*j+]; X3= WW[indexB][3*j+]; WV= WW[R][3*j ] - D[9*j ]*X - D[9*j+]*X - D[9*j+]*X3; WV= WW[R][3*j+] - D[9*j+3]*X - D[9*j+4]*X - D[9*j+5]*X3; WV3= WW[R][3*j+] - D[9*j+6]*X - D[9*j+7]*X - D[9*j+8]*X3; for(k=inl[j];k<inl[j+];k++){ i=ial[k]; X= WW[indexB][3*i ]; X= WW[indexB][3*i+]; X3= WW[indexB][3*i+]; WV+= - AL[9*k ]*X - AL[9*k+]*X - AL[9*k+]*X3; WV+= - AL[9*k+3]*X - AL[9*k+4]*X - AL[9*k+5]*X3; WV3+= - AL[9*k+6]*X - AL[9*k+7]*X - AL[9*k+8]*X3; for(k=inu[j];k<inu[j+];k++){ i=iau[k]; X= WW[indexB][3*i ]; X= WW[indexB][3*i+]; X3= WW[indexB][3*i+]; WV+= - AU[9*k ]*X - AU[9*k+]*X - AU[9*k+]*X3; WV+= - AU[9*k+3]*X - AU[9*k+4]*X - AU[9*k+5]*X3; WV3+= - AU[9*k+6]*X - AU[9*k+7]*X - AU[9*k+8]*X3; WW[indexA][3*j ]=WV; WW[indexA][3*j+]=WV; WW[indexA][3*j+]=WV3; LABEL75: n continue; n M ( r M n M n n n n M ( r M M ) n )

151 Parallel FEM 3D- 5 Block Scaling /** **/ if (PRECOND!= ) { Block SCALING for(i=;i<n;i++){ WW[3*i ][Z]=WW[3*i ][R]; WW[3*i+][Z]=WW[3*i+][R]; WW[3*i+][Z]=WW[3*i+][R]; Forward/Backward Substitution by LU factoriation of D (ALU) for(i=;i<n;i++){ X=WW[3*i ][Z]; X=WW[3*i+][Z]; X3=WW[3*i+][Z]; X= X - ALU[9*i+3]*X; X3= X3 - ALU[9*i+6]*X - ALU[9*i+7]*X; X3= ALU[9*i+8]* X3; X= ALU[9*i+4]*( X - ALU[9*i+5]*X3 ); X= ALU[9*i ]*( X - ALU[9*i+]*X3 - ALU[9*i+]*X); WW[3*i ][Z]= X; WW[3*i+][Z]= X; WW[3*i+][Z]= X3;

152 Parallel FEM 3D- 5 Sparse Matrix-Vector Multiplication /*** {q= [A]{p ***/ SOLVER_SEND_RECV_3 ( NP, NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX, EXPORT_ITEM, WS, WR, WW[P], my_rank); for( j=;j<n;j++){ X=WW[P][3*j ]; X=WW[P][3*j+]; X3=WW[P][3*j+]; WVAL= D[9*j ]*X + D[9*j+]*X + D[9*j+]*X3; WVAL= D[9*j+3]*X + D[9*j+4]*X + D[9*j+5]*X3; WVAL3= D[9*j+6]*X + D[9*j+7]*X + D[9*j+8]*X3; for(k=inl[j];k<inl[j+];k++){ i=ial[k]; X=WW[P][3*i ]; X=WW[P][3*i+]; X3=WW[P][3*i+]; WVAL+= AL[9*k ]*X + AL[9*k+]*X + AL[9*k+]*X3; WVAL+= AL[9*k+3]*X + AL[9*k+4]*X + AL[9*k+5]*X3; WVAL3+= AL[9*k+6]*X + AL[9*k+7]*X + AL[9*k+8]*X3; for(k=inu[j];k<inu[j+];k++){ i=iau[k]; X=WW[P][3*i ]; X=WW[P][3*i+]; X3=WW[P][3*i+]; WVAL+= AU[9*k ]*X + AU[9*k+]*X + AU[9*k+]*X3; WVAL+= AU[9*k+3]*X + AU[9*k+4]*X + AU[9*k+5]*X3; WVAL3+= AU[9*k+6]*X + AU[9*k+7]*X + AU[9*k+8]*X3; WW[Q][3*j ]=WVAL; WW[Q][3*j+]=WVAL; WW[Q][3*j+]=WVAL3;

153 Parallel FEM 3D- 53 DAXPY, Dot Products /*** {x= {x + ALPHA*{p {r= {r - ALPHA*{q ***/ for(i=;i<n;i++){ X[3*i ]+=ALPHA *WW[3*i ][P]; X[3*i+]+=ALPHA *WW[3*i+][P]; X[3*i+]+=ALPHA *WW[3*i+][P]; WW[3*i ][R]+= -ALPHA *WW[3*i ][Q]; WW[3*i+][R]+= -ALPHA *WW[3*i+][Q]; WW[3*i+][R]+= -ALPHA *WW[3*i+][Q]; DNRM=.e; for(i=;i<n;i++){ DNRM+= WW[3*i ][R]*WW[3*i ][R] + WW[3*i+][R]*WW[3*i+][R] + WW[3*i+][R]*WW[3*i+][R]; MPI_Allreduce (&DNRM,&DNRM,, MPI_DOUBLE,MPI_SUM, MPI_COMM_WORLD); RESID= sqrt(dnrm/bnrm);

154 Parallel FEM 3D- 54 Stress So far, displacement at each node has been computed. Stress is important from the engineering point of view!! stress is calculated by stress, which is derivative of displacement (or rate of displacement)

155 Parallel FEM 3D- 55 Strain-Stress Relationship x y xy y x x y xy y x E D D

156 Parallel FEM 3D- 56 Strain-Stress Relationship x y xy y x x y xy y x valb valb valb valx vala vala vala valx vala vala vala valx D D

157 Parallel FEM 3D- 57 Normal Strain - Displacement PQ P Q ε x x dx u u dx x dx x u dx u x y dy R P u / y P dx R v / x Q Q ε ε ε x y u x v y w x

158 Parallel FEM 3D- 58 Shear Strain - Displacement R R xy u y v x y dy u / y P v / x Q y x v w x w y u P dx Q x

159 Parallel FEM 3D- 59 Computation of Stress Components w y v x u E E y x x

160 Parallel FEM 3D- 6 Stress So far, displacement at each node has been computed. Stress is important from the engineering point of view!! stress is calculated by stress, which is derivative of displacement Accurate stress components are calculated at Gussian Quad. Points. Stress at nodes (vertices) Averaged Value

161 Parallel FEM 3D- 6 Computation of Stress Components w y v x u E E y x x V y x T V y x T V x T dv W N V N U N E N dv w v u E N dv N { { {,,,,,, Galerkin Method: Ave. Elem. Stress X Volume

162 Parallel FEM 3D- 6 D: Elem.-by-Elem. Integration: {f X j x x X i Ni, N dn dn i j j, L L dx L dx L L T x / L XAL X N dv XA dx x / L V Body Force : A:Sectional Area L:Length of Element

163 Parallel FEM 3D- 63 D: Stress Components at Node N V T dv x Volume of Contribution X Stress at each node from surrounding elements N V T dv Volume Contribution at each node x V N V N T dv T x dv Average Stress at Each Node

164 Parallel FEM 3D- 64 RECOVER_STRESS: Stress (/4) #include <math.h> #include "pfem_util.h" #include "allocate.h" extern void JACOBI(); extern void SOLVER_SEND_RECV_3(); void RECOVER_STRESS() { int i,k,kk,icel; int ie,je,ip,jp; int ipn,jpn,kpn; int iis,iie; double RB; double UUi,VVi,WWi,UUj,VVj,WWj; double valx,vala,valb,e,poi,vol,coef; int in,in,in3,in4,in5,in6,in7,in8; double X,X,X3,X4,X5,X6,X7,X8; double Y,Y,Y3,Y4,Y5,Y6,Y7,Y8; double Z,Z,Z3,Z4,Z5,Z6,Z7,Z8; double SHi,SHj; double EPS_xx,EPS_yy,EPS_,GAM_xy,GAM_x,GAM_y; KINT nodlocal[8]; SIGMA_N=(KREAL*)allocate_vector(sieof(KREAL),3*NP); TAU_N =(KREAL*)allocate_vector(sieof(KREAL),3*NP); for(i=;i<3*np;i++) for(i=;i<3*np;i++) for(i=;i<3*np;i++) SIGMA_N[i]=.; TAU_N[i]=.; B[i]=.; for( icel=;icel< ICELTOT;icel++){ E = ELAST; POI= POISSON; vala= POI / (.e-poi); valb= (.e-.e*poi)/(.e*(.e-poi)); valx= E* (.e-poi)/((.e+poi)*(.e-.e*poi)); vala= vala * valx; valb= valb * valx;

165 Parallel FEM 3D- 65 RECOVER_STRESS: Stress (/4) in= ICELNOD[icel][]; in= ICELNOD[icel][]; in3= ICELNOD[icel][]; in4= ICELNOD[icel][3]; in5= ICELNOD[icel][4]; in6= ICELNOD[icel][5]; in7= ICELNOD[icel][6]; in8= ICELNOD[icel][7]; nodlocal[]= in; nodlocal[]= in; nodlocal[]= in3; nodlocal[3]= in4; nodlocal[4]= in5; nodlocal[5]= in6; nodlocal[6]= in7; nodlocal[7]= in8; X= XYZ[in-][]; X= XYZ[in-][]; ( 中略 ) X7= XYZ[in7-][]; X8= XYZ[in8-][]; Y= XYZ[in-][]; Y= XYZ[in-][]; ( 中略 ) Y7= XYZ[in7-][]; Y8= XYZ[in8-][]; Z= XYZ[in-][]; Z= XYZ[in-][]; ( 中略 ) Z7= XYZ[in7-][]; Z8= XYZ[in8-][]; /** JACOBIAN & inv-jacobian **/ JACOBI (DETJ, PNQ, PNE, PNT, PNX, PNY, PNZ, X, X, X3, X4, X5, X6, X7, X8, Y, Y, Y3, Y4, Y5, Y6, Y7, Y8, Z, Z, Z3, Z4, Z5, Z6, Z7, Z8 );

166 Parallel FEM 3D- 66 RECOVER_STRESS: Stress (3/4) /** MATRIX **/ for(ie=;ie<8;ie++){ ip= nodlocal[ie]; for(je=;je<8;je++){ jp= nodlocal[je]; UUj= X[3*jp-3]; VVj= X[3*jp-]; WWj= X[3*jp-]; UUi= X[3*ip-3]; VVi= X[3*ip-]; WWi= X[3*ip-]; Displacement at each node EPS_xx= EPS_yy= EPS_= GAM_xy= GAM_x= GAM_y= GAM_xy= GAM_x= GAM_y=.e;.e;.e;.e;.e;.e;.e;.e;.e; VOL =.e; for( ipn=;ipn<;ipn++){ for( jpn=;jpn<;jpn++){ for( kpn=;kpn<;kpn++){ coef= fabs(detj[ipn][jpn][kpn])*wei[ipn]*wei[jpn]*wei[kpn]; SHi= SHAPE[ipn][jpn][kpn][ie] * coef; EPS_xx+= SHi*PNX[ipn][jpn][kpn][je]; EPS_yy+= SHi*PNY[ipn][jpn][kpn][je]; EPS_+= SHi*PNZ[ipn][jpn][kpn][je]; GAM_xy+= SHi*PNX[ipn][jpn][kpn][je] * VVj + SHi*PNY[ipn][jpn][kpn][je] * UUj; GAM_x+= SHi*PNX[ipn][jpn][kpn][je] * WWj + SHi*PNZ[ipn][jpn][kpn][je] * UUj; GAM_y+= SHi*PNY[ipn][jpn][kpn][je] * WWj + SHi*PNZ[ipn][jpn][kpn][je] * VVj; VOL = VOL + SHi;

167 Parallel FEM 3D- 67 RECOVER_STRESS: Stress (3/4) /** MATRIX **/ for(ie=;ie<8;ie++){ ip= nodlocal[ie]; for(je=;je<8;je++){ jp= nodlocal[je]; UUj= X[3*jp-3]; VVj= X[3*jp-]; WWj= X[3*jp-]; UUi= X[3*ip-3]; VVi= X[3*ip-]; WWi= X[3*ip-]; EPS_xx= EPS_yy= EPS_= GAM_xy= GAM_x= GAM_y= GAM_xy= GAM_x= GAM_y=.e;.e;.e;.e;.e;.e;.e;.e;.e; u v, x, x w, x N, x { U, u, y N, y { U, u, N, N, x { V, v, y N, y { V N { W, w N { W, x,, { U VOL =.e; for( ipn=;ipn<;ipn++){ for( jpn=;jpn<;jpn++){ for( kpn=;kpn<;kpn++){ coef= fabs(detj[ipn][jpn][kpn])*wei[ipn]*wei[jpn]*wei[kpn]; SHi= SHAPE[ipn][jpn][kpn][ie] * coef; EPS_xx+= SHi*PNX[ipn][jpn][kpn][je]; EPS_yy+= SHi*PNY[ipn][jpn][kpn][je]; EPS_+= SHi*PNZ[ipn][jpn][kpn][je]; GAM_xy+= SHi*PNX[ipn][jpn][kpn][je] * VVj + SHi*PNY[ipn][jpn][kpn][je] * UUj; GAM_x+= SHi*PNX[ipn][jpn][kpn][je] * WWj + SHi*PNZ[ipn][jpn][kpn][je] * UUj; GAM_y+= SHi*PNY[ipn][jpn][kpn][je] * WWj + SHi*PNZ[ipn][jpn][kpn][je] * VVj; VOL = VOL + SHi;

168 Parallel FEM 3D- 68 RECOVER_STRESS: Stress (4/4) EPS_xx= EPS_xx * UUj; EPS_yy= EPS_yy * VVj; EPS_= EPS_ * WWj; SIGMA_N[3*ip-3]+= valx*eps_xx + vala*eps_yy + vala*eps_; SIGMA_N[3*ip-]+= vala*eps_xx + valx*eps_yy + vala*eps_; SIGMA_N[3*ip-]+= vala*eps_xx + vala*eps_yy + valx*eps_; TAU_N[3*ip-3]+= GAM_xy*valB; TAU_N[3*ip-]+= GAM_x*valB; TAU_N[3*ip-]+= GAM_y*valB; if (ip==jp) B[ip-]+=VOL; N, x { U, u, y N, y { U, u, N, /**** NODAL VALUE ***/ v, x N, x { V, v, y N, y { V for(i=;i<n;i++){ RB=.e/B[i]; w, x N, x { W, w, N, { W SIGMA_N[3*i ]*=RB; SIGMA_N[3*i+]*=RB; SIGMA_N[3*i+]*=RB; TAU_N[3*i ]*=RB; TAU_N[3*i+]*=RB; TAU_N[3*i+]*=RB; /**** UPDATE ***/ WS=(KREAL*)allocate_vector(sieof(KREAL),3*NP); WR=(KREAL*)allocate_vector(sieof(KREAL),3*NP); SOLVER_SEND_RECV_3( NP,NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, WS,WR,SIGMA_N, my_rank); SOLVER_SEND_RECV_3( NP,NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, WS,WR,TAU_N, my_rank); u, x { U deallocate_vector((kreal*)ws); deallocate_vector((kreal*)wr);

169 Parallel FEM 3D- 69 RECOVER_STRESS: Stress (4/4) /**** NODAL VALUE ***/ for(i=;i<n;i++){ RB=.e/B[i]; SIGMA_N[3*I ]*=RB; SIGMA_N[3*i+]*=RB; SIGMA_N[3*i+]*=RB; TAU_N[3*i ]*=RB; TAU_N[3*i+]*=RB; TAU_N[3*i+]*=RB; /**** UPDATE ***/ EPS_xx= EPS_xx * UUj; EPS_yy= EPS_yy * VVj; EPS_= EPS_ * WWj; SIGMA_N[3*ip-3]+= valx*eps_xx + vala*eps_yy + vala*eps_; SIGMA_N[3*ip-]+= vala*eps_xx + valx*eps_yy + vala*eps_; SIGMA_N[3*ip-]+= vala*eps_xx + vala*eps_yy + valx*eps_; TAU_N[3*ip-3]+= GAM_xy*valB; TAU_N[3*ip-]+= GAM_x*valB; TAU_N[3*ip-]+= GAM_y*valB; if (ip==jp) B[ip-]+=VOL; x valx y vala vala xy y x vala valx vala WS=(KREAL*)allocate_vector(sieof(KREAL),3*NP); WR=(KREAL*)allocate_vector(sieof(KREAL),3*NP); SOLVER_SEND_RECV_3( NP,NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, WS,WR,SIGMA_N, my_rank); SOLVER_SEND_RECV_3( NP,NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, WS,WR,TAU_N, my_rank); vala vala valx valb valb x y xy y valb x deallocate_vector((kreal*)ws); deallocate_vector((kreal*)wr);

170 Parallel FEM 3D- 7 RECOVER_STRESS: Stress (4/4) EPS_xx= EPS_xx * UUj; EPS_yy= EPS_yy * VVj; EPS_= EPS_ * WWj; SIGMA_N[3*ip-3]+= valx*eps_xx + vala*eps_yy + vala*eps_; SIGMA_N[3*ip-]+= vala*eps_xx + valx*eps_yy + vala*eps_; SIGMA_N[3*ip-]+= vala*eps_xx + vala*eps_yy + valx*eps_; TAU_N[3*ip-3]+= GAM_xy*valB; TAU_N[3*ip-]+= GAM_x*valB; TAU_N[3*ip-]+= GAM_y*valB; if (ip==jp) B[ip-]+=VOL; /**** NODAL VALUE ***/ Only Internal Points for(i=;i<n;i++){ RB=.e/B[i]; SIGMA_N[3*I ]*=RB; SIGMA_N[3*i+]*=RB; SIGMA_N[3*i+]*=RB; TAU_N[3*i ]*=RB; TAU_N[3*i+]*=RB; TAU_N[3*i+]*=RB; /**** UPDATE ***/ WS=(KREAL*)allocate_vector(sieof(KREAL),3*NP); WR=(KREAL*)allocate_vector(sieof(KREAL),3*NP); SOLVER_SEND_RECV_3( NP,NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, WS,WR, SIGMA_N, my_rank); SOLVER_SEND_RECV_3( NP,NEIBPETOT, NEIBPE, IMPORT_INDEX, IMPORT_ITEM, EXPORT_INDEX,EXPORT_ITEM, WS,WR, TAU_N, my_rank); deallocate_vector((kreal*)ws); deallocate_vector((kreal*)wr);

171 Parallel FEM 3D- 7 Report #S3 (/) y x Implement and evaluate a 3D-linear-elastic-code solving the following problem: Cantilevered Beam with Rect. Section E=., (density)=.5, =.3 u=v=w=@x=

172 Parallel FEM 3D- 7 Report #S3 (/) Modify mesh generation program Modify parallel fem3d (BC, Body Force) Evaluation (Comparison with Analytical Solution, Effect of Mesh) Apply various numbers of Gaussian quad. points n=(fem3d),n=,n=3 Report Due on February 5 th (Sa), 4 at 7: ( is OK) Documents Report (Outline, Results, Discussions) (less than total pages) List of Source Code

173 Parallel FEM 3D- 73 Analytical Solution if L is sufficiently Large W 4EI 4 WL max 8EI x L x Lx 3L where L h 4 x L W E I disp. in -direction Length of Beam Density per Length Young s Modulus Geometrical Moment of Inertia (for rectangular section) I bh b 3 h

174 Parallel FEM 3D- 74 Gaussian Quadrature n= Quadrature Point:. Weighting Factor:. At Center of the Element

Microsoft PowerPoint - 08-pFEM3D-2C.ppt [互換モード]

Microsoft PowerPoint - 08-pFEM3D-2C.ppt [互換モード] 並列有限要素法による 三次元定常熱伝導解析プログラム (2/2)C 言語編 中島研吾東京大学情報基盤センター pfem3d-2 2 対象とする問題 : 三次元定常熱伝導 Z x T x T=0@Z=z max y NZ T y z 定常熱伝導 + 発熱 一様な熱伝導率 直方体 T z 一辺長さの立方体 ( 六面体 ) 要素 各方向にNX NY NZ 個 境界条件 Q x, y, z 0 X NY NX

More information

Microsoft PowerPoint - 08-pFEM3D-2F.ppt [互換モード]

Microsoft PowerPoint - 08-pFEM3D-2F.ppt [互換モード] 並列有限要素法による 三次元定常熱伝導解析プログラム (2/2)Fortran 編 中島研吾東京大学情報基盤センター pfem3d-2 2 対象とする問題 : 三次元定常熱伝導 Z x T x T=0@Z=z max y NZ T y z 定常熱伝導 + 発熱 一様な熱伝導率 直方体 T z 一辺長さ1の立方体 ( 六面体 ) 要素 各方向にNX NY NZ 個 境界条件 Q x, y, z 0 X

More information

Microsoft PowerPoint - 08-pFEM3D-2F.ppt [互換モード]

Microsoft PowerPoint - 08-pFEM3D-2F.ppt [互換モード] 並列有限要素法による 三次元定常熱伝導解析プログラム (2/2)Fortran 編 中島研吾東京大学情報基盤センター RIKEN AICS HPC Spring School 204 pfem3d-2 2 対象とする問題 : 三次元定常熱伝導 Z x T x T=0@Z=z max y NZ T y z 定常熱伝導 + 発熱 一様な熱伝導率 直方体 T z 一辺長さの立方体 ( 六面体 ) 要素 各方向にNX

More information

GeoFEM開発の経験から

GeoFEM開発の経験から FrontISTR における並列計算のしくみ < 領域分割に基づく並列 FEM> メッシュ分割 領域分割 領域分割 ( パーティショニングツール ) 全体制御 解析制御 メッシュ hecmw_ctrl.dat 境界条件 材料物性 計算制御パラメータ 可視化パラメータ 領域分割ツール 逐次計算 並列計算 Front ISTR FEM の主な演算 FrontISTR における並列計算のしくみ < 領域分割に基づく並列

More information

Microsoft PowerPoint - 07-pFEM3D-1.ppt [互換モード]

Microsoft PowerPoint - 07-pFEM3D-1.ppt [互換モード] 並列有限要素法による 三次元定常熱伝導解析プログラム (1/2) 中島研吾東京大学情報基盤センター RIKEN AICS HPC Spring School 201 pfem3d-1 2 fem3dの並列版 MPIによる並列化 扱うプログラム pfem3d-1 3 プログラムのインストール 実行 並列有限要素法の手順 領域分割とは? 本当の実行 データ構造 pfem3d-1 ファイルコピー on FX10

More information

Microsoft PowerPoint - 07-pFEM3D-1.ppt [互換モード]

Microsoft PowerPoint - 07-pFEM3D-1.ppt [互換モード] 並列有限要素法による 三次元定常熱伝導解析プログラム (1/2) 中島研吾東京大学情報基盤センター pfem3d-1 2 fem3dの並列版 MPIによる並列化 扱うプログラム pfem3d-1 3 プログラムのインストール 実行 並列有限要素法の手順 領域分割とは? 本当の実行 データ構造 pfem3d-1 4 ファイルコピー on FX10 FORTRAN ユーザー >$ cd ~/pfem >$

More information

Microsoft PowerPoint - 06-S2-ref-C.ppt [互換モード]

Microsoft PowerPoint - 06-S2-ref-C.ppt [互換モード] 並列有限要素法による 一次元定常熱伝導解析プログラム C 言語編 中島研吾東京大学情報基盤センター S2-ref 2 問題の概要, 実行方法 局所分散データの考え方 プログラムの説明 計算例 FEM1D 3 対象とする問題 : 一次元熱伝導問題 体積当たり一様発熱 Q x T x Q 0 x=0 (x min ) x= x max 一様な : 断面積 A, 熱伝導率 体積当たり一様発熱 ( 時間当たり

More information

並列有限要素法による 一次元定常熱伝導解析プログラム C 言語編 中島研吾 東京大学情報基盤センター

並列有限要素法による 一次元定常熱伝導解析プログラム C 言語編 中島研吾 東京大学情報基盤センター 並列有限要素法による 一次元定常熱伝導解析プログラム C 言語編 中島研吾 東京大学情報基盤センター S2-ref 2 問題の概要, 実行方法 プログラムの説明 計算例 FEM1D 3 対象とする問題 : 一次元熱伝導問題 体積当たり一様発熱 Q ɺ x T λ x + Qɺ = 0 x=0 (x min ) x= x max 一様な : 断面積 A, 熱伝導率 λ 体積当たり一様発熱 ( 時間当たり

More information

Microsoft PowerPoint - MPIprog-C2.ppt [互換モード]

Microsoft PowerPoint - MPIprog-C2.ppt [互換モード] MPI によるプログラミング概要 ( その ) C 言語編 RIKEN AICS HPC Summer School 01 中島研吾 ( 東大 情報基盤センター ) 横川三津夫 ( 神戸大学 計算科学教育センター ) 1 概要 MPI とは MPI の基礎 :Hello World 全体データと局所データ グループ通信 (Collective Communication) 1 対 1 通信 (Peer-to-Peer

More information

Microsoft PowerPoint - 06-S2-ref-F.pptx

Microsoft PowerPoint - 06-S2-ref-F.pptx 並列有限要素法による 一次元定常熱伝導解析プログラム Fortran 編 中島研吾東京大学情報基盤センター お試しアカウント付き講習会 MPI 応用編 : 並列有限要素法 S2-ref 2 問題の概要, 実行方法 プログラムの説明 計算例 FEM1D 3 対象とする問題 : 一次元熱伝導問題 体積当たり一様発熱 Q x T x Q 0 x=0 (x min ) x= x max 一様な : 断面積

More information

A Higher Weissenberg Number Analysis of Die-swell Flow of Viscoelastic Fluids Using a Decoupled Finite Element Method Iwata, Shuichi * 1/Aragaki, Tsut

A Higher Weissenberg Number Analysis of Die-swell Flow of Viscoelastic Fluids Using a Decoupled Finite Element Method Iwata, Shuichi * 1/Aragaki, Tsut A Higher Weissenberg Number Analysis of Die-swell Flow of Viscoelastic Fluids Using a Decoupled Finite Element Method Iwata, Shuichi * 1/Aragaki, Tsutomu * 1/Mori, Hideki * 1 Ishikawa, Satoshi * 1/Shin,

More information

第7章 有限要素法のプログラミング

第7章 有限要素法のプログラミング April 3, 2019 1 / 34 7.1 ( ) 2 Poisson 2 / 34 7.2 femfp.c [1] main( ) input( ) assem( ) ecm( ) f( ) solve( ) gs { solve( ) output( ) 3 / 34 7.3 fopen() #include FILE *fopen(char *fname, char

More information

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member (University of Tsukuba), Yasuharu Ohsawa, Member (Kobe

More information

h23w1.dvi

h23w1.dvi 24 I 24 2 8 10:00 12:30 1),. Do not open this problem booklet until the start of the examination is announced. 2) 3.. Answer the following 3 problems. Use the designated answer sheet for each problem.

More information

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf ("%s", str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf (%s, str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i ABC066 / ARC077 writer: nuip 2017 7 1 For International Readers: English editorial starts from page 8. A : ringring a + b b + c a + c a, b, c a + b + c 1 # include < stdio.h> 2 3 int main (){ 4 int a,

More information

Microsoft PowerPoint - GeoFEM.ppt [互換モード]

Microsoft PowerPoint - GeoFEM.ppt [互換モード] 三次元並列有限要素法への OpenMP/MPI ハイブリッド 並列プログラミングモデル適用 中島研吾東京大学情報基盤センター RIKEN AICS Spring School 2014 2 Hybrid 並列プログラミング スレッド並列 + メッセージパッシング OpenMP+ MPI CUDA + MPI, OpenACC + MPI 個人的には自動並列化 +MPI のことを ハイブリッド とは呼んでほしくない

More information

Microsoft PowerPoint - MPIprog-F2.ppt [互換モード]

Microsoft PowerPoint - MPIprog-F2.ppt [互換モード] MPI によるプログラミング概要 ( その ) Fortran 言語編 RIKEN AICS HPC Summer School 01 中島研吾 ( 東大 情報基盤センター ) 横川三津夫 ( 神戸大学 計算科学教育センター ) 1 概要 MPI とは MPI の基礎 :Hello World 全体データと局所データ グループ通信 (Collective Communication) 1 対 1 通信

More information

Developement of Plastic Collocation Method Extension of Plastic Node Method by Yukio Ueda, Member Masahiko Fujikubo, Member Masahiro Miura, Member Sum

Developement of Plastic Collocation Method Extension of Plastic Node Method by Yukio Ueda, Member Masahiko Fujikubo, Member Masahiro Miura, Member Sum Developement of Plastic Collocation Method Extension of Plastic Node Method by Yukio Ueda, Member Masahiko Fujikubo, Member Masahiro Miura, Member Summary Previously, the authors developed the plastic

More information

JFE.dvi

JFE.dvi ,, Department of Civil Engineering, Chuo University Kasuga 1-13-27, Bunkyo-ku, Tokyo 112 8551, JAPAN E-mail : atsu1005@kc.chuo-u.ac.jp E-mail : kawa@civil.chuo-u.ac.jp SATO KOGYO CO., LTD. 12-20, Nihonbashi-Honcho

More information

23 Study on Generation of Sudoku Problems with Fewer Clues

23 Study on Generation of Sudoku Problems with Fewer Clues 23 Study on Generation of Sudoku Problems with Fewer Clues 1120254 2012 3 1 9 9 21 18 i Abstract Study on Generation of Sudoku Problems with Fewer Clues Norimasa NASU Sudoku is puzzle a kind of pencil

More information

r07.dvi

r07.dvi 19 7 ( ) 2019.4.20 1 1.1 (data structure ( (dynamic data structure 1 malloc C free C (garbage collection GC C GC(conservative GC 2 1.2 data next p 3 5 7 9 p 3 5 7 9 p 3 5 7 9 1 1: (single linked list 1

More information

ohp07.dvi

ohp07.dvi 19 7 ( ) 2019.4.20 1 (data structure) ( ) (dynamic data structure) 1 malloc C free 1 (static data structure) 2 (2) C (garbage collection GC) C GC(conservative GC) 2 2 conservative GC 3 data next p 3 5

More information

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool

Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool Introduction Purpose This training course describes the configuration and session features of the High-performance Embedded Workshop (HEW), a key tool for developing software for embedded systems that

More information

120802_MPI.ppt

120802_MPI.ppt CPU CPU CPU CPU CPU SMP Symmetric MultiProcessing CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CP OpenMP MPI MPI CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU MPI MPI+OpenMP CPU CPU CPU CPU CPU CPU CPU CP

More information

Microsoft PowerPoint - 3D-FEM-2.ppt [互換モード]

Microsoft PowerPoint - 3D-FEM-2.ppt [互換モード] 三次元弾性解析コード (2/3) マトリクス生成 22 年夏学期 中島研吾 科学技術計算 Ⅰ(482-27) コンピュータ科学特別講義 Ⅰ(48-24) FEM3D-Prt2 2 初期化 有限要素法の処理 : プログラム 制御変数読み込み 座標読み込み 要素生成 (: 節点数 ICELTOT: 要素数 ) 配列初期化 ( 全体マトリクス 要素マトリクス ) 要素 全体マトリクスマッピング (IndeItem)

More information

4.1 % 7.5 %

4.1 % 7.5 % 2018 (412837) 4.1 % 7.5 % Abstract Recently, various methods for improving computial performance have been proposed. One of these various methods is Multi-core. Multi-core can execute processes in parallel

More information

MPI usage

MPI usage MPI (Version 0.99 2006 11 8 ) 1 1 MPI ( Message Passing Interface ) 1 1.1 MPI................................. 1 1.2............................... 2 1.2.1 MPI GATHER.......................... 2 1.2.2

More information

para02-2.dvi

para02-2.dvi 2002 2 2002 4 23 : MPI MPI 1 MPI MPI(Message Passing Interface) MPI UNIX Windows Machintosh OS, MPI 2 1 1 2 2.1 1 1 1 1 1 1 Fig. 1 A B C F Fig. 2 A B F Fig. 1 1 1 Fig. 2 2.2 Fig. 3 1 . Fig. 4 Fig. 3 Fig.

More information

joho09.ppt

joho09.ppt s M B e E s: (+ or -) M: B: (=2) e: E: ax 2 + bx + c = 0 y = ax 2 + bx + c x a, b y +/- [a, b] a, b y (a+b) / 2 1-2 1-3 x 1 A a, b y 1. 2. a, b 3. for Loop (b-a)/ 4. y=a*x*x + b*x + c 5. y==0.0 y (y2)

More information

コードのチューニング

コードのチューニング ハイブリッド並列 八木学 ( 理化学研究所計算科学研究機構 ) 謝辞 松本洋介氏 ( 千葉大学 ) KOBE HPC Spring School 2017 2017 年 3 月 14 日神戸大学計算科学教育センター MPI とは Message Passing Interface 分散メモリのプロセス間の通信規格(API) SPMD(Single Program Multi Data) が基本 -

More information

The Effect of the Circumferential Temperature Change on the Change in the Strain Energy of Carbon Steel during the Rotatory Bending Fatigue Test by Ch

The Effect of the Circumferential Temperature Change on the Change in the Strain Energy of Carbon Steel during the Rotatory Bending Fatigue Test by Ch The Effect of the Circumferential Temperature Change on the Change in the Strain Energy of Carbon Steel during the Rotatory Bending Fatigue Test by Chikara MINAMISAWA, Nozomu AOKI (Department of Mechanical

More information

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alternative approach using the Monte Carlo simulation to evaluate

More information

RX600 & RX200シリーズ アプリケーションノート RX用仮想EEPROM

RX600 & RX200シリーズ アプリケーションノート RX用仮想EEPROM R01AN0724JU0170 Rev.1.70 MCU EEPROM RX MCU 1 RX MCU EEPROM VEE VEE API MCU MCU API RX621 RX62N RX62T RX62G RX630 RX631 RX63N RX63T RX210 R01AN0724JU0170 Rev.1.70 Page 1 of 33 1.... 3 1.1... 3 1.2... 3

More information

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble 25 II 25 2 6 13:30 16:00 (1),. Do not open this problem boolet until the start of the examination is announced. (2) 3.. Answer the following 3 problems. Use the designated answer sheet for each problem.

More information

44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru

44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru 43 6 MPI MPI(Message Passing Interface) MPI 1CPU/1 PC Cluster MPICH[5] 6.1 MPI MPI MPI 1 : #include 2 : #include 3 : #include 4 : 5 : #include "mpi.h" 7 : int main(int argc,

More information

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2 CHLAC 1 2 3 3,. (CHLAC), 1).,.,, CHLAC,.,. Suspicious Behavior Detection based on CHLAC Method Hideaki Imanishi, 1 Toyohiro Hayashi, 2 Shuichi Enokida 3 and Toshiaki Ejima 3 We have proposed a method for

More information

±é½¬£²¡§£Í£Ð£É½éÊâ

±é½¬£²¡§£Í£Ð£É½éÊâ 2012 8 7 1 / 52 MPI Hello World I ( ) Hello World II ( ) I ( ) II ( ) ( sendrecv) π ( ) MPI fortran C wget http://www.na.scitec.kobe-u.ac.jp/ yaguchi/riken2012/enshu2.zip unzip enshu2.zip 2 / 52 FORTRAN

More information

Study on Application of the cos a Method to Neutron Stress Measurement Toshihiko SASAKI*3 and Yukio HIROSE Department of Materials Science and Enginee

Study on Application of the cos a Method to Neutron Stress Measurement Toshihiko SASAKI*3 and Yukio HIROSE Department of Materials Science and Enginee Study on Application of the cos a Method to Neutron Stress Measurement Toshihiko SASAKI*3 and Yukio HIROSE Department of Materials Science and Engineering, Kanazawa University, Kakuma-machi, Kanazawa-shi,

More information

alternating current component and two transient components. Both transient components are direct currents at starting of the motor and are sinusoidal

alternating current component and two transient components. Both transient components are direct currents at starting of the motor and are sinusoidal Inrush Current of Induction Motor on Applying Electric Power by Takao Itoi Abstract The transient currents flow into the windings of the induction motors when electric sources are suddenly applied to the

More information

,,,, : - i -

,,,, : - i - 2017 Future University Hakodate 2017 System Information Science Practice Group Report Project Name Manga engineering Group Name Literacy Manga /Project No. 19 /Project Leader 1015131 Kiyomasa Murakami

More information

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth Journal of Geography 116 (6) 749-758 2007 Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth Data: A Case Study of a Snow Survey in Chuetsu District,

More information

program.dvi

program.dvi 2001.06.19 1 programming semi ver.1.0 2001.06.19 1 GA SA 2 A 2.1 valuename = value value name = valuename # ; Fig. 1 #-----GA parameter popsize = 200 mutation rate = 0.01 crossover rate = 1.0 generation

More information

XcalableMP入門

XcalableMP入門 XcalableMP 1 HPC-Phys@, 2018 8 22 XcalableMP XMP XMP Lattice QCD!2 XMP MPI MPI!3 XMP 1/2 PCXMP MPI Fortran CCoarray C++ MPIMPI XMP OpenMP http://xcalablemp.org!4 XMP 2/2 SPMD (Single Program Multiple Data)

More information

Microsoft PowerPoint - FEM3D-C [互換モード]

Microsoft PowerPoint - FEM3D-C [互換モード] 有限要素法による 三次元定常熱伝導解析プログラム C 言語編 202 年夏季集中講義中島研吾 並列計算プログラミング (66-2057) 先端計算機演習 (66-4009) FEM3D 2 対象とする問題 : 三次元定常熱伝導 Z T T=0@Z= ma Z T 定常熱伝導 + 発熱 一様な熱伝導率 直方体 T 一辺長さの立方体 ( 六面体 ) 要素 各方向にX Y Z 個 境界条件 Q 0 X Y

More information

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science, Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science, Bunka Women's University, Shibuya-ku, Tokyo 151-8523

More information

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1 I117 II I117 PROGRAMMING PRACTICE II 2 MEMORY MANAGEMENT 2 Research Center for Advanced Computing Infrastructure (RCACI) / Yasuhiro Ohara yasu@jaist.ac.jp / SCHEDULE 1. 2011/06/07(Tue) / Basic of Programming

More information

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i kubostat2018d p.1 I 2018 (d) model selection and kubo@ees.hokudai.ac.jp http://goo.gl/76c4i 2018 06 25 : 2018 06 21 17:45 1 2 3 4 :? AIC : deviance model selection misunderstanding kubostat2018d (http://goo.gl/76c4i)

More information

Microsoft Word - PCM TL-Ed.4.4(特定電気用品適合性検査申込のご案内)

Microsoft Word - PCM TL-Ed.4.4(特定電気用品適合性検査申込のご案内) (2017.04 29 36 234 9 1 1. (1) 3 (2) 9 1 2 2. (1) 9 1 1 2 1 2 (2) 1 2 ( PSE-RE-101/205/306/405 2 PSE-RE-201 PSE-RE-301 PSE-RE-401 PSE-RE-302 PSE-RE-202 PSE-RE-303 PSE-RE-402 PSE-RE-203 PSE-RE-304 PSE-RE-403

More information

fx-9860G Manager PLUS_J

fx-9860G Manager PLUS_J fx-9860g J fx-9860g Manager PLUS http://edu.casio.jp k 1 k III 2 3 1. 2. 4 3. 4. 5 1. 2. 3. 4. 5. 1. 6 7 k 8 k 9 k 10 k 11 k k k 12 k k k 1 2 3 4 5 6 1 2 3 4 5 6 13 k 1 2 3 1 2 3 1 2 3 1 2 3 14 k a j.+-(),m1

More information

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth and Foot Breadth Akiko Yamamoto Fukuoka Women's University,

More information

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶· Rhpc COM-ONE 2015 R 27 12 5 1 / 29 1 2 Rhpc 3 forign MPI 4 Windows 5 2 / 29 1 2 Rhpc 3 forign MPI 4 Windows 5 3 / 29 Rhpc, R HPC Rhpc, ( ), snow..., Rhpc worker call Rhpc lapply 4 / 29 1 2 Rhpc 3 forign

More information

ohp03.dvi

ohp03.dvi 19 3 ( ) 2019.4.20 CS 1 (comand line arguments) Unix./a.out aa bbb ccc ( ) C main void int main(int argc, char *argv[]) {... 2 (2) argc argv argc ( ) argv (C char ) ( 1) argc 4 argv NULL. / a. o u t \0

More information

16.16%

16.16% 2017 (411824) 16.16% Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus,

More information

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L Vol. 48 No. 4 Apr. 2007 LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for Learning to Associate LAN Construction Skills with TCP/IP

More information

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for embedded systems that use microcontrollers (MCUs)

More information

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p 22 6 22 MPI MPI 1 1 2 2 3 MPI 3 4 7 4.1.................................. 7 4.2 ( )................................ 10 4.3 (Allreduce )................................. 12 5 14 5.1........................................

More information

はじめに

はじめに IT 1 NPO (IPEC) 55.7 29.5 Web TOEIC Nice to meet you. How are you doing? 1 type (2002 5 )66 15 1 IT Java (IZUMA, Tsuyuki) James Robinson James James James Oh, YOU are Tsuyuki! Finally, huh? What's going

More information

C

C C 1 2 1.1........................... 2 1.2........................ 2 1.3 make................................................ 3 1.4....................................... 5 1.4.1 strip................................................

More information

XJTAG

XJTAG LDRA/ T-VEC/ MetaEdit+ Domain Specific Modeling Ashling/Jtag ARC SmartCards LAUTERBACH /Jtag ARM PowerPC K MIPS XJTAG HW Domain-Specific Modeling Domain-Specific Modeling Software Technology 30 Copyright

More information

(check matrices and minimum distances) H : a check matrix of C the minimum distance d = (the minimum # of column vectors of H which are linearly depen

(check matrices and minimum distances) H : a check matrix of C the minimum distance d = (the minimum # of column vectors of H which are linearly depen Hamming (Hamming codes) c 1 # of the lines in F q c through the origin n = qc 1 q 1 Choose a direction vector h i for each line. No two vectors are colinear. A linearly dependent system of h i s consists

More information

新版明解C言語 実践編

新版明解C言語 実践編 2 List - "max.h" a, b max List - max "max.h" #define max(a, b) ((a) > (b)? (a) : (b)) max List -2 List -2 max #include "max.h" int x, y; printf("x"); printf("y"); scanf("%d", &x); scanf("%d", &y); printf("max(x,

More information

9 8 7 (x-1.0)*(x-1.0) *(x-1.0) (a) f(a) (b) f(a) Figure 1: f(a) a =1.0 (1) a 1.0 f(1.0)

9 8 7 (x-1.0)*(x-1.0) *(x-1.0) (a) f(a) (b) f(a) Figure 1: f(a) a =1.0 (1) a 1.0 f(1.0) E-mail: takio-kurita@aist.go.jp 1 ( ) CPU ( ) 2 1. a f(a) =(a 1.0) 2 (1) a ( ) 1(a) f(a) a (1) a f(a) a =2(a 1.0) (2) 2 0 a f(a) a =2(a 1.0) = 0 (3) 1 9 8 7 (x-1.0)*(x-1.0) 6 4 2.0*(x-1.0) 6 2 5 4 0 3-2

More information

soturon.dvi

soturon.dvi 12 Exploration Method of Various Routes with Genetic Algorithm 1010369 2001 2 5 ( Genetic Algorithm: GA ) GA 2 3 Dijkstra Dijkstra i Abstract Exploration Method of Various Routes with Genetic Algorithm

More information

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N Vol. 42 No. 6 June 2001 IREX-NE F 83.86 A Japanese Named Entity Extraction System Based on Building a Large-scale and High-quality Dictionary and Pattern-matching Rules Yoshikazu Takemoto, Toshikazu Fukushima

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

Title 生活年令による学級の等質化に関する研究 (1) - 生活年令と学業成績について - Author(s) 与那嶺, 松助 ; 東江, 康治 Citation 研究集録 (5): 33-47 Issue Date 1961-12 URL http://hdl.handle.net/20.500.12000/ Rights 46 STUDIES ON HOMOGENEOUS

More information

XMPによる並列化実装2

XMPによる並列化実装2 2 3 C Fortran Exercise 1 Exercise 2 Serial init.c init.f90 XMP xmp_init.c xmp_init.f90 Serial laplace.c laplace.f90 XMP xmp_laplace.c xmp_laplace.f90 #include int a[10]; program init integer

More information

エンタープライズサーチ・エンジンQ u i c k S o l u t i o n ® の開発

エンタープライズサーチ・エンジンQ u i c k S o l u t i o n ® の開発 Development of Enterprise Search Engine QuickSolution by Yoshinori Takenami, Masahiro Kishida and Yasuo Tanabe As document digitization and information sharing increase in enterprises, the volume of information

More information

24 Depth scaling of binocular stereopsis by observer s own movements

24 Depth scaling of binocular stereopsis by observer s own movements 24 Depth scaling of binocular stereopsis by observer s own movements 1130313 2013 3 1 3D 3D 3D 2 2 i Abstract Depth scaling of binocular stereopsis by observer s own movements It will become more usual

More information

Repatriation and International Development Assistance: Is the Relief-Development Continuum Becoming in the Chronic Political Emergencies? KOIZUMI Koichi In the 1990's the main focus of the global refugee

More information

double float

double float 2015 3 13 1 2 2 3 2.1.......................... 3 2.2............................. 3 3 4 3.1............................... 4 3.2 double float......................... 5 3.3 main.......................

More information

Present Situation and Problems on Aseismic Design of Pile Foundation By H. Hokugo, F. Ohsugi, A. Omika, S. Nomura, Y. Fukuda Concrete Journal, Vol. 29

Present Situation and Problems on Aseismic Design of Pile Foundation By H. Hokugo, F. Ohsugi, A. Omika, S. Nomura, Y. Fukuda Concrete Journal, Vol. 29 Present Situation and Problems on Aseismic Design of Pile Foundation By H. Hokugo, F. Ohsugi, A. Omika, S. Nomura, Y. Fukuda Concrete Journal, Vol. 29, No. 8, pp. 4-12, Aug. 1986 Synopsis The pile foundation

More information

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len( AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) 29 4 29 A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(b)-1] == c[0]: print( YES ) else: print( NO ) 1 B:

More information

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment 28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment 1170288 2017 2 28 Docker,.,,.,,.,,.,. Docker.,..,., Web, Web.,.,.,, CPU,,. i ., OS..,, OS, VirtualBox,.,

More information

Isogai, T., Building a dynamic correlation network for fat-tailed financial asset returns, Applied Network Science (7):-24, 206,

Isogai, T., Building a dynamic correlation network for fat-tailed financial asset returns, Applied Network Science (7):-24, 206, H28. (TMU) 206 8 29 / 34 2 3 4 5 6 Isogai, T., Building a dynamic correlation network for fat-tailed financial asset returns, Applied Network Science (7):-24, 206, http://link.springer.com/article/0.007/s409-06-0008-x

More information

WinHPC ppt

WinHPC ppt MPI.NET C# 2 2009 1 20 MPI.NET MPI.NET C# MPI.NET C# MPI MPI.NET 1 1 MPI.NET C# Hello World MPI.NET.NET Framework.NET C# API C# Microsoft.NET java.net (Visual Basic.NET Visual C++) C# class Helloworld

More information

超初心者用

超初心者用 3 1999 10 13 1. 2. hello.c printf( Hello, world! n ); cc hello.c a.out./a.out Hello, world printf( Hello, world! n ); 2 Hello, world printf n printf 3. ( ) int num; num = 100; num 100 100 num int num num

More information

200708_LesHouches_02.ppt

200708_LesHouches_02.ppt Numerical Methods for Geodynamo Simulation Akira Kageyama Earth Simulator Center, JAMSTEC, Japan Part 2 Geodynamo Simulations in a Sphere or a Spherical Shell Outline 1. Various numerical methods used

More information

NUMAの構成

NUMAの構成 メッセージパッシング プログラミング 天野 共有メモリ対メッセージパッシング 共有メモリモデル 共有変数を用いた単純な記述自動並列化コンパイラ簡単なディレクティブによる並列化 :OpenMP メッセージパッシング 形式検証が可能 ( ブロッキング ) 副作用がない ( 共有変数は副作用そのもの ) コストが小さい メッセージパッシングモデル 共有変数は使わない 共有メモリがないマシンでも実装可能 クラスタ

More information

2 [1] 7 5 C 2.1 (kikuchi-fem-mac ) input.dat (cat input.dat type input.dat ))

2 [1] 7 5 C 2.1 (kikuchi-fem-mac ) input.dat (cat input.dat type input.dat )) 2.3 2007 8, 2011 6 21, 2012 5 29 1 (1) (2) (3) 2 Ω Poisson u = f (in Ω), u = 0 (on Γ 1 ), u n = 0 (on Γ 2) f Γ 1, Γ 2 Γ := Ω n Γ Ω (1980) Fortran : kikuchi-fem-mac.tar.gz 1 ( fem, 6701) tar xzf kikuchi-fem-mac.tar.gz

More information

listings-ext

listings-ext (6) Python (2) ( ) ohsaki@kwansei.ac.jp 5 Python (2) 1 5.1 (statement)........................... 1 5.2 (scope)......................... 11 5.3 (subroutine).................... 14 5 Python (2) Python 5.1

More information

ON A FEW INFLUENCES OF THE DENTAL CARIES IN THE ELEMENTARY SCHOOL PUPIL BY Teruko KASAKURA, Naonobu IWAI, Sachio TAKADA Department of Hygiene, Nippon Dental College (Director: Prof. T. Niwa) The relationship

More information

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( ) 1,a) 2 4 WC C WC C Grading Student programs for visualizing progress in classroom Naito Hiroshi 1,a) Saito Takashi 2 Abstract: To grade student programs in Computer-Aided Assessment system, we propose

More information

Microsoft PowerPoint - omp-c-02.ppt [互換モード]

Microsoft PowerPoint - omp-c-02.ppt [互換モード] 科学技術計算のための マルチコアプログラミング入門 C 言語編第 Ⅱ 部 : オーダリング 中島研吾 東京大学情報基盤センター 2 データ依存性の解決策は? オーダリング (Ordering) について Red-Black,Multicolor(MC) Cuthill-McKee(CM),Reverse-CM(RCM) オーダリングと収束の関係 オーダリングの実装 オーダリング付 ICCG 法の実装

More information

The Evaluation on Impact Strength of Structural Elements by Means of Drop Weight Test Elastic Response and Elastic Limit by Hiroshi Maenaka, Member Sh

The Evaluation on Impact Strength of Structural Elements by Means of Drop Weight Test Elastic Response and Elastic Limit by Hiroshi Maenaka, Member Sh The Evaluation on Impact Strength of Structural Elements by Means of Drop Weight Test Elastic Response and Elastic Limit by Hiroshi Maenaka, Member Shigeru Kitamura, Member Masaaki Sakuma Genya Aoki, Member

More information

Fig. 3 Coordinate system and notation Fig. 1 The hydrodynamic force and wave measured system Fig. 2 Apparatus of model testing

Fig. 3 Coordinate system and notation Fig. 1 The hydrodynamic force and wave measured system Fig. 2 Apparatus of model testing The Hydrodynamic Force Acting on the Ship in a Following Sea (1 St Report) Summary by Yutaka Terao, Member Broaching phenomena are most likely to occur in a following sea to relative small and fast craft

More information

2008 ( 13 ) C LAPACK 2008 ( 13 )C LAPACK p. 1

2008 ( 13 ) C LAPACK 2008 ( 13 )C LAPACK p. 1 2008 ( 13 ) C LAPACK LAPACK p. 1 Q & A Euler http://phase.hpcc.jp/phase/mppack/long.pdf KNOPPIX MT (Mersenne Twister) SFMT., ( ) ( ) ( ) ( ). LAPACK p. 2 C C, main Asir ( Asir ) ( ) (,,...), LAPACK p.

More information

untitled

untitled II 4 Yacc Lex 2005 : 0 1 Yacc 20 Lex 1 20 traverse 1 %% 2 [0-9]+ { yylval.val = atoi((char*)yytext); return NUM; 3 "+" { return + ; 4 "*" { return * ; 5 "-" { return - ; 6 "/" { return / ; 7 [ \t] { /*

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

Microsoft PowerPoint - 03-FEM3D-C.ppt [互換モード]

Microsoft PowerPoint - 03-FEM3D-C.ppt [互換モード] 有限要素法による 三次元定常熱伝導解析プログラム C 言語編 中島研吾東京大学情報基盤センター FEM3D 2 対象とする問題 : 三次元定常熱伝導 Z T T=0@Z= ma Z T 定常熱伝導 + 発熱 一様な熱伝導率 直方体 T 一辺長さの立方体 ( 六面体 ) 要素 各方向にX Y Z 個 境界条件 Q 0 X Y X Y T=0@Z= ma 体積当たり発熱量は位置 ( メッシュの中心の座標

More information

平成 31 年度 筑波大学大学院博士課程 システム情報工学研究科 コンピュータサイエンス専攻 博士前期課程 ( 一般入学試験 8 月期 ) 試験問題基礎科目 ( 数学, 情報基礎 ) Mathematics/Fundamentals of Computer Science [ 注意事項 ][Inst

平成 31 年度 筑波大学大学院博士課程 システム情報工学研究科 コンピュータサイエンス専攻 博士前期課程 ( 一般入学試験 8 月期 ) 試験問題基礎科目 ( 数学, 情報基礎 ) Mathematics/Fundamentals of Computer Science [ 注意事項 ][Inst 平成 31 年度 筑波大学大学院博士課程 システム情報工学研究科 コンピュータサイエンス専攻 博士前期課程 ( 一般入学試験 8 月期 ) 試験問題基礎科目 ( 数学, 情報基礎 ) Mathematics/Fundamentals of Computer Science [ 注意事項 ][Instructions] 1. 試験開始の合図があるまで, 問題の中を見てはいけません. また, 筆記用具を手に持ってはいけません.

More information

浜松医科大学紀要

浜松医科大学紀要 On the Statistical Bias Found in the Horse Racing Data (1) Akio NODA Mathematics Abstract: The purpose of the present paper is to report what type of statistical bias the author has found in the horse

More information

3_23.dvi

3_23.dvi Vol. 52 No. 3 1234 1244 (Mar. 2011) 1 1 mixi 1 Casual Scheduling Management and Shared System Using Avatar Takashi Yoshino 1 and Takayuki Yamano 1 Conventional scheduling management and shared systems

More information

K02LE indd

K02LE indd I Linear Stage Table of Contents 1. Features... 1 2. Description of part Number... 1 3. Maximum Speed Limit... 2 4. Specifications... 3 5. Accuracy Grade... 4 6. Motor and Motor Adaptor Flange... 4 6-1

More information

練習&演習問題

練習&演習問題 練習問題 ファイル入出力 練習問題 1 ファイルへのデータ出力 配列 a[ ] の値をファイル data.txt に出力するプログラムを作成しなさい #include #include /* srand(), rand() */ #include /* time() */ int main(void) { int i; double a[5];

More information

r03.dvi

r03.dvi 19 ( ) 019.4.0 CS 1 (comand line arguments) Unix./a.out aa bbb ccc ( ) C main void... argc argv argc ( ) argv (C char ) ( 1) argc 4 argv NULL. / a. o u t \0 a a \0 b b b \0 c c c \0 1: // argdemo1.c ---

More information

スライド 1

スライド 1 Parallel Programming in MPI part 2 1 1 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation

More information

comment.dvi

comment.dvi ( ) (sample1.c) (sample1.c) 2 2 Nearest Neighbor 1 (2D-class1.dat) 2 (2D-class2.dat) (2D-test.dat) 3 Nearest Neighbor Nearest Neighbor ( 1) 2 1: NN 1 (sample1.c) /* -----------------------------------------------------------------

More information

「プログラミング言語」 SICP 第4章 ~超言語的抽象~ その6

「プログラミング言語」  SICP 第4章   ~超言語的抽象~   その6 SICP 4 6 igarashi@kuis.kyoto-u.ac.jp July 21, 2015 ( ) SICP 4 ( 6) July 21, 2015 1 / 30 4.3: Variations on a Scheme Non-deterministic Computing 4.3.1: amb 4.3.2: 4.3.3: amb ( ) SICP 4 ( 6) July 21, 2015

More information

2012年度HPCサマーセミナー_多田野.pptx

2012年度HPCサマーセミナー_多田野.pptx ! CCS HPC! I " tadano@cs.tsukuba.ac.jp" " 1 " " " " " " " 2 3 " " Ax = b" " " 4 Ax = b" A = a 11 a 12... a 1n a 21 a 22... a 2n...... a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5 Gauss LU

More information