ちょっとした実験

ちょっとした実験。配列へのアクセスパターンで実行時間がどのくらい変わるか。


#define N 10000
int a[N][N];
int main()
{
int i=0;
int j=0;

for(j=0;j
real 0m1.360s
user 0m1.045s
sys 0m0.308s


#define N 10000
int a[N][N];
int main()
{
int i=0;
int j=0;

for(j=0;j
real 0m0.514s
user 0m0.265s
sys 0m0.245s

3倍も変わるのが驚き

valgrindによる、キャッシュミスの測定もしてみた。まず前者


==7912== I refs: 800,175,357
==7912== I1 misses: 568
==7912== L2i misses: 564
==7912== I1 miss rate: 0.00%
==7912== L2i miss rate: 0.00%
==7912==
==7912== D refs: 500,096,605 (400,070,837 rd + 100,025,768 wr)
==7912== D1 misses: 100,001,142 ( 962 rd + 100,000,180 wr)
==7912== L2d misses: 6,261,058 ( 886 rd + 6,260,172 wr)
==7912== D1 miss rate: 19.9% ( 0.0% + 99.9% )
==7912== L2d miss rate: 1.2% ( 0.0% + 6.2% )
==7912==
==7912== L2 refs: 100,001,710 ( 1,530 rd + 100,000,180 wr)
==7912== L2 misses: 6,261,622 ( 1,450 rd + 6,260,172 wr)
==7912== L2 miss rate: 0.4% ( 0.0% + 6.2% )
次に後者

==7973== I refs: 106,678
==7973== I1 misses: 683
==7973== L2i misses: 678
==7973== I1 miss rate: 0.64%
==7973== L2i miss rate: 0.63%
==7973==
==7973== D refs: 56,049 (38,316 rd + 17,733 wr)
==7973== D1 misses: 784 ( 609 rd + 175 wr)
==7973== L2d misses: 725 ( 561 rd + 164 wr)
==7973== D1 miss rate: 1.3% ( 1.5% + 0.9% )
==7973== L2d miss rate: 1.2% ( 1.4% + 0.9% )
==7973==
==7973== L2 refs: 1,467 ( 1,292 rd + 175 wr)
==7973== L2 misses: 1,403 ( 1,239 rd + 164 wr)
==7973== L2 miss rate: 0.8% ( 0.8% + 0.9% )
D1キャッシュミスが19.9%=>1.3%にも激減している。