MIT6006 Lec02 Models of Computation

2018-11-03

What is model of computation?

Model of computation specifies

what operations an algorithm is allowed
cost (time, space, . . . ) of each operation
cost of algorithm = sum of operation costs

The followings are two kinds of models of computation.

Random Access Machine (RAM)

It is modeled by a big array.
For each word, it registers $\theta(1)$
In $\theta(1)$ time, it can
- load word
- compute (+,-,*,/,&,|,^) on registers
- store register $r_j$ into memory
What is a word:
- assume basic objects (e.g. int) fit in word
It is realistic and powerful → implement abstractions

Pointer machine

can dynamically allocated objects
object has $O(1)$ fields
field = word (e.g., int) or pointer to object/null (a.k.a. reference)
weaker than RAM

Python model

Python lets you use either mode of thinking, e.g.

“list” is acutually an array in RAM:
L[i] = L[j] + 5: this operation only costs $\theta(1)$ time (constant time)
object with $O(1)$ attributes (including references) is like a pointer machine
$x=x.next$ costs $\theta(1)$ time

The following are some operations and their costs in Python. To determine their cost, imagine implementation in terms of the above two models(RAM or Pointer).

list
(a) L.append(x): $\theta(1)$ time. (It uses table doubling)
(b) L = L1+L2 ≡
L = []: cost $\theta(1)$ time to build a list
for x in L1: L.append(x) costs $\theta(1)$. Totally in L1 is $\theta(|L1|)$
for x in L2: L.append(x) costs $\theta(1)$. Totally in L2 is $\theta(|L2|)$
Therefore, L = L1+L2 costs $\theta(1+|L1|+|L2|)$ time
(c) L1.extend(L2) ≡
for x in L2: L1.append(x) costs θ(1). Totally $\theta(|L2|)$
≡ L1+ = L2
Therefore, costs θ(1 + |L2|) time
(d) L2 = L1[i : j] ≡
L2 = []: θ(1)
for k in range(i, j): L2.append(L1[i]) costs θ(1)
Therefore, costs $θ(j − i + 1) = O(|L|)$
(e) len(L): $θ(1)$ time - since list stores its length in a field
(f) L.sort(): $θ(|L|log |L|)$ - via comparison sort
tuple, str: similar
dict: via hashing, costs $θ(1)$ time
set: similar (think of as dict without vals)
heapq: heappush & heappop - via heaps → $θ(log(n))$ time
long: via Karatsuba algorithm
x + y → O(|x| + |y|) time where |y| reflects # words
x ∗ y → O((|x| + |y|)log(3)) ≈ O((|x| + |y|)1.58) time

Document Distance Problem — compute d(D1, D2)

The problem is acutually to find similarity in documents, and have application in detecting duplicates, plagiarism, and also in web search (D2 = query).
In this problem, we define word as the sequence of alphanumeric characters, and document as a sequence of words (ignore space, punctuation).

The idea is to define distance in terms of shared words. Think of document D as a vector:
If three axis are defined as three words: the, dog, cat
Then vector v1 could be “the cat”, vector v2 could be “the dog”, vector v3 could be “cat dog”

After looking them as vectors, then we can apply mathematical methods to calculate the distance between these vectors like angle

The algorithm can be formed as follows:

split each document into words
count word frequencies (document vectors)
compute dot product (& divide)

展开全文 >>

Hexo deploy 出现问题

2018-10-30

也不知道怎么回事办公室的电脑突然重启了，可能是更新吧，但是更新完了回来发现hexo generate 和 deploy 并不能使用了
最开始报的错是 Node 不是内部或外部命令，一看是 node.js 的环境变量没有配置，很奇怪，之前明明是OK的，然后我就在环境变量里面配置了一下node，步骤如下：

右键我的电脑-属性-高级系统设置
找到下面有个环境变量的选择框，点进去
此时出来的对话框中有上下两个部分，上面的部分属于用户变量，找到里面有一行是 path，点击编辑
新建一个变量，把node.js所在的路径拷进去就行了，我的是 D:\Program Files\nodejs

然后 hexo generate 就可以了，deploy的时候发现出现以下问题：

FATAL Something's wrong. Maybe you can find the solution here: http://hexo.io/docs/troubleshooting.html
Error: spawn git ENOENT
at notFoundError (C:\Users\XXX\hexo\node_modules\cross-spawn\lib\enoent.js:11:11)
at verifyENOENT (C:\Users\XXX\hexo\node_modules\cross-spawn\lib\enoent.js:46:16)
at ChildProcess.cp.emit (C:\Users\XXX\hexo\node_modules\cross-spawn\lib\enoent.js:33:19)
at Process.ChildProcess._handle.onexit (internal/child_process.js:198:12)

那个XXX是hexo所在文件的路径，这里就不放出来了。
解决这个的问题最直接的办法就是直接在hexo的当前文件夹下，运行 git bash 然后在bash 处运行 hexo deploy…

展开全文 >>

Algorithm - 模拟退火 Simulated Annealing

2018-10-29

模拟退火（SA）是拿来找给定函数的近似全局最优的。一般当搜索空间是离散的时，经常使用它。模拟退火算法最初是受到金属加工时退火过程的启发来的。在真正的金属冶炼过程中，通过给金属降温，金属的形态就会固定。而在模拟退火中，我们会设置 “温度” 这个变量，来模拟这个过程。初始时，将其设置为高，然后在算法运行时让它慢慢“冷却”。当这个变量“温度”很高时，算法会更允许接受比当前解更差的解(即高温时，金属可以改变的力度和形状可以更大)。这能够让算法不局限在之前发现的任何local optimum上。随着温度的降低，接受更差解的机会就越小，因此允许算法逐渐集中在搜索空间的一个区域，来找到近最优解。这个渐进的“冷却”过程能够让模拟退火算法在处理包含大量局部最优解的大问题时非常有效地找到接近的最优解。

Acceptance Function:

决定一个solution是否被接受的步骤如下：

看下一个解是否优于当前解，如果是，则无条件接受
如果不是，根据以下几个方面考虑要不要接受：
- 下一个解有多差？
- 当前系统的 “温度” 有多高？
  数学表达式如下：
  $$exp(\frac{solutionEnergy - neighborEnergy}{Temperature})$$
  即：系统能量变得越小，温度越高，越会接受这个解。是否接受解根据一个随机的概率来的

算法描述如下：

设置温度变量初始值，随机初始化一个解
循环直到停止条件达到：一般为系统足够”冷却” 或最优的解已经找到
- 通过较小地改变当前解来确定下一个解
- 决定是否跳去下一个解（用上面的Acception function）
降低温度，继续循环

Pseudo Code

int temperature = m;
create a collection of best solutions found
while (temperature meets the condition){
    
    if (accept(curr_sol, next_sol, temperature)) curr_sol = next_sol;
    if (curr_sol < best) best = add curr_sol to best collection;
    temperature *= (1-coolingRate) 
}

accept(curr_sol, next_sol, temperature){
    if (next_sol < curr_sol) return true;
    return exp((curr_sol-next_col)/temperature) > rand() # 以一个随机的概率接受
}

Reference:
http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6
http://mathworld.wolfram.com/SimulatedAnnealing.html

展开全文 >>

DTI Basics

2018-10-23

You will also need to acquire a reference image with no diffusion information to calculate the ADC value (divide diffusion scan by reference scan). You can vary the amount of diffusion weighting (level of sensitivity to diffusion). The amount of diffusion weighting is measured by the b-value. If the b-value equals 0, there is no diffusion weighting (reference scan). If the diffusion value is very high, you can get greater resolution, but also more noise. The standard b-value for adults is 1000. In children, you typically use 600. If looking outside the brain, typically around 500.

converted to NIfTI format and includes the b-values and gradient idrections: .bval and .bvec

mean B0 image is not sensitive to diffusion direction.
in diffusion image, the brightness varies dramatically in the white matter depending on the alignment of the fibers. More diffusion yields a darker pixel because you lose signal as the water molecule can go anywhere. Higher ADC values mean it can go very far without interruption, and it takes the signal with it. This is not necessarily related to anisotropy.

Eddy current correction of DTI data is analogous to motion correction of fMRI data.

Processing Data with FSL’s FDT Diffusion

The reference volume is typically 0 (this is the volume with a b-value of 0, dcm2nii should automatically ensure that the initial volume is the volume with zero b-value, but you should ensure this is correct with your data which you can do by viewing the volumes with MRIcron).

The basic processing pipeline has the following elements:
Convert data from scanner to scalar image
Run distortion correction
EPI distortion correction with fieldmap/TOPUP
Eddy current correction
Brain extraction
Tensor fitting
Produce scalars (FA, MD, AD, RD, RGB)

Advanced processing:
Normalization
Scalar normalization, or
Tensor normalization
Fiber tracking
Deterministic, or
Probabilistic
Whole brain analysis
Voxel-based analysis, or
Track-based analysis

步骤：FA -> Tensors -> Fiber tracking -> white matter delineation

http://www.cabiatl.com/Resources/Course/tutorial/html/dti.html
http://brainimaging.waisman.wisc.edu/~tromp/DTI_101.pdf
http://www.diffusion-imaging.com/2015/10/dti-tutorial-1-from-scanner-to-tensor.html
https://mrtrix.readthedocs.io/en/latest/quantitative_structural_connectivity/ismrm_hcp_tutorial.html
http://dbic.dartmouth.edu/wiki/index.php/Diffusion_Tensor_Imaging_Analysis

展开全文 >>

Panda软件使用笔记

2018-10-19

开始使用之前

将Panda的文件夹安装在MATLAB的路径当中
MATLAB必须从 terminal 处打开，不能用shortcut的图标打开

文件格式准备

subject name folder - DTI folder - DTI files 这种格式
DTI files 应该有三个文件：B value file(bval), b vector file(bvec), nii file
用 MRIcron 可以直接将 Dicom 的原始文件 convert 过去

Full pipeline

Pipeline mode: 如果只是一台电脑的话就选 batch; Max_Queued 是根据电脑的核数来的，会自动识别，所以不用改
Diffusion parameters:
- Resampling resolution: 作者推荐使用 $2×2×2mm^3$，较高的resolution 会耗费处理时间
- 删除raw NII file 以节约空间
- f(skull reovel): 去除DTI image中的brain tissue, 选用default值 0.25
- cropping gap(mm): 就是图像外面的空白大小，选用默认值 3mm
- Orientation patch:
- 剩下全是默认

Tracking & Network parameters

Deterministic fiber tracking:
- Propagation algorithm: FACT
- Angle threshold: 45
- FA_threshold: 0.2~1

Network node definition

选择 parcellated native space: 这里我们使用 Freesurfer generate 出来的atlas.
需注意这个atlas需要是跟DTI data coregister 之后的，不然会fail掉，因为两个图像不match, 所以需要先生成DTI的file，再拿这个file去跟atlas做 coregistration.
生成DTI file: 此时 ‘Network Node Definition’ 及以下都不选择，运行到上面的fiber traking就能生成想要的文件，生成的文件是trackvis folder 下的dti_fa_color.nii.gz
Coregistraion: 找到Freesurfer 生成的文件，用AFNI的suma function之后，会有一个 SUMA 的文件夹，该文件夹下面应该有一个 aparc+aseg_REN_all.nii.gz和aparc.a2009s+aseg_REN_all.nii.gz的文件，这两个文件就是我们要coregister的，可以任选一个，分别代表不同的T1 segmentation的atlas. 将 dti_fa_color.nii.gz 复制到该 SUMA 文件夹下

Coregistration 的代码如下:

3dcalc -a dti_fa_color.nii.gz -expr 'a' -prefix dti_fa_color
align_epi_anat.py -anat aparc+aseg_REN_all.nii.gz -epi dti_fa_color+orig -epi_base 0 -anat_has_skull no -epi_strip 3dAutomask -volreg off -tshift off -big_move
3dAllineate -1Dmatrix_apply aparc+aseg_REN_all_al_mat.aff12.1D -input aparc+aseg_REN_all.nii.gz -prefix aparc+aseg_REN_all_reg -NN -final NN
3dcalc -a aparc+aseg_REN_all_reg+orig -expr 'a' -short -prefix aparc+aseg_REN_all_short
3dfractionize -template dti_fa_color+orig -input aparc+aseg_REN_all_short+orig -clip 0.2 -preserve -prefix aparc+aseg_REN_all_coreg
3dAFNItoNIFTI -prefix aparc+aseg_REN_all_coreg aparc+aseg_REN_all_coreg+orig

得到的 aparc+aseg_REN_all_coreg 就是要放在 parcellated native space 处的文件。选择好之后再运行一遍就OK。

Network construction: 用 deterministic

展开全文 >>

Problem solve - `GLIBCXX_3.4.21' not found

2018-10-18

在才开始使用Panda的过程中遇到了一个问题，这个问题之前遇到过，后来解决了也没再管过，没想到过了接近半年之后还会再使用Panda，搞了半天才找到解决办法，觉得还是真的要记录下来每个问题及当时的解决方法，保不准下一次又遇到了。。。

问题如下：
运行之后直接出现error:

1	/usr/local/MATLAB/R2016b/sys/os/glnxa64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/share/fsl/5.0/bin/fslchfiletype_exe

首先 not found 有两种情况：

这个 GLIBCXX_3.4.21 本来就不存在
这个 GLIBCXX_3.4.21 存在，但是matlab找不到，没有链接过去

在terminal处先通过以下代码看这个到底有没有：

1	strings /usr/lib/libstdc++.so.6 \| grep GLIBCXX

下面会出来一串相关的存在的 GLIBCXX. 找下有没有MATLAB说的那个 `GLIBCXX_3.4.21’。

如果存在的话，那就说明是matlab 链接的问题，只需要在terminal处用以下方式打开matlab就行了：

1	LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libstdc++.so.6" matlab

如果不存在的话，执行以下代码：

sudo apt-get install libstdc++6
sudo add-apt-repository ppa:ubuntu-toolchain-r/test 
sudo apt-get update
sudo apt-get dist-upgrade

Reference:
https://askubuntu.com/questions/719028/version-glibcxx-3-4-21-not-found
https://stackoverflow.com/questions/44773296/libstdc-so-6-version-glibcxx-3-4-20-not-found
https://askubuntu.com/questions/575505/glibcxx-3-4-20-not-found-how-to-fix-this-error

展开全文 >>

写GRF，答辩BMI的一些心得

2018-10-11

到今天(2018.10.11)为止，帮老板写 GRF proposal 和 Brain Mind Institute(BMI) 申 funding 的答辩就基本告一段落了，被 criticize 了很多吧。这里记录一下这个过程的一些心得，这些是我觉得不只是写proposal，对开始做一件事情时，整个人的逻辑和思考方式都是有帮助的：

GRF:

每次只 highlight 一个最多两个新颖的点：有些时候可能自己想法很多，觉得这样不错，那样也不错，都是非常前卫cutting-edge那种感觉，然后很兴奋，想把他们都加进去。但是有几点你要考虑：
1. Reviewer 不完全是你这个方向的，他可能并不能理解你那些前卫的想法，你要考虑这个剧本的观众是谁，他们能不能完全 get 到你的 point, 如果不能，那你这个剧本再怎么梦幻都是失败的，因为你首先需要观众的认可。一个人嗨没用，你要让一群人都嗨起来。
2. 加入多个新鲜的元素意味着，危险。这个危险有两个方面，一是apply到人上面的东西，涉及到伦理人情实验时长subject的心情等等客观因素，作为一个investigator, 首先要保证的就是安全无害，当多个元素加入时，这种风险就会增大，这是必须要注意的，人是第一要素。第二个方面是实验失败的风险，多个元素会使对照变得显得相当困难，连初中生都知道在设计生物实验时一定要有对照组，你怎么就忘了呢？没有对照，一切结果都是不能让人信服的。
描述细节：在写的过程中，很容易写得很宽泛，特别是 abstract 和 introduction 部分, 全程不说你要具体怎么做，只是说明你要去optimize, 要去 maximize 一个东西，最后结果可以多叼多叼。别人看完了还是不知道你到底要怎么去optimize, maxmize, 你倒是给点操作步骤啊。所以这里我的总结是，少用形容词，少点假大空，最好的句式是：列出要用到哪些东西，表明用什么方法，之后怎么去改变，会有什么样的结果。
写下的每一个数字，要有来源和依据：比如样本你要多少个，那这个数字是怎么来的，从文献呢还是经验呢，这个要列出来，No magic number。另一类数字就是你的一些 preliminary study 的结果，这些数字，你要知道它的物理意义，单位，还要用最通俗的语言解释给别人听，它的变化意味着什么。
关于题目的选取：google search 一下别人是怎么起名字的，这个我觉得对 non-native speaker 是很有用的。比如你起的这个到底是不是常用的，要是search 一下发现别人根本没人用这个词，或者这个词search出去导向的是别的研究方向，那就尴尬了，别人会觉得你不是内行人。
在文中适当表明一下这个技术有多新，多前沿，可以考虑指出世界上第一次做这个方向，采用这个方法等等是在哪个年份。。。比如2016, 2017，那就显得高端前沿了。
每一个稍微专业的词，都一定要解释清楚，保证你要传达的信息能够被准确理解
小到Reference中的journal斜体，缩写，作者名字，都是应该注意的。

BMI 答辩：

对你画出的每一个图，图里面的每一个字负责：前后一致，文字与图内容一致，并且保证你放上去的图，你都能完全解释给卖菜的大妈听懂。
PPT 少一点字啊喂：我承认没有花时间去准备，但这并不能成为你降低自己表达能力要求的借口。
关注你的听众：他们的background是什么，他们懂你的术语吗？不懂就一定要提前想到，然后过程中解释保证他们能听懂，还是那句话，交流就表示要有信息输出，而这个信息一定要能被准确传达和理解
关于回答问题：如果回答完一次后发现对方仍然在问相同或相似的问题，可能不是你的回答有问题，而是你们在某些内容上的认知不一样，这个时候就需要回到你觉得他在问什么上，你觉得的和他觉得的，可能并不是一个东西。考虑：他为什么这样问，这样问的前提是基于什么来的（或者简单点，这个问题的词汇构成是什么，你的哪一部分解释会指向这些词汇），是不是之前哪里让他对整个事情（问题中的一些词汇）有点误解？

大概就这样了，愿世界少点 proposal。

展开全文 >>

Notes of Neural Masses and Cortical Fields

2018-10-09

This is a summary of paper The Dynamic Brain: From Spiking Neurons to Neural Masses and Cortical Fields

This paper mainly talks about a variety of computational approaches that have been used to characterize the dynamics of the cortex.

The central theme: the activity in populations of neurons can be understood by reducing the degrees of freedom from many to few.
The way to achieve that is: to reduce the large population of spiking neurons to a distribution function describing their probabilistic evolution, which captures the likely distribution of neuronal states at a given time.
This can be further reduced to a single variable describing the mean firing rate.

Neural Mass models: capture the dynamics of a neuronal population.

The theory based: Full probability distribution function can be represented by a set of scalars that parameterize it parsimoniously. These parameters are equivalent to the moments of the distribution.)

Neural Field Models: investigate how neuronal activity unfolds on the spatially continuous cortical sheet by involving differential operators with both temporal and spatial terms.

The theory based: Neuronal activity depends on its current state as well as spatial gradients, which allow its spread horizontally across the cortical surface.

Mean-Field Models

This part provides an overview of mean-field models of neuronal dynamics and their derivation from models of spiking neurons.

mainly based on the mean-field approximation
are formulated using concepts from statistical physics.
suited to data which reflect the behavior of a population of neurons, such as EEG, MEG and fMRI.

Ensemble density models

Ensemble models attempt to model the dynamics of large (theoretically infinite) populations of neurons
use phase space to represent the neuron attribute space. Each attribute induces a dimension in the phase space of a neuron.
Three attributes will be included: post-synaptic membrane depolarization, $V$, capacitive current, $I$, and the time since the last action potential, $T$. (therefore, it is a three dimensional phase space)
The state of each neuron is a point $ν = {V,I,T} ∈ ℜ^3$
The density of neurons populate the space can be represented as: $p(ν,t)$
- As the state of each neuron evolves, the points will flow through phase space, and the ensemble density $p(ν,t)$ will evolve until it reaches some steady state or equilibrium.
- $p(ν,t)$ is a scalar function returning the probability density at each point in phase space.
the density dynamics conform to a simple equation: the Fokker-Planck equation:
- This equation comprises a flow and a dispersion term
- phase flow, $f(ν,t)$ describes the dynamics
- dispersion, $D(ν,t)$, describes the random fluctuations
  $$\dot p = - \nabla \cdot (f-D\nabla)p \equiv \frac{\partial p}{\partial t} = tr(-\frac{\partial (fp)}{\partial v}+\frac{\partial}{\partial v}(D \frac {\partial p}{\partial v}))$$
This level of description is usually framed as a stochastic differential equation (Langevin equation) that describes how the states evolve as functions of each other and some random fluctuations with $dv = f(v)dt + \sigma d \omega$.
- $D = \frac 12 \sigma^2$
- $\omega$ is a standard Wiener process,i.e. $w(t)−w(t+Δt)∼N(0, Δt)$
The density dynamics can be written as a linear operator or Jacobian Q:
$$\dot p = Qp$$ $$Q = \nabla \cdot (D \nabla - f)$$
For any model of neuronal dynamics, specified as a stochastic differential equation, there is a deterministic linear equation that can be integrated to generate ensemble dynamics.

From spiking neurons to mean-field models

For a single neuron, we can assume the spiking dynamics as the $leaky integrate-and-fire (LIF)$ model.
In the LIF model, each neuron i can be fully described in terms of a single internal variable, namely the depolarization $V_i(t)$ of the neural membrane.
The basic circuit of a LIF model consists of a capacitor, $C$, in parallel with a resistor, $R$, driven by a synaptic current.
When the voltage across the capacitor reaches a threshold $θ$, the circuit is shunted (reset) and a $δ$ pulse (spike) is generated and transmitted to other neurons.
The subthreshold membrane potential of each neuron evolves according to a simple RC circuit, with a time constant $τ = RC$ given by the following equation:
- $I_i(t)$ is the total synaptic current flow into the cell i
- $V_L$ is the leak or resting potential of the cell in the absence of external afferent inputs
  $$\tau \frac{dV_i(t)}{dt} = -[V_i(t)-V_L] + RI_i(t)$$
The total synaptic current coming into the cell i is therefore given by the sum of the contributions of δ-spikes produced at presynaptic neurons.
Assume that $N$ neurons synapse onto cell i and that $J_{ij}$ is the efficacy of synapse j, then the total synaptic afferent current is given by
- $t_j^{(k)}$ is the emission time of the kth spike from the jth presynaptic neuron.
  $$RI_i(t) = \tau \sum^N_{j=1} J_{ij} \sum_k \delta (t-t_j^{(k)})$$
Substitute the above equation gets:
- H(t) is the Heaviside function (H(t) = 1 if t>0, and H(t) = 0 if t<0) (acutally is a step function)
  $$V_i(t) = V_L + \sum^N_{j=1} J_{ij} \int^t_0 e^{-s/\tau} \sum_k \delta (t-s-t_j^{(k)})ds
  = V_L + \sum^N_{j=1} J_{ij} e^{-(t-t_j^{(k)})/\tau} \sum_k H(t-t_j^{(k)})$$

The population density approach

A cortical column has $O(10^4)$−$O(10^8)$ neurons which are massively interconnected (on average, a neuron makes contact with $O(10^4)$ other neurons). The underlying dynamics of such networks can be described explicitly by the set of coupled differential equations. However, direct simulations of these equations will be very complex and computationally expensive. Therefore, we adopt the population density approach, using the Fokker-Planck formalism. The Fokker-Planck equation summarizes the flow and dispersion of states over phase space in a way that is a natural summary of population dynamics in genetics. The following will show how to derive the Fokker-Planck equation for neuronal dynamics.

Individual IF neurons are grouped together into populations of statistically similar neurons.
Then use probability density function to describe the distribution of neuronal states (i.e., membrane potential) over the population.
Key assumption: the afferent input currents impinging on neurons in one population are uncorrelated
- In general, neurons with the same state $V(t)$ at a given time $t$ have a different history because of random fluctuations in the input current $I(t)$.
- The main source of randomness is from fluctuations in recurrent currents and fluctuations in the external currents
Then the dynamics are described by the evolution of the probability density function:
- which is the fraction of neurons at time $t$ that have a membrane potential $V(t)$ in the interval $[ν,ν+dν]$
  $$p(v, t)dv = Prob\{V(t)\in [v, v+dv]\}$$
The evolution of the population density is given by the Chapman-Kolmogorov equation:
- $ρ(ε|ν) = Prob\{V(t+dt) = ν+ε|V(t) = ν\}$ is the conditional probability that generates an infinitesimal change $ε = V(t+dt)−V(t)$ in the infinitesimal interval $dt$.
  $$p(v, t+dt) = \int^{+\infty}_{-\infty} p(v-\varepsilon,t)\rho(\varepsilon|v-\varepsilon)d\varepsilon$$
The Chapman-Kolmogorov equation can be written in a differential form by performing a Taylor expansion in $p(ν′,t) ρ(ε|ν′)$ around $ν′ = ν$:
- assume that $p(ν′,t)$ and $ρ(ε| ν′)$ are infinitely many times differentiable in $ν$
  $$p(v’,t)\rho (\varepsilon|v’) = \sum^{\infty}_{k=0} \frac{(-\varepsilon)^k}{k!} \frac{\partial^k}{\partial v’^k} [p(v’,t)\rho(\varepsilon|v’)] \mid {v’=v}$$
Combine the above equatin and can get:
- $〈…〉ν$ denotes the average with respect to $ρ(ε| ν)$ at a given $ν$
  $$p(v,t+dt) = \sum^\infty_{k=0} \frac{(-1)^k}{k!} \frac{\partial^k}{\partial v^k} [p(v,t)\langle \varepsilon^k \rangle _v]$$
Take the limit for $dt \to 2$:
$$\frac{\partial p(v,t)}{\partial t} = \sum^\infty_{k=1} \frac{(-1)^k}{k!} \frac{\partial^k}{\partial v^k} [p(v,t)lim_{dt \to 0 }\frac 1{dt} \langle \varepsilon^k \rangle_v]$$

The diffusion approximation

The above temporal evolution requires the moments $〈ε^k〉_υ$.
These moments can be calculated by the mean-field approximation
The mean-field approximation replaces the time-averaged discharge rate of individual cells with a common time-dependent population activity
Therefore, infinitesimal change, $dV(t)$, in the membrane potential of all neurons is:
- $N$ is the number of neurons
- $〈J〉_J$ denotes the average of the synaptic weights in the population.
- $Q(t)$ is the mean population firing rate and determined by the proportion of active neurons by counting the number of spikes $n_{spikes}(t,t+dt)$ in a small time interval dt and dividing by N and by dt: $Q(t) = lim_{dt \to 0} \frac{n_{spikes}(t,t+dt)}{Ndt}$
  $$dV(t) = \langle J\rangle_J NQ(t) dt - \frac{V(t)-V_L}{\tau}dt$$
The first two moments in the Kramers-Moyal expansion are called drift and diffusion coefficients, respectively as follows:
$$M^{(1)} = lim_{dt \to 0} \frac1{dt} \langle \varepsilon\rangle_v =\langle J\rangle_J NQ(t) - \frac{v-V_L}{\tau} = \frac{\mu(t)}{\tau} - \frac{v-V_L}{\tau}$$ $$M^{(2)} = lim_{dt \to 0} \frac1{dt} \langle \varepsilon^2 \rangle_v =\langle J^2 \rangle_J NQ(t) = \frac{\sigma(t)^2}{\tau}$$
The diffusion approximation allows to omit all higher orders k>2 in the Kramers-Moyal expansion. The resulting differential equation describing the temporal evolution of the population density is called the Fokker-Planck equation:
$$\frac{\partial p(v,t)}{\partial t} = \frac1{2\tau} \sigma^2 (t) \frac{\partial^2p(v,t)}{\partial v^2} + \frac{\partial}{\partial v}[(\frac{v-V_L-\mu(t)}{\tau})p(v,t)]$$
If the drift is linear and the diffusion coefficient, $σ^2(t)$, is given by a constant, the Fokker-Planck equation describes a well-known stochastic process called the Ornstein-Uhlenbeck process. And the input afferent currents are given by
- $ω(t)$ is a white noise process
  $$RI(t) = \mu(t) + \sigma \sqrt \tau \omega(t)$$

The mean-field model

Neural Modes and Masses

Neural Field Models

展开全文 >>

Laplacian Matrix and Spectral Clustering

2018-10-05

When I was reading paper, some people used this Laplacian metrix as a small step in their network analysis. I then found some papers and tutorials regarding this topic to have a general idea of what it is. The followings are some summary for reference.

Either Laplacian graph or spectral clustering is in the field of spectral graph theory. It studies the properties of graphs via the eigenvalues and eigenvectors of their associated graph matrices: the adjacency matrix and the graph Laplacian and its variants.

The most important application of the Laplacian is spectral clustering that corresponds to a computationally tractable solution to the graph partitionning problem.
Another application is spectral matching that solves for graph matching.

Basic information:

Degree matrix: a diagonal matrix which contains information about the degree of each vertex — the sum of weights of all edges attached to that vertex. The value of diagonal is the degree of the vertex.
Similarity function: also called similarity measures, to quantify the similarity between two objects and is the basic and preliminary thing to do clustering. Denoted as $s_{ij}$ between point i and point j. Some examples: Cosine similarity, kernel functions, Gaussian similarity ($s(x_i, x_j) = exp(-|x_i-x_j|^2/2\sigma^2)$).
Similarity graph: to model the local neighborhood relationships between the data points. There are several similarity graphs can be chosen:

ε-neighborhood graph: connect all points whose pairwise distances that are smaller than ε. The ε-neighborhood graph is usually considered as an unweighted graph. (以数值来说话)
k-nearest neighbor graphs: connect vertex $v_i$ with vertex $v_j$ if $v_j$ is among the k-nearest neighbors of $v_i$. (Attention: this definition can lead to a directed graph, as the neighborhood relationship is not symmetric, e.g. vertex $v_j$ can be the k-th nearest neighbor of $v_i$ but $v_i$ may not be the k-th nearest neighbor of $v_j$) (以个数来说话)
fully connected graph: simply connect all points with positive similarity with each other, and weight all edges by $s_{ij}$. As the graph should represent the local neighborhood relationships, this construction is only useful if the similarity function itself models local neighborhoods, e.g. Gaussian similarity function.

Graph Laplacian and properties

The unnormalized Laplacian matrix:

$$ L = D - A $$
where, D is the degree matrix and A is the adjacency matrix of the graph.

Property of the unnormalized laplacian matrix:

For every vector $f\in R^n$: $f’Lf=\frac12\sum_{i,j=1}^nw_{ij}(f_i-f_j)^2$
L is symmetric and positive semi-definite
The smallest eigenvalue of L is 0, the corresponding eigenvextor is the constant one vector 1.
L has $n$ non-negative, real-valued eigenvalues $0=\lambda_1 \le \lambda_2 \le … \le \lambda_n$

The normalized Laplacian:

There are two matrices that are called normalized graph Laplacians in literature:
$$L_{sym} = D^{-1/2}LD^{-1/2} = I - D^{-1/2}WD^{-1/2}$$
$$L_{rm} = D^{-1}L = I - D^{-1}W$$
The first matrix is denoted by $L_{sym}$ as it is symmetric matrix.
The seond one is denoted by $L_{rm}$ as it is closely related to a random walk.

Property of the normalized laplacian matrix:

For every vector $f\in R^n$: $f’L_{sym}f=\frac12\sum_{i,j=1}^nw_{ij}(\frac{f_i}{\sqrt {d_i}} - \frac{f_j}{\sqrt {d_j}})^2$
$\lambda$ is an eigenvalue of $L_{rw}$ with eigenvector $u$ if and only if $\lambda$ is an eigenvalue of $L_{sym}$ with eigenvector $w=D^{1/2}u$
$\lambda$ is an eigenvalue of $L_{rw}$ with eigenvector $u$ if and only if $\lambda$ and $u$ solve the generalized eigenproblem $Lu = λDu$
0 is an eigenvalue of $L_{rw}$ with the constant one vector 1 as eigenvector. 0 is an eigenvalue of $L_{sym}$ with eigenvector $D^{1/2}1$.
$L_{sym}$ and $L_{rw}$ are positive semi-definite and have $n$ nonnegative real-valued eigenvalues $0 = λ_1 ≤···≤ λ_n$.

Spectral clustering algorithms

Assume the data consists of $n$ “points” $x_1,…,x_n$ which can be arbitrary objects. We measure their pairwise similarities $s_{ij} = s(x_i,x_j)$ by some similarity function which is symmetric and non-negative, and we denote the corresponding similarity matrixby $S = (s_{ij} )_{i,j=1,…,n}$.

Unnormalized spectral clustering

Input: Similarity matrix $S ∈ R^{n×n}$, number k of clusters to construct.

Construct a similarity graph by one of similarity functions. Let $W$ be its weighted adjacency matrix.

Compute the unnormalized Laplacian $L$.

Compute the first $k$ eigenvectors $u_1,…,u_k$ of $L$

Let $U ∈ R^{n×k}$ be the matrix containing the vectors $u_1,…,u_k$ as columns.

For $i = 1,…,n$, let $y_i ∈ R^k$ be the vector corresponding to the i-th row of $U$

Cluster the points $(y_i)_{i=1,…,n}$ in $R^k$ with the k-means algorithm into clusters $C_1,…,C_k$

Output: Clusters $A_1,…,A_k$ with $A_i =\{j|y_j ∈ C_i\}$.

Normalized spectral clustering using $L_{rw}$

Input: Similarity matrix $S ∈ R^{n×n}$, number k of clusters to construct.

Construct a similarity graph by one of similarity functions. Let $W$ be its weighted adjacency matrix.

Compute the unnormalized Laplacian $L$.

Compute the first k generalized eigenvectors $u_1,…,u_k$ of the generalized eigenproblem $Lu=λDu$

Let $U ∈ R^{n×k}$ be the matrix containing the vectors $u_1,…,u_k$ as columns.

For $i = 1,…,n$, let $y_i ∈ R^k$ be the vector corresponding to the i-th row of $U$

Cluster the points $(y_i)_{i=1,…,n}$ in $R^k$ with the k-means algorithm into clusters $C_1,…,C_k$

Output: Clusters $A_1,…,A_k$ with $A_i =\{j|y_j ∈ C_i\}$.

Normalized spectral clustering using $L_{sym}$

Input: Similarity matrix $S ∈ R^{n×n}$, number k of clusters to construct.

Construct a similarity graph by one of similarity functions. Let $W$ be its weighted adjacency matrix.

Compute the normalized Laplacian $L_{sym}$.

Compute the first k eigenvectors $u_1,…,u_k$ of $L_{sym}$

Form the matrix $T ∈ R^{n×k}$ from $U$ by normalizing the rows to norm 1,that is set $t_{ij} =u_{ij}/(\sum_k u_{ik}^2)^{1/2}$.

For $i = 1,…,n$, let $y_i ∈ R^k$ be the vector corresponding to the i-th row of $T$

Cluster the points $(y_i)_{i=1,…,n}$ in $R^k$ with the k-means algorithm into clusters $C_1,…,C_k$

Output: Clusters $A_1,…,A_k$ with $A_i =\{j|y_j ∈ C_i\}$.

Attention: Eigenvalues will always be ordered increasingly, respecting multiplicities. Therefore, “the first k eigenvectors” refers to the eigenvectors corresponding to the k smallest eigenvalues.

Reference:
https://www.sciencedirect.com/science/article/pii/S1053811917305463
https://link.springer.com/content/pdf/10.1007%2Fs11222-007-9033-z.pdf
http://www2.imm.dtu.dk/projects/manifold/Papers/Laplacian.pdf
https://csustan.csustan.edu/~tom/Clustering/GraphLaplacian-tutorial.pdf
https://www.youtube.com/watch?v=FRZvgNvALJ4

展开全文 >>

思维模型总结

2018-10-03

大部分内容来源于L先生说，此处是一些个人总结。

思维模型(Mental model): 每个人认知世界、思考问题的基本模式和习惯

Keywords: 本质，规律，原则

本质：一个事物在变化的过程中，或者穷尽各种可能性的前提下，所维持不变的那一部分。

For example:
公司的本质是整合资源。无论什么行业、什么领域，只要资源存在错配和空缺，公司就得以存在；而一旦资源能高效运作、对接起来，公司就没必要存在。
汽车的本质是出行。在这个过程中，保证便捷、安全、高效，才是汽车最核心的存在理由。其他一切，都只是建筑在这个本质上面的浅层需求罢了。

二阶模型

如果这样做，短时间内会得到什么样的结果（一阶）
如果得到了这个结果，在较长时间后，会产生哪些新的可能性和结果？（二阶）

本质上是个可能性分析的问题，有必要时可跟概率进行联系。

整体思维

整体思维基于两个假设：

一切事物在底层上都是相互联系的
整体能够提供比个体本身更多的信息

关于整体思维的训练方式：

它的背景和场景是什么
它为什么出现
它的出现带来了什么，导致了什么？

其实这个思维方式可以理解成 3W 问题，即：Where, Why, What. 对于要研究的对象，建立联系图，挖掘底层的东西，并与自己的所知道的系统进行联系。

系统思维

系统就是「元素」和「结构」的组合。把一定的元素，通过不同的结构、方式组合起来，使它们具备整体性，这就构成了一个「系统」。

基于的假设：系统就是一个转换 (transformation)的过程。
即，系统的存在，一定是因为它达成了一种「转换」：能将某些不够好的、无序的状态，转变成更优的、有序的状态。

从工程上来讲，a system has input and output. 如果以系统思维来思考问题的话，其实就是理解什么是input, system中的transfer function 指代的是什么，以及output 可以有哪些情况。

系统思维在生活中的应用：公司可以看作系统，城市可以看做系统，一个餐饮店，一个班级，一个公众号，都可以看做一个完备的、小小的系统。

关于寻求本质的一些思维模型：

输入 - 输出模型(IO模型)
问：一个行为、过程和系统，它的初始形态是什么？它的最终形态又是什么？
去关注：一样事物，它变化之前是什么，变化之后又是什么？

不要去关注产品，而是要去关注：消费者想通过这个产品，达到一种什么样的状态？我们有没有什么方法帮助他们达到这个状态？这就是创新的要义。

供给 - 需求模型(SD模型)
问：它们为什么可以连接起来？彼此的供给和需求是什么？

我有什么？
谁需要这些东西？
我如何能把已有的东西，转变为别人需要的东西？

动力 - 阻力模型(PO模型)
很多问题的本质，其实都是动力和阻力的博弈。动力超过阻力，改变就会发生，行为就会成立，反之，就会停滞。
问：推动一件事情的动力是什么？如果它发生了，阻力又是什么？

改变 - 不变模型(CS模型)
对于生活中的现象和事实，问：经历了这些复杂的过程，它改变的是什么？不变的又是什么？
这些不变的东西，很可能就是它所赖以持存、构成其本质的部分。

当你接触到一个陌生领域时，多搜集一些相关的材料去观察：在这些材料里面，有没有哪几个关键词、哪几个概念，是一直都存在，没有改变过的？如果有，它们往往就是这个领域的关键节点。试着去把它们「连接」起来，就可以找到的是这个领域的本质。

Reference：
L先生说，36氪

展开全文 >>