在证明随机矩阵满足限制等距性质以及估计限制等距常数时,Khintchine不等式、Hoeffding型不等式以及Bernstein型不等式都发挥着重要的作用。在实际应用当中,这三种不等式根据随机变量类别的不同又有很多变化。本文重点补充证明了当随机变量为矩阵时的Khintchine不等式、Hoeffding型不等式以及Bernstein型不等式,这些不等式在随机测量矩阵稀疏恢复证明过程中起到关键作用。 Khintchine’s inequality, Hoeffding’s inequality and Bernstein’s inequality play an important role in proving that random matrix satisfies the property of restricted isometry and estimating the constant of restricted isometry. In practical application, these three inequalities have many changes according to the different types of random variables. In this paper, we mainly prove the Khintchine inequality, Hoeffding type inequality and Bernstein type inequality when the random variable is a matrix. These inequalities play a key role in the proof of sparse restoration of random measurement matrix.
在证明随机矩阵满足限制等距性质以及估计限制等距常数时,Khintchine不等式、Hoeffding型不等式以及Bernstein型不等式都发挥着重要的作用。在实际应用当中,这三种不等式根据随机变量类别的不同又有很多变化。本文重点补充证明了当随机变量为矩阵时的Khintchine不等式、Hoeffding型不等式以及Bernstein型不等式,这些不等式在随机测量矩阵稀疏恢复证明过程中起到关键作用。
限制等距性质,亚高斯随机变量,标准高斯随机变量,Khintchine不等式,Hoeffding型不等式, Bernstein型不等式
Ke Zheng, Ruying Song
Department of Mathematics, Taiyuan Normal University, Jinzhong Shanxi
Received: Jun. 19th, 2021; accepted: Jul. 11th, 2021; published: Jul. 22nd, 2021
Khintchine’s inequality, Hoeffding’s inequality and Bernstein’s inequality play an important role in proving that random matrix satisfies the property of restricted isometry and estimating the constant of restricted isometry. In practical application, these three inequalities have many changes according to the different types of random variables. In this paper, we mainly prove the Khintchine inequality, Hoeffding type inequality and Bernstein type inequality when the random variable is a matrix. These inequalities play a key role in the proof of sparse restoration of random measurement matrix.
Keywords:Restricted Isometric Property, Sub-Gaussian Random Variable, Standard Gaussian Random Variable, Khintchine Inequality, Hoeffding Type Inequality, Bernstein Type Inequality
Copyright © 2021 by author(s) and beplay安卓登录
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
压缩感知的模型如下 y = A x ,当测量矩阵是随机测量矩阵的时候,满足RIP性质的概率需要足够大,在证明过程中需要借助Khintchine不等式(见文献 [
在文献 [
文献 [
在证明Noncommutative Bernstein不等式的同时需要Hoeffding型不等式,文献 [
在证明之前首先需要以下引理来辅助证明Khintchine不等式。
引理2.1:设 g 是一标准高斯变量, B ∈ ℂ d × d 是一自伴矩阵。那么满足以下等式
E exp ( g θ B ) = exp ( θ 2 B 2 / 2 ) .(1)
证明:对于标准高斯随机变量 g ,期望和方差满足以下条件
E g = 0 , E g 2 = 1 (2)
见文献 [
E g k = E ( g ) E ( g k − 1 ) = 0 × E ( g k − 1 ) = 0. (3)
我们将定理中等式左边按照泰勒展开式展开,即
E exp ( g θ B ) = ∑ k = 0 ∞ E ( g θ B ) k k ! = ∑ k = 0 ∞ E ( g ) k E ( θ B ) k k !
我们将(3)代入上式,可得
E exp ( g θ B ) = ∑ k = 0 ∞ E ( g θ B ) k k ! = ∑ k = 0 ∞ E ( g ) k E ( θ B ) k k ! = 1 + ∑ k = 1 ∞ E ( g ) 2 k E ( θ B ) 2 k ( 2 k ) ! ! + ∑ k = 1 ∞ E ( g ) 2 k − 1 E ( θ B ) 2 k − 1 ( 2 k − 1 ) ! ! = 1 + ∑ k = 1 ∞ E ( g ) 2 k E ( θ B ) 2 k ( 2 k ) ! ! = 1 + ∑ k = 1 ∞ 1 × ( θ B ) 2 k ( 2 k ) ! ! = ∑ k = 0 ∞ ( θ B ) 2 k ( 2 k ) ! ! = ∑ k = 0 ∞ ( θ B ) 2 k 2 k k ! = exp ( θ 2 B 2 / 2 )
等式结果得以证明。
以上引理证明时可以借鉴文献 [
引理2.2:设 ε 是一Rademacher随机变量, B ∈ ℂ d × d 是一自伴矩阵。那么满足以下等式
E exp ( ε θ B ) ≤ exp ( θ 2 B 2 / 2 ) .(4)
证明:对于Rademacher随机变量 ε ,期望和方差满足以下条件
E ε = 0 , E ε 2 = 1 (5)
我们将定理中等式左边按照泰勒展开式展开,即
E exp ( ε θ B ) = 1 2 ( exp ( θ B ) + exp ( − θ B ) ) = ∑ k = 0 ∞ ( θ B ) 2 k ( 2 k ) ! ≤ ∑ k = 0 ∞ ( θ B ) 2 k 2 k k ! = exp ( θ 2 B 2 / 2 ) (6)
等式结果得以证明。
该引理摘自文献 [
定理2.1:在以上定理成立的条件下,满足以下不等式
E ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ≤ 2 ln ( 2 d ) ‖ ∑ j = 1 M B j 2 ‖ 2 → 2 1 / 2 (7)
成立。
并且,对于Rademacher序列 ε = ( ε 1 , ⋯ , ε M ) ,
E ‖ ∑ j = 1 M ε j B j ‖ 2 → 2 ≤ 2 ln ( 2 d ) ‖ ∑ j = 1 M B j 2 ‖ 2 → 2 1 / 2 (8)
成立。
证明:首先证明不等式(7),先从(7)中左边开始着手
E ‖ ∑ j = 1 M g j B j ‖ 2 → 2 = E ( λ max ( ∑ j = 1 M g j B j ) ) ≤ E ∑ j = 1 d λ j ( ∑ j = 1 M g j B j ) ≤ E t r ( ∑ j = 1 M g j B j ) = E ∑ j = 1 M t r ( g j B j ) ≤ ∑ j = 1 M t r ( ln E exp ( g j B j ) )
由于 t r ( A + B ) = t r A + t r B 以及引理2.1可知,当 θ = 1 时可得 E exp ( g j B j ) = exp ( B j 2 / 2 ) ,上式继续推导可得
= ∑ j = 1 M t r ( ln exp ( B j ) 2 2 ) = ∑ j = 1 M t r ( ( B j ) 2 2 ) = t r ∑ j = 1 M ( ( B j ) 2 2 ) = 2 2 ∑ i = 1 d λ i ( ∑ j = 1 M ( B j ) 2 ) ≤ 2 d 2 λ max ( ∑ j = 1 M ( B j ) 2 ) = 2 d 2 ‖ ∑ j = 1 M ( B j ) 2 ‖ 2 → 2 1 / 2
最终结果为 E ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ≤ 2 d 2 ‖ ∑ j = 1 M ( B j ) 2 ‖ 2 → 2 1 / 2 ,当 2 d 2 ≤ 2 ln ( 2 d ) 成立时,即 d ≤ 4 ln ( 2 d ) , E ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ≤ 2 d 2 ‖ ∑ j = 1 M ( B j ) 2 ‖ 2 → 2 1 / 2 ≤ 2 ln ( 2 d ) ‖ ∑ j = 1 M B j 2 ‖ 2 → 2 1 / 2 成立。
类似证明(8)也成立,先从(8)中左边开始着手
E ‖ ∑ j = 1 M ε j B j ‖ 2 → 2 = E ( λ max ( ∑ j = 1 M ε j B j ) ) ≤ E ∑ j = 1 d λ j ( ∑ j = 1 M ε j B j ) ≤ E t r ( ∑ j = 1 M ε j B j ) = E ∑ j = 1 M t r ( ε j B j ) ≤ ∑ j = 1 M t r ( ln E exp ( ε j B j ) )
由于 t r ( A + B ) = t r A + t r B 以及引理2.2可知,当 θ = 1 时可得 E exp ( ε j B j ) ≤ exp ( B j 2 / 2 ) ,上式继续推导可得
= ∑ j = 1 M t r ( ln exp ( B j ) 2 2 ) = ∑ j = 1 M t r ( ( B j ) 2 2 ) = t r ∑ j = 1 M ( ( B j ) 2 2 ) = 2 2 ∑ i = 1 d λ i ( ∑ j = 1 M ( B j ) 2 ) ≤ 2 d 2 λ max ( ∑ j = 1 M ( B j ) 2 ) = 2 d 2 ‖ ∑ j = 1 M ( B j ) 2 ‖ 2 → 2 1 / 2
最终结果为 E ‖ ∑ j = 1 M ε j B j ‖ 2 → 2 ≤ 2 d 2 ‖ ∑ j = 1 M ( B j ) 2 ‖ 2 → 2 1 / 2 ,当 2 d 2 ≤ 2 ln ( 2 d ) 成立时,即 d ≤ 4 ln ( 2 d ) , E ‖ ∑ j = 1 M ε j B j ‖ 2 → 2 ≤ 2 d 2 ‖ ∑ j = 1 M ( B j ) 2 ‖ 2 → 2 1 / 2 ≤ 2 ln ( 2 d ) ‖ ∑ j = 1 M B j 2 ‖ 2 → 2 1 / 2 成立。
在证明之前首先证明一个简单的定理,如下:
定理3.1:对于随机变量 X ,满足以下不等式
E X ≤ inf θ > 0 θ − 1 ln E [ exp ( θ X ) ] .
证明: E X = θ − 1 E ln [ exp ( θ X ) ] = θ − 1 ln E [ exp ( θ X ) ]
所以 θ − 1 ln E [ exp ( θ X ) ] 取下确界可得 E X ≤ inf θ > 0 θ − 1 ln E [ exp ( θ X ) ] 。
接下来开始证明矩阵高和Hoeffding型不等式,文献 [
定理3.2:设 g = ( g 1 , ⋯ , g M ) 是一个由独立标准高斯随机变量构成的向量,并且 B 1 , ⋯ , B M ∈ ℂ d × d 是自伴矩阵。引入 σ 2 = ‖ ∑ j = 1 M B j 2 ‖ 2 → 2 ,满足不等式
E exp ( θ ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ) ≤ 2 d exp ( θ 2 σ 2 / 2 ) (9)
其中 θ > 0 。并且对于 t > 0 ,概率满足
P ( ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ≥ t ) ≤ 2 d exp ( − t 2 2 σ 2 ) . (10)
证明:对于不等式(9),我们从(9)的左边开始证明
E exp ( θ ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ) = E exp ( θ λ max ( ∑ j = 1 M g j B j ) ) = E λ max exp ( θ ∑ j = 1 M g j B j ) ≤ E ∑ j = 1 d λ j exp ( θ ∑ j = 1 M g j B j ) = E t r e θ ∑ j = 1 M g j B j ≤ t r exp ( ∑ j = 1 M ln E exp ( θ g j B j ) )
其中第二行的不等号应用了文献 [
≤ t r exp ( ∑ j = 1 M ln exp ( θ 2 B j 2 / 2 ) ) = t r exp ( θ 2 ∑ j = 1 M B j 2 2 ) = exp ( θ 2 ∑ j = 1 d λ j ( ∑ j = 1 M B j 2 ) 2 ) ≤ exp ( θ 2 d λ max ( ∑ j = 1 M B j 2 ) 2 ) = exp ( θ 2 d ‖ ∑ j = 1 M B j 2 ‖ 2 → 2 2 ) = exp ( θ 2 d σ 2 2 )
在 e d / 2 ≤ 2 d ( d ≤ 2 ln 2 d )成立的条件下,不等式 exp ( d θ 2 σ 2 / 2 ) ≤ 2 d exp ( θ 2 σ 2 / 2 ) 成立,即
E exp ( θ ‖ ∑ j = 1 M g j B j ‖ 2 → 2 ) ≤ exp ( d θ 2 σ 2 / 2 ) ≤ 2 d exp ( θ 2 σ 2 / 2 ) .
不等式(9)得以证明。
对于概率不等式(10),证明过程可以借鉴文献 [
E exp ( g θ B ) = exp ( θ 2 B 2 / 2 ) ,这里 B 2 是半正定的。因此,文献 [
[
ρ = ‖ ∑ l = 1 M B l 2 ‖ 2 → 2 = σ 2
其中 ∑ l = 1 M B l 2 是半正定的。由此
P ( ‖ ∑ l M g l B l ‖ 2 → 2 ≥ t ) ≤ P ( λ max ( ∑ l = 1 M g l B l ) ≥ t ) + P ( λ max ( − ∑ l = 1 M g l B l ) ≥ t ) ≤ 2 d inf θ > 0 e − θ t + θ 2 σ 2 / 2 = 2 d e − t 2 / ( 2 σ 2 ) .
这里 θ 的最优选择是 θ = t / σ 2 。定理3.2的(10)得以证明。
接下来我们介绍矩阵版本的非对易伯恩斯坦不等式的次指数版本。
首先给出证明时所需引理来辅助证明,如下:
引理4.1:设 Y ∈ ℂ d × d 是一个自伴随机矩阵。那么,对于 t ∈ ℝ ,
P ( λ max ( Y ) ≥ t ) ≤ inf θ > 0 exp { e − θ t E t r exp ( θ Y ) } . (11)
引理4.2:设 X 1 , ⋯ , X M ∈ ℂ d × d 是独立自伴随机矩阵。那么,对于 θ ∈ ℝ ,
E t r exp ( θ ∑ l = 1 M X l ) ≤ t r exp ( ∑ l = 1 M ln E exp ( θ X l ) ) . (12)
引理4.3:设 X 1 , ⋯ , X M ∈ ℂ d × d 是独立自伴随机矩阵。假设它们存在函数 g : ( 0 , ∞ ) → [ 0 , ∞ ) 以及固定的自伴矩阵 A 1 , ⋯ , A M 使得
E [ exp ( θ X l ) ] ≺ = exp ( g ( θ ) A k ) , 对 于 ∀ θ > 0 以 及 ∀ k ∈ [ M ] . (13)
那么, ρ : = λ max ( ∑ l = 1 M A l ) 。概率满足以下不等式
P ( λ max ( ∑ l = 1 M X l ) ≥ t ) ≤ d inf θ > 0 e − θ t + g ( θ ) ρ , t ∈ ℝ .
以上三个引理均摘自文献 [
定理4.1 (Noncommutative Bernstein不等式的次指数版本):设 X 1 , ⋯ , X M ∈ ℂ d × d 是独立零均值自伴随机矩阵。假设
E [ X l n ] ≺ = n ! R n − 2 σ l 2 B l 2 / 2 , l ∈ [ M ] , (14)
对于某些自伴矩阵 B l ,设
σ 2 : = ‖ ∑ l = 1 M B l 2 ‖ 2 → 2 .
则满足以下概率不等式
P ( λ max ( ∑ l = 1 M X l ) ≥ t ) ≤ d exp ( − t 2 / 2 σ 2 + R t ) . (15)
其中 t > 0 。
证明:通过克拉默定理的启示,我们估计 X l 的矩生成函数。将指数函数展开为泰勒级数,利用Fubini定理交换期望和求和得到
E [ exp ( θ X l ) ] = 1 + θ E [ X l ] + ∑ n = 2 ∞ θ n E [ X l n ] n ! = 1 + θ 2 σ l 2 B l 2 2 ∑ n = 2 ∞ θ n − 2 E [ X l n ] n ! σ l 2 B l 2 / 2 ,
由于 E [ X l ] = 0 。定义 F l ( θ ) = ∑ n = 2 ∞ θ n − 2 E [ X l n ] n ! σ l 2 B l 2 / 2 ,我们可得
E [ exp ( θ X l ) ] = 1 + θ 2 σ l 2 B l 2 F l ( θ ) / 2 ≤ exp ( θ 2 σ l 2 B l 2 F l ( θ ) / 2 ) .
引入 F ( θ ) = max l ∈ [ M ] F l ( θ ) 并且应用 σ 2 : = ‖ ∑ l = 1 M B l 2 ‖ 2 → 2 ,我们从克拉默定理中(文献 [
P ( λ max ( ∑ l = 1 M X l ) ≥ t ) ≤ inf θ > 0 { exp ( − θ t ) t r exp ( ∑ l = 1 M ln E exp ( θ X l ) ) } ≤ inf θ > 0 { exp ( − θ t ) t r exp ( ∑ l = 1 M ln exp ( θ 2 σ l 2 B l 2 F l ( θ ) / 2 ) ) } ≤ exp ( − θ t ) t r exp ( ∑ l = 1 M ( θ 2 σ l 2 B l 2 F l ( θ ) / 2 ) )
由于 E [ X l n ] ≤ E [ | X l | n ] ,假设文献 [
F l ( θ ) ≤ ∑ n = 2 ∞ θ n − 2 E [ | X l | n ] n ! σ l 2 B l 2 / 2 ≤ ∑ n = 2 ∞ θ n − 2 n ! R n − 2 σ l 2 B l 2 / 2 n ! σ l 2 B l 2 / 2 = ∑ n = 2 ∞ ( θ R ) n − 2 = 1 1 − R θ
这里 R θ < 1 。所以 F ( θ ) ≤ ( 1 − R θ ) − 1 代入到以上概率不等式可得
P ( λ max ( ∑ l = 1 M X l ) ≥ t ) ≤ exp ( − θ t ) t r exp ( ∑ l = 1 M ( θ 2 σ l 2 B l 2 F l ( θ ) / 2 ) ) ≤ exp ( − θ t ) t r exp ( ∑ l = 1 M ( θ 2 σ l 2 B l 2 2 ( 1 − R θ ) ) ) = exp ( − θ t ) ∑ i = 1 d λ i ( exp ( ∑ l = 1 M ( θ 2 σ l 2 B l 2 2 ( 1 − R θ ) ) ) )
≤ d exp ( − θ t ) λ max ( exp ( ∑ l = 1 M ( θ 2 σ l 2 B l 2 2 ( 1 − R θ ) ) ) ) ≤ d exp ( − θ t ) exp ( θ 2 2 ( 1 − R θ ) λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) ≤ d exp ( − θ t + θ 2 2 ( 1 − R θ ) λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) )
现在我们选择 θ = t / ( R t + λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) ,很明显满足 R θ < 1 。这便推导出
P ( λ max ( ∑ l = 1 M X l ) ≥ t ) ≤ d exp ( − θ t + θ 2 2 ( 1 − R θ ) λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) ≤ d exp ( − t 2 R t + λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) + t 2 ( R t + λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) 2 2 ( 1 − R t R t + λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) )
= d exp ( − t 2 / 2 R t + λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) = d exp ( − t 2 / 2 R t + ‖ ∑ l = 1 M σ l 2 B l 2 ‖ 2 → 2 ) ≤ d exp ( − t 2 / 2 R t + ‖ ∑ l = 1 M B l 2 ‖ 2 → 2 ) = d exp ( − t 2 / 2 R t + λ max ( ∑ l = 1 M ( σ l 2 B l 2 ) ) ) = d exp ( − t 2 / 2 R t + ‖ ∑ l = 1 M σ l 2 B l 2 ‖ 2 → 2 ) ≤ d exp ( − t 2 / 2 R t + ‖ ∑ l = 1 M B l 2 ‖ 2 → 2 ) = d exp ( − t 2 / 2 R t + σ 2 )
其中 σ l 2 ∈ ( 0 , 1 ] ,这时满足不等式
P ( λ max ( ∑ l = 1 M X l ) ≥ t ) ≤ d exp ( − t 2 / 2 R t + σ 2 ) .
结论得证。特别的,当随机变量是标量时的Bernstein不等式如下:
定理4.2:设 X 1 , ⋯ , X M 是独立零均值随机变量,对于任意的整数 n ≥ 2 ,满足以下不等式
E | X l | n ≤ n ! R n − 2 σ l 2 / 2 , ∀ l ∈ [ M ] , (16)
对于某些常数 R > 0 以及 σ l > 0 , l ∈ [ M ] 。那么,对于 ∀ t > 0 ,满足概率不等式
P ( | ∑ l = 1 M X l | ≥ t ) ≤ 2 exp ( − t 2 / 2 σ 2 + R t ) . (17)
其中 σ 2 : = ∑ l = 1 M σ l 2 。
以上内容摘自文献 [
本文在文献 [
郑 珂,宋儒瑛. 随机矩阵概率不等式类的证明The Proof of Probability Inequality of Random Matrix[J]. 应用数学进展, 2021, 10(07): 2410-2418. https://doi.org/10.12677/AAM.2021.107253
https://doi.org/10.1007/978-1-4614-4544-9
https://doi.org/10.1007/978-0-8176-4948-7
https://doi.org/10.1007/978-1-4757-2545-2_3
https://doi.org/10.1109/TIT.2011.2104999
https://doi.org/10.1214/ECP.v15-1544
https://doi.org/10.1007/s10208-011-9099-z