2010: Watched a failed iPad app startup trying to build a first‑gen news‑reading app; the founders were cloud architects—saw the rise of mobile but didn’t see ML’s power.
2012: Learning deep learning. Built the first GPU machine at BUPT and the MSRA ML group; ported Theano to Windows; mined some bitcoins/litecoins with those GPUs—later felt blockchain was a scam.
2013–2016: My supervisor, Dale Schuurmans, is the best. He never asked me to do anything—just funded whatever I wanted to do. Deeply influenced my management style. When Facebook tore my offer after I lost the H‑1B lottery twice, he gave me more scholarship so I could stay.
2015: After Facebook refused to sponsor my O‑1 visa (I didn’t have 50 citations), I started paying attention to research & publication. I created an architecture identical to SqueezeNet. Submitted to ICONIP then withdrew it—thought the work was trivial and the conference quality was too low. Later it was a chapter in my master’s thesis.
2016: In parallel, created fast style transfer; helped Dato get acquired by Apple.
2019: In parallel, created EfficientNet‑like ideas; a few weeks after I posted internally, Google published the systematic paper.
2019: Built a “GEMM Net” (GEMM instead of convolution for CV). But I was brain‑damaged enough to use soft attention rather than self‑attention.
2019: Advocating “The Bitter Lesson” at Facebook.
2023: Raising funds for HippoML created forever PTSD for me with VCs—especially with blonde hair partners.