These are my recent Pinboard.in links:
-
Far better, if possible, to avoid direct confrontation and find ways to pursue infinite game play on the margins or edges of finite game institutions or in the white spaces not yet occupied by finite game institutions. By drawing attention to horizons that have not yet been explored and demonstrating the ability to make progress in drawing out more potential and possibility, infinite game players have a greater chance of shifting the game and attracting other players. By building parallel institutions and practices that pull others into their game, infinite game players can attract enough critical mass so that they can pursue their quests with lower risk of intervention from the finite game players who view such actions as deeply subversive. At our research center, JSB and I are now exploring these kinds of approaches as a way of achieving organizational change within large institutions.
-
[1107.0056] Fixed parameter algorithms for restricted coloring problems
In this paper, we obtain polynomial time algorithms to determine the acyclic chromatic number, the star chromatic number, the Thue chromatic number, the harmonious chromatic number and the clique chromatic number of $P_4$-tidy graphs and $(q,q-4)$-graphs, for every fixed $q$. These classes include cographs, $P_4$-sparse and $P_4$-lite graphs. All these coloring problems are known to be NP-hard for general graphs. These algorithms are fixed parameter tractable on the parameter $q(G)$, which is the minimum $q$ such that $G$ is a $(q,q-4)$-graph. We also prove that every connected $(q,q-4)$-graph with at least $q$ vertices is 2-clique-colorable and that every acyclic coloring of a cograph is also nonrepetitive.
-
Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuries, for which the following network measurements were obtained: clustering coefficient, average shortest path lengths, and betweenness. We found that the two factors with stronger dependency on the authors were the skewness in the distribution of word intermittency and the average shortest paths. Other factors such as the betweeness and the Zipf's law exponent show only weak dependency on authorship. Also assessed was the contribution from each measurement to authorship recognition using three machine learning methods. The best performance was a ca. 65 % accuracy upon combining complex network and intermittency features with the nearest neighbor algorithm. From a detailed analysis of the interdependence of the various metrics it is concluded that the methods used here are complementary for providing short- and long-scale perspectives of texts, which are useful for applications such as identification of topical words and information retrieval.
natural-language-processing document-clustering clustering feature-selection algorithms nudge-targets
-
[1108.1170] Convex Optimization without Projection Steps
For the general problem of minimizing a convex function over a compact convex domain, we will investigate a simple iterative approximation algorithm based on the method by Frank & Wolfe 1956, that does not need projection steps in order to stay inside the optimization domain. Instead of a projection step, the linearized problem defined by a current subgradient is solved, which gives a step direction that will naturally stay in the domain. Our framework generalizes the sparse greedy algorithm of Frank & Wolfe and its primal-dual analysis by Clarkson 2010 (and the low-rank SDP approach by Hazan 2008) to arbitrary convex domains. We give a convergence proof guaranteeing {epsilon}-small duality gap after O(1/{epsilon}) iterations.
The method allows us to understand the sparsity of approximate solutions for any l1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality. We obtain matching upper and lower bounds of {Theta}(1/{epsilon}) for the sparsity for l1-problems. The same bounds apply to low-rank semidefinite optimization with bounded trace, showing that rank O(1/{epsilon}) is best possible here as well. As another application, we obtain sparse matrices of O(1/{epsilon}) non-zero entries as {epsilon}-approximate solutions when optimizing any convex function over a class of diagonally dominant symmetric matrices.
We show that our proposed first-order method also applies to nuclear norm and max-norm matrix optimization problems. For nuclear norm regularized optimization, such as matrix completion and low-rank recovery, we demonstrate the practical efficiency and scalability of our algorithm for large matrix problems, as e.g. the Netflix dataset. For general convex optimization over bounded matrix max-norm, our algorithm is the first with a convergence guarantee, to the best of our knowledge.
-
[1112.6235] Detecting a Vector Based on Linear Measurements
We consider a situation where the state of a system is represented by a real-valued vector. Under normal circumstances, the vector is zero, while an event manifests as non-zero entries in this vector, possibly few. Our interest is in the design of algorithms that can reliably detect events (i.e., test whether the vector is zero or not) with the least amount of information. We place ourselves in a situation, now common in the signal processing literature, where information about the vector comes in the form of noisy linear measurements. We derive information bounds in an active learning setup and exhibit some simple near-optimal algorithms. In particular, our results show that the task of detection within this setting is at once much easier, simpler and different than the tasks of estimation and support recovery.
-
[1109.2215] Finding missing edges and communities in incomplete networks
Many algorithms have been proposed for predicting missing edges in networks, but they do not usually take account of which edges are missing. We focus on networks which have missing edges of the form that is likely to occur in real networks, and compare algorithms that find these missing edges. We also investigate the effect of this kind of missing data on community detection algorithms.
network-theory algorithms inference statistics nudge-targets
-
[1010.4735] Exploring the Energy Landscapes of Protein Folding Simulations with Bayesian Computation
Nested sampling is a Bayesian sampling technique developed to explore probability distributions lo- calised in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algo- rithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post-processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering (replica exchange). In this paper we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Go-type force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins which are commonly used for testing protein folding procedures: protein G, the SH3 domain of Src tyrosine kinase and chymotrypsin inhibitor 2. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used.
structural-biology biochemistry modeling algorithms statistics metamodeling
-
[1109.2618] Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schr"odinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ~10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.
machine-learning learning-from-data biochemistry computational-science nudge-targets
-
[1101.2135] Bounded confidence model: addressed information maintain diversity of opinions
A community of agents is subject to a stream of messages, which are represented as points on a plane of issues. Messages are sent by media and by agents themselves. Messages from media shape the public opinion. They are unbiased, i.e. positive and negative opinions on a given issue appear with equal frequencies. In our previous work, the only criterion to receive a message by an agent is if the distance between this message and the ones received earlier does not exceed the given value of the tolerance parameter. Here we introduce a possibility to address a message to a given neighbour. We show that this option reduces the unanimity effect, what improves the collective performance.
agent-based communication network-theory machine-learning diversity