Items of some interest…

These are my recent Pinboard.in links:

  • [1108.4135] Complex-Valued Autoencoders

    "Autoencoders are unsupervised machine learning circuits whose learning goal is to minimize a distortion measure between inputs and outputs. Linear autoencoders can be defined over any field and only real-valued linear autoencoder have been studied so far. Here we study complex-valued linear autoencoders where the components of the training vectors and adjustable matrices are defined over the complex field with the $L_2$ norm. We provide simpler and more general proofs that unify the real-valued and complex-valued cases, showing that in both cases the landscape of the error function is invariant under certain groups of transformations. The landscape has no local minima, a family of global minima associated with Principal Component Analysis, and many families of saddle points associated with orthogonal projections onto sub-space spanned by sub-optimal subsets of eigenvectors of the covariance matrix. The theory yields several iterative, convergent, learning algorithms, a clear understanding of the generalization properties of the trained autoencoders, and can equally be applied to the hetero-associative case when external targets are provided. Partial results on deep architecture as well as the differential geometry of autoencoders are also presented. The general framework described here is useful to classify autoencoders and identify general common properties that ought to be investigated for each class, illuminating some of the connections between information theory, unsupervised learning, clustering, Hebbian learning, and auto encoders."

    neural-networks machine-learning classification encoding algorithms nudge-targets

  • [1108.5685] Predicting flow reversals in chaotic natural convection using data assimilation

    "A simplified model of natural convection, similar to the Lorenz (1963) system, is compared to computational fluid dynamics simulations in order to test data assimilation methods and better understand the dynamics of convection. The thermosyphon is represented by a long time flow simulation, which serves as a reference "truth". Forecasts are then made using the Lorenz-like model and synchronized to noisy and limited observations of the truth using data assimilation. The resulting analysis is observed to infer dynamics absent from the model when using short assimilation windows.

    Furthermore, chaotic flow reversal occurrence and residency times in each rotational state are forecast using analysis data. Flow reversals have been successfully forecast in the related Lorenz system, as part of a perfect model experiment, but never in the presence of significant model error or unobserved variables. Finally, we provide new details concerning the fluid dynamical processes present in the thermosyphon during these flow reversals."

    chaos dynamical-systems experiment prediction numerical-methods algorithms nudge-targets

  • [1108.1320] Compressed Matrix Multiplication

    "Motivated by the problems of computing sample covariance matrices, and of transforming a collection of vectors to a basis where they are sparse, we present a simple algorithm that computes an approximation of the product of two n-by-n real matrices A and B.…"

    approximation algorithms nudge-targets

  • [1110.5296] Computing a Longest Common Palindromic Subsequence

    "The {em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. Palindrome is a word which reads the same forward as it does backward. The {em longest common palindromic subsequence (LCPS)} problem is an interesting variant of the classic LCS problem which finds the longest common subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we study the LCPS problem and give efficient algorithms to solve this problem. To the best of our knowledge, this is the first attempt to study and solve this interesting problem."

    combinatorics strings algorithms nudge-targets

Items of some interest…

These are my recent Pinboard.in links:

  • [1110.0585] Discriminately Decreasing Discriminability with Learned Image Filters

    "In machine learning and computer vision, input images are often filtered to increase data discriminability. In some situations, however, one may wish to purposely decrease discriminability of one classification task (a "distractor" task), while simultaneously preserving information relevant to another (the task-of-interest): For example, it may be important to mask the identity of persons contained in face images before submitting them to a crowdsourcing site (e.g., Mechanical Turk) when labeling them for certain facial attributes. Another example is inter-dataset generalization: when training on a dataset with a particular covariance structure among multiple attributes, it may be useful to suppress one attribute while preserving another so that a trained classifier does not learn spurious correlations between attributes. In this paper we present an algorithm that finds optimal filters to give high discriminability to one task while simultaneously giving low discriminability to a distractor task. We present results showing the effectiveness of the proposed technique on both simulated data and natural face images."

    machine-learning data-preparation filtering algorithms nudge-targets

  • [1110.5183] Diffusion of Information in Robot Swarms

    "This work is devoted to communication approaches, which spread information in robot swarms. These mechanisms are useful for large-scale systems and also for such cases when a limited communication equipment does not allow routing of information packages. We focus on two approaches such as virtual fields and epidemic algorithms, discuss several aspects of hardware implementation and demonstrate experiments performed with microrobots "Jasmine"."

    agent-based swarms communication complex-systems epidemiology dynamical-systems experiment

  • [1109.5229] Distributed Algorithms for Optimal Power Flow Problem

    "Optimal power flow (OPF) is an important problem for power generation and it is in general non-convex. With the employment of renewable energy, it will be desirable if OPF can be solved very efficiently so its solution can be used in real time. With some special network structure, e.g. trees, the problem has been shown to have a zero duality gap and the convex dual problem yields the optimal solution. In this paper, we propose a primal and a dual algorithm to coordinate the smaller subproblems decomposed from the convexified OPF. We can arrange the subproblems to be solved sequentially and cumulatively in a central node or solved in parallel in distributed nodes. We test the algorithms on IEEE radial distribution test feeders, some random tree-structured networks, and the IEEE transmission system benchmarks. Simulation results show that the computation time can be improved dramatically with our algorithms over the centralized approach of solving the problem without decomposition, especially in tree-structured problems. The computation time grows linearly with the problem size with the cumulative approach while the distributed one can have size-independent computation time."

    operations-research algorithms network-theory infrastructure composition nudge-targets

  • Coupon collector’s problem – Wikipedia, the free encyclopedia

    "In probability theory, the coupon collector's problem describes the "collect all coupons and win" contests. It asks the following question: Suppose that there are n coupons, from which coupons are being collected with replacement. What is the probability that more than t sample trials are needed to collect all n coupons? An alternative statement is: Given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once. The mathematical analysis of the problem reveals that the expected number of trials needed grows as Θ(nlog(n)). For example, when n = 50 it takes about 225 trials to collect all 50 coupons."

    probability-theory slic-representation nudge-targets

  • [1107.2379] Data Stability in Clustering: A Closer Look

    "This paper considers the model introduced by Bilu and Linial (2010), who study problems for which the optimal clustering does not change when the distances are perturbed by multiplicative factors. They show that even when a problem is NP-hard, it is sometimes possible to obtain polynomial-time algorithms for instances resilient to large perturbations, e.g. on the order of $O(sqrt{n})$ for max-cut clustering. Awasthi et al. (2010) extend this line of work by considering center-based objectives, and Balcan and Liang (2011) consider the $k$-median and min-sum objectives, giving efficient algorithms for instances resilient to certain constant multiplicative perturbations.

    Here, we are motivated by the question of to what extent these assumptions can be relaxed while allowing for efficient algorithms. We show there is little room to improve these results by giving NP-hardness lower bounds for both the $k$-median and min-sum objectives. On the other hand, we show that multiplicative resilience parameters, even only on the order of $Theta(1)$, can be so strong as to make the clustering problem trivial, and we exploit these assumptions to present a simple one pass streaming algorithm for the $k$-median objective. We also consider a model of additive perturbations and give a correspondence between additive and multiplicative notions of stability. Our results provide a close examination of the consequences of assuming, even constant, stability in data."

    clustering statistics algorithms robustness nudge-targets

  • [1109.5641] Energy-Aware Load Balancing in Content Delivery Networks

    "Internet-scale distributed systems such as content delivery networks (CDNs) operate hundreds of thousands of servers deployed in thousands of data center locations around the globe. Since the energy costs of operating such a large IT infrastructure are a significant fraction of the total operating costs, we argue for redesigning CDNs to incorporate energy optimizations as a first-order principle. We propose techniques to turn off CDN servers during periods of low load while seeking to balance three key design goals: maximize energy reduction, minimize the impact on client-perceived service availability (SLAs), and limit the frequency of on-off server transitions to reduce wear-and-tear and its impact on hardware reliability. We propose an optimal offline algorithm and an online algorithm to extract energy savings both at the level of local load balancing within a data center and global load balancing across data centers. We evaluate our algorithms using real production workload traces from a large commercial CDN. Our results show that it is possible to reduce the energy consumption of a CDN by more than 55% while ensuring a high level of availability that meets customer SLA requirements and incurring an average of one on-off transition per server per day. Further, we show that keeping even 10% of the servers as hot spares helps absorb load spikes due to global flash crowds with little impact on availability SLAs. Finally, we show that redistributing load across proximal data centers can enhance service availability significantly, but has only a modest impact on energy savings."

    algorithms load-balancing infrastructure nudge-targets operations-research

  • [1110.0264] Face Recognition using Optimal Representation Ensemble

    "Recently, the face recognizers based on linear representations have been shown to deliver state-of-the-art performance. In real-world applications, however, face images usually suffer from expressions, disguises and random occlusions. The problematic facial parts undermine the validity of the linear-subspace assumption and thus the recognition performance deteriorates significantly. In this work, we address the problem in a learning-inference-mixed fashion. By observing that the linear-subspace assumption is more reliable on certain face patches rather than on the holistic face, some Bayesian Patch Representations (BPRs) are randomly generated and interpreted according to the Bayes' theory. We then train an ensemble model over the patch-representations by minimizing the empirical risk w.r.t the "leave-one-out margins". The obtained model is termed Optimal Representation Ensemble (ORE), since it guarantees the optimality from the perspective of Empirical Risk Minimization. To handle the unknown patterns in test faces, a robust version of BPR is proposed by taking the non-face category into consideration. Equipped with the Robust-BPRs, the inference ability of ORE is increased dramatically and several record-breaking accuracies (99.9% on Yale-B and 99.5% on AR) and desirable efficiencies (below 20 ms per face in Matlab) are achieved. It also overwhelms other modular heuristics on the faces with random occlusions, extreme expressions and disguises. Furthermore, to accommodate immense BPRs sets, a boosting-like algorithm is also derived. The boosted model, a.k.a Boosted-ORE, obtains similar performance to its prototype. Besides the empirical superiorities, two desirable features of the proposed methods, namely, the training-determined model-selection and the data-weight-free boosting procedure, are also theoretically verified."

    image-analysis face-recognition algorithms nudge-targets

  • [1110.0957] Dictionary Learning for Deblurring and Digital Zoom

    "This paper proposes a novel approach to image deblurring and digital zooming using sparse local models of image appearance. These models, where small image patches are represented as linear combinations of a few elements drawn from some large set (dictionary) of candidates, have proven well adapted to several image restoration tasks. A key to their success has been to learn dictionaries adapted to the reconstruction of small image patches. In contrast, recent works have proposed instead to learn dictionaries which are not only adapted to data reconstruction, but also tuned for a specific task. We introduce here such an approach to deblurring and digital zoom, using pairs of blurry/sharp (or low-/high-resolution) images for training, as well as an effective stochastic gradient algorithm for solving the corresponding optimization task. Although this learning problem is not convex, once the dictionaries have been learned, the sharp/high-resolution image can be recovered via convex optimization at test time. Experiments with synthetic and real data demonstrate the effectiveness of the proposed approach, leading to state-of-the-art performance for non-blind image deblurring and digital zoom."

    image-processing image-analysis algorithms nudge-targets hyperresolution

  • [1110.0477] Distributed Evolutionary Graph Partitioning

    "We present a novel distributed evolutionary algorithm, KaFFPaE, to solve the Graph Partitioning Problem, which makes use of KaFFPa (Karlsruhe Fast Flow Partitioner). The use of our multilevel graph partitioner KaFFPa provides new effective crossover and mutation operators. By combining these with a scalable communication protocol we obtain a system that is able to improve the best known partitioning results for many inputs in a very short amount of time. For example, in Walshaw's well known benchmark tables we are able to improve or recompute 76% of entries for the tables with 1%, 3% and 5% imbalance."

    algorithms graph-theory evolutionary-algorithms nudge-targets

  • [1104.1605] Efficient Top-K Retrieval in Online Social Tagging Networks

    "We consider in this paper top-k query answering in social tagging systems, also known as folksonomies. This problem requires a significant departure from existing, socially agnostic techniques. In a network-aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose an algorithm that has the potential to scale to current applications. While the problem has already been considered in previous literature, this was done either under strong simplifying assumptions or under choices that cannot scale to even moderate-size real world applications. We first consider a key aspect of the problem, which is accessing the closest or most relevant users for a given seeker. We describe how this can be done on the fly (without any pre-computations) for several possible choices – arguably the most natural ones – of proximity computation in a user network. Based on this, our top-k algorithm is sound and complete, while addressing the scalability issues of the existing ones. Importantly, our technique is instance optimal in the case when the search relies exclusively on the social weight of tagging actions. To further reduce response times, we then consider directions for efficiency by approximation. Extensive experiments on real world data show that our techniques can drastically improve the response time, without sacrificing precision."

    folksonomy tagging network-theory search-algorithms nudge-targets data-access

Items of some interest…

These are my recent Pinboard.in links:

  • Hebrew Typography

    beautiful lettering

    typography hebrew graphic-design calligraphy lettering

  • [1110.5376] A Quantitative Test of Population Genetics Using Spatio-Genetic Patterns in Bacterial Colonies

    "It is widely accepted that population genetics theory is the cornerstone of evolutionary analyses. Empirical tests of the theory, however, are challenging because of the complex relationships between space, dispersal, and evolution. Critically, we lack quantitative validation of the spatial models of population genetics. Here we combine analytics, on and off-lattice simulations, and experiments with bacteria to perform quantitative tests of the theory. We study two bacterial species, the gut microbe Escherichia coli and the opportunistic pathogen Pseudomonas aeruginosa, and show that spatio-genetic patterns in colony biofilms of both species are accurately described by an extension of the one-dimensional stepping-stone model. We use one empirical measure, genetic diversity at the colony periphery, to parameterize our models and show that we can then accurately predict another key variable: the degree of short-range cell migration along an edge. Moreover, the model allows us to estimate other key parameters including effective population size (density) at the expansion frontier. While our experimental system is a simplification of natural microbial community, we argue it is a proof of principle that the spatial models of population genetics can quantitatively capture organismal evolution."

    bacterial-genetics evolution microbiology experiment cute

  • NDFD Database Contents

    "You can access NDFD elements via file transfer protocol (ftp), http, eXtensible Markup Language (XML), or web browser. Links to the data, supporting information and software are listed below:…"

    weather data raw-data-now government2.0 nudge-targets reference forecasts

  • The Performativity of Networks – Kieran Healy

    "The “performativity thesis” is the claim that parts of contemporary economics and finance, when carried out into the world by professionals and popularizers, reformat and reorganize the phenomena they purport to describe, in ways that bring the world into line with theory. Practical technologies, calculative devices and portable algorithms give actors tools to implement particular models of action. I argue that social network analysis is performative in the same sense as the cases studied in this literature. Social network analysis and finance theory are similar in key aspects of their development and effects. For the case of economics, evidence for weaker versions of the performativity thesis in quite good, and the strong formulation is circumstantially supported. Network theory easily meets the evidential threshold for the weaker versions; I offer empirical examples that support the strong (or “Barnesian”) formulation. Whether these parallels are a mark in favor of the thesis or a strike against it is an open question. I argue that the social network technologies and models now being “performed” build out systems of generalized reciprocity, connectivity, and commons-based production. This is in contrast both to an earlier network imagery that emphasized self-interest and entrepreneurial exploitation of structural opportunities, and to the model of action typically considered to be performed by economic technologies."

    network-theory network-culture economics cultural-dynamics theory-and-practice-sitting-in-a-tree

Items of some interest…

These are my recent Pinboard.in links:

Items of some interest…

These are my recent Pinboard.in links:

  • Vaguery/collegeJournalOfMedicalScienceSeptember1857 – GitHub

    "I've scanned pages of the September 1857 issue of The College Journal of Medical Science, OCRed these to html (to preserve some formatting) with ABBYY FineReader 9, and am converting those html files into LaTeX files.

    The latest PDF version (as constructed on my computer) can be downloaded (via "view raw") from http://github.com/Vaguery/collegeJournalOfMedicalScienceSeptember1857/blob/master/work.pdf

    A collection of scripts and checklists is coming out of this: scripts to do the heavy lifting translating self-contained HTML to TeX intended to be strung together into a single work, and checklists of small proofreading and hand-formatting tasks that need to be completed on each page.

    The individual page TeX files are stitched together and typeset in ./work.tex, using XeLaTeX. Be sure to check the font assignments; I'm using purchased postscript fonts I own."

    project typesetting digitization experiment proofreading