Projects:graph openstack
Contents
项目名称
用Graph的方法检测Openstack故障问题
项目介绍
It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions. A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states. We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph. With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine. Also, using graph-based anomaly detection, we can automatically discover hidden problems in OpenStack. We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.
参与人员
Yong Xiang
Wei Xu
相关资料
项目进展
2014年11月15日
项目开始
2016年8月5日
In Proceedings of ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'16) [BEST PAPER AWARD] Hong Kong, China, 2016
相关文献
[1] https://www.openstack.org/
[4] http://www.linux-kvm.org/page/Main Page [5] http://openvswitch.org/
[6] http://ceph.com/
[7] Xu W, Huang L, Fox A, et al. “Detecting large-scale system problems by mining console logs[C].” Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009: 117-132.
[8] Yuan, Ding, et al. “SherLog: error diagnosis by connecting clues from run-time logs.” ACM SIGARCH computer architec- ture news. Vol. 38. No. 1. ACM, 2010.
[9] Ranking I A A. “Finding Patterns in Static Analysis Alerts[J]”. 2014.
[10] http://neo4j.com/
[11] Gonzalez, Joseph E., et al. “Graphx: Graph processing in a distributed dataflow framework.” 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.
[12] Turner D, Levchenko K, Snoeren A C, et al. “California fault lines: understanding the causes and impact of network failures[J]”. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 315-326.
[13] Mace, Jonathan, Ryan Roelke, and Rodrigo Fonseca. “Pivot tracing: dynamic causal monitoring for distributed systems.” Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 2015.