Difference between revisions of "Projects:graph openstack"

From IIIS-Systems
Jump to: navigation, search
(Created page with "= 项目名称 = 用Graph的方法检测Openstack故障问题 = 项目介绍 = It is hard to operate and debug systems like OpenStack that integrate many independently develo...")
 
 
(5 intermediate revisions by the same user not shown)
Line 4: Line 4:
 
= 项目介绍 =
 
= 项目介绍 =
 
It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions.  A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states.  We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph.  With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine.  Also, using graph-based anomaly detection, we can automatically discover hidden problems in OpenStack.  We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.
 
It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions.  A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states.  We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph.  With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine.  Also, using graph-based anomaly detection, we can automatically discover hidden problems in OpenStack.  We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.
 
  
 
= 参与人员 =
 
= 参与人员 =
Line 14: Line 13:
 
= 相关资料 =
 
= 相关资料 =
  
 +
[[File:xy1.jpg | 800px]]
  
[[File:XY.pdf | 600px ]]
+
[[File:XY.pdf]]
  
 
= 项目进展 =
 
= 项目进展 =
Line 24: Line 24:
  
 
In Proceedings of ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'16) [BEST PAPER AWARD] Hong Kong, China, 2016
 
In Proceedings of ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'16) [BEST PAPER AWARD] Hong Kong, China, 2016
 
  
 
= 相关文献 =
 
= 相关文献 =
  
 
[1] https://www.openstack.org/
 
[1] https://www.openstack.org/
[2] https://www.rabbitmq.com/
+
 
[3] https://libvirt.org/
+
[2] https://www.rabbitmq.com/  
[4] http://www.linux-kvm.org/page/Main Page [5] http://openvswitch.org/
+
 
 +
[3] https://libvirt.org/  
 +
 
 +
[4] http://www.linux-kvm.org/page/Main Page [5] http://openvswitch.org/  
 +
 
 
[6] http://ceph.com/
 
[6] http://ceph.com/
 +
 
[7] Xu W, Huang L, Fox A, et al. “Detecting large-scale system problems by mining console logs[C].” Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009: 117-132.
 
[7] Xu W, Huang L, Fox A, et al. “Detecting large-scale system problems by mining console logs[C].” Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009: 117-132.
 +
 
[8] Yuan, Ding, et al. “SherLog: error diagnosis by connecting clues from run-time logs.” ACM SIGARCH computer architec- ture news. Vol. 38. No. 1. ACM, 2010.
 
[8] Yuan, Ding, et al. “SherLog: error diagnosis by connecting clues from run-time logs.” ACM SIGARCH computer architec- ture news. Vol. 38. No. 1. ACM, 2010.
 +
 
[9] Ranking I A A. “Finding Patterns in Static Analysis Alerts[J]”. 2014.
 
[9] Ranking I A A. “Finding Patterns in Static Analysis Alerts[J]”. 2014.
 +
 
[10] http://neo4j.com/
 
[10] http://neo4j.com/
 +
 
[11] Gonzalez, Joseph E., et al. “Graphx: Graph processing in a distributed dataflow framework.” 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.
 
[11] Gonzalez, Joseph E., et al. “Graphx: Graph processing in a distributed dataflow framework.” 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.
 +
 
[12] Turner D, Levchenko K, Snoeren A C, et al. “California fault lines: understanding the causes and impact of network failures[J]”. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 315-326.
 
[12] Turner D, Levchenko K, Snoeren A C, et al. “California fault lines: understanding the causes and impact of network failures[J]”. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 315-326.
 +
 
[13] Mace, Jonathan, Ryan Roelke, and Rodrigo Fonseca. “Pivot tracing: dynamic causal monitoring for distributed systems.” Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 2015.
 
[13] Mace, Jonathan, Ryan Roelke, and Rodrigo Fonseca. “Pivot tracing: dynamic causal monitoring for distributed systems.” Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 2015.
[14] Sambasivan R R, Zheng A X, De Rosa M, et al. “Diagnosing Performance Changes by Comparing Request Flows.”.” NSDI. 2011.
 
[15] Wang C, Viswanathan K, Choudur L, et al. “Statistical tech- niques for online anomaly detection in data centers.” Integrated Network Management (IM), 2011 IFIP/IEEE International Sym- posium on. IEEE, 2011: 385-392.
 
[16] Solaimani M, Iftekhar M, Khan L, et al. “Statistical technique for online anomaly detection using spark over heterogeneous data from multi-source VMware performance data.” Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014: 1086-1094.
 
[17] Id T, Kashima H. “Eigenspace-based anomaly detection in computer systems[C].” Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2004: 440-449.
 
[18] Zhao Y, Zheng Z, Wen H. “Bayesian statistical inference in machine learning anomaly detection.” Communications and Intelligence Information Security (ICCIIS), 2010 International Conference on. IEEE, 2010: 113-116.
 
[19] Dasgupta D, Majumdar N S. “Anomaly detection in multidi- mensional data using negative selection algorithm.” wcci. IEEE, 2002: 1039-1044.
 
[20] Sun, Yizhou, et al. “Pathsim: Meta path-based top-k similarity search in heterogeneous information networks.” VLDB11 (2011).
 
[21] Knorr E M, Ng R T, Tucakov V. “Distance-based outliers: algorithms and applications[J].” The VLDB Journal The International Journal on Very Large Data Bases, 2000, 8(3-4): 237-253.
 
[22] Xie M, Han S, Tian B. “Highly efficient distance-based anomaly detection through univariate with PCA in wireless sensor networks.” Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 2011: 564-571.
 
[23] Khoa N L D, Babaie T, Chawla S, et al. “Network anomaly detection using a commute distance based approach[C].” Data Mining Workshops (ICDMW), 2010 IEEE International Conference on. IEEE, 2010: 943-950.
 
[24] Noble, Caleb C., and Diane J. Cook. “Graph-based anomaly detection.” Proceedings of the ninth ACM SIGKDD interna- tional conference on Knowledge discovery and data mining. ACM, 2003.
 

Latest revision as of 16:35, 24 November 2016

项目名称

用Graph的方法检测Openstack故障问题

项目介绍

It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions. A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states. We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph. With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine. Also, using graph-based anomaly detection, we can automatically discover hidden problems in OpenStack. We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.

参与人员

Yong Xiang

Wei Xu

相关资料

Xy1.jpg

File:XY.pdf

项目进展

2014年11月15日

项目开始

2016年8月5日

In Proceedings of ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'16) [BEST PAPER AWARD] Hong Kong, China, 2016

相关文献

[1] https://www.openstack.org/

[2] https://www.rabbitmq.com/

[3] https://libvirt.org/

[4] http://www.linux-kvm.org/page/Main Page [5] http://openvswitch.org/

[6] http://ceph.com/

[7] Xu W, Huang L, Fox A, et al. “Detecting large-scale system problems by mining console logs[C].” Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009: 117-132.

[8] Yuan, Ding, et al. “SherLog: error diagnosis by connecting clues from run-time logs.” ACM SIGARCH computer architec- ture news. Vol. 38. No. 1. ACM, 2010.

[9] Ranking I A A. “Finding Patterns in Static Analysis Alerts[J]”. 2014.

[10] http://neo4j.com/

[11] Gonzalez, Joseph E., et al. “Graphx: Graph processing in a distributed dataflow framework.” 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.

[12] Turner D, Levchenko K, Snoeren A C, et al. “California fault lines: understanding the causes and impact of network failures[J]”. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 315-326.

[13] Mace, Jonathan, Ryan Roelke, and Rodrigo Fonseca. “Pivot tracing: dynamic causal monitoring for distributed systems.” Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 2015.