Skip to content

Flexjoin

Project Category: Entrepreneurial

If you have any questions, please join our presentation on April 13th

About our project

What is the problem?

Modern software is trending towards a microservice architecture, which causes companies to spread and separate their data across various data sources. However, this separated data becomes a problem when there are clear reasons why it should be connected together.

For example, Tesla was storing data of what cars a customer owns in one service, and information on if they own solar panels in another. This separation of data created a problem as it was hard to get the full view on the asset, which is the customer here. One great use-case for having this full view would be the increased ability to up-sell the customer when there is a clear view of what they own.

Overall, the initial motivation for Flexjoin comes from the lack of developer solutions to easily combine data sources as described in the example with Tesla. Therefore, we have designed Flexjoin to be a Software as a Service (SaaS) solution to stream data from these separated data sources by creating modifiable relationships that connect together to form something we call a Data Model.

This Data Model can be anything, it could represent customers for a business, or it could even represent assets for companies like an oil rig. We showcase these models and their connections in a graph database as it can dynamically show relations between data. This finalized data can then be analyzed for whatever use-case, such as up-selling.

What is our solution

Purpose of Flexjoin

Entrepreneurial Aspects of Flexjoin

Our team has developed Flexjoin to have the following main features which allow it to differentiate itself from competitors:

  • Dynamic Data Models: Flexjoin allows customers to update, modify and delete relationships on their data model after creation.
  • Multiple Data source types: Currently supports multiple SQL type databases (MySql, Postgres, SqlServer). Flexjoin aims to continue support and add ability to combine data from NoSQL databases like MongoDB and other sources of data like Excel.
  • Real Time Data Streaming: Any changes in the source database is streamed in real-time to the data-model. Therefore the values on the data model are always up-to date.
  • Lower Latency: One query to our model, will give a result that already has combined data. This is very low latency as it is only one call to one service. For current solutions, to connect data together a user would have to do a service call to all disconnected data sources, which has higher latency.

Unique Value Proposition

Competitors

  • In-House Solutions: This is the approach Tesla went with when they needed to connect data. For this approach a company will self develop solutions to connect data. However, the development time for one single relationship or connection is way too long (months to incorporate changes). Flexjoin allows data to be connected much easier and quicker.
  • Power-Bi and Tableau: Both Power-Bi and Tableau are more representative of only the analytics side of the data. They lack the real-time processing of actually being able to take data from varying data sources and connecting it together, which is what Flexjoin offers.
  • Pentaho: Pentaho is the closest current solution to what we offer. However, the relationships between data that Pentaho creates is only a snapshot of data at that moment. Flexjoin creates data pipelines that run indefinitely, which keeps the Data Model always up to date in real-time.

Here are a few use-cases where customers can use the resulting combined data that is created by Flexjoin:

  • Upselling: Like in the Tesla example, we see that connecting data can allow companies to connect what assets are owned by their customers. This can help them analyze loyal buyers as first contact points for new products they are launching.
  • Sentiment Analysis: If a company stores the reviews that customers leave on their products, they can connect all the reviews of customer to a real person and run sentiment analysis on the reviews. This can help a business realize what type of products they should/shouldn’t advertise to that customer as well as help them analyze what they should or shouldn’t sell.
  • Trend Analysis: By using a stock as the basis for a data model, we could connect various data sources related to historical stock information and analyze it with machine learning to predict future trends.

These are just a few use-cases of connected data. Overall, connecting data together reveals context that can be analyzed by businesses. And Flexjoin allows customers a fast path to look at the connected data!

Use Case(s) of Combined Data

Pricing and Future Development

Our solution is a Software as a Service, which when released, would be offered in separate tiers. Currently only one tier has been determined. This is priced at $1500 per month, for 3 data models with a maximum of 5 connections on each.

In terms of future development, the team will aim to initially apply for funding through startup seeding organizations (after capstone fair). If we receive anything substantial we will start to take project seriously as a possible full-time commitment. As of right now, we plan to continue to developing Flexjoin part-time.


Meet our team members

Muhammad Qasim

Project Manager, Lead Backend Developer

 

Jimmy Truong

Infrastructure and Full-Stack Developer

 

Daniel Guieb

Full-Stack Developer

 

Trevor Le

Lead Frontend Developer

 

Details about our design

The following section will give a more technical view of Flexjoin in comparison to the previous sections. The aim for previous sections was to cater to all viewers, regardless of experience. We hope the next section can answer any technicalities. Remember to join the zoom room if you have any questions!

Flexjoin allows developers to access a forward facing user-interface where they can define their separated data sources and then use those to start creating a relationship for their model. Below we have provided the overall architecture for Flexjoin and the various moving parts that make it work. The front-end portion of the project is indicated by the number 1 and 2, whilst the majority of the project is in the back-end which is labelled as 3, 4 and 5.

Flexjoin Architecture

How our design addresses practical issues

Modern software is moving towards the use of microservices, which results in data becoming separated. This is because microservices often use a database-per-service approach, so every service has their own separate database.

However, their are various reasons to combine this data as discussed in the first section. Furthermore there are competing solutions to solve this problem, but Flexjoin aims to fix the following practical issues with other solutions.

  • Real-Time Streaming
  • Lower Latency
  • Dynamic Relationships
  • Multiple Data Source Types

These were all discussed in the unique value proposition section, so we will not reiterate the same details.

Flexjoin combines industry standard frameworks with growing technologies for it’s innovative solution.

The frontend application is made with React to create a user-interface that developers can use to create their data models. Overall, the frontend is used to define/update where the models get their data, and how it is connected.

The backend application is the main component of Flexjoin, and is where our team used Kafka, Java Spring and Neo4J together in a completely innovative way. We use Java Spring as our microservice basis to connect Kafka and Neo4J together.

Kafka is a robust and fail-tolerant data streaming service that we use to capture and stream data from a data source indefinitely. The streamed data will capture events on the source database, which is used by Flexjoin to keep the data models up-to-date in real-time (instead snapshots).

The finalized data is all pushed to Neo4J, which is a graph database. The use of a graph database allows Flexjoin to be extremely dynamic, as the relationships between data from separate sources can be changed and updated by using our user-interface.

What makes our design innovative

What makes our design solution effective

  • Modern Architecture: Our architecture separates the main services it overs, and as a result we have created three microservices in the backend. This allows us to siphon off our calls between services to ensure they are not overburdened.
  • Innovation with growing technologies: Kafka and Neo4J are usually not used together, in a way that is so dynamic. A Spring service is used to connect these technologies, allowing us to stream data from different sources, and elegantly connect and display the streamed data in Neo4J. Both of these technologies are very new, and we have shown an effective way to use them together.
  • Focus on infrastructure, performance and security: Our team created Flexjoin to completely work on AWS so that it could easily be deployed/re-deployed when needed. Furthermore, we have set up our services to work with Datadog so that we can monitor our back end calls, and detect performance-related issues.

Our team consistently met with industry representatives, technical advisors, and teaching assistants to ensure that our solution was realistic and valid for industry usage. Our technical goal for Flexjoin was to demonstrate that we could connect data from different services, and that it could be streamed and modified in real-time. We were able to achieve this goal as shown in the demo and How to use Flexjoin section. This simple validation showcases all of our market defining feature set, and would be the basis to further develop and pursue Flexjoin.

In terms of business-related validation, we consulted large and small business representatives and learned that both business types had use cases for data spread. We also learned that data spread can easily occur in businesses without microservices as modern day data ingestion methods constantly stores separated data. These business representatives elaborated that the solution we pitched would definitely we useful and a possible investment which validates our solution as a business venture.

How we validated our Design Solution

Feasibility of our Design Solution

We believe that the positive industry feedback as well as the actual showcase of Flexjoin working demonstrates the feasibility of the project. The current iteration itself serves as a proof-of-concept that the idea of connecting data from different sources can be done. Therefore, we believe that the design is absolutely feasible as it was more about proving that this could be done technically, now it would be more about scalability and security.

How to use Flexjoin

Open to view step-by-step process of creating a Data Model and associated pipelines using Flexjoin

Step 1: Define and Create Data Sources

A customer for our service would initially define their data sources. These are the locations where the data we want to connect resides. The data/information for these would be known by a database administrator or software engineer for an organization.

As of right now we support the usage of MS SQL Server, PostgresSQL and MySQL. Future iterations would also include Excel and MongoDB.

Multiple Data sources a customer has defined

Step 2: Create Data Model

This step is the most complicated one and requires background knowledge on databases. Overall, to create a data model a customer has to go through the following steps:

  1. Data Sources: In this step we will define the data sources that a customer wants to connect. For this example we are connecting the CarService with the Users MySQL data sources. These data sources will have tables inside of them, which is what we are actually connecting together.
  2. Foreign Keys: In order to connect data sources, we need to define what the similarity between them is i.e. what columns between tables are the same. The color of these columns in our UI is purple, as you can see from the second photograph, we have connected the vehicleowners, credential and users tables together with the similar columns of Name and Owner.
  3. Columns: In addition to the foreign key defined columns, we also define any extra columns that may interest us for our data model, i.e. the Model of the car that a customer owns. These chosen columns are shown by the blue and beige columns.
  4. Primary Table: In this step, we must select one table as the basis of the data model, or what every other table is connecting too. For example, in this case our users table would be the basis of our model as it defines the unique users or assets. This is needed as as all added relationships will connect to this primary table. The second picture shows this table with a blue highlight as well as a (Primary Table) tag.
  5. Relationships: Once a primary or basis table is established, we must define how other tables are related to it. Are they one-to-one or one-to-many connections. For this example, the users table has a one-to-one connection with credentials as all users will only have one credential like a username and password. However, users table also has a one-to-many relationship with vehicleowners, as a user can have multiple vehicles that they own. This step may seem confusing at first, so please join the zoom call if you have any questions on it.
  6. Confirmation: Now we confirm the data model, and submit it.

Create Data Model Screen, Has multiple steps
Finalized Model

Step 3: View and Use Connected Model

The picture on the right showcases what the connected model would look like in Neo4J after the model has been created. A few things to note:

  • Any time there is a change in the source tables that the data model represents, it will be persisted into the Neo4J database as well. We have shown this in the video demo that was created.
  • Each node or circle on the viewer, represents a row from the table that we have connected. If we were to hover on a node, it would show all of the columns that we highlighted when we created the model.
Visualized Data-Model in Neo4J Viewer

Step 4: Modify and Update Model

Once a model has been created, we can modify it in the following ways:

  • Delete Model: If model is not needed, we can delete the whole model itself, which will delete all the pipelines.
  • Modify existing relationship: We are able to modify any relationship that already exists by deleting it or adding/removing columns that we want in our final mode.
  • Add new relationship: If there is a new table in an existing data source, or new data source that we want to add, we are able to easily add it onto our model. This is shown in the video demo.
Overall view of model, can be modified as needed

Overall, those are all the steps needed to create a data model using Flexjoin!


Partners and Mentors

Dr. Ann Barcomb

Technical Advisor

Dr. Roberto Medeiros de Souza

Technical Advisor

Aaakash Bhatt

Technical Teaching Assistant

Jose Menjivar Hernandez

Business Teaching Assistant

Our photo gallery

AWS Infrastructure, currently runs two of our services (Kafka is very heavy so not done)
Frontend Application Context