Workspace overview

A Flow Workspace is a collection of schemas, API specs and Taxi projects that describe data sources (APIs, Databases, Message queues and Serverless functions) and provide a description of the data and capabilities they provide.

These descriptions are used to generate integration on the fly, and power your data catalog.

Data sources are described using some form of API spec language (Protobuf, OpenAPI, Avro, Taxi), along with additional metadata that defines how concepts are related semantically.

There are lots of different ways to tell Flow about your data sources. We believe that Flow should fit with the way you work today.

Components

Generally, there are at least the following components:

  • A taxonomy project (built using Taxi) that defines the terms embedded in your API contracts. This is most commonly stored in Git, or locally on your machine (when you’re first getting started).

  • A series of API specs, enriched with metadata using terms from your taxonomy project.

publish from sources flow2

Your core taxonomy

The core taxonomy is a collection of terms that describe your data attributes. Think of them as simple tags that describe your data - such as FirstName, LastName, etc.

In practice, these tags are actually semantic types, defined using Taxi.

type FirstName inherits String
type LastName inherits String
type CustomerId inherits Int

By using Taxi, you get the benefit of a full type system, rich build tooling to verify correctness, and an open source ecosystem to build your own plugins.

A common pattern for building a core taxonomy is:

  • It is created as a standalone project, independent of any specific API / Service

  • It lives in a Git repository

  • Flow monitors the Git repository and automatically deploys changes

Services and data sources

Services and data sources are described using existing API specs (OpenAPI / Protobuf / JSON Schema, etc), enriched with semantic tags from your core taxonomy.

  • ShoppingCart OpenAPI Spec

  • Example.proto

# An extract of the ShoppingCartApi OpenAPI spec:
components:
  schemas:
    ShoppingCart:
      properties:
        custId:         # Field name
          x-taxi-type:  # <-- Taxi extension
          name: CustomerId  # <-- Semantic type
          type: string
import "org/taxilang/dataType.proto";

message Customer {
   optional string customer_name = 1 [(taxi.dataType)="CustomerName"];
   > optional int32 customer_id = 2 [(taxi.dataType)="CustomerId"];
}

For more information, see the topics on using OpenAPI or Protobuf to describe your data sources.

Use Taxi

Another option is authoring your API specs directly in Taxi:

model Customer {
  id : CustomerId inherits Int
  firstName : CustomerFirstName inherits String
}

service CustomerService {
  @HttpOperation(method = "GET", url = "/shoppingCart/{CustomerId}")
  operation getShoppingCartByCustomerId(CustomerId):ShoppingCart
}

Alternatively, you can just generate schema definitions directly from code, or author your definitions in Taxi.

Once you’ve chosen how to describe your services, next publish your specs to Flow.

Publish and update your specs

How and when you choose to publish / update these sources is up to you - teams tend to have different preferences around this.

You’re also free to mix and match. Just pick the publication method that best works for your tech and your team’s working style.

The following publication methods are available:

  • Pushing API specs to Flow (e.g., when your microservices start)

  • Pushing updates to Flow manually (e.g., within a CI/CD pipeline)

  • Having Flow poll sources for changes

Push API specs directly from your apps

Applications can generate Taxi, direct from code, and publish to Flow.

This suits microservices and teams who prefer generating their API specs from code.

Pull API specs from Flow

Flow can be configured to fetch API specs (projects) from either Git repositories or the file system where Flow is running (intended for local development).

In addition to storing your core taxonomy in Git, you can fetch other API specs (such as Protobuf or OpenApi) from a Git repository.

Flow will poll your Git repo, and automatically update as changes are merged into your target branch.

For more information, read about pulling API specs from Git or reading API specs from local disk.

Workspace.conf file

The workspace.conf file is a HOCON file that describes all the locations to pull code from. (Data sources that are pushing their API specs are not included.)

Schemas can be pulled from multiple different formats and approaches. The configuration for these repositories is defined in a HOCON format file.

Pass a workspace.conf file

By default, the configuration file is called workspace.conf. However, the location of the file can be changed by setting --flow.workspace.config-file=/path/to/workspace.conf on the command line, or through any of the supported configuration overriding mechanisms.

Read workspace.conf from Git

For production deployments, it’s often preferable to read config directly from Git. This is useful both for Infrastructure-as-code, as well as for deploying to services where there’s ephemeral storage (like AWS ECS).

You can configure Flow to read a workspace.conf file from a git repository, by passing the following command line settings:

Setting Description

flow.workspace.git.url

The url of the Git repo to clone. If pulling from Github, Gitlab or Azure DevOps, use a personal access token in the url (eg: https://username:personalAccessToken@github.com/username/yourRepoName.git or https://[username]:[personalAccessToken]@dev.azure.com/[yourOrgName]/[yourProjectName]/_git/[yourRepoName])

flow.workspace.git.branch

The name of the branch to check out

flow.workspace.git.path

Optional The path within the repo to read the config file. Defaults to workspace.conf

Use a single-project workspace

For demos / test config, it’s sometimes useful to start Flow with a single project configured.

You can bypass the workspace config, and point Flow directly to a single local file-system project.

To do this, start Flow with --flow.workspace.project-file=/path/to/taxi.conf

Configure conventions

Durations are defined using ISO 8601 formats. For example:

  • 1 Day = P1D

  • 3 Seconds = PT3S

Kitchen sink configuration example

file {
   changeDetectionMethod=WATCH
   incrementVersionOnChange=false
   projects=[
      {
        isEditable=true
        path="/opt/var/flow/schemas/taxi"
      }
   ]
   pollFrequency=PT5S
   recompilationFrequencyMillis=PT3S
}
git {
   checkoutRoot="/my/git/root"
   pollFrequency=PT30S
   repositories=[
      {
         branch=master
         name=my-git-project
         uri="https://github.com/something.git"
      }
   ]
}

Include a README.md file

If you supply a README.md file at the root of the project, it will be read by Flow and displayed in the UI.

Github flavored markdown is supported, along with some special code snippets.

readme example

Taxi snippets

Standard Taxi can be included by using the following syntax:

```taxi
@DatabaseService(connection = "my-postgres-db")
service MyPostgresService {

   // table declares read operations, such as querying
   table ticketSales : VenueTicketSalesRecord[]

   // An upsert operation, for performing upsert queries
   @UpsertOperation
   write operation saveTicketSalesRecord(VenueTicketSalesRecord):VenueTicketSalesRecord
}
```

TaxiQL snippets

Taxi queries that can be run in the query editor in Flow can be included like this:

```taxiql
find { VenueTicketSales[] }
call MyPostgresService::saveTicketSalesRecord
```

taxiql snippet

Schema diagram snippets

Schema diagrams can be included with the following syntax:

```schemaDiagram
{
   "members" : {
      "TicketsS3Bucket" : {},
      "VenueTicketSales" : {},
      "TicketPricesApi" : {}
   }
}
```

This will produce the following interactive component within the README in Flow:

schema diagram

Continue reading

Continue learning about Flow in connect your data sources.